key: cord-0558771-ob8jlq9x
authors: Philippou, Andreas N.
title: Why Do Polls Fail? The Case of Four US Presidential Elections, Brexit, and Two India General Elections
date: 2021-07-29
journal: nan
DOI: nan
sha: 9b5add0ca6224dcfcfaf93dd556df3db25447584
doc_id: 558771
cord_uid: ob8jlq9x

One of the most widely known and important applications of probability and statistics is scientific polling to forecast election results. In 1936, Gallup predicted correctly the victory of Roosevelt over Landon in the US presidential election, using scientific sampling of a few thousand persons, whereas the Literary Digest failed using 2.4 million answers to 10 million mailed questionnaires to automobile and telephone owners. Since then, polls have grown to be a multibillion flourishing and very influential and important industry, spreading around the world. Polls have mostly been accurate in the US presidential elections, with a few exceptions. Their two most notable failures were their wrong predictions of the US 1948 and 2016 presidential elections. Most polls failed too in the 2016 UK Referendum, in the 2014 and 2019 India Lok Sabha elections, and in the US 2020 presidential election, even though in the latter three they did predict the winner. We discuss these polls in the present paper. The failure in 1948 was due to non-random sampling. In 2016 and 2020 it was mainly due to the problem of non-response and possible biases of the pollsters. In 2014 and 2019 it was due to non-response and political biases of the polling agencies and news outlets that produced the polls.

Let be the unknown proportion of members in a population who possess an attribute. If in a random sample of members from this population, members are found to possess this attribute, then = / is used to estimate . Sometimes people write = r/n ± E, where E is the margin of error of the estimate. By the Central Limit Theorem, for large n, with probability ≅ 95%. Since = / , it may be stated with approximate probability 95%, that = r/n ± E, where E = 1.96√( / )(1 − ( / ))/ , with E ≤ 3% for = 1,068, and E ≤ 3,5% for = 800.

The above confidence interval for , using simple random sampling, has been essentially the basis for polling after 1948, when the pollsters failed to predict Truman's victory using quota sampling, and decided to replace it by probability sampling.

Twelve years earlier, in 1936, Dr. George Gallup [2] (see, also, Squire [21] and Warren [22] ) predicted correctly the victory of Roosevelt over Landon in the US presidential election, using only 50,000 responses to a "scientific sample", whereas the Literary Digest failed using 2.4 million answers to a "straw sample" of mailed questionnaires.

Since 1936, the polls have grown to be a flourishing and very influential and important multibillion industry, spreading around the world. They have mostly been accurate in the US presidential elections, with a few exceptions. Their two most notable failures were their wrong predictions of the US 1948 and 2016 presidential elections. The polls failed too in the 2016 UK Referendum, in the 2014 and 2019 India Lok Sabha elections, and in the US 2020 presidential election, even though in the latter three they did predict the winner.

The failure in 1948 was due to non-random sampling. In 2016 and 2020 it was mainly due to the problem of non-response. In 2014 and 2019 it was due to non-response and political biases of the polling agencies and news outlets that produced the polls.

For decades, polls were typically conducted by telephone, using live interviewers. Today, internet surveys, random digit dial (RDD), and Interactive Voice Response (IVR) polls are increasingly common. Most sampled persons, more than 90%, refuse to answer to polls. In particular, according to the Obama 2012 presidential campaign whizkid, David Shor [12, 13] , in RDD polls roughly 1 percent of people respond. But those who respond to polls are "weird", they are not the same as those who do not, and this is biasing the polls.

In one form or another opinion polls have been part of the American scene for more than 150 years and slowly-slowly became more scientific and their use spread around most of the world.

During 1920-1932, the mass periodical Literary Digest became very famous for its successful predictions of the winner of the US presidential elections, based on very large samples of persons.

In 1936, however, the series of successes ended dramatically, and the Digest seized publishing in 1938. The periodical mailed 10 million "ballot-questionnaires" to automobile and telephone owners, and, on the basis of about 2.4 million "ballots" (2,376,523 to be exact) received back, it predicted a landslide victory 3 to 2 for Alfred Landon against Franklin Roosevelt. Landslide victory it was, but Roosevelt was the winner with 61%, and he became USA President, again, not Landon.

Two were the reasons for the failure of the Digest. First, the "straw sample" of 10 million recipients was not representative of the American voters, and, second and more important, the 2.4 million respondents were not representative of the 10 million recipients (the 6.4 million non-respondents were different). According to Squire [21] , who used data from a 1937 Gallup survey which asked about participation in the Literary Digest poll, the magazine's sample and the response were both biased and jointly produced the wildly incorrect estimate of the vote. However, he states, if all of those who were sampled had responded, the magazine would have at least predicted Roosevelt as the winner. See also Bryson [2] , who disputes the first reason altogether and writes about the making of a statistical myth. In contrast, Gallup, Roper and Crossley used "scientific sampling methods" designed to include the proper proportion of voters from each economic stratum -not just those who owned automobiles and phones, and their predictions, based on 50,000 responses, were closer to the actual landslide victory of Roosevelt. See [2] , [21] and [22] . All three failed, however, 12 years later.

Gallup, Roper and Crossley wrongly predicted Dewey's victory with 6, 15 and 5 percentage points more than Truman's, respectively, when in fact Truman won the US Presidency with a margin of 5 points and 114 Electors more than Dewey. All three Pollsters used quota sampling in order to ensure that the sample represents the voters in various strata (residential area, sex, age, race, economic status), and interviewers. Each interviewer was assigned specified numbers in each stratum. The Chicago Tribune felt so confident in the polls that the night of the election went ahead and printed the following morning's edition with the headline DEWEY DEFEATS TRUMAN. The picture below, showing President Truman holding the paper, is one of the most famous images in American politics.

What went wrong is that each interviewer was free to pick the voters in each category anyway he pleased, not randomly. Evidently, more Republicans were interviewed than Democrats. In the aftermath, the three Pollsters and other members of AAPOR, which was founded in 1947, held a meeting in Iowa City and developed industry standards for public opinion polls, inaugurating a shift from quota sampling to probability sampling.

For decades, starting in the 1970's, polls were typically conducted by telephone, using live interviewers.

In 1980, the ASA Section on Survey Research Methods published the booklet What is A Survey? by R. Ferber et al. [5] (see, also, Fritz Scheuren [15] for an updated version of the same tittle) to help avoid mistakes and to promote a better understanding of what is involved in carrying out sample surveys correctly -especially those aspects that have to be taken into account in evaluating the results of surveys. The sample from the target population should be taken randomly, in order to be representative of the population, and the nonresponse should be avoided or minimized as much as possible.

Nowadays, internet surveys, random digit dial (RDD) telephone polls, and Interactive Voice Response (IVR) polls are increasingly common (see, for example, [8] , [10] and [13] . But now more people refuse to respond, 90 to 99 percent, and this is biasing the polls despite weighting adjustments for education, age, race, gender, etc.

Donald Trump's victory like that of Harry Truman in 1948, is considered as one of the greatest political upsets in modern U.S. history [6] .

What went wrong with the polls? The polls did predict the winner of the national vote, albeit Clinton's recorded win margin of 2,1% was smaller than her predicted margin by most polls (4% by CNN Poll of Polls). But they failed to observe the swing to Trump of many white blue-collar workers within the Great Lakes and Rust Belt regions, to whom his rhetoric appealed, especially in Michigan, Pennsylvania and Wisconsin. The Latest Wisconsin polls | US Election 2016 poll tracker (ft.com), for example, on the day of the election found Clinton winning the state by a margin of 6.5%. Trump managed to win these States, and he won the election with 304 electoral votes.

According to Kennedy et al. [11] , a committee, commissioned by AAPOR and headed by him, conducted an extensive investigation of the performance of pre -election polls in 2016. While the general public reaction was that the polls failed, the committee found the reality to be more complex. Some polls, indeed, had large problematic errors, but many polls did not. In particular, the national polls were generally correct (with respect to the popular vote) and accurate by historical standards. The most glaring problems occurred in state-level general election polling, particularly in the Upper Midwest.

The committee evaluated a number of different theories as to why so many polls underestimated support for Donald Trump. The explanations for which the most evidence exists are a late swing in vote preference toward Trump and a pervasive failure to adjust for overrepresentation of college graduates (who favored Clinton). In addition, there is clear evidence that voter turnout changed from 2012 to 2016 in ways that favored Trump. Despite widespread speculation, there is little evidence that socially desirable responding was a major contributor to poll error. If there was a Shy Trump effect on responses, it does not appear to have been particularly large.

One encouraging result from the historical analysis, they state, is that there is no systematic bias toward one major party or the other in US polling.

One broader question raised is whether the polling problems in 2016 could reoccur. The 2016 election featured a number of unusual circumstances that are perhaps unlikely to repeat (e.g., both major party candidates being historically unpopular, split outcomes in the popular vote and Electoral College, nearly 14 million votes across three states breaking for a candidate by about 0.5%), but several structural weaknesses of polls are likely to persist. Errors in state polls like those observed in 2016 are not uncommon, even though 2016 was a particularly bad election for state polls. Finally, a late swing in favor of one candidate (as appears to have occurred in 2016) is not something that pollsters can necessarily guard against, other than by polling closer to Election Day.

Although Clinton typically conceded defeat to Donald Trump, the Democrats opposed him fiercely, never accepting his victory. December 18, 2019, Donald Trump was impeached by the House of Representatives on charges of abuse of power and obstruction of Congress, but he was acquitted by the Senate on February 5, 2020.

In the US 2020 presidential election Biden defeated incumbent President Trump, as projected by almost all pollsters, even though the national election vote lead of 4,4% is quite smaller than the projected lead of most of them (10% by the CNN poll of polls as of 11/2/2020). The mistakes of the polls in Michigan, Pennsylvania and Wisconsin, underrepresenting Republicans, were even bigger. The mistakes of the polls in 2020 have been bigger, underestimating Trump's national support and slightly overestimating Biden's, mainly due to the problem of non-response. But they correctly predicted the victory of Biden. According to a task force of AAPOR headed by Josh Clinton [3] , after examining 2,858 national and state-level election polls, the polling error was of "unusual magnitude" resulting in the worst performance of national polls in 40 years and state polls in 20 years. Among polls conducted in the last two weeks before the election, the average signed error on the vote margin was too favorable for Biden by 3.9% in the national polls, and by 4.3% in state polls. See, also, Panagopoulos [13] , who found that the pro-democratic bias of the polls in 2020 was systematic. According to David Shor [12, 13] , a successful data scientist and 2012 Obama presidential campaign whiz-kid, every "high-quality public pollster" in the USA now does random digit dialing and roughly only 1 percent of people respond. Then, despite weighting for education, age, race, and gender, the pollsters fail, because respondents are quite different than nonrespondents.

The United Kingdom joined the European Economic Community in 1973. On 23 June 2016, in a Referendum, the UK voted for Leave the European Union by 51.89% to 48.11% for Remain, a margin of 3.78%. However, even on the day of the Referendum a YouGov poll predicted 52% for Remain to 48% for Leave, using a sample of 4,772 voters!

The following day, the British Polling Council [1] reported its analysis of the final EU referendum polls. "Seven member companies issued 'final' polls of voting intentions in the EU referendum. While no company forecast the eventual result exactly, in three cases the result was within the poll's margin of error of plus or minus three points, and in one of them Leave was correctly estimated to be ahead. In the four remaining cases, however, support for Remain was clearly overestimated. This is obviously a disappointing result for the pollsters, and for the BPC, especially because every single poll, even those within sampling error, overstated the Remain vote share".

The error is mainly due to the problem of non-response. It is also likely that pollsters were biased in favor of Remain.

The 2014 India Lok Sabha election was held from 7 April to 12 May 2014. About 834 million people were eligible to vote, and turnout was over 66 per centthe highest until then.

The National Democratic Alliance (Bharatiya Janata Party led) won 336 seats, the United Progressive Alliance (Indian National Congress led) won 59 seats, and Others won 149 seats. It was the greatest upset in Indian political history.

Almost all polls, including the exit polls, grossly underestimated the strength of NDA and overestimated the strength of the UPA, even though they did predict the victory of NDA. Jonah Force Hill [7] of Harvard, writing in the Diplomat ahead of the 2014 Lok Sabha (Lower House of Parliament) elections stated that, "Election polling in India is a notoriously unreliable exercise. It suffers from the political biases of the polling agencies and news outlets that produce the polls". He continued: "A more serious challenge to reliability comes from operational problems inherent in India's mammoth electorate, complex demographics, daunting geography and poor infrastructure, all of which make accurate polling an immensely labor intensive, expensive and often-dubious process". See, also Praveen Rai [16] .

The 2019 India Lok Sabha election was held from 11 April to 19 May 2019. About 911 million people were eligible to vote, and turnout was over 67 per centthe highest ever. The National Democratic Alliance won 353 seats, the United Progressive Alliance won 91 seats, and Others won 98 seats. Most of the polls in 2019, including the exit polls [12] , underestimated NDA and overestimated UPA, even though they did predict the winner. However, the two exit polls [19, 20] shown in the following table, were almost excellent. 

In ending the paper, we make the following remarks.

Gallup, who used a small "scientific sample" of responding voters to interviewers, in contrast to the failure of the Literary Digest, which sent millions of "ballotquestionnaires" and received back as responding "ballots" only 25% of them, was the beginning of successful scientific polling in the United States, albeit with a few failures. Polls spread throughout most of the countries of the world.

2. The two most notable failures of pollsters in the US were their wrong predictions of the US 1948 and 2016 US Presidential Elections. In 1948, the failure was clearly due to non-random sampling. In 2016, it was mainly due to very high nonresponse percentage, 90 to 99, and the resulting bias despite weighting adjustments. It is true that the national polls did predict correctly the winner of the national vote. Several state polls, however, especially in Michigan, Pennsylvania and Wisconsin, failed to observe the swing to Trump of many white blue-collar workers. Even on the day of the election, Clinton was predicted to win Wisconsin by a margin of 6.5%! Trump won all these states, reaching 306 electors to Clinton's 232, and became President receiving 304 votes to Clinton's 227 as seven electors defected.

3. In the US 2020 Presidential Election, almost all pollsters correctly predicted Biden's victory over incumbent President Trump, even though the national election vote lead is quite smaller than the projected lead of most of them. After examining more than 2,800 national and state-level election polls, a task force of AAPOR found that the polling error was of "unusual magnitude", resulting in the worst performance of national polls in 40 years and state polls in 20 years. Among polls conducted in the last two weeks before the election, the average signed error on the vote margin was too favorable for Biden by 3.9% in the national polls, and by 4.3% in state polls. Panagopoulos [15] found that the prodemocratic bias of the polls in 2020 was systematic.

4. In the 2016 UK Brexit Referendum, the UK voted for Leave the European Union by a margin of 3.78%. However, even on the day of the Referendum a YouGov poll predicted a margin of 4% for Remain, using a sample of almost 5,000! As the British Polling Council reported, seven member companies issued 'final' polls of voting intentions in the EU referendum. In three cases the result was within the poll's margin of error of plus or minus three points. In the four remaining cases, however, support for Remain was clearly overestimated. This is obviously a disappointing result for the pollsters, especially because every single poll, even those within sampling error, overstated the Remain vote share. The error is mainly due to the problem of non-response. It is also likely that pollsters were not unbiased.

In the 1914 India Lok Sabha Elections, the National Democratic Alliance won 336 seats, the United Progressive Alliance won 59 seats, and Others won the remaining seats. Almost all polls, including exit polls, grossly underestimated the strength of NDA and overestimated the strength of UPA, mainly due to nonresponse, and political biases of the polling agencies and news outlets that produced the polls. In the 1919 India Lok Sabha Elections, the National Democratic Alliance won 353 seats, the United Progressive Alliance won 91 seats, and Others won the remaining seats. Although most of the polls and the exit polls underestimated the strength of UDA and overestimated the strength of UPA, they did predict the winner. Two of the exit polls were excellent.

The main problem in polling is the problem of non-response, despite weighting adjustments.

The people who answer questions in polls are different from the kind of people who refuse to answer, and this is biasing the polls.

Performance of the polls in the EU referendum (britishpollingcouncil.org)

The Literary Digest Poll: The Making of a Statistical Myth

2020 Pre-Election Polling: An Evaluation of the 2020 General Election Poll. Report of a Task Force

Opinion polling in a democracy

What is A Survey? Booklet

Flashback: It's happened before: Truman's defeat of Dewey had hints of Trump-Clinton

In India's national elections don't trust the polls. The Diplomat

The evolution of election polling in the United States

Elements of Statistical Inference, 4 th edition, 4 th printing

Gauging the Impact of Growing Nonresponse on Estimates from a National RDD Telephone Survey

An evaluation of the 2016 Election Polls in the United States

David Shor's Unified Theory of American Politics

One pollster's explanation for why the polls got it wrong. Election results: Why the polls got it wrong -Vox

Lok Sabha Elections 2019 Exit Poll Results

Polls and Elections Accuracy and Bias in the 2020 U.S. General Election Polls

Fallibility of opinion polls in India

What is a Survey? Booklet

Statistical Methods, 6 th edition, 7 th printing. The Iowa State University Press

Lok Sabha

India Today-Axis My India exit poll gets Lok Sabha result spot on

Why the 1936 Literary Digest Poll failed

Public Opinion Polls