As the US presidential elections wind their way towards a painful and tortuous conclusion, I’ve been thinking a lot about the difference between reporting on a survey that offers a snapshot in time, as opposed to using polls, and past history, to predict what the results might be in a few days, weeks, or months.
My colleagues and I have done lots of surveys over the last eight years in Colombia, India, Mexico, Morocco, Nigeria, and the USA. In all this work, we offered estimates of how many people in a given population engaged in this or that behavior, supported this or that policy, or thought positively/negatively about a particular organization or institution. There is always a confidence interval around every one of those estimates influenced in large part by the number of valid responses. This allows you answer questions like, “what percentage of the population living in the Moroccan cities of Casablanca or Rabat supported this or that political party in May 2015?” Or, “what is the statistical association between attending mosque and believing that women’s rights are human rights?”
If I wanted to ask how many people were likely to vote for a specific Moroccan political party in the future, however, I’d have to do so much more than extrapolate from the previous work I’d done with my colleagues. I’d want to have more polls over time, so that I could track trends, and I’d want to figure out a way of predicting who would be likely to actually participate in the election, assuming that participation was voluntary.
In the US, pollsters did not do as good a job as we had hoped for the 2020 presidential elections, leading at least one New York Times op ed writer and survey expert to ask, “Can we Finally Agree to Ignore Election Forecasts?” There is a lot of back and forth on the issue. (For a pollster’s defense of his industry’s 2020 performance, see Mark Mellman’s recent piece in The Washington Post).
The biggest problem this time around and in 2016 was that most pollsters under-estimated the pro-Trump vote in a number of key states. They did so by missing roughly 3-4% of those Trump supporters who refuse to answer pollsters, for one reason or another; by under-estimating the percentage of Hispanic voters who supported Trump in Florida and Texas; and by under-estimating the number of Trump supporters who would actually turn out to vote. US pollsters, in other words, don’t have as good a grasp on the political views and behavior of non-college educated whites and Hispanics as they would like.
Political scientists will come up with more and better explanations in the coming months as they and their graduate students sit down with the 2020 polling data and analyze it carefully, state-by-state and demographic segment-by-demographic segment. They will also want to re-examine the 2018 and 2016 electoral data, and re-interpret the lessons learned then in light of the 2020 results. Then, pollsters will incorporate those lessons learned into their weighting algorithms and turnout predictions, hoping to do better in 2024.
One problem that cannot be fixed, however, is that pollsters, survey experts, and anyone else in the business of estimating the views of the non-college educated Americans who support Trump are, by definition, college educated; it’s hard to be a survey expert, pollster or commentator without higher education. The non-membership of pollsters in the very same demographic group that they are trying so hard to understand is thus hard-wired into the entire enterprise. This problem, moreover, can never, ever go away. By definition. It’s as if all the pollsters in the US were white and mono-lingual, and yet nonetheless hoped to really understand the views of people of color, or people who speak only Spanish. It’s really hard. Even the best and most sensitive ethnographers, such as the UC Berkeley super-sociologist Arlie Hoschschild (author of “Strangers in Their Own Land”) will invariably find themselves viewing non-college educated Americans as a foreign tribe.