I owe many people an apology today. I was extremely confident that HRC had this election in the bag and presumptuously advertised this confidence over the past weeks, dismissing any thought that she would lose. It wasn't an act, and it has made it all the more devastating to me now that she has in fact lost.
Because of my work with Gradient, which does statistical modeling in a business setting, over this cycle a lot of people have asked me for my interpretation of the various forecasting models. When discussing it with them, I was bullish, overconfident, and as it turns out, terribly wrong. I feel terrible for giving people this false impression of security and making the result any more jarring and devastating than it already is.
What was I wrong about? Well, mostly everything - but two very large things stand out: polling bias and the correlation of polling errors. In general there are two types of statistical error: bias and variance. Variance is when you're dancing around the right result - any one measurement is off but on average the errors cancel out; bias is when the errors don't cancel. Polling bias (even state polls) in presidential elections has been estimated over many cycles, and has typically been small (about 1%). What polling bias means in concrete terms is that many more people voted for Trump than the polls captured; there was a social movement happening in front of our eyes but that did not show up in the data. I trusted the data and did not foresee the possibility that the polling bias would be so large across the board in swing states.
That leads to the next point. State results are obviously correlated - blue and red states tend to vote together. This means too that polls of states should be correlated. So outcomes and the measurements of those outcomes should be correlated - but should errors be correlated? I had no reason to think so. For example, polls had HRC ahead in Michigan and Wisconsin; I thought that all the correlation between the two would be captured by the correlated polling results across the two states. I did not think that a polling error in one state meant that a polling error in the same direction in the other state was more likely. It is obvious now that this was the case.
This election is going to prompt a rethink across the entire polling and data analysis industry - myself very much included, even though politics is not my professional remit. But that's small potatoes compared to the vastly more consequential implications of this election: climate change; respect for women, immigrants, and muslims; the supreme court; our security and trade relationships around the world; the nuclear codes, and so on.
As for what happens next, I hope you agree with me that it's critical that we stay engaged with our country's politics as opposed to recoiling in horror. Quite literally, I think our country needs us and people like us to fight tooth and nail to preserve our vision of what America is and what makes it great.