Nate Silver’s forecast would have done better had he gotten a few wrong. 

I really like 538 and they have done a great job. I have one minor quibble with how they represent their forecasts and their results. 538’s forecasts are not binary predictions, they are probability estimates. Meaning, that of all the races he projects to be at 70% probability, the winner should be the leading candidate 70% of the time – not 100% of the time. But 538 and many others are saying that he got 50/50 states right, and that confuses the picture a lot. What does it mean to call Florida “correctly” when the model gave Obama a 50.1 chance of winning?

The example I used when explaining it to some friends was that if a weatherman, for 10 straight days, said there was a 55% chance of rain, and it rained every day – you could say that the weatherman correctly predicted the weather every day, but the correct interpretation is that his estimates were too low. 

The code below simply takes 538’s estimates right before the election, with the probability figure being that which 538 assigned to the eventual winner. The code simply assumes that 538’s probability estimates were exactly correct, and shows the distribution of states that would be in error in a run of 10,000 simulations. As you can see, if 538’s estimates were exactly correct, it’s  significantly more likely that he would have gotten 1, 2, or 3 states wrong than none at all. 

This is not to say that his estimates weren’t spot on – there is a good chance they were – I only want to point out that there is a little bit more going on than saying that 538 got 50/50 states “right”.

You brought words to a numbers fight

One of the great joys of this election for nerds like me is following the debate about how to track the election. There are a few competing models available to look through, such as the Princeton Election Consortium, Votamatic, FiveThirtyEight – and even UnSkewed Polls.

Each of these modelers has taken a different approach to a difficult problem – namely how to estimate the outcome of an election using available data. There are tradeoffs around each approach and there’s a fairly lively debate about which approach is most enlightening. I am excited to see how they fare tomorrow.

On the other hand, you have a bunch of contributors making proclamations about what will happen, seemingly untethered to the evidence available to them. They only show the results of their internal model (should there be one), not the model itself. Obviously many of them are motivated by something other than the “search for truth”, but even if they were, their contribution is generally useless. Here’s why.

First, a model can be useful even if its predictions are wrong. Especially when it is predicting things that are compositional in nature (i.e. not single events in and of themselves, but composed of many events). Second, a model can make a correct prediction even when the model itself is wrong.

Most importantly, when two projections disagree, it’s impossible to work out why they disagree unless we know how the models that produced them work in the first place. 

This last point is the most important. Without a model, you are treating projection like opinion. 

So: come with a model. 

Updated simulation w/ more swing-state prices

Moved a bunch of states out of the “safe” categories and into the “swing state” categories w/ associated prices. States like Indiana, Arizona, Missouri, Michigan, and Wisconsin. (That may in fact be it.)

Doesn’t change the outcome whatsoever. So far this race seems safely in the bag for Obama, with this informal model predicting a 97% likelihood of him winning. 

Obama FTW