Nate Silver first made a name for himself by using statistics to make sports predictions. But like most, I became aware of him after he accurately called 49 out of 50 states in the 2008 Presidential election. His fame rose when he called the 2012 election accurately as well, despite many on the right not quite having faith in his numbers.
The core of his technique is nothing magical, although neither will I shortchange him by calling it “obvious” as so many people are wont to do after somebody clever does something new. It’s obvious in hindsight; it wasn’t so obvious before he did it. He’s published a general outline of his methodology after every Presidential election, and you don’t have to be an actual statistician to follow what he’s doing. You do, however, need to have a basic understanding of the underlying statistical methodology. Any undergraduate stats course should let you follow along – conceptually, at least, if not in the details.
Before I dive into my main point, let me emphasize that Mr. Silver’s methodology will work brilliantly the vast majority of the time. His methodology is just about as truly data driven as it’s possible to be. He uses the best data that’s out there. And given his reputation, he can now get access to that data easily. He also uses standard and sound statistical methods.
However, the day will eventually come when Nate Silver will fail – and when it does, he will fail big.
To understand why, we first need to have a basic understanding of his methods. A decade or two ago, somebody had the keen insight that although any individual poll taken during an election season had to be taken with a huge grain of salt, if you average all the polls together you end up with numbers that are pretty reliable. I’m not sure who had the eureka moment first, but Real Clear Politics popularized the concept with their RCP poll average in the early 2000s and it’s been a staple of politics ever since.
Mr. Silver took the concept even further and improved upon it in several ways.
First, he realized that in Presidential politics it was the state polls that mattered – not the national polls. So he computed polling averages for each individual state.
Second, he did historical analysis of each polling company and concluded that some were more reliable than others. He quantified this using standard statistical techniques, and then adjusted his averages by weighting each poll according to its historical reliability. This alone is a big improvement to the RCP model, and its validity shouldn’t be discounted.
Third, he added other factors into his model: the general state of the economy and how it favors the incumbent; endorsements; experience of the candidate; and several other factors. The predictive value of these factors is less, so they’re weighted less in his model – but their value counted.
Fourth, he improved the whole thing by running Monte Carlo simulations. This is also a giant improvement over the RCP average. Basically, it works like this: you write a simple computer program that takes the poll numbers given and, using the model you’ve devised (in this case, points 1 through 3 above) you simulate a given election. With the polls, endorsements, etc as given, you also account for some randomness in the actual results. To do this, you account for the historical error of the polls – if a candidate is polling at, say, 45% then history might suggest that his actual vote could be anywhere from 40% to 50%, and you can compute a probability curve that matches that range.
Then you run this simulation – a lot. Thousands of times or tens of thousands of times. Let’s say you run it ten thousand times, and out of those ten thousand times, Candidate A wins the election five thousand times: exactly half. You then say that candidate has a 50% chance of winning the election.
The methodology is pretty sound. But it has some serious flaws, and because of these, eventually Nate Silver will fail. Here are the problems.
First, the model requires that the input polling data be good. If the polls aren’t good, then Silver’s model isn’t any good either. Note that it doesn’t require any individual poll to be perfect. But it does require a few things. Each poll should be generally within or close to it’s historical margin of error. The polls should be canceling out each others’ errors. In other words, if one poll gets Candidate A’s share of the vote too high, the competitor’s poll should get it too low. If both polls are wrong in the same direction then averaging them doesn’t help.
There is strong evidence – even documented by Silver himself – that the polls are getting worse. Indeed, the polling companies are having so much trouble that Gallup has stopped polling the Presidential races altogether. There’s also evidence that the polls have started to weight their data so that they match more closely to other polls. That skews their value and makes them less reliable. So the polls themselves are a problem – and a growing one.
Second, polling long before an election is hugely inaccurate. Accuracy increases greatly the closer a poll is taken to the actual election. This is why Mr. Silver’s 2008 and 2012 predictions weren’t magic: the “predictions” relied on polls taken within days of the election. With respect to Mr. Silver, this accomplishment isn’t as big as many made it out to be. At that point, the polls are generally pretty accurate. His achievement was simply to look at the right polls.
To be fair to Mr. Silver, he’s quite aware of this problem and has discussed it at some length. He refrains from even making predictions before certain points in the campaign, and he’s the first to tell you that they’re of little value even when he begins them. However, having his predictions become accurate only days or a very few weeks before the actual election robs them of much of the value of a prediction. It doesn’t make them worthless, mind you, just of small utility for most of us.
But the real problem isn’t even those issues, as bad as they might be. The real problem is that the map is not the territory. Mr. Silver has constructed a wonderful model of elections. But it’s just that: a model. It is not the reality.
The biggest area where this will eventually bite him is in the non-polling factors that he includes. For instance, months ago Mr. Silver was claiming that Donald Trump’s low favorability ratings put a cap on the support he’d manage to get at the polls. He made the claim in several places, but this piece from July 2015 is the one I managed to find with a few seconds of Googling. In it he claims that candidates with Trump’s net favorability ratings rarely grow beyond 20 or 30% of the vote. As of this writing, the RCP average has Trump at 29% in Iowa (about to break that ceiling), 32.2% in New Hampshire (broke the ceiling) and 34.8% nationally (shattered it). A poll released today shows that he’s nearing 50% in Florida.
What happened? Trump’s favorability changed – a lot. Gallup last week showed him at +27% among Republicans, up 23 points from where Silver had him in the July piece listed above. That’s yuge.
Again, as I noted above – the map is not the territory. Silver’s model, as good as it is, doesn’t account for this kind of thing to happen. Now, it’s easy to say, “let’s update the model to allow for the off chance of someone increasing his favorability.” Fine. But the underlying problem is that favorability doesn’t directly predict anything. It’s a proxy.
Think of it this way: there’s no ironclad law of physics that says that a candidate with low favorability ratings can’t win. Mr. Silver has merely observed that so far, in the election’s we’ve seen, this hasn’t happened. It seems to have a strong correlation with the winner. But correlation does not equal causation. In this particular case, the variables are probably weakly linked. That is, how favorably the electorate views a candidate probably does have some impact on how they eventually vote. But it’s not a perfect match.
Mr. Silver will readily admit this, and that’s why the value is weighted relatively small compared to other data. But the problem is that all of Mr. Silver’s data is intrinsically a proxy, including the actual polling data. How people say they’ll vote is not the same thing as how they’ll actually vote. The correlation is high, but it’s not a causation.
Someday we’ll hit a point in the territory where the map doesn’t agree with it. For that case, we’ll have no choice but to conclude that the map is wrong. As they say in sports, there’s a reason they play the games.
There’s good reason to suspect that this election cycle may be it. Mr. Silver has been giving Mr. Trump roughly 5% odds of winning the nomination, based mostly on his model. Personally, I think his model is wrong in this specific case. “This time is different” is called a lot and is rarely true. But… sometimes it’s true, this time really is different. By all outside appearances, this election certainly seems to be one of those cases. I believe that Mr. Silver has too much invested in his model for him to be able to step back and honestly admit that it may not cover this case. Again, to be fair to Mr. Silver, I don’t believe this is a conscious choice. But I think it’s real.
But this may not be the time, either. It may well be that this time Mr. Silver is right again and I am wrong. I fully accept that, and I’ll admit it here if it’s the case. But even if this time isn’t the one, sooner or later Nate Silver will fail – and it will be yuge.