Tuesday, November 22, 2016

A complete guide to election day data prediction mishap - oops we forgot about Johnson and errors

  • A top data visualization team and journal - the New York Times completely failed to predict a Trump win. Predicted 85% Clinton victory. 
  • Nate Silver, author of FiveThirtyEight, also the guy who perfectly predicted Obama 2008 Election results, predicted that Clinton has 2/3 of a chance to win. He's also saying Trump had 30% chance of winning - significantly higher than most people expected. People who had extreme distaste of Trump probably expected close to 0 chance.
    • Nate Silver later posted on his twitter : A—There's a 30% chance of an earthquake B—LOL ur crazy no way it's that high {{earthquake}} B—Idiot! You said a 70% chance of no earthquake
  • We all forgot about Johnson, the minority contender whose presence might have "stolen" the 1-5% margin Clinton so needed but instead narrowly losing to Trump in key Democratic states. Many forgot that these marginal alternative votes to Johnson has cost her many key states like Pennsylvania where her votes were only percentage away from Trump's. Personally I believe that the existence of a 3rd candidate caused Clinton to lose swinging states. In order to exercise their right to vote alternative, many voters have accidentally handed the presidency to Trump, whom they would not have voted for at all. Unintended mathematical result of voting alternative. 
  • Traditional sampling and polling methods claim to have being "blindsided" postmortem. They claim the root data was wrong. Specifically many blamed the early exit polls.
    • Postmortem examination of this method shows clear selection bias - the people who are more vocal would have revealed their votes, as are those who matched the popular expectations.
    • Margin of Error. Columbia University researcher Andrew Gelman found the margin of error of such polling can be as high as 7%, 14-point range +/-. An estimated 50% vote of confidence is actually 43% to 57%.  Crazy, that's the difference between a majority win or loss.
  • HuffingtonPost a Silicon Valley and women friendly media outpost claimed that Clinton will win by more than 90% chance.
  • Election night, people warmed in utter shock as Trump racked up electoral college votes in a landslide
  • The updated NYTimes visualization below shows the stunning "upset" where Trump overtook Clinton in an Election Night victory
  • Just like we cannot predict stock market win/loss, we cannot predict election win/loss.  
  • Data analysis results were wrong but there were some new data visualization charts. Nate Silver came up with a snake board-game-like intestine chart for electoral college votes and the states. Some criticism of the snake chart : no useful info, just a fancy map on electoral votes. And it's a design spin on an vintage game.  
  • Subjective perspective and inherent bias. Many media and data outlets were criticized post election. While no statisticians would ever unwisely predict 100% Clinton win, nearly all liberal outlets were prematurely hailing a Clinton win. Even Nate Silver who gave Trump a 30% chance of winning, did not step out to help the public understand this stats until after the election. 
  • Simulation and randomness. Nearly all respectful data outlets ran simulations with random factors. They build in scenarios of swinging states flipping, margin of error. Yet the models still fell short.
    • Imagine simulating percentage votes versus electoral college votes. Percentage is continuous and can change a fraction of percentage at a time. Electoral Votes are much more discrete, and is allocated by lots of 3, ... , 55 (Alaska...California) margin of error becomes huge on one hand. The "landslide effect" of Clinton's major upset was very apparent on election night when chunks of electoral votes were going to Trump, escalating him quickly to the threshold, swiftly making him the apparent winner.
  • Personally, I think at Election Night, watching the live visualization on Wall Street Journal website, I saw that large cities were voting as expected leaning either democratic or republican but yet there were many lesser known counties were overwhelming voting for Trump from California to New York. Few studies and data analysis were granular on the county level. We were so focused on state level results. 
  • This is an election that swinging states matter a great deal more than usual. Nate Silver used two numbers tipping-point chance and voter power index to highlight these important states that played a crucial role election night. 
  • Who's winning the popular vote? Nate Silver estimated Clinton an average of 48.5% percentage, and Trump 44.9%, really not bad at all. And we forgot Johnson 5%! That is enough margin to make Trump the winner! If Clinton fails to catch all 48.5%, and Johnson fails to capture 5% throughout, that's enough margin going to Trump. Plus the margin of error of estimation... Wow Trump and Hillary win were more like a flip of a coin 50% 50%. (visit Nate silver's blog to see this useful visualizaiton). Again, my personal opinion is that we forgot about Johnson
  • In my personal opinion, Trump's win was not a landslide, instead it appeared to be a landslide because of our electoral college system. The actual votes (popular vote) was a more even split. I personally think we really forgot about errors and Johnson. Landslide victories were unlikely (Obama had a true landslide), so margin of errors and Johnson presence were extremely important. Yet we forgot about them. We still don't think about them when we just claim there was a landslide victory and now we are learning what Trump did right and justify what he did right. Really he did a lot of things right and Clinton was close to do a lot of other things right. One of them won by chance. No one predicted that. 

Sources and Further Reading
  • Fast Company
  • Nate Silver 

1 comment:

  1. The development of artificial intelligence (AI) has propelled more programming architects, information scientists, and different experts to investigate the plausibility of a vocation in machine learning. Notwithstanding, a few newcomers will in general spotlight a lot on hypothesis and insufficient on commonsense application. IEEE final year projects on machine learning In case you will succeed, you have to begin building machine learning projects in the near future.

    Projects assist you with improving your applied ML skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include projects into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Final Year Project Centers in Chennai even arrange a more significant compensation.

    Data analytics is the study of dissecting crude data so as to make decisions about that data. Data analytics advances and procedures are generally utilized in business ventures to empower associations to settle on progressively Python Training in Chennai educated business choices. In the present worldwide commercial center, it isn't sufficient to assemble data and do the math; you should realize how to apply that data to genuine situations such that will affect conduct. In the program you will initially gain proficiency with the specialized skills, including R and Python dialects most usually utilized in data analytics programming and usage; Python Training in Chennai at that point center around the commonsense application, in view of genuine business issues in a scope of industry segments, for example, wellbeing, promoting and account.


Machine Learning for Beginners Resources

Uniqtech guide to Machine Learning. This guide explains the difference between machine learning, traditional programming, machine learning w...