Vegas' Accuracy of Predicting the Point Spread

How good is Vegas in predicting the point spread? Or better yet, is Vegas getting any better at predicting the point spread? In this post I am studying the difference between Vegas' prediction of the point spread and the actual outcome. That is,
Difference = Point Spread Outcome - Vegas Line

The Point Spread Outcome is computed as Visiting Team Score - Home Team Score since my Vegas Line is referenced in the following way: negative if the home team is favorite and positive if the visiting team is favorite. So, if the difference is HIGHLY POSITIVE it implies that the VISITING team was underrated by the Vegas Line. If the difference is HIGHLY NEGATIVE then the HOME team was underrated. For example, take last year's Super Bowl (Chicago as the 'home team'). The Vegas Line was 6.5 (favoring Indianapolis since it is positive). The actual score was 29-17, hence the actual point spread was 12 and the difference 12-6.5 = 5.5. One can say that the Vegas Line underestimated the visiting team.

I grouped the difference into 7 categories:

  • ' < -10' = HOME team underrated by more than 10 points

  • ' (-10,-6)' = HOME team underrated by less than 10 points but more than 6

  • ' (-6,-2)' = HOME team underrated by less than 6 points but more than 2

  • ' (-2,2)' = Vegas got within 2 points of the actual outcome

  • ' (2,6)' = VISITING team underrated by less than 6 points but more than 2

  • ' (6,10)' = VISITING team underrated by less than 10 points but more than 6

  • ' > 10' = VISITING team underrated by more than 10 points

Here is a graph of the percentage of games in each category within each season:


See any trends?

Within each category there isn't any real trends. One thing to notice is that the the two highest categories averaging about 23% are the ones that Vegas misses by more than 10 points (both visiting teams and home teams). Another interesting aspect of this graph is last year's big spike the ' > 10', implying that last year they underestimated the visiting team more than in any other year (One might expect them to correct for this, this year, watch out for underrating the home teams). Come back in a couple of days because I will study the characteristics (if any) of these games where Vegas is missing by more than 10 points, if I find something interesting I will post it.

We know that on average Vegas does well. Since 1992 the average difference is a mere -0.14 with a minimum of -47.5, maximum of 44, and standard deviation of 13. Seems highly volatile to me the fact that on average Vegas is 13 points off the actual outcome. Is Vegas getting better in predicting the point spread? In short, no, the following table shows the mean and standard deviation from 1992 to 2006.

Difference: Point Spread - Vegas Line
Season N Mean Std Dev
1992 204 -0.061 14.412
1993 198 0.15 12.913
1994 204 1.62 12.284
1995 235 0.36 12.601
1996 243 -1.29 12.877
1997 251 -0.23 13.188
1998 251 -0.98 12.185
1999 259 -0.72 13.289
2000 259 -0.61 13.384
2001 259 0.20 13.050
2002 267 -0.24 13.540
2003 267 -1.03 13.470
2004 267 -0.05 12.900
2005 267 -0.89 12.871
2006 266 1.96 13.432

Introduction to Wagering on National Football League Games

Intro
The problem of estimating the outcome of the National Football League games started to gain attention in the literature only since the 70's! Most of these authors, not all, argue that one can try to estimate the spread or build profitable wagering strategies by using outcomes of games from past seasons. Some of these strategies are simple and easy to understand while others require high-level statistical background. My interest is in the statistical wagering strategies, but why use these if the simpler methods work better? So, we will first look at the simple, non-statistical approaches to wagering in the next blog, Simple Wagering Strategies. Here, for those not familiar with NFL wagering, I will briefly discuss how it works.

Wagering on NFL Games
The gambling procedure for the NFL is quite trivial. Each week bookmakers establish a point spread (or just the spread) for each of the games. The bettor wins if the bet is placed on the favorite team and the favorite team wins by more than the point spread or if the bet is placed on the underdog and either the favorite team does not win by more than the point spread or the underdog wins the game. The bookmaker charges a 10% ante on all bets. That is, if a $100 bet is placed ($110 with the ante) and wins, the bettor is paid $100 and keeps the ante; if the bettor loses she pays $110. In order to break even, how many winning bets does the bettor need? 52.38%. (Say p=proportion of winning bets then to break even solve 100p=110(1-p) which give p=.5238 or 52.38%.) .

I am not a bookmaker nor do I intend to be. Hence, I do not know exactly how they arrive at the point spread. I do know that their intentions are not to predict the true point spread but to separate the population in half. This way, the bookmaker can guarantee 5% profit regardless of the outcome of the game (10% from half of the betting population). It is for this reason that many researchers believe there must be a profitable strategy. But which one? And if people find about it, will it still be profitable? So why am I wasting my time trying to find strategies in football wagering? In short, because I'm a statistical geek.

This study is a competition between strategies. I could have gathered the data from previous years, obtained the results immediately and found out which strategies were more profitable, but I find it more amusing to test this strategies in real-time. Watching NFL games with friends is fun. Talking about players' expected performances is better than listening to the guys on ESPN. Seeing your predictions unfold right in front of your eyes, priceless.

Performance of wagering strategies and picks in this blog are based solely on profitability against the mother of all bookmakers, Vegas baby! Of all the websites giving the Vegas daily point spreads, I will use the winner of the Google search algorithm keyword Vegas point spreads: www.vegas.com/gaming/index.html. All wagering strategies are data-based, that is, no feelings, preferences, or opinions about a city or team are taken into consideration. Only the past performance of each team (or player) until the day before the game is used.

On a side note, don't try this at home! Sports betting (online and offline) and specifically betting on the outcome of football games is not only illegal in most states, but a very risky endeavor. I do not encourage anybody to use the strategies I will discuss in real gambling situations since even the authors that claim to have found profitable strategies, only made at most 5% on their money, which is what I'm currently getting in my risk free CD at my local bank.

First Post - How I Got Into This

History
I have been fascinated with the combination of statistics and sports since I was child. Growing up, I remember telling my friends, who were playing video games with me, not to touch the joystick while I wrote down the stats. In those days, about 1986, sports video games did not record the stats, so I did. It was a lot of work, and we only kept track of a small percentage of what people keep track of today, but we were still able to use these stats to, for example, select a video game tournament's MVP. Statistics would be very present in my thoughts also when I was playing a sport. After a basketball game I would know how many points, rebounds, and assists I had at any moment in time. Was this egotistical and selfish? No, a few times I would tell my friends how they performed on the game, and they would claim to have had many more rebounds or assists since the league recorded only the points. Indirectly or directly, I was always interested in having a way to rank players or teams, which brings me to this blog.
Purpose


HOW TO USE STATISTICS TO PREDICT THE POINT SPREAD OF A GAME?
This blog is not written to give gambling tips nor an online sports wagering pick. nor do I encourage anybody to go into sports gambling based on these experiments. So, what will I be doing here? Basically, using different statistical measures for predicting outcomes of National Football League (NFL) games. By predicting outcomes, I mean, an estimate of the true spread, the number of points by which team A will beat team B. That is, instead of predicting that the score will be 20-13 in favor of A, I am only interested in predicting that A will beat B by 7. I will keep track of ten different ranking measures or ways to estimate NFL game outcomes. The ranking measures or published in this blog were obtained from published scientific journals, websites, and my own cooked up measures. This blog will help me to create an online notebook, keep track and compare the different estimator's performance on the 2006 NFL season outcomes.

Future Blogs
In my next few blogs, I will give an introduction and explanation to each of the statistical estimators and the different sports (NFL) wagering picks. Some of these estimators are the work of how other statisticians who published their work on well known statistical journals. The estimators vary based on the statistical methodology and the variables they use. For example, one estimator may base her decision solely on previous scores and home-court advantage, while another may take into consideration yards rushed, yards passed, interceptions, sacks and penalties. A statistical methodology used by one estimator may be a standard linear model, another might use a Bayesian approach, and one might use an ad hoc methodology like surveying free opinions from different websites. I will try to stay away from too many mathematical technicalities to explain how each estimator works, but for those savvy statistical readers, I will post references to scientific journal articles and websites for detailed explanations of each estimator. And after that, when the NFL 2006 season starts, watch weekly what these estimators predict and how much imaginary money they win or lose. Stay tuned!