Fooled by Randomness

Quick Pick:
Kansas City +6 - No comment

Currently, I am reading a very interesting book called Fooled by Randomness written by Nassim Taleb. The book is about "how we perceive and deal with luck in life and business". It contains a story about two people, one who got lucky in the stock market and 'explained' why his move was a 'smart' one after the fact. The other character who recognized randomness in the market, played it safe, and although never made huge amounts of money, never lost it all as the first character did once randomness kicked in. One got lucky period and the other had some knowledge, played it safe, and was well in the long run.

The same can be said with NFL gambling.

Say I create a methodology that is tested for 10 years to be 65% successful (which I have). I can then create a story behind the methodology explaining why it is a good strategy (something like the spread is overreacting to recent games or giving too many points to home favorites) to back up this strategy. Reality is that things change and so does the NFL. One should always play safe and not overreact to success. A method (even one very hard to replicate) that gives 65% for 10 years may yield 40% in the next 2 years, at which point anyone would go broke if they gamble at the same rate. In general, one can find even simpler methodologies that may be profitable like betting home underdogs, or betting against the line movement but you may have to wait 2,5, 10, or even 20 years to hit above 53%.

Part of the reason I liked having a computer make the picks is because there is no BS in why a pick is so good, it's good because the data says so. Explaining why a pick is such a good one is all boloni and if you hear an expert on the radio or the internet explaining why Kansas City is a great pick on week 17 over the 6-point favorite Jets, block your ears and bet with your instinct. The best part I have read from the book so far is when people try to explain why such a decision was a mistake. It is only a mistake if you catch it before making the decision, after the fact is too late. It's like the senators who voted for the war and are now campaigning on how the president made such a mistake in going to war. There is a saying in Puerto Rico: "It is easy to guess a dog is male after you have seen it's testicles", but can you guess with certainty before?

This is not a goodbye speech nor I am giving up. I still believe that a statistician of my caliber can beat the less-educated bookies in Vegas. All I am saying is that I will not be "fooled by randomness" by explaining why my Tampa Bay or Denver pick last week was a mistake. I will also not try to convince anyone that a pick is a gimme or certain, at most it would only be 65% certain. I will also continue my search for better predictors of the spread and superior NFL betting strategies.

I only have 1 pick for this week since most predictions from the model are not applicable due to teams that have already clinched the playoffs. In the past 10 years, this strategy has earned 62% and it only applies to one game in week 17. That is, Kansas City +6.

Until the playoffs,


Week 16

NFL Point Spread Picks:
Tampa Bay -6 - San Francisco is weak and inexperienced
Philadelphia +3.5 - Going against ESPN who chose NO to cover
Denver +9 - I wouldn't bet on this one, but it is what came out
Indianapolis -7 - Indianapolis knows what to do before entering the playoff, gain momentum.

Last week the model did not perform well going 2-3 ATS on the 5 yellow picks posted. Overall, the record is now at 58% for yellow picks and 60% for green picks since I started coloring favorite picks on week 9. This week I have built a more robust model I will use to verify the current model's picks. I have tested it and it does as well as the current pick and when both pick the same team, we get a 2% improvement when predicting games from 2000-2007, going to 55% for all games.

The purpose of statistically trying to predict games should be to find opportunities and not to try to predict ALL games successfully. I have found situations in which 30 and 40 games have been predicted correctly with more than 70%.

This week we have one pick above 60% and 3 yellow ones. I have excluded the Minnesota game since Collins has only played 2 games and the data used includes games where Jason Campbell played. Good luck everyone!

NFL Picks Week 16
<%image(20071221-NFL2007_week16c.jpg|668|459|NFL Picks week 16)%>

Week 15

NFL Point Spread picks:
Buffalo +6 Swimming against the current, it's not easy
Houston +1 Home underdog, same record, I like it.
NY Giants -4.5 My computer doesn't even know that Jason Campbell is not playing!
Indianapolis -11 I don't like hot streaks, but boy did they look good on Sunday!
Jacksonville +4 Tough pick, but I think JAC wins the game.

I started categorizing games into "green" and "yellow" since week 9. The purpose was to illustrate the success rate of the model for certain circumstances in which the game at hand fell into. Green implying a significantly good success rate, currently defined above 60% , and yellow an OK success rate currently defined at 55%-60%. Since then, the yellow picks have had a 8-4 ATS record and the green picks a 5-3 ATS. Overall, that is 13-7, 65%, huge right? Well, I am not convinced since it is only 5 weeks and anyone can get lucky.

This week, there wasn't a game that I could categorize as "green" even though the top one has a 61% confidence. Although I should not pay attention to wisdom of the crowds, I found at that 75% picked Cleveland to cover. I have been tracking these percentages and some weeks the crowd is right, like last week, but sometimes they are totally wrong. There is a scientific paper that 'proves' that betting against the line move is a profitable strategy giving about 54% within some time frame of games the statistician looked at. Meaning that if the line moves one way, you bet the other. Well, I am not changing my pick but I will not categorize it as green, Buffalo +6.

Here are the NFL week 15 computer picks:
<%image(20071211-NFL2007_week15.jpg|639|454|NFL week 15 picks)%>

Week 14

NFL Computerized Statistically robust picks:
Cleveland -3.5
Arizona +7
Philadelphia -3
Washington -3

Another week with huge and unexpected SUCCESS. Carolina and Pittsburgh were both picked last week with 'yellow' confidence. These "yellow picks" have covered the spread 4-0 in the past 2 weeks. Also the highest green pick, last week was San Diego, also covered for the 4th week in a row! Still, the model is not and probably will never be perfect and disappointed me with Cleveland's loss against Arizona.

This week I have some good picks for you as well. For those new readers, here' a quick explanation of the table below. The Estimate column is an estimate produced via a regression model of team rankings, off/def yards, and home field advantage. The team of record is the home team, which means negative favors the home team.

The success of these picks is cross-tabbed by 5 categories of the Vegas spread and 5 categories of the difference between the prediction and the Vegas spread. The Total Games column represents the number of games that have fallen into this category since 2002 and the percent is the success rate of the model for this cross-tab. For example, the MIN @ SF game falls into the "vegas spread is more than 6 points for visiting team" and "estimate is favoring visiting team more than 6 points than the spread" and this category has had 3 games, 2 of which have been predicted correctly (hence 67%).

I have not chosen Minnesota or Indi since total games (sample size) is very small. Here are the NFL Picks of the Week 14:
<%image(20071205-NFL2007_week14b.jpg|625|410|NFL Week 14 Point Spread Picks)%>

Week 13

Quick Picks:
Cleveland +1
San Diego -5
Pittsburgh -8.5

Last week was a HUGE SUCCESS! Overall, the statistical model had a 67% success rate and although my purpose is to select games with higher chance of success, it is still good to know that the model performed well overall. Specifically, the games that the model selected to have high confidence also performed well. Both yellow picks (mid-high confidence), last week's were Green Bay and Indianapolis, covered as predicted. Last week's highest confidence pick (New Orleans) covered by a wide margin.

The only disappointment was the OAK @ KC game which (lesson learned) should have not been predicted because of injuries. The technique I use weighs recent games heavier, but games dating back to last season are still considered. During that time, KC ran with Larry Johnson and Oakland did not use Culpepper. Since there is currently no way for me to account for injuries, from now on when a team has a key player missing, I will not include that game in one of my confidence games.

NFL week 13 point spreads seem rather harder to predict or let's say I feel a bit less confident than last week. Feelings set aside, there are 4 games that the computer picks with mid-high to high confidence. Below is the usual table with the Vegas spreads, computer point spread predictions, NFL picks, and percent confidence. If you Read More, we delve into the statistical details and graphs for each of these 4 matchups.

<%image(20071127-NFL2007_week13.jpg|524|478|NFL Week 13 Picks)%>

Week 12

NFL Point Spread Picks:
New Orleans -3
Kansas City -6
Indianapolis -11.5

Motivated by one of our readers, I have decided to run the current model and show you two GREEN picks and two YELLOW picks. At the top, with 71% confidence we have New Orleans covering a spread of 3 at Carolina. Personally I don't like this pick, but I have to disregard my thoughts and feelings since they might be biased by reading other people's opinions and blogs. Secondly, we got Kansas City covering a spread of 6 at home against Oakland. I do like this one.


The computer also shows some other attractive games which are worth mentioning. The model is predicting Indianapolis over Atlanta by 18 points which would comfortably cover a 11.5 spread. The line has shifted to 12.5 in some sites which is still a bargain since it is below two TD's. One curious game that I have not highlighted in yellow because it goes against most people's opinion (which might actually be good) is Houston defeating Cleveland by 3 while the spread is currently set at 3.5 favoring Cleveland. The Browns are playing well so I didn't want to include this game might be an anomaly in the model. Finally, we have Green Bay over Detroit by more than 3.5 at Detroit. This one has been chosen by the model with 56% confidence and backed up by most (67%) experts. Also, the line has moved about 1/2 to 1 point for Green Bay which may be signs that the spread was to low to start with.

Let's dig in deeper into the current stats for both of the GREEN picks.

New Orleans @ Carolina
First, let's look these teams have paired up against the spread in the past 5 games.

<%image(20071121-lst512ca.gif|570|340|NO @ CAR)%>

In the past two weeks, both teams have not covered the spread but New Orleans being farther off including a surprising loss against St. Louis where they favorites by 10 points and lost by 8. Before that, NO covered by more than 12 points in week 8 and 9 against SF and JAC, Jacksonville being the only heavy contender. Carolina on the other hand has not won or covered the spread in the past 4 weeks although they've had a slightly tougher schedule playing with IND, TEN, ATL, and GB. In 10/14 you see a huge jump of 20 when they beat Arizona by 15. Let's look at their OFF/DEF stats.


In the past 5 games, Carolina has averaged about 10 passing yards less than their opponents per game while New Orleans has been significantly better with 40 yards more per game. Rushing yards were quite surprising to me. New Orleans averaging 20 yards less per game while Carolina has a little above 5.

Finally, we will look at how these two teams have faired off against each other since 2001.


Watch the red bars, if they are above zero it means Carolina covered and if the green bars are above zero it means Carolina was favorite. The last game was on 10/07/2007 when the Vegas line was favoring New Orleans by 4, but Carolina covered by 7 beating NO with a score of 16-13 at New Orleans, this gives me chills. Notice any patterns? I see Carolina has been favorite since 2004 until the last game this year, but covering the spread half the time. Not really a pattern and I do not see anything significant jumping at me so I will back up my model and say New Orleans -3.

Oakland @ Kansas City



Week 11

Predicting the NFL Point SpreadCurrently I am working on different kids of models and statistical techniques in order to obtain one that is satisfactory. So far, the best model gives 52% winning rate on ALL games from 2000-2007. This model would make a bettor break-even, unsatisfactory. We worked on finding circumstances where the model provides more accuracy and we had found that indeed there are situations where the model predicts with more than 65% accuracy, but these situations were hardly the norm.

I have come to the conclusion that a simple linear regression model will not work with the data I have. Other similar techniques I have tried include robust linear regression, where outliers are down-weighted weighted in order to obtain better estimates and logistic regression which allows the responses to be binary in order to predict the probability (odds) that a team would cover the spread.

A few of the problems I see are that 1) the data is not linear, 2) the observations are not independent, and 3) there are other significant factors that are not contained in the data. All hope is not lost, there are statistical techniques around this. There are non-linear techniques like Bayesian statistics that might be more accurate and take the severe randomness into account. Repeated measures or also called mix models which are more commonly used in drug trials and account for the correlation of observations done repeatedly to a single entity.

Finally, there is a vast number of data mining techniques that might be a good fit for the NFL data. These include: neural networks that are able to capture complex system of behavior between inter-correlated/connected nodes, regression/decision trees which accommodate predictive modeling and classification (in our case, will a team cover the spread?), clustering procedures that could group games in terms of their predictability, Vegas spread, and other factors that can help in building estimates for each cluster type.

Stay tuned for next week when I deploy my next best model. I will give you the NFL picks and their level of accuracy from 2000-2007.

Week 10

Quick NFL Picks:
Dallas -1.5
Buffalo -3

My computer admits last week picks were brutal, she told me. But she did tell me that her most confident game was right on the nose. Houston indeed covered the spread. Brutality came with the other 2 medium confidence games. Although the Cleveland/Seattle game came down to the wire, there is no excuse. The question is, should we just pay attention to the 'green' games? Well, I wouldn't go that far, just yet. The good news is that in this NFL week 10, the model produced 2 high ('green') confident games. Is it Minnesota/Green Bay, Indianapolis/San Diego, or Chicago/Oakland. If I were to pick with my pure gut feeling, these are the games I would choose and would pick all visiting teams in those 3 games. The green games of the week are: Dallas @ NY Giants and Buffalo @ Miami. We will look at more detailed stats for these two games in the next post, but for this one, we'll stick to the picks and predictions.

Below you will find the table with the picks and the new confidence column I described in my previous post. I couldn't believe my eyes, it predicts the Cowboys to beat the Giants by about 6 points and that they will cover the spread with 79% confidence! OMG. I have also included a column of total games from 2002-2007 that fall into a similar category as the these games. This column should be used as a gauge on the sample size (the smaller, the less reliable the confidence is). What is a good sample size? That is something I just thought about and for a categorical problem like we have here, usually more than 5% in each category is considered reliable. In our case, 5% of all games from 2002-2007 is actually about 60, so we are below reliableness. I will modify categories in order to obtain bigger samples in each(next week). For now, let's look at the numbers:


I would like to see the defensive/offensive stats, the spread history, and last 5 game performance for the two so call high confident games. By inspection, I like these picks. Buffalo has been very good to my computer. Every time she has predicted Buffalo, they have come through, oh and yes they are playing Miami. Dallas is playing unbelievably well and Vegas is still doubting them. Watching them destroy Philadelphia last week showed me that they are not too far up there with New England and Indianapolis.

Notice there are 2 games with confidence higher than 60% but less than 65% and those are:
Minnesota @ Green Bay
Cleveland @ Pittsburgh

Minnesota comes as no surprise and most lines are starting to move because of gamblers' preference towards the Vikings. The Cleveland @ Pittsburgh game I am less comfortable with after seeing last week's domination. We will have to look into dig deeper in the stats for each of these 4 games in my next post. Stay tuned!

Clash of the Titans: New England vs. Indianapolis

What a game! Is it Sunday yet? Indianapolis is undefeated and so is New England. Peyton Manning was last year's MVP and Tom Brady is on route to break the record for the most TD in one season. Although the Colts are playing at home, Vegas is giving a 5-point (Bodog has the spread at -6) advantage to the visiting team. In this post we'll explore some stats and predictions to see if this spread is justifiable.

First, we start with the history when these two teams have faced each other. The graph below shows all the games since 2001 when these teams have battled it out. At the far right we see last year's AFC conference championship. The positive green bar shows that Indianapolis was favorite by 3 games and as we all know they went to beat the Patriots by a score of 38-34 and covering the spread by 1 point (hence the small positive red bar). In 2006, New England was favorite by four points but again the Colts covered and as you can see from the red bar, by 10 points. It wasn't until 1/16/2005 (Divisional Playoffs in 2004) that New England covered easily by more than 16 points (actual score was 20-3).


Let's look at this year's stats:

The graph below shows the average yardage against opponents in the past 5 games. Tom Brady's excellence shows that they have passed more than 100 yards per game while Indianapolis 'only' 80 yards. Indianapolis' rushing yardage looks slightly better than New England (this could be key in Sunday's game).


As far as who does my statistical model predicts. I have to say it does not know. The statistical problem here is that these two teams are the 'extremes' and so regression pulls it back to the 'mean'. It has both teams squarely even but again this is not a reliable prediction. I would say that this game is highly volatile. New England has been playing their best ever (now without taping other teams' signals) and the Colts are as good as last year. I wouldn't bet on this game but if would, I would go with my instinct.

I do have week 9 picks that the model is predicting with great confidence, but I'll leave those to my next blog.

Week 9

Quick picks:
Seattle +1
Houston +3
Baltimore +9.5

I ran my model for this week, compared its success to this year and previous seasons, and automatically created picks accordingly. The system is based on which situations is the model predicting correctly. For example, when the Vegas spread is favoring the home team by 3 to 6 points but my model predicts the visiting team to win by more than 6, the visiting team ends up covering the spread 75% of the time. This week we have one such situation, Houston at Oakland.

If you have been reading my blog, you will notice that I added a new dimension to measuring the effectiveness of the model. Before, I was defining 'situations' as intervals of points of the difference between my spread and the Vegas line. Now, I consider not just this difference, but what was the value of the Vegas line at such difference. You can visualize it as a 3-way cross-tab. I found some situations with 90% from 2003-2007, although sample sizes were as big as 10. I did find more than 75 games that fit into situations with more than 65%. I will not display all the results since it would take up pages, but I will give you this week's computer generated NFL point spreads and picks. Beware, this week's picks are all visiting teams something to look into closer. Also, I am currently figuring out a method to better measure trends or "momentum" if it exists. Good luck!


NFL Picks and Measures of Confidence

Lately, I have been highlighting games in green that have 'high' degree of confidence (las, but what is this measure of confidence? It is simple. Statisticians use a wide variate of deviance measures to understand the accuracy of models and procedures. For regression models (which is the current model being used in this blog) some examples include R-Squared (R^2) and Mean Square Error (MSE). The former measures the relative proportion of the variance explained by the model and the latter the expected value of the error (how far apart could I be from actual the point spread). These measures are used to select the model to use, i.e. the one with highest R^2 and/or lowest MSE.

In the NFL, it is not so important to be accurate on predicting the point spread (Vegas does a pretty good job), it is most important to select winning picks. Therefore, this problem turns into a classification procedure that decides which team is more likely to cover the spread. The decision 'confidence' can be determined by the error rate, i.e. the percentage of times that the decision correctly predicts the winning team against the spread.

As you have seen in previous posts, we have been trying to classify each game into certain "buckets" depending on the Vegas line and the prediction of the model. I was asking myself this weekend: Are these the right buckets, I don't have much more data to use so...? How should these buckets be partitioned? That brought me back to grad school and decision-tree learning methods I studied. Of course! I thought. Let the data partition them to achive lowest error rate. Thanks to the open source statistical software R I will be able to build these decision trees, include more variables (essentially offensive and defensive yards), and hopefully make better predictions.

What is going to change for the next blog posts? Not much, only that for each game besides including the pick decided by the model, I will include the error rate. So if you see for example (this I already ran so it is a quick pick at week 10) Dallas @ NY Giants 78%, it implies that for games which have fallen in the category that this game belongs to, the decision has predicted the correct pick 78% of the time. Huge confidence!

Week 8

Quick picks:
Tennessee -7.5
Buffalo +3
Green Bay +3
Jacksonville +4

For example, when the difference between the prediction and the Vegas spread is between -5 and -3 points (home team preferred), picking the home team has been correct 9 of 12 times (75%). One could argue that when the difference is between -1 and 0, picking the home team yields only 16% so picking the visiting team would yield 84%. I will not highlight games which difference fall in this bucket.


Now let's take a look at these week's computer generated picks and their difference against this week's Vegas spread. Unfortunately, for this week there is only one game (Oak @ Ten) that falls in one of the 'good' categories, two (GB @ Den, Buf @ NYJ) that are very close to these categories, and one that is a huge outlier (Jac @ TB) which should be investigated further.


Let's look at stats for each of these 4 games:

Oakland @ Tennessee
When I initially saw the line I thought that it was an easy pick for Oakland. I thought that Tennessee would be an underdog or slight favorite. Apparently, last year's data is dragging Oakland to the mud. Although my model shows high confidence, 6 weeks of running this still does not give me a personal confidence for this pick. But I won't back down from my regression friend. Tennessee -7.5.


Oakland is clearly struggling in passing and rushing yards against opponents in the past 5 games. Averaging about 40 yards less in both categories. Tennessee on the other hand is rushing about 10 yards and passing above 60 yards more than its opponents in the past 5 games.


On average Tennessee is slightly doing better against the spread by averaging almost 6 points above while Oakland 4. Notice also that Oakland has been below the spread by about 4 points in the past two games.

Buffalo @ NY Jets and Green Bay @ Denver
These games have equal spread and equal prediction. Although both if these game fall into an undesired category by the model, they are still very close to the 1-2 bucket(which has picked 67% correctly). Both Green Bay and Buffalo are playing well and are still bringing doubt from bettors. So let's look at how they have performed recently.

The following graph shows the difference in the Vegas point spread and the actual result along with the Vegas line. Notice that the last game these two teams played (9/30/2007) although New York was a favorite, Buffalo covered the spread by 6 points. Last year New York was also favorite and this time Buffalo covered by more than 20 points. The model's pick plus this graph is enough for me to conclude Buffalo +3.


Look at the following two graphs. Green Bay clearly kicking ass in the yards department. Also notice Denver's struggles covering the spread, although they are still favorites in this game. I say no way. Green Bay +3.


Week 7

In green you will see games where the prediction confidence is at its highest, 65%-70%. In orange, the confidence is at the second highest level 55%-65%. Notice that my Vegas Line and point spread predictions take the home team as reference. That is, when the Vegas line is negative, then the home team is favorite. For example, the Arizona @ Washington game has Washington favorite by 8 points and the computer generated point spread prediction is 2.1 points more than the Vegas spread which falls into this year's low confidence category (see previous posts) and hence the row is kept white. Other games where the difference is too high, confidence is low since other factors not included in the model may be affecting the estimate.

This week we have 2 games with high confidence and 2 games in 'medium' confidence. The San Francisco @ NY Giants, the estimate is favoring the home team by more than 4 points. The Tampa Bay @ Detroit game, the prediction is giving the visiting team a slight advantage. In orange we have Dallas covering the spread by more than two points and New Orleans beating Atlanta but falling short of the 8.5 points.


In the coming weeks, I will be exploring how outliers may be affecting my estimates and weigh those observations differently. As for now, let's explore the data concerning the Atlanta @ New Orleans and the New York @ San Francisco games:

Atlanta @ New Orleans
A first look at the passing yards difference between each team and their opponents in the last 5 games shows both teams averaging less yards than their opponents in the past 5 games. Atlanta is averaging about 30 passing yards and about 38 rushing yards less than their opponents. New Orleans is not performing that well either against opponents, averaging around 12 passing yards and 15 rushing yards less than their opponents..


Evaluating the data on how each team has performed against the Vegas line, we plot the number of points above or below the Vegas line in each team's past few games. We clearly see how New Orleans started very badly by being 20 point or more below the Vegas line. Remember in New Orleans lost by 17 against Tennessee on week 3, by 17 against Tampa Bay, and by 31 against Indianapolis on week 1. They rebounded last week by beating Seattle by 11. Besides last week's lost by 21 (16 points below the spread) against the Giants, Atlanta has been bouncing around the spread with its highest peak on week 4 with a 10-point win over Houston. So do you think Atlanta will rebound from last week's loss and New Orleans will continue its streak? Considering that last year New Orleans beat Atlanta twice by more than 20 points, probably but my model does not think so. Tough pick: Atlanta +8.5


San Francisco @ NY Giants @
The graphs clearly support the model's decision that 9.5 points is not enough. The passing game, the rushing game, and past few games show that the Giants are playing much better. You be the judge:


Tampa Bay @ Detroit


Good luck!

Vegas' Accuracy of Predicting the Point Spread

How good is Vegas in predicting the point spread? Or better yet, is Vegas getting any better at predicting the point spread? In this post I am studying the difference between Vegas' prediction of the point spread and the actual outcome. That is,
Difference = Point Spread Outcome - Vegas Line

The Point Spread Outcome is computed as Visiting Team Score - Home Team Score since my Vegas Line is referenced in the following way: negative if the home team is favorite and positive if the visiting team is favorite. So, if the difference is HIGHLY POSITIVE it implies that the VISITING team was underrated by the Vegas Line. If the difference is HIGHLY NEGATIVE then the HOME team was underrated. For example, take last year's Super Bowl (Chicago as the 'home team'). The Vegas Line was 6.5 (favoring Indianapolis since it is positive). The actual score was 29-17, hence the actual point spread was 12 and the difference 12-6.5 = 5.5. One can say that the Vegas Line underestimated the visiting team.

I grouped the difference into 7 categories:

  • ' < -10' = HOME team underrated by more than 10 points

  • ' (-10,-6)' = HOME team underrated by less than 10 points but more than 6

  • ' (-6,-2)' = HOME team underrated by less than 6 points but more than 2

  • ' (-2,2)' = Vegas got within 2 points of the actual outcome

  • ' (2,6)' = VISITING team underrated by less than 6 points but more than 2

  • ' (6,10)' = VISITING team underrated by less than 10 points but more than 6

  • ' > 10' = VISITING team underrated by more than 10 points

Here is a graph of the percentage of games in each category within each season:

See any trends?

Within each category there isn't any real trends. One thing to notice is that the the two highest categories averaging about 23% are the ones that Vegas misses by more than 10 points (both visiting teams and home teams). Another interesting aspect of this graph is last year's big spike the ' > 10', implying that last year they underestimated the visiting team more than in any other year (One might expect them to correct for this, this year, watch out for underrating the home teams). Come back in a couple of days because I will study the characteristics (if any) of these games where Vegas is missing by more than 10 points, if I find something interesting I will post it.

We know that on average Vegas does well. Since 1992 the average difference is a mere -0.14 with a minimum of -47.5, maximum of 44, and standard deviation of 13. Seems highly volatile to me the fact that on average Vegas is 13 points off the actual outcome. Is Vegas getting better in predicting the point spread? In short, no, the following table shows the mean and standard deviation from 1992 to 2006.

Difference: Point Spread - Vegas Line
Season N Mean Std Dev
1992 204 -0.061 14.412
1993 198 0.15 12.913
1994 204 1.62 12.284
1995 235 0.36 12.601
1996 243 -1.29 12.877
1997 251 -0.23 13.188
1998 251 -0.98 12.185
1999 259 -0.72 13.289
2000 259 -0.61 13.384
2001 259 0.20 13.050
2002 267 -0.24 13.540
2003 267 -1.03 13.470
2004 267 -0.05 12.900
2005 267 -0.89 12.871
2006 266 1.96 13.432

Introduction to Wagering on National Football League Games

The problem of estimating the outcome of the National Football League games started to gain attention in the literature only since the 70's! Most of these authors, not all, argue that one can try to estimate the spread or build profitable wagering strategies by using outcomes of games from past seasons. Some of these strategies are simple and easy to understand while others require high-level statistical background. My interest is in the statistical wagering strategies, but why use these if the simpler methods work better? So, we will first look at the simple, non-statistical approaches to wagering in the next blog, Simple Wagering Strategies. Here, for those not familiar with NFL wagering, I will briefly discuss how it works.

Wagering on NFL Games
The gambling procedure for the NFL is quite trivial. Each week bookmakers establish a point spread (or just the spread) for each of the games. The bettor wins if the bet is placed on the favorite team and the favorite team wins by more than the point spread or if the bet is placed on the underdog and either the favorite team does not win by more than the point spread or the underdog wins the game. The bookmaker charges a 10% ante on all bets. That is, if a $100 bet is placed ($110 with the ante) and wins, the bettor is paid $100 and keeps the ante; if the bettor loses she pays $110. In order to break even, how many winning bets does the bettor need? 52.38%. (Say p=proportion of winning bets then to break even solve 100p=110(1-p) which give p=.5238 or 52.38%.) .

I am not a bookmaker nor do I intend to be. Hence, I do not know exactly how they arrive at the point spread. I do know that their intentions are not to predict the true point spread but to separate the population in half. This way, the bookmaker can guarantee 5% profit regardless of the outcome of the game (10% from half of the betting population). It is for this reason that many researchers believe there must be a profitable strategy. But which one? And if people find about it, will it still be profitable? So why am I wasting my time trying to find strategies in football wagering? In short, because I'm a statistical geek.

This study is a competition between strategies. I could have gathered the data from previous years, obtained the results immediately and found out which strategies were more profitable, but I find it more amusing to test this strategies in real-time. Watching NFL games with friends is fun. Talking about players' expected performances is better than listening to the guys on ESPN. Seeing your predictions unfold right in front of your eyes, priceless.

Performance of wagering strategies and picks in this blog are based solely on profitability against the mother of all bookmakers, Vegas baby! Of all the websites giving the Vegas daily point spreads, I will use the winner of the Google search algorithm keyword Vegas point spreads: All wagering strategies are data-based, that is, no feelings, preferences, or opinions about a city or team are taken into consideration. Only the past performance of each team (or player) until the day before the game is used.

On a side note, don't try this at home! Sports betting (online and offline) and specifically betting on the outcome of football games is not only illegal in most states, but a very risky endeavor. I do not encourage anybody to use the strategies I will discuss in real gambling situations since even the authors that claim to have found profitable strategies, only made at most 5% on their money, which is what I'm currently getting in my risk free CD at my local bank.

First Post - How I Got Into This

I have been fascinated with the combination of statistics and sports since I was child. Growing up, I remember telling my friends, who were playing video games with me, not to touch the joystick while I wrote down the stats. In those days, about 1986, sports video games did not record the stats, so I did. It was a lot of work, and we only kept track of a small percentage of what people keep track of today, but we were still able to use these stats to, for example, select a video game tournament's MVP. Statistics would be very present in my thoughts also when I was playing a sport. After a basketball game I would know how many points, rebounds, and assists I had at any moment in time. Was this egotistical and selfish? No, a few times I would tell my friends how they performed on the game, and they would claim to have had many more rebounds or assists since the league recorded only the points. Indirectly or directly, I was always interested in having a way to rank players or teams, which brings me to this blog.

This blog is not written to give gambling tips nor an online sports wagering pick. nor do I encourage anybody to go into sports gambling based on these experiments. So, what will I be doing here? Basically, using different statistical measures for predicting outcomes of National Football League (NFL) games. By predicting outcomes, I mean, an estimate of the true spread, the number of points by which team A will beat team B. That is, instead of predicting that the score will be 20-13 in favor of A, I am only interested in predicting that A will beat B by 7. I will keep track of ten different ranking measures or ways to estimate NFL game outcomes. The ranking measures or published in this blog were obtained from published scientific journals, websites, and my own cooked up measures. This blog will help me to create an online notebook, keep track and compare the different estimator's performance on the 2006 NFL season outcomes.

Future Blogs
In my next few blogs, I will give an introduction and explanation to each of the statistical estimators and the different sports (NFL) wagering picks. Some of these estimators are the work of how other statisticians who published their work on well known statistical journals. The estimators vary based on the statistical methodology and the variables they use. For example, one estimator may base her decision solely on previous scores and home-court advantage, while another may take into consideration yards rushed, yards passed, interceptions, sacks and penalties. A statistical methodology used by one estimator may be a standard linear model, another might use a Bayesian approach, and one might use an ad hoc methodology like surveying free opinions from different websites. I will try to stay away from too many mathematical technicalities to explain how each estimator works, but for those savvy statistical readers, I will post references to scientific journal articles and websites for detailed explanations of each estimator. And after that, when the NFL 2006 season starts, watch weekly what these estimators predict and how much imaginary money they win or lose. Stay tuned!