Week 13

Quick Picks:
Cleveland +1
San Diego -5
Pittsburgh -8.5

Last week was a HUGE SUCCESS! Overall, the statistical model had a 67% success rate and although my purpose is to select games with higher chance of success, it is still good to know that the model performed well overall. Specifically, the games that the model selected to have high confidence also performed well. Both yellow picks (mid-high confidence), last week's were Green Bay and Indianapolis, covered as predicted. Last week's highest confidence pick (New Orleans) covered by a wide margin.

The only disappointment was the OAK @ KC game which (lesson learned) should have not been predicted because of injuries. The technique I use weighs recent games heavier, but games dating back to last season are still considered. During that time, KC ran with Larry Johnson and Oakland did not use Culpepper. Since there is currently no way for me to account for injuries, from now on when a team has a key player missing, I will not include that game in one of my confidence games.

NFL week 13 point spreads seem rather harder to predict or let's say I feel a bit less confident than last week. Feelings set aside, there are 4 games that the computer picks with mid-high to high confidence. Below is the usual table with the Vegas spreads, computer point spread predictions, NFL picks, and percent confidence. If you Read More, we delve into the statistical details and graphs for each of these 4 matchups.

<%image(20071127-NFL2007_week13.jpg|524|478|NFL Week 13 Picks)%>

Week 12

NFL Point Spread Picks:
New Orleans -3
Kansas City -6
Indianapolis -11.5

Motivated by one of our readers, I have decided to run the current model and show you two GREEN picks and two YELLOW picks. At the top, with 71% confidence we have New Orleans covering a spread of 3 at Carolina. Personally I don't like this pick, but I have to disregard my thoughts and feelings since they might be biased by reading other people's opinions and blogs. Secondly, we got Kansas City covering a spread of 6 at home against Oakland. I do like this one.


The computer also shows some other attractive games which are worth mentioning. The model is predicting Indianapolis over Atlanta by 18 points which would comfortably cover a 11.5 spread. The line has shifted to 12.5 in some sites which is still a bargain since it is below two TD's. One curious game that I have not highlighted in yellow because it goes against most people's opinion (which might actually be good) is Houston defeating Cleveland by 3 while the spread is currently set at 3.5 favoring Cleveland. The Browns are playing well so I didn't want to include this game might be an anomaly in the model. Finally, we have Green Bay over Detroit by more than 3.5 at Detroit. This one has been chosen by the model with 56% confidence and backed up by most (67%) experts. Also, the line has moved about 1/2 to 1 point for Green Bay which may be signs that the spread was to low to start with.

Let's dig in deeper into the current stats for both of the GREEN picks.

New Orleans @ Carolina
First, let's look these teams have paired up against the spread in the past 5 games.

<%image(20071121-lst512ca.gif|570|340|NO @ CAR)%>

In the past two weeks, both teams have not covered the spread but New Orleans being farther off including a surprising loss against St. Louis where they favorites by 10 points and lost by 8. Before that, NO covered by more than 12 points in week 8 and 9 against SF and JAC, Jacksonville being the only heavy contender. Carolina on the other hand has not won or covered the spread in the past 4 weeks although they've had a slightly tougher schedule playing with IND, TEN, ATL, and GB. In 10/14 you see a huge jump of 20 when they beat Arizona by 15. Let's look at their OFF/DEF stats.


In the past 5 games, Carolina has averaged about 10 passing yards less than their opponents per game while New Orleans has been significantly better with 40 yards more per game. Rushing yards were quite surprising to me. New Orleans averaging 20 yards less per game while Carolina has a little above 5.

Finally, we will look at how these two teams have faired off against each other since 2001.


Watch the red bars, if they are above zero it means Carolina covered and if the green bars are above zero it means Carolina was favorite. The last game was on 10/07/2007 when the Vegas line was favoring New Orleans by 4, but Carolina covered by 7 beating NO with a score of 16-13 at New Orleans, this gives me chills. Notice any patterns? I see Carolina has been favorite since 2004 until the last game this year, but covering the spread half the time. Not really a pattern and I do not see anything significant jumping at me so I will back up my model and say New Orleans -3.

Oakland @ Kansas City



Week 11

Predicting the NFL Point SpreadCurrently I am working on different kids of models and statistical techniques in order to obtain one that is satisfactory. So far, the best model gives 52% winning rate on ALL games from 2000-2007. This model would make a bettor break-even, unsatisfactory. We worked on finding circumstances where the model provides more accuracy and we had found that indeed there are situations where the model predicts with more than 65% accuracy, but these situations were hardly the norm.

I have come to the conclusion that a simple linear regression model will not work with the data I have. Other similar techniques I have tried include robust linear regression, where outliers are down-weighted weighted in order to obtain better estimates and logistic regression which allows the responses to be binary in order to predict the probability (odds) that a team would cover the spread.

A few of the problems I see are that 1) the data is not linear, 2) the observations are not independent, and 3) there are other significant factors that are not contained in the data. All hope is not lost, there are statistical techniques around this. There are non-linear techniques like Bayesian statistics that might be more accurate and take the severe randomness into account. Repeated measures or also called mix models which are more commonly used in drug trials and account for the correlation of observations done repeatedly to a single entity.

Finally, there is a vast number of data mining techniques that might be a good fit for the NFL data. These include: neural networks that are able to capture complex system of behavior between inter-correlated/connected nodes, regression/decision trees which accommodate predictive modeling and classification (in our case, will a team cover the spread?), clustering procedures that could group games in terms of their predictability, Vegas spread, and other factors that can help in building estimates for each cluster type.

Stay tuned for next week when I deploy my next best model. I will give you the NFL picks and their level of accuracy from 2000-2007.

Week 10

Quick NFL Picks:
Dallas -1.5
Buffalo -3

My computer admits last week picks were brutal, she told me. But she did tell me that her most confident game was right on the nose. Houston indeed covered the spread. Brutality came with the other 2 medium confidence games. Although the Cleveland/Seattle game came down to the wire, there is no excuse. The question is, should we just pay attention to the 'green' games? Well, I wouldn't go that far, just yet. The good news is that in this NFL week 10, the model produced 2 high ('green') confident games. Is it Minnesota/Green Bay, Indianapolis/San Diego, or Chicago/Oakland. If I were to pick with my pure gut feeling, these are the games I would choose and would pick all visiting teams in those 3 games. The green games of the week are: Dallas @ NY Giants and Buffalo @ Miami. We will look at more detailed stats for these two games in the next post, but for this one, we'll stick to the picks and predictions.

Below you will find the table with the picks and the new confidence column I described in my previous post. I couldn't believe my eyes, it predicts the Cowboys to beat the Giants by about 6 points and that they will cover the spread with 79% confidence! OMG. I have also included a column of total games from 2002-2007 that fall into a similar category as the these games. This column should be used as a gauge on the sample size (the smaller, the less reliable the confidence is). What is a good sample size? That is something I just thought about and for a categorical problem like we have here, usually more than 5% in each category is considered reliable. In our case, 5% of all games from 2002-2007 is actually about 60, so we are below reliableness. I will modify categories in order to obtain bigger samples in each(next week). For now, let's look at the numbers:


I would like to see the defensive/offensive stats, the spread history, and last 5 game performance for the two so call high confident games. By inspection, I like these picks. Buffalo has been very good to my computer. Every time she has predicted Buffalo, they have come through, oh and yes they are playing Miami. Dallas is playing unbelievably well and Vegas is still doubting them. Watching them destroy Philadelphia last week showed me that they are not too far up there with New England and Indianapolis.

Notice there are 2 games with confidence higher than 60% but less than 65% and those are:
Minnesota @ Green Bay
Cleveland @ Pittsburgh

Minnesota comes as no surprise and most lines are starting to move because of gamblers' preference towards the Vikings. The Cleveland @ Pittsburgh game I am less comfortable with after seeing last week's domination. We will have to look into dig deeper in the stats for each of these 4 games in my next post. Stay tuned!

Clash of the Titans: New England vs. Indianapolis

What a game! Is it Sunday yet? Indianapolis is undefeated and so is New England. Peyton Manning was last year's MVP and Tom Brady is on route to break the record for the most TD in one season. Although the Colts are playing at home, Vegas is giving a 5-point (Bodog has the spread at -6) advantage to the visiting team. In this post we'll explore some stats and predictions to see if this spread is justifiable.

First, we start with the history when these two teams have faced each other. The graph below shows all the games since 2001 when these teams have battled it out. At the far right we see last year's AFC conference championship. The positive green bar shows that Indianapolis was favorite by 3 games and as we all know they went to beat the Patriots by a score of 38-34 and covering the spread by 1 point (hence the small positive red bar). In 2006, New England was favorite by four points but again the Colts covered and as you can see from the red bar, by 10 points. It wasn't until 1/16/2005 (Divisional Playoffs in 2004) that New England covered easily by more than 16 points (actual score was 20-3).


Let's look at this year's stats:

The graph below shows the average yardage against opponents in the past 5 games. Tom Brady's excellence shows that they have passed more than 100 yards per game while Indianapolis 'only' 80 yards. Indianapolis' rushing yardage looks slightly better than New England (this could be key in Sunday's game).


As far as who does my statistical model predicts. I have to say it does not know. The statistical problem here is that these two teams are the 'extremes' and so regression pulls it back to the 'mean'. It has both teams squarely even but again this is not a reliable prediction. I would say that this game is highly volatile. New England has been playing their best ever (now without taping other teams' signals) and the Colts are as good as last year. I wouldn't bet on this game but if would, I would go with my instinct.

I do have week 9 picks that the model is predicting with great confidence, but I'll leave those to my next blog.

Week 9

Quick picks:
Seattle +1
Houston +3
Baltimore +9.5

I ran my model for this week, compared its success to this year and previous seasons, and automatically created picks accordingly. The system is based on which situations is the model predicting correctly. For example, when the Vegas spread is favoring the home team by 3 to 6 points but my model predicts the visiting team to win by more than 6, the visiting team ends up covering the spread 75% of the time. This week we have one such situation, Houston at Oakland.

If you have been reading my blog, you will notice that I added a new dimension to measuring the effectiveness of the model. Before, I was defining 'situations' as intervals of points of the difference between my spread and the Vegas line. Now, I consider not just this difference, but what was the value of the Vegas line at such difference. You can visualize it as a 3-way cross-tab. I found some situations with 90% from 2003-2007, although sample sizes were as big as 10. I did find more than 75 games that fit into situations with more than 65%. I will not display all the results since it would take up pages, but I will give you this week's computer generated NFL point spreads and picks. Beware, this week's picks are all visiting teams something to look into closer. Also, I am currently figuring out a method to better measure trends or "momentum" if it exists. Good luck!


NFL Picks and Measures of Confidence

Lately, I have been highlighting games in green that have 'high' degree of confidence (las, but what is this measure of confidence? It is simple. Statisticians use a wide variate of deviance measures to understand the accuracy of models and procedures. For regression models (which is the current model being used in this blog) some examples include R-Squared (R^2) and Mean Square Error (MSE). The former measures the relative proportion of the variance explained by the model and the latter the expected value of the error (how far apart could I be from actual the point spread). These measures are used to select the model to use, i.e. the one with highest R^2 and/or lowest MSE.

In the NFL, it is not so important to be accurate on predicting the point spread (Vegas does a pretty good job), it is most important to select winning picks. Therefore, this problem turns into a classification procedure that decides which team is more likely to cover the spread. The decision 'confidence' can be determined by the error rate, i.e. the percentage of times that the decision correctly predicts the winning team against the spread.

As you have seen in previous posts, we have been trying to classify each game into certain "buckets" depending on the Vegas line and the prediction of the model. I was asking myself this weekend: Are these the right buckets, I don't have much more data to use so...? How should these buckets be partitioned? That brought me back to grad school and decision-tree learning methods I studied. Of course! I thought. Let the data partition them to achive lowest error rate. Thanks to the open source statistical software R I will be able to build these decision trees, include more variables (essentially offensive and defensive yards), and hopefully make better predictions.

What is going to change for the next blog posts? Not much, only that for each game besides including the pick decided by the model, I will include the error rate. So if you see for example (this I already ran so it is a quick pick at week 10) Dallas @ NY Giants 78%, it implies that for games which have fallen in the category that this game belongs to, the decision has predicted the correct pick 78% of the time. Huge confidence!