NFL Picks and Measures of Confidence

Lately, I have been highlighting games in green that have 'high' degree of confidence (las, but what is this measure of confidence? It is simple. Statisticians use a wide variate of deviance measures to understand the accuracy of models and procedures. For regression models (which is the current model being used in this blog) some examples include R-Squared (R^2) and Mean Square Error (MSE). The former measures the relative proportion of the variance explained by the model and the latter the expected value of the error (how far apart could I be from actual the point spread). These measures are used to select the model to use, i.e. the one with highest R^2 and/or lowest MSE.

In the NFL, it is not so important to be accurate on predicting the point spread (Vegas does a pretty good job), it is most important to select winning picks. Therefore, this problem turns into a classification procedure that decides which team is more likely to cover the spread. The decision 'confidence' can be determined by the error rate, i.e. the percentage of times that the decision correctly predicts the winning team against the spread.

As you have seen in previous posts, we have been trying to classify each game into certain "buckets" depending on the Vegas line and the prediction of the model. I was asking myself this weekend: Are these the right buckets, I don't have much more data to use so...? How should these buckets be partitioned? That brought me back to grad school and decision-tree learning methods I studied. Of course! I thought. Let the data partition them to achive lowest error rate. Thanks to the open source statistical software R I will be able to build these decision trees, include more variables (essentially offensive and defensive yards), and hopefully make better predictions.

What is going to change for the next blog posts? Not much, only that for each game besides including the pick decided by the model, I will include the error rate. So if you see for example (this I already ran so it is a quick pick at week 10) Dallas @ NY Giants 78%, it implies that for games which have fallen in the category that this game belongs to, the decision has predicted the correct pick 78% of the time. Huge confidence!

Comments