Afterthoughts from Week 8 - A bigger Sample Size
Three years ago a friend told me that I should use my analytical skills to predict the point spread. So I decided to build 10 different models, test them, and start making predictions using those models that work best. 
I experimented with simple offensive and defensive ratios, various linear regression models to predict point spread, a logistic regression (binary outcome) of the probability that a team would beat the spread, and a neural network to predict straight winners. Neither worked so well in predicting ATS winners.
Then I started thinking of a way to combine and categorize these models in search for games in which the spread is a bit "off". After combining the estimates of those models that turned out to be somewhat successful (>52%, the best only predicted 55%), splitting the point spread and the estimations into categories I started to see some good percentages.
Games for which models predicted the home team to beat the spread and the home team was favorite by 3-4 did the best, on average predicting above 65%. Also, heavily favored visiting teams for which the model predicted they would slightly not cover also did well (63%). But what happened on week 8? Didn't I choose those categories for which the games were predicted to be above 60% successful and only got 2 out of 5? Yes, the problem was the sample size.
I have categorized the point spread into 10 categories and the estimation into 8, this totals 80 different possible categories that a game could land on. Although I have 10 years of historical data, some of these categories have less than 10 games, a very small sample size.
This is why when you see stats like "this team is 5-0 ATS at home on Monday night", you should ignore it. 5 games?, that's it? A reasonable sample size would begin at 30, and once you reach 50 or 60 you can feel confident there is a significant trend.
Looking back at past week's point spread picks, I would have done much better had I sticked with bigger sample sizes. I felt into a gambler's trap; I thought there was a trend and went for it when there was not enough evidence to determine there was indeed a trend. I have already written 50 times,'do not pick a game that does not fall into a category with a sample size above 20 and a confidence level greater than 58%'. This is my 3rd year doing this so I guess it is the lessons that a beginning handicapper learns.
I still believe this strategy will pay off. Currently, we are sitting at 59% post week 4, not bad, but I want to be above 60% by the end of the year. You will keep receiving my analysis and picks of the week, at least this year, if I am not above 60% by the end of the year I will close the blog.
Any comments? Good luck.
I experimented with simple offensive and defensive ratios, various linear regression models to predict point spread, a logistic regression (binary outcome) of the probability that a team would beat the spread, and a neural network to predict straight winners. Neither worked so well in predicting ATS winners.
Then I started thinking of a way to combine and categorize these models in search for games in which the spread is a bit "off". After combining the estimates of those models that turned out to be somewhat successful (>52%, the best only predicted 55%), splitting the point spread and the estimations into categories I started to see some good percentages.
Games for which models predicted the home team to beat the spread and the home team was favorite by 3-4 did the best, on average predicting above 65%. Also, heavily favored visiting teams for which the model predicted they would slightly not cover also did well (63%). But what happened on week 8? Didn't I choose those categories for which the games were predicted to be above 60% successful and only got 2 out of 5? Yes, the problem was the sample size.
I have categorized the point spread into 10 categories and the estimation into 8, this totals 80 different possible categories that a game could land on. Although I have 10 years of historical data, some of these categories have less than 10 games, a very small sample size.
This is why when you see stats like "this team is 5-0 ATS at home on Monday night", you should ignore it. 5 games?, that's it? A reasonable sample size would begin at 30, and once you reach 50 or 60 you can feel confident there is a significant trend.
Looking back at past week's point spread picks, I would have done much better had I sticked with bigger sample sizes. I felt into a gambler's trap; I thought there was a trend and went for it when there was not enough evidence to determine there was indeed a trend. I have already written 50 times,'do not pick a game that does not fall into a category with a sample size above 20 and a confidence level greater than 58%'. This is my 3rd year doing this so I guess it is the lessons that a beginning handicapper learns.
I still believe this strategy will pay off. Currently, we are sitting at 59% post week 4, not bad, but I want to be above 60% by the end of the year. You will keep receiving my analysis and picks of the week, at least this year, if I am not above 60% by the end of the year I will close the blog.
Any comments? Good luck.
Comments
Science can't account for luck, and some years you will get unlucky.