Brandon Harris bio photo

Brandon Harris

Cloud + Data Engineering + Analytics

Twitter LinkedIn Instagram Github Photography

Last year I spent quite a bit of time building a number of models to predict regular season NCAA basketball games. I spent about 2-3 weeks before I ended up with a basic classification model (Away team wins = 1/0) that I was happy with. Then I ran into Scott Turner who pointed me in the direction of predicting score differentials, and I spent another month or so tearing everything down and re-building the model. Around mid-February, I finally got everything automated where I had a set of predictions and results emailed to me each morning, just in time for March Madness, and the entire season to end.

This year I started much earlier, and have rebuilt the entire process from the ground up. The model is new, the support architecture is new, and I’m really happy with the way things turned out (so far). I’ve christened the new model / interface DeepDribble 1.0, and it’s currently live (though still working out minor kinks here and there). Results are also going to be picked up at The Prediction Tracker which is an aggregator for various sports predictions and features some of the big names in prediction (Sagarin, StatFox, etc..). I’ll probably hold off on that until early December when I’ve had a bit more time to collect team performance data for this season, but I’m looking forward to seeing how the model stacks up against others who’ve been doing this for a while.

The data used for the model is comprised of data that I scrape daily and results in a final data set of 200+ independent variables. The data (raw and aggregate) is stored in a MySQL database, and the modeling is done entirely in Python. DeepDribble (the model) itself is an ensemble model consisting of weighted output from various regularized regressions, SVM’s, trees and neural networks that use various subsets of the entire data set to generate a final score differential prediction for each matchup. The training data for the model is based on historical NCAA data after the 2008-2009 season.

This 2015-2016 season is the first season I’ve been able to start modeling from the very beginning, so I’m very excited about DeepDribble 1.0 and I’m looking forward to sharing the results!