Brandon Harris bio photo

Brandon Harris

Cloud + Data Engineering + Analytics

Twitter LinkedIn Instagram Github Photography

TLDR - Daily (early season model) predictions here.

Well hello there! Been a while, right!? Hope you’ve been well.

I looked back at some of the posts and clearly it’s been some time since the site has been active. There is a reason for that, and her name is Claudia. Claudia was born in June and has been fun, adorable, and at the same time a drain on my free time like I never would have imagined. I love spending time with her so it’s definitely not a bad thing, but it caught me off guard just how much time a newborn can actually take from your day. I’m sure those of you with kids are knowingly amused at that, but her being our first child, it took me a while to adjust. Some things had to get pushed to side to make room, and unfortunately the blog was one of those things.

That being said, we’ve come a long way since June and she’s almost starting to sleep through the night, so I’ve had a few hours free a day to work on some personal hobbies. Deep Dribble was at that top of that list since I knew NCAAM basketball was right around the corner. While I haven’t made too many improvements to the system, I do think we’ll see much better performance than last year.

Last year was a bit rough as far as predictions go, looking back on the stats at Prediction Tracker MSE ended up almost at 150! Not a fantastic result. Back-testing with DD 1.1 across multiple full seasons has seen that drop into the low 120’s, so I’m hopeful for this new season. I’ve made a few changes, including a modified RPI-based variable (per team), as well as engineering a ‘tier-based’ feature for teams based on clustering of historical (multi-year) performance. This was primarily to combat the poor MSE/RMSE performance when we include the less-prominent teams, as prior to this I had no variable indicating a team was a (for lack of a better word) ‘nobody’. The large score differential that cropped up involving some of those teams really hurt the overall model performance. The tier-based variable has helped with that and while it took some manual effort to figure out logical rules I could apply to the clusters, it was worth it. I’ve also moved from an ensemble-based model to a single model, and experimented with resolving the seasonal cold-start issue of not having any game data at the beginning of the season.

Regarding the cold-start issue, I am now making predictions from Day 1, though this was a somewhat rushed choice, so I’m not officially predicting these early games (though you can certainly still see them on the site). Normally I wait until mid-Decemeber when we have enough data to start getting accurate results, and at that time I’ll consider the predictions official.

I would definitely put a ‘beta’ label on these early season predictions if I knew enough html to do so, but anyway, consider yourself warned! You can see the early season performance here and daily-predictions here. I didn’t bother to change the header links so you will have to hit those URL’s directly. If you try and use the header to traverse the website you’ll end up in the ‘official’ area with no data until mid-December.

Here’s to a great 2016-2017 NCAA season!