In 2010, the SportVU optical player and ball tracking system was first deployed in select NBA arenas by teams wanting to obtain an edge in player and team analysis. Due to the value of the tracking data, the NBA then adopted SportVU league-wide prior to the 2013-14 season. Since then, nearly all analysis and decision-making for NBA teams has been data-driven, utilizing not only the raw positional data, but tactical insights derived from the markings detected automatically by machine learning algorithms (e.g., screens, isolations, drives, etc.).
However, when it comes to analyzing NCAA players for an upcoming draft, NBA teams are severely limited in their decision-making ability as they do not have the same detailed tracking data of NCAA players. In-venue hardware solutions are impractical for the NCAA, with over 300 Division I schools alone in addition to the numerous exhibition/tournament and post-season games not played at NCAA venues. Additionally, for an NBA front office to model a college player’s future potential output, they will need historical tracking data of current NBA players to build a training set for modeling – something that in-venue solutions cannot achieve.
To circumvent this issue, we have utilized state of the-art computer vision techniques to capture player and ball tracking data from thousands of historical NCAA D-I Men’s basketball games directly from broadcast video. This volume of data equates to more than 650,000 possessions and over 300 million frames of broadcast video. From the tracking data, we automatically detect events such as ball-screens, drives, isolations, post-ups, off-ball screens and defensive match-ups using our actor-action attention neural network system achieving recall and precision rates of 0.8 and 0.7 respectively.
Even though the generation of tracking data from broadcast for college basketball is in itself a massive breakthrough in the field of basketball analytics – it is not enough. To showcase the value of the generated data, it is best to gauge the value through a predictive task. In this paper, we focus on the task of predicting the talent of future NBA players. We do this by predicting the probability of a player making the NBA directly from college data. We show using tracking data enables us to obtain more accurate forecasts compared to current data sources (tracking log-loss: 0.30 vs play-byplay log-loss: 0.40). The additional benefit of our approach is that we apply “interpretable machine learning“ techniques (i.e., Shapley values) to not only create accurate predictions but also identify the strengths and weaknesses of a specific player.