What Makes a Breakout? Analyzing the Most Promising Young Stars in the National Basketball Association – Dartmouth Sports Analytics Skip to content

What Makes a Breakout? Analyzing the Most Promising Young Stars in the National Basketball Association

By Sabin Hart '24, Grace McGinley '26, Dean Lowery '27, Tyler Goldstein '27, and Cam Cowperthwaite '27

Introduction

Across all sports, one of the most prevalent problems is figuring out which prospects will pan out and which will crash and burn. Millions of dollars are being poured into predictive analytics for everything from next-day parlays in sports betting to ranking draft prospects years in advance. Knowing this, we wanted to explore what causes a player in the National Basketball Association (NBA) to take the leap from a role player to all-star. Can the statistics tell us if the 41st overall pick will be the future Most Valuable Player? This paper looks at recent all-stars to determine if they showed early hints before they made the leap to stardom.

Data and Methods

To begin, we defined a breakout player as one with at least one all-star selection. In order to find breakout players, we looked at all-stars between 2018 and 2023. We decided to explore the two years immediately preceding their first all-star selection and compare it to the other non-all-star players. Basketball Reference was used to collect data [1]. We began by collecting basic and advanced per-game statistics for every player from the 2016-17 season through the present. We then filtered this data to exclude players who have already, by our definition, broken out. That is, players who were already all-stars in the first year of our data set. Additionally, we excluded players over the age of 25, as our research was focused on young players with the potential to be stars as opposed to older players who have likely missed their window to breakout. 

After we had processed the data, we trained a model to predict which players would break out based on every numerical variable to see which ones held the most predictive power. Since the outcome was binary, a logistic regression better suited the model to the dependent variable. The data was also normalized to z-scores to ensure that the relative magnitude of the variables did not contribute positively or negatively to their predictive power. We used five-fold cross-validation to ensure the best possible model across all of our data. The model had an accuracy of 97 percent, which is quite high, but to be expected as the vast majority of the data are predicted zeros/not breakouts. This model is rudimentary so its recall and precision were lower, as we had also expected. Because we trained the model on the sample data, testing it on that same data would be irresponsible, so instead our results rely on the feature importance.

 

Results and Discussion

 

The results from this machine learning analysis reveal a notable 0.2357 correlation between points scored and their likelihood of breaking out. This suggests a statistically significant relationship between the variables, proposing that the more points scored by a player, the more likely they are to become an all star within their time in the NBA. This relationship was the strongest predictor of any of the basic and advanced per-game statistics. In terms of basketball, it comes as no shock that points scored exhibit the greatest correlation. Typically, assessing a player's basketball prowess often involves examining their scoring output, making it a reliable indicator of skill level.

The correlation between points and breaking out is quickly followed by free throws attempted (FTA) and personal fouls with correlations of 0.1807 and 0.1774 respectively. FTA can be explained by the defense’s inability to guard the potential all star. By fouling the player in the motion of releasing a shot, the player will shoot two or three free throws depending on where the shot had taken place. The more free throws a player attempts, the greater their offensive threat is, prompting defensive players to commit fouls in their struggle to legally defend against them. Despite being a statistic that players generally aim to avoid, personal fouls serve as an indicator of a player's defensive activity and engagement on the court. Even if unintentional, it is highly unlikely for a player to complete a game without committing any fouls while diligently defending—fouling is simply an inherent aspect of the game. Although NBA players are only allowed six fouls per game, successful players are more likely to play until they get close to that limit. Even if a player is taken off after a couple of quick fouls, they will likely see the court again, proving that future stars are predicted to have more fouls.

The next strongest correlations are year and age, which have a negative correlation of -0.1561 and -0.1434 respectively. Year corresponds to the number of years the player has been in the league, while age is simply the age of the player. Although players over the age of 25 were removed from the data set, older players were still less likely to break out. Since players get older, and have less time in the league to develop, it is justifiable to assume that they are less likely to break out. 

Interestingly enough, the player’s three point percentage held a weak correlation of -0.0064. One would suppose that in an era of high-volume three point shooting, coaches would want high-percentage shooters on the floor late in games. As a result, players with higher three point percentage would be trusted by their coaches and get more minutes, increasing the chance that they will break out. 

Conclusion

To sum, the model yielded surprising results. While some were foreseeable, like more points per game is more associated with breaking out, others were unpredictable, like higher turnover percentage being positively correlated with breaking out. That being said, correlation does not provide nor indicate a causal relationship here; in fact, in the case of turnovers it could very well be that being a skilled player causes one to possess the ball more, which in turn just naturally produces more turnovers. Nevertheless, this data still poses an interesting relationship between advanced statistics and their correlation to breaking out. 

Data Sources

[1] Sports Reference. Basketball Statistics & History of every Team & NBA and WNBA players. Basketball Reference. https://www.basketball-reference.com/