Evaluating the Reliability of Points As a Measure of Player Performance in the NHL
- Claire Mezzina
- Jul 10, 2021
- 9 min read
This project was done for an Intermediate Statistics class in the spring semester of 2021, one of the very first projects of my undergraduate career. Since then, my research and analysis skills have grown a lot, but I believe this project was a solid first step in developing my skills and - hopefully - making some decent inferences about an issue that is relevant in hockey analytics today.
Abstract
In the National Hockey League and in ice hockey in general, points (goals + assists scored) are often used as a tell-all way to evaluate a player’s performance. In this study, I wanted to analyze the reliability of points as a metric to evaluate a player’s performance and to compare players. I looked at the effect that playing time has on points. Are points influenced by how much playing experience (and opportunity) that player has? Will a player that is given more playing time, both throughout the season and within games, score a significantly higher amount of goals? Can we compare two players with different playing opportunity by points at all?
My data was from QuantHockey.com, which collects all standard boxscore stats for each player in the NHL. For the purpose of this study, I decided to narrow the dataset down to forwards only. (Defensemen were excluded since points do not apply to them as much.)
The techniques I used to answer the questions proposed were: 1) inference on variance of two groups, 2) two-way ANOVA testing with interaction, and 3) a multiple linear regression model (taking interaction into account). In the first test, I tested whether the variance for Points (P) was higher than the variance for Points Per 60 Minutes (P/60). I found enough evidence to confirm this hypothesis. In the second test, I used two-way ANOVA to ask if Games Played (GP) and average Time-On-Ice (TOI) were significant factors upon which P/60 depended on. I found enough evidence to confirm that GP and TOI were statistically significant factors in determining P/60, although there was a significant level of interaction between the two factors. Finally, I created a linear regression model to determine the relationship between GP, TOI, and the dependent variable of P/60. This model showed that GP and TOI interact with each other to create a positive relationship with P/60. In other words, as GP and TOI go up, so will P/60.
Introduction
It is widely accepted in hockey that the more points a player scores, the better that player is. However, there is a large number of underlying factors that go into how or why a player is able to score a goal or make an assist. In other words, points that were scored in very different situations are still counted the same on the scoreboard, even though they were not generated equally. This creates some unreliability in how we look at a hockey player’s performance based on their number of points.
One such factor that goes into points is playing time. Most players do not play every single game in a season, whether it is due to injury or the coach’s choice to bench them. Furthermore, hockey players do not play the whole game, taking very short shifts that amount to anywhere between approximately 6 to 30 total minutes of time-on-ice (playing time) per game. This leads to vastly different levels of opportunity between players. Player A may have scored less points than Player B, but was only given 12 minutes of playing time per game (“time-on-ice”) on average , whereas Player B was given 21 minutes of ice-time on average. Is Player B scoring more because of this?
Previous work done on this exact topic is rather limited, mostly since analytics haven’t truly become an established method in the sport of hockey yet. There has been a lot of work done on the value of Expected Goals (xG), which takes into account all the “other factors” that go into a goal being scored, thus showing how not all goals are equal (Hockey-Graphs). However, the topic of how much time a player actually plays and how that either increases (or decreases) their points scored has not been studied much. A little bit of work has been done on Games Played and its relationship to Goals (Arctic Ice Hockey), but no further than that.
Studying this topic is important because it is extremely difficult to truly compare players in the NHL. Since points are so widely used as a metric to compare players (forwards in particular), NHL decision-makers need to be sure that such a metric is actually accurate. If points are shown to be too reliant on other factors beside the player’s actual ability, then they should no longer be used as a metric for player performance.
The first step in this report is to determine which specific variable of “points” I need to use. There are three options for this: Points, Points Per Game Played (P/GP), and Points Per 60 Minutes Played (P/60). To determine this, I used inference on variance. Following this, I used ANOVA testing and a linear regression model to find if playing time (Games Played and Time-On-Ice) had a significant affect on the points metric, and what that relationship was. Finally, I will draw conclusions from my findings, and how they answer my questions about points as a reliable metric.
Data and Methodology
The data from QuantHockey.com provides statistical metrics on 439 forwards in the NHL. The statistics gathered include information on playing time, penalty minutes, goals and assists, points-per-strength-state (even-strength, 5v4, 4v5, et cetera), shots taken, defensive metrics (shot blocks and hits), and faceoff metrics.
I specifically decided to use data from the 2019-2020 season because it is larger in size while still being relatively recent. The 2020-2021 data was not suitable despite being the most recent because 1) the season is not finished yet, and 2) it is a shortened season (56 games as opposed to the normal 82 games). The 2019-2020 season was shortened as well due to COVID-19, but only by a few games, thus making it a larger dataset than 2020-2021.
Descriptive statistics:

Mean of Points: 28.86 P Minimum of Points: 1 P Maximum of Points: 110 P

Mean of TOI: 887.47 secs Minimum of TOI: 333 secs Maximum of TOI: 1356 secs (all per game)

Mean of GP: 56.44 games Maximum of GP: 71 games Minimum of GP: 20 games
Methodologies:
Inference on Variance - This technique is used to compare the variance of Points versus Points Per 60 Minutes. This is to test whether taking time into account will affect the consistency of points scored across players. If P/60 has a lower variance than Points, it would show that when taking playing time into account, points scored is not as different from player to player as basic Points would have us believe.
Two-way ANOVA Test - This technique is used to test if playing time does, in fact, play a significant role in determining points scored. The factors I used as proxies for playing time were Games Played (GP) and Average Time-On-Ice (TOI). The dependent factor for points was P/60, as I determined in the variance test before.
Linear Regression Model - The model shows the relationship of the two independent factors (GP and TOI) to the dependent variable, P/60. It also shows how the interaction of the two factors, GP and TOI, impact P/60. With this technique, we can show the exact manner in which playing time has an effect on points scored.
Analysis
Inference on Variance
Issue: When playing time is taken into account, does this change the consistency of points across players?
To solve this problem, I used a hypothesis test for inference on variance between Points (P) and P/60.
Null hypothesis: The variance of P is less than or equal to the variance of P/60.
Alternate hypothesis: The variance of P is greater than the variance of P/60.
If the variance of P is greater than the variance of P/60, then this means when playing time is taken into account for points scored, we can see more consistently between players than what points alone would have us believe.
Output:
test statistic = 726.64
p-value = 0.00
The p-value is smaller than significance level of 0.05. Therefore, we can reject the null hypothesis and conclude that the variance of Points is greater than the variance of P/60.
My interpretation of these results is that because P/60 shows more consistency across players than Points does, players are shown to not have as much differences between them in terms of points when playing time is taken into account. Thus, the large variation in Points between players may mislead one into thinking that players are much more different than they actually are; P/60, instead, is a more reliable metric because it takes at least one factor (TOI) into account.
The limitation to this interpretation is that it does not take other factors into account (i.e., Games Played) and does not determine whether or not less variance is a better reflection of player ability. However, for the purpose of this study, it represents a closer reflection of points in relation to playing time, which is the focus.
Two-way ANOVA Test
Issue: Do Games Played and Time-On-Ice have a significant effect upon which Points/60 is dependent?
To solve this problem, I used a two-way ANOVA test with interaction effect assumed to determine if one or both of GP and TOI have an effect on P/60.
Output:
GP p-value: 2e-16
TOI p-value: 2e-16
GP:TOI p-value: 0.0127
The p-values for GP and TOI are both less than 0.05. Therefore, we can conclude that they have a significant effect on P/60.
The interaction p-value is less than 0.05. Therefore, we can conclude that there is an interaction effect between GP and TOI.
This test shows evidence that P/60 is, in some way, dependent on GP and TOI, and therefore playing time. Furthermore, the results suggest that the influence that GP has on P/60 will differ based on different levels of TOI.
Linear Regression Model
Issue: Do players with more GP and higher TOI have more opportunity to score points, with P/60 as the dependent?
To solve this problem, I used a multiple linear regression model to find the relationship that GP and TOI have with P/60. First, I covered the three assumptions:
1. The relationships between the dependent variable and independent variables are linear.


2. Non-multicollinearity between the independent variables, with a cutoff of 0.9.
Correlation between GP and TOI is 0.553.
3. Dependent variable must be normally distributed.

We would like to know how GP and TOI affect P/60, as well as how GP and TOI interact in how they affect P/60.
Output:
Intercept p-value: 0.2799
GP p-value: 0.00818
TOI p-value: 0.00493
GP:TOI p-value: 0.01272
Residual standard error: 0.514 on 435 degrees of freedom
Multiple R-squared: 0.5198
Adjusted R-squared: 0.5165
F-statistic:157 on 3 and 435 degrees of freedom
p-value: 2.2e-16
P/60 = 4.386x10⁻¹ - 1.941x10⁻² * GP + 1.502x10⁻³ * TOI + 2.235x10⁻⁵ * (GP*TOI)
We can rework this linear regression equation to:
P/60 = 4.386x10⁻¹ + (2.235x10⁻⁵ * TOI - 1.941x10⁻²)*GP + 1.502x10⁻³ * TOI
This model demonstrates that there is a positive relationship between GP and TOI with P/60. Furthermore, the interaction of GP and TOI means that the impact of GP on P/60 will differ depending on what the value of TOI is.
This is evidence that for every minute added to a player’s ice-time and/or for every extra game played in the season, the player will score more points. This is because the player has the opportunity to score more points. Therefore, players that play with less TOI or play less GP cannot be fairly compared to players with more TOI or GP in terms of points.
Conclusion
The results of this study show that more playing time does, indeed, cause a player to score more points. Therefore, two players who do not have equal playing time cannot fairly be compared in terms of points. This makes points an unreliable way to measure player’s performance - without the ability to compare players, and with so many factors that go into points (the focus of this study being playing time), it does not lead to an accurate assessment of the player’s ability.
This demonstrates a need in hockey for points to stop being used as an assessment of player performance. Despite the push from the analytics community to move away from boxscore stats like goals, points, and assists, it is still the most prominent metric used. With this analysis, we can see how even something as simple as playing time can cause points to be misleading. Playing time is often at the mercy of coaches or circumstance (i.e., a player who is on a very competitive roster and therefore gets less playing time), which makes the player’s ability even harder to assess.
The factors that cause these results is that a player simply has more opportunity to score points. Oftentimes in hockey, we see a player who is given very little ice-time perform poorly; but then, when he goes to a different team that gives him more ice-time, he suddenly starts to score more. Furthermore, more playing time can lead to confidence in players that will boost their performance. The more playing time they have, the more confidence they have, and the more points they will score.
One limitation of this study is that is does not take into account other factors that can impact point scoring, such as luck, quality of linemates, or the performance of the opposing goaltender. This could be expanded upon in a further study that examines the relationship between those factors and points scored. A second major limitation of this study is that is assumes playing time as a cause and points scored as the effect, not the other way around. There may be some instances where a player receives less playing time because they are not scoring as many points. Therefore, this study must assume that playing time is the cause.
References
Chatel, T., Chatel, T., P., D., D., & Hohl, G. (2020, May 13). expected goals –. Hockey Graphs. https://hockey-graphs.com/tag/expected-goals/
S. (2011, July 16). SSW: Forward G, A, Pts, +/-, PIM reliability and Regression to the mean. Arctic Ice Hockey. https://www.arcticicehockey.com/2011/7/16/2273803/ssw-forward-g-a-pts-pim-reliability-and-regression-to-the-mean
Comments