There comes a time in the season when a brief refresher on stats is important. What better time than now, with some people throwing out some questions about why we use advanced stats? For those of you starting to try and understand the hows and whys of the numbers, I hope this is helpful.
Let me introduce some of the stats that I will be using in this article:
CF% = Corsi For Percentage, the percentage of the total shot attempts in the game taken by a team.
xGF% = Expected Goals For Percentage, the percentage of goals in the game a team should have scored. By factoring in shot type and location and comparing it to a the rest of the league’s historical results of those shots, we can figure out what percentage of goals a team or player should have.
GF% = Goals For Percentage, the percentage of goals in the game that team has actually scored.
Recently in the Anaheim Calling Slack Chat, editor Eric Stites posed the question: “If we study advanced stats and say, for example, that the Ducks have been playing better recently but the results aren’t showing and sometimes they play poorly and win games because hockey is just weird, why do we study advanced stats at all?”
There might be plenty of you thinking the same thing. Since December 18th, when the eight-game skid began, the Ducks have had a 53 CF% (6th in the league over that span) and a 47.6 xGF% (23rd in the league over that span), all while only having a 34 GF% (29th in the league over that span). These numbers tend to show the Ducks have done a good job controlling the share of shots in games, but haven’t been turning that into bonafide scoring chances.
Having said that, the team should not have a 34 GF% when controlling those kinds of shot numbers in the game and being that close to break even in expected goals.
This gets us to the why the Ducks are losing these games, and Eric said it best:
Sometimes hockey is just weird.
Hockey is a game that is greatly affected by variance. It is a game with a puck that can bounce in weird directions, and take odd deflections off players, referees, the boards, etc. that can find their way into the back of the net. To make matters worse in this regard, goals are relatively infrequent over the course of the game, making it possible for these odd bounces to have a large impact on the outcome. Sometimes those odd bounces can even decide a Stanley Cup (yes this is a reference to Travis Moen’s winning goal in game 5 of the 2007 Stanley Cup Final).
Due to this variance, it’s hard to predict the outcome of a game. Look no further than Monday night’s affair between the San Jose Sharks and Los Angeles Kings in San Jose. On the surface this game should’ve been an absolute lock for the Sharks. They are by far the better team in standings, shot metrics, and expected goals. Per Dom Luszczyszyn of the Athletic’s predictive model, the Sharks had a 67% chance to win the game.
The Kings still had a 33% chance to win the game, even though they were significantly the lesser of the two teams. The two numbers are around the highest and lowest percentage chances you will see in Dom’s model to win a game. This is due to the fact that bounces and, for lack of a better term, “luck”, come into play so often.
As hard as it is to conceptualize luck, this reflects the main reason the Ducks have not been able to win a game in their last eight attempts.
Eric’s concern about why we are even studying advanced stats at all if results don’t always follow positive play in the CF% and xGF% columns has been echoed- and is still echoed- by many people within the hockey community.
The reason for the skepticism in analytics is because there have been plenty of examples of good teams, standings-wise, with poor shot and chance metrics, and plenty of bad teams with great shot and chance metrics this season.
You don’t have to look very far back for clear examples of this, with the Hurricanes having extremely high shot and expected goal metrics to start the year. Yet, they have only collected 43 points in 41 games.
The Buffalo Sabres have had poor shot and expected goal metrics, yet have collected 50 points in 42 games.
With these two clear examples in mind, let’s start actually jumping into the questions of why we study advanced stats at all.
The quick answer is that statistics like CF% and xGF% do not merely paint a better picture of what happened over the course of a game, but they are also better predictors of future GF% in the short term than actual GF%.
That might sound a bit weird at first, but let’s go back to the idea of variance and puck luck. For example, consider if the Ducks won a game 3-2, but two of those goals for the Ducks came off of a weird bounce, giving them a win in a game in which they were completely outplayed.
In that game the Ducks had a 60 GF% (three goals for the Ducks divided by five total goals), even though they deserved to be at 33 GF% (one non-fluke goal for the Ducks divided by three total non fluke goals).
In the short term, those two goals have a large impact and do not tell us how the two teams actually played. Over a long enough period of time, however (this is usually a period of several seasons per Stat Shot: The Ultimate Guide to Hockey Analytics by Rob Vollman), the variance will come to play a smaller role as a larger sample size has come together, allowing for the GF% to converge with future GF%.
The issue is that we as fans, along with GMs and coaches, do not have several seasons. We are all focused- the fans as a passion and the GMs/coaches as a job- on the current season, which is where CF% and xGF% come into play. There are plenty of shot attempts that happen over the course of a single game, let alone multiple games. With this, we get a larger sample size that is less influenced by variance. xGF% takes these shot attempts and applies an xG value based on similar shot attempts in recent league history to each shot attempt taken in the game. The figures yield what the expected goal total should be for each team at the end of the game. Both of these as seen below have been proven to have a strong correlation with GF%. See the following quote and image from Hockey Graphs (Also I highly suggest reading their article for a more in depth look at this correlation):
Together, the correlation and the larger sample size become better ways to predict the success of a team. A team like the Hurricanes should eventually start having a better GF% as the season goes along which will lead to more wins due to their strong play. On the opposite side, these numbers show that the Sabres will probably fall off a bit.
Back to the original question from Eric: why do we study analytics? It’s because it better informs us about how the future could look for the team if they continue to play this way. Certain teams are able to outperform where they “should be” for a season due to extremely high end skill, like the Ducks with John Gibson, or under perform due to a lack of finishers or poor goaltending. But in the long run, a team has a better chance at sustained success by controlling the majority of shot attempts and expected goals.
With that knowledge, why would you not want your team to play in a way that would lead to long term success?
*All stats per Corsica.Hockey unless stated otherwise.