Comments / New

Kid Ish’s Pizzametrics, Week 22: Real Stats Talk

A Thought on Stats — They Will Never Go Away
The following comment by EricTheHawk got me thinking again how misunderstood advanced statistics are to some. A few years ago this sort of resistance to them was more frequent. But I read this as more a resistance toward statistics consumption in sports generally, not anything aimed at hockey. Sports stats are only going to continue evolving—they aren't going to cease to be. Here's the comment:

"The problem I see is that many advanced stats advocates seem to think of them as more predictive than descriptive. As much as people want to sabermetricize the game, the sport of hockey doesn't fit into a box that a sport like baseball does because it's a flow based game as opposed to a game of static events building on top of one another. Not to mention, you'll have much larger data sets in baseball over the course of a 162 game MLB season.
Whether the stats are expansions of already existing numbers (Corsi/Fenwick as extensions of shots, zone starts pinpointing where certain shifts begin) or wholly made up with no statistical basis (PDO arbitrarily adds two numbers together and claims to describe something), they're nothing more than additional angles at which past events can be viewed. Can they give additional clarity to analyzing a game/team? Sure. But with each game as an independent event with a multitude of variables involved, they can't be anything more that descriptors."

We have to start this conversation honestly. Sports in general are a playground for predictive logic and comparative analysis. Prediction is the primary conversation outside basic storyline narrative for both fans and those within the industry alike. It drives sports. “We want to win, here’s how we win” is the foundational prediction upon which organizations and leagues operate. That prediction benefits from factual information.

Statistics are numerical systems meant to inform. I’ll share definitions simply to remind everyone that this isn’t specific to hockey:
Statistics: the practice/science of collecting, noting, analyzing numerical data.
Analysis: the detailed examination of aspects or the structure of a thing for the purpose of understanding, discussing, interpreting.
Interpretation: the process of explaining the meaning of something.
A contextual understanding of stats is critical to them having viable space in the normal predictive conversation. This is as true for population studies as it is for sports analysis though. It isn’t a fair anti-hockey stats argument.

It is also dishonest to pin a criticism of hockey statistics themselves on the interpretations of their usage. The numbers are facts; the subsequent analysis of them is certainly interpretative. That feels like a different conversation than the resistance to stats given above.

Bringing up baseball to denounce hockey analytics is curious because they are different sports. It would only apply if we were trying to derive the same basic stats from each, but nobody is doing that. They are different, they are analyzed differently.

The “box” suggestion is flawed outright though: both baseball and hockey are sports wherein a series of recurring events occur—and recurrence is measurable in every system. It is arbitrary to claim stats only work in one series of recurring instances but not in another series of recurring instances simply because of the speed those instances occur. Statistics don’t work with that limitation. (And recall, stats are there to help us rid ourselves of human bias, and “flow of game” is one.)

To that point, the NHL already tracks all of the measurements used to underpin “advanced statistics.” The data is already there and distributed as widely as can be. There are certainly deeper areas of quantifications available—and many of us track those things manually (which is easier than it may seem). That’s where more subjective interpretations of the statistics can play into things, sure, but what we’re using right now isn’t that—it’s pretty easily derived information that doesn’t suffer any setback due to the flow of the game.

One last thing on any baseball versus hockey thinking here. The number of statistical measurements on each active play in baseball is staggeringly huge. Way bigger than the range of data to track in hockey. There is also a massive level of pre-play numbers to consider, plus varying contextual points to include based on desired outcome versus probable outcome—and so on. Baseball is far more complex than hockey as a sport, there’s just so much nuance. Data collection and assignment in hockey…I don’t know. It doesn’t seem as massive to me. It may grow into something more complex, but we’re not there yet.

I’d love to discuss this point further—but that’s what the comments are for, right? In the interest of removing dishonesty or misunderstanding from the conversation: hockey statistics are not baseball statistics, but nobody wants them to be. Nobody is saying they are. Nobody uses them in the same way.

This does raise a viable and honest objection: a lot of people misuse hockey statistics. It tends to be those who misunderstand them, so we’re still stuck there. But it’s also just a human thing, especially with sports. Sports is all about discussion, debate, consumption; predictions and comparisons drive that socially. People will always trend toward picking what favors their argument and using only that metric.

That’s a fair criticism of the culture of hockey stats, but again it isn’t entirely honest to pin it on the stats themselves. The stats just “are.”

Phew, tiring. But this is what I wanted to get to all along. HOCKEY!

Advanced stats (aka fancystats) aren’t actually advanced. The basis for all of the articles, blogs, and broadcast mentions you’re seeing or beginning to see is really simple. Now of course, there are advanced uses for them that go beyond the simple stats itself—knowing them, tracking them, caring about them isn’t necessary. I’m all about people consuming their sports in whatever manner they want.

There are two stats you should at least learn about properly, regardless of your consumption. A word on this: PDO and Corsi are community names. Who cares what broadcasts or media call the stats when the logic is the same? I don’t.

PDO.

Eric’s favorite metric! I can understand how people think this is an arbitrary measure—especially since the guy who named it did so arbitrarily (it literally means nothing). I’m not going to get into probability theory here, but this stems directly from it.

PDO is average league-wide save percentage plus average league-wide shooting percentage. Through an entire year and calculated with all teams, this number is always 100%. A shot on goal either gets saved or goes in for 100% of the shots taken in a year, every team. If the league average save percentage is .920, the league average shooting percentage is 8%. Probability theory confirms that any variation below or above will trend toward those averages with enough time.

On a team level, the functions of save percentage and shooting percentage are independent. That is why a team’s PDO number is not 100. Because the two actions are independent, 100 doesn’t always have to be the set average for a team. If a team has a good (and fairly consistent) goaltender, it might have a generally higher PDO number. More on this in a second.

There’s math out there supporting the notion that goaltenders actually peek very early on and spend the rest of their careers gradually playing back down to average. This means that goalies are what their SV% is, which doesn’t tend to range as far from league average as our human biases tell us (ie, a brilliant playoff performance of .940 is not what that goalie always is, per se). Why this is important is that in looking at PDO, it’s easier to project the goalie’s contributions to the team number. Generally speaking, this has allowed us to see that wider fluctuations in PDO tend to be in shooting percentage.

This is why a lot of mind is paid to the inherently volatile nature of shooting percentage. A goalie’s one slip in play (whether he’s .940 or .910) always equals one goal against. But a slip in shooting percentage for a team relying on it to outscore their mistakes can mean the difference between putting up three in a game or getting shutout—it isn’t a one-to-one slip, in other words. The aim is to mitigate the unsustainable play resulting from fluctuating PDO so that dips in SH% (which happens every year in the postseason) don’t change the game plan. A perfect example of this was in last year’s first round playoff series.

Corey Perry is a high SH% player—it will always vary somewhat, but I suspect he’ll carry plus-10 SH% most of his career. Detroit targeted him in the postseason and gambled that the rest of the roster couldn’t make up for his scoring if he was shut down. The predictability of Perry coming alive in a seven-game playoff series isn’t possible. Either he does or he doesn’t. But Anaheim was too reliant on needing he and Ryan Getzlaf to score to win games—there wasn’t enough solid team play (with and without the puck) to mitigate the dip in SH% certain players experienced.

The team’s numbers regressed, in other words, right at the time when predicted they would.

One last thing: if a team is an average above-100 PDO team, like some fans believe Colorado or Anaheim must certainly be (to have won so much), then looking at that sub-10X play should be consistent with 100-level teams who win games with lower PDO—like Los Angeles, Chicago, San Jose. If the Ducks average at 102, then PDO of 101 shouldn’t hurt their record, right?

During the most recent streak of 5-6-2 (not best), Anaheim’s PDO/game was: 102.7 (L), 102.0 (L), 99.8 (L), 100.5 (W), 100 (W), 101.8 (W), 101.3 (L SO), 100.1 (L SO), 100.2 (L), 98.9 (L), 100.7 (W), 102.6 (W). With that much high-end fluctuation, you’d hope for more wins during this stretch. The high PDO instances in losses suggest there’s some structural issues within the Ducks game—and aren’t we seeing that with our eyes, irrespective of numbers? I think so.

Here’s how this stat matters to you: since this stat has been adopted, the best teams in the league can win with average or lower PDO. It’s been pretty consistent. The context for this phenomenon is simple: those teams play in such a way as to limit the fluctuations in shooting percentage, usually in the form of controlling the puck. (This is how it relates to Corsi.)

Corsi.

Corsi is shot attempts as a proxy measure for a team’s time on attack. It has been compared against tracking a team’s “time in the offensive zone” and is incredibly accurate. There are good articles/blog posts containing that, I’m sure someone here can link to them for you if you can’t find them via Internet search.

Time on attack is the key here, as it is meaningful possession, or “using the puck to score.” I used to call it meaningful intent, but it’s been a while. Both teams are competing directly for that game state more than any other (there are three on-puck, three off-puck), which makes tracking it as a comparison easier—what one team has here, the other doesn’t.

(Fewer studies have been undertaken to see if there’s any purpose of tracking non-Time-On-Attack possession. Like the ones done previously with OZ time, shot attempts as TOA broken out into percentage tends to align with everything else, even this. It’s basically been largely abandoned as pointless because SAs are remarkably accurate measures of possession in all game states.)

That’s actually it when it concerns Corsi. It measures which team has the puck on attack more in a game. It doesn’t say anything else on its own—that’s really where all the “advanced” stuff comes from, with people attempting to learn more about the game by delving deeper. This is just a foundation.

One last thing: shots from anywhere count as valid attempts because every shot carries a percentage of success and results in the same probable outcome—goal, save, miss. (broadly put, a miss is a missed shot, a blocked shot, or a penalty.) In terms of data, there isn’t differentiation between a shot from the NZ and a shot in the slot. As a measure of having the puck (possession) for the purpose of trying to score (meaningful intent), it is equal. Especially when you look at puck recovery probability.

This isn't as "noisy" as people want to claim. As long as you remember that Corsi and "shot quality" have no inherent relation, you're fine. Corsi just measures meaningful possession.
Oh, and I guess to be informational: Fenwick took Corsi's idea and removed blocked shots from the calculation because blocked shots are coached as attempt prevention. It makes sense but has a little different usage.

Here's how this stat matters to you: the Ducks are a little bipolar. When trailing, they tend to be possession monsters. They put up a ton of attempts and, this is what tends to get ignored, they suppress shots attempts against much better. But if the score is tied, the Ducks are pretty bad. They don't control play (take attempts for, stop attempts against) enough to mitigate their fluctuating PDO numbers.

Furthermore, they give up a lot more attempts/entry as the season has gone on, which concerns me. Teams are able to gain entry into the OZ and attempt shots repeatedly. We all hope this shore this up.

All of the really advanced parts of hockey stats come from these two foundations—PDO and Corsi.
As Darryl Sutter recently said: "The game's changed. They think there's defending in today's game. Nah, it's how much you have the puck. Teams that play around in their own zone think they're defending but they're generally getting scored on or taking face-offs and they need a goalie to stand on his head if that's the way they play."
Advanced stats in hockey aren't going anywhere. You don't have to like them, you don't have to use them, you can stay away. I'm ok with this. I just warn off arguments like "well these aren't perfect for me because" when there's some dishonest logic behind it. Like them or not, the stats just "are."

Talking Points