clock menu more-arrow no yes mobile

Filed under:

Raw Data

ROBBY:
In my last post, I laid out my idea for an interesting summer project: Can we determine how particular statistics relate to a team's success? In other words, can we find any statistical link between a team's ability to win faceoffs and the number of standings points it accumulates in a season.

Since then, I used some of your suggestions to come up with a list of statistics to test. I also accumulated the raw data from the past three regular seasons we need to begin this analysis (you can check out the raw data in Google Docs). Here are the the particular team characteristics we'll examine:

  • Goals For
  • Corsi (Shots For - Shots Against)
  • Faceoff %
  • PP %
  • Blocked Shots
  • Secondary Scoring (10+ Goals)
  • PIM
  • Secondary Scoring (20+ Points)
  • Takeaways
  • Shootout Winning Percentage

Using Excel's integrated charting function, I've created graphs showing how each of these measures relates to standings points. Join me after the jump to take a look at these graphs as we begin analyzing what, if any, correlation exists between these measures and the standings points a team accumulates.

Before we get started, a few disclaimers:

  • To better visualize some of these statistics, I've adjusted the horizontal axes. While this lets us better see how clustered data is distributed, it also exaggerates the visual relationship between variables. So while we can probably get a sense of whether any sort of relationship exists, the degree of that relationship will be hard to quantify until I actually run the regression anaysis.
  • I did my best to make sure this data is accurate, but there may be some errors. If you happen to notice anything, let me know in the comments, and I'll take a look. Compiling some of this was an extremely manual process.
  • Any conclusions we reach are merely suggestions of possible correlation. This is by no means a definitive effort, especially since we're only looking at one statistic at a time.

Goals-for_medium

Of all the measures I'm testing, Goals For is probably the most significantly related to standings points. As you'll see in the other charts, no other measure exhibits the sort of strong positive relationship the data suggests here. And ultimately, it makes logical sense that the more goals a team scores, the more likely they are to win games.

In case you're wondering, that data point on the upper right represents the 2009-2010 Washington Capitals, who scored a staggering 313 goals on their way to a league-best 121 points. Over the span of this analysis, the Caps' 313 goals is nearly 25 more than the next closest team (the 2008-2009 Red Wings).

Corsi_medium

While this data is fairly clustered, there does seem to be a positive relationship between a team's shot differential and the number of standings points it earns in a season. Again, this would seem to be a logical result. However, the lack of a really strong relationships also points to the fact that teams can still be successful, even while being outshot. For instance, our very own Ducks had the third worst Corsi rating in 2010-2011, and yet this still finished with 99 points, good for 9th best overall.

Over the course of this study, the 2009-2010 Chicago Blackhawks registered the best regular season Corsi rating with +740, meaning they outshot the competition by an average of 9 shots per game. The worst team? This year's Minnesota Wild, who registered a ghastly -477.

Faceoff_medium

Faceoff percentage is a tough one to call from the visual view alone. There seems to be a general trend here, but the vast majority of the data points are so clustered that I'm willing to bet we won't see an r value above .25.

For fun, the worst team with a faceoff wining percentage of 50% or more was the 2008-2009 Tampa Bay Lightning. That year, they won 50.2% of their draws on their way to 66 points. Conversely, the best team with a faceoff winning percentage of less than 50% was the 2010-2011 Pittsburgh Penguins. Despite only winning 49.2% of draws, the Penguins racked up 106 points during the campaign.

Pp_medium

Like faceoffs, power play percentage seems to have a positive correlation to winning, but it's tough to tell for sure. There are literally data points all over the map on this one, although no team with a success rate over 20% finished with less than 88 points in a given season.

The worst team over the last three years is the 2008-2009 Columbus Blue Jackets, who somehow managed to rack up 92 points while converting only 12.7% of chances. In our study area, three teams managed to crack 25%: the 2008-2009 Red Wings, the 2008-2009 Capitals, and the 2009-2010 Capitals.

Blocked-shots_medium

Here's our first real surprise: blocked shots seem to negatively correlate with standings points. While the distribution is similar to faceoffs and power play in its relative clustering about the middle, there does seem to be a general negative trend. In trying to suggest why this might happen, my best guess would be that teams with more blocked shots probably face more shots overall. So while they may be blocking more shots than the rest of the league, they may also be giving up more goals.

In 2010-2011, the ability to block shots was a bad omen. Among the top five teams in blocked shots this year, only the Flyers and the Rangers (numbers 3 and 4, respectively) were able to reach the post-season. The other three teams in the top five (the Islanders, Leafs, and Thrashers) averaged 79.3 points.

Seconday-scoring-goals_medium

I decided to look at secondary scoring because it's one of those things you always hear about during the playoffs. Some commentator will lament team X's secondary contributions while pointing to team Y's goal distribution. But does that really matter?

As a matter of full disclosure, I had a hard time figuring out how to quantify secondary scoring (which is why I look at two different measures of it in this analysis). Ten or more goals seemed reasonable, and I was stunned by how tightly teams were grouped here. No matter how good or bad a team is, they probably have somewhere in the realm of 8 to 10 skaters who are capable of potting 10 or more goals. In fact, more than half the teams in this sample (60%) fell in that range.

This graph seems to imply that there is a relationship between secondary goal scoring and winning, but the data is just too tightly grouped to tell for sure. What's more likely is that secondary scoring (at least as I've defined it here) is more myth than reality. Most teams have a similar number of players who are able to score 10+ goals in a season. The key is whether or not those players are scoring significantly more than 10 goals or not.

Pim_medium

While PIM is a mixed measure of a team's pugilism and its discipline (with a dash of reputation thrown in), it's an important characteristic to test. The data implies there might be a slightly negative relationship here, which is what we would logically expect. However, the relative clustering here makes me think we probably won't see an r value less than -.3.

This year's version of the New York Islanders featured the highest total in our sample at 1,515. This is probably the result of their 31 (!!!) misconducts. In our sample, only the 2009-2010 Nashville Predators finished with less than 700 PIM.

Secondary-scoring-points_medium

This was my second attempt at quantifying secondary scoring. By looking at the number of players on each team with 20+ points in a season, I was hoping to get a better understanding of secondary scoring impact. However, like my first attempt, this measure of secondary scoring is just too clustered to draw any meaningful results.

By the way, that data point all alone in the lower left? That would be the 2008-2009 Tampa Bay Lightning, who only featured 7 skates with 20 or more points. When you're relying on Steve Eminger for 23 points, I guess it's not so surprising that the Lightning only earned 66 standings points during 2008-2009.

Takeaways_medium

Takeaways is another one of those areas where I thought we'd see a more definitive result. However, the data is jut too clustered to reach any conclusions. For instance, teams with more than 800 takeaways over the past three seasons ranged from a low of 79 standings points to a high of 117.

Interestingly, the Ducks recorded exactly one more takeway in 2010-2011 (447) than they did in 2009-2010 (446). I'm not sure why that's so noteworthy to me, but if you had told me that before the season started, I don't think I would have believed you. Whatever the reason for the similar numbers, it's clear that the Ducks are missing some of the shutdown guys they had in 2008-2009, when they recorded 671 takeaways.

Shootout-winning_medium

For kicks, I wanted to throw in a team's shootout proficiency as one final measure. After all, it wouldn't be a sufficiently controversial NHL blog post without mentioning the shootout. And as we can see, it probably doesn't matter how bad you are at the shootout. While it might help to nab that extra standings point on occasion, the fact of the matter is that the last two Presidents' Trophy winners lost more shootouts than they won.

The most prolific shootout team (at least in terms of winning the shootouts they participate in) over the past three seasons was this year's Colorado Avalanche, who were 6-1 in the shootout. Unfortunately for them, this was the only thing they did well this year.

Closing Thoughts

Personally, I'm glad this step is now done. The data collection took me several hours and now that it's all here, we'll be able to start making some inferences.

I'm personally surprised about what we're seeing with Blocked Shots. I can't wait to further investigate that with some regression analysis to see if there actually is a negative relationship here or not. Otherwise, I can't say I'm all that stunned by any of the results here. While the parity among rosters was greater than I expected (at least in terms of goal and point producers), most of what we're seeing is in-line with what I expected.

In my next post, I will actually run a least squares regression on each of these scenarios to get an equation that approximates each of these characteristics. I'll also calculate the r values for each statistic to find out just how strongly that characteristic correlates to standings points.