Moneyball – How NFL Teams Make Multimillion Dollar Decisions, Why Are They Wrong and How To Do It Right

Last week I was invited to give a talk at the Big Data Innovation Summit in Boston. My colleague, Patrick Philips, and I gave a talk about how LinkedIn uses crowdsourcing to improve its machine learning algorithms and data products. Since this post is not about this talk, I will just mention that we got great feedbacks and you can watch it here.

Parallel to the data innovation summit, there was another conference held in Boston, the Sports Analytics Innovation Summit. Since Patrick and I are big sports fans, our work involves doing a lot of analysis and we had a free pass because both conferences were organized by the same group, we decided to drop by.

We weren’t wrong. It was incredibly interesting to see the difference between our world – data products oriented consumer internet companies and the world of sports analytics. While state of the art for the internet companies is analyzing terabytes of data using distributed big data frameworks and applying machine learning algorithms such as deep learning neural networks, the state of the art for sports analytics to put it mildly … is different.

One of the most informative sessions at the conferences was given by the data analysts of the National Football League. In this session, the people in charge of analytics for the NFL explained that since data gathering and cleaning is a task that needs to be performed by all NFL teams, the NFL has built a new platform for teams to consume this data.

Platform is a pretty big word. One might imagine this platform as something similar to Google analytics where a team coach could log in and watch fancy graphs and charts about his team performance. It’s not exactly like that. It’s more like an Excel spreadsheet that holds very similar data to the one you might encounter at Yahoo! Sports or ESPN. Actually, it’s not more like an Excel spreadsheet, but it is exactly an Excel spreadsheet that is emailed to the teams in the league every week.

The spreadsheet contains a lot of tabs and a lot of canned reports about which team played, what was the performance of the players, which players were on the field for any given play and links to the videos of the plays. Pretty straightforward stuff. But there was something also a little bit different there. Something that immediately caught my eye, because it was a data product, and one that in the idea, very similar to many of the products we develop at LinkedIn. The product was named “Similar Running Backs”, which sounds very similar to the “Similar Profiles”, “Similar Companies” and “Similar Schools” LinkedIn data products.

The way the NFL analysts explained the idea behind Similar Running Backs is that every year the teams need to renegotiate contracts with their players. To make the negotiations (which sometimes go up to more than ten million dollars a year contracts) efficient, it is very helpful to the teams to understand how similar players are compensated. So the league created this tool for the teams as part of their new platform and the first version of this tool was comparing running backs.

Here is how it works – you select a player from a drop down list and the select two numbers which represent the similarity range of the players you are looking for. The smaller the range, the less players will fit the criteria and vice versa. For example: the values 95% and 105% will return the players whose regular season stats are between 95% and 105% of the corresponding statistic of the selected player.

Now let’s look at real data and see how the algorithm works. Note: for this analysis I only looked at players who in the 2012 season played at least 10 games and who had at least 10 rushing attempts. Let’s see what data do we have. The NFL used the following stats to assess running back similarity: number of games played, rushing attempts made, total rushing yards, rushing yards per game, rushing yards per attempt and rushing touchdowns.

Issue #1

Since the job of the running back is to carry the ball through the defensive line, which is basically a set of about seven 300 lbs. guys, running backs tend to get injured a lot. This causes them to miss a lot of games and makes hard to compare players who played all games in a season to players who played part of the games.

Solution #1

Normalize the data by the amount of games played. That is, instead of counting total rushing attempts and total rushing touchdowns, use rushing attempts per game and rushing touchdowns per game.

Now that we have the stats right, let’s try to find all the players with +/-5% range to every statistic.

Issue #2

Not all stats are alike. For example, in 2012 the top rushing yards player was Adrian Peterson with 131 yards per game while the lowest player, Jorvorskie Lane, rushed only 0.8 yards per game, about 160X times less. This means that the gaps between the rushing yards per game of players can be very significant. In comparison, in the rushing yards per attempt category, the top player, Cedric Peerman, rushed for 7.2 yards per attempt, while the lowest player rush for 1 yard per attempt which is only 7.2X lower. Since the differences between players on those two metrics are very different, it doesn’t make sense to compare 5% difference on these two stats as they symbolize the same “similarity”. Being within 5% of rushing yards per game is very similar, while for rushing per attempt it’s not.

Solution #2

Normalize the data to have the same units of distance. What we want to do here is to transform of our measurements to have the same range. One way to do so is to use the standard deviation metric. Standard deviation is a measurement for how wide is our range. Think of a bell curve, the wider the bell curve, the higher the standard deviation. We want the bell curves for all of our stats to look similar to each other. To accomplish this, we can normalize our data by the standard deviation. (This post is too short to explain why this concept works. Feel free to read more in this article).

Now that we have the right stats and they are all comparable, we can start looking at what players are similar to each other. Remember, since we have only four stats to work with, the most similar players will have all of their stats within 95%-105% of each other. Less similar players will have only 3, 2, so on and so forth.

It appears there are no two players in the league who are very similar on all four stats, but there are some who share three. Here is a visualization of these similarities:

We can see from this graph that there are three pairs of players that are very similar to each other in their stats and a cluster of six players who are also similar to one another.

Issue #3

While Darius Reynaud is similar to D.J. Ware who is similar to Le’Ron McClain who is similar to Jason Snelling. The first and the last are not very similar to each other. While both their output wasn’t high, Jason Snelling rushed twice more per game and per attempt than Darius Reynaud.

Issue #4

This similarity metric is too coarse. It’s all or nothing, either the players are within 5% from each other in most stats or they don’t. Even if we reduce the number of stats players have to be similar at to two as can be seen in this graph.

We still get only pairs of players who are similar to each other, but it is hard to see how they compare to other players.

Issue #5

This similarity product has too many levers. We need to provide it with the range of what it means that two players are similar to each other and we also need to provide how many stats should be similar.

Solution to #3, #4 and #5

We can provide a visualization that:

  1. Displays all the players

  2. Uses a continuous similarity metric where closer means more similar instead of the binary similar or not similar we used before

  3. Doesn’t need any levers

In order to achieve that, we will just cluster all players into groups and then display all the players on a chart where similar players will be close to each other and dissimilar people are far.

Now we can see all the players in a single graph separated into five groups. The red group (number 1) are the superstars, guys like Adrian Peterson, Marshawn Lynch and Arian Foster. These are the guys with the most rushing attempts, the highest yardage per game and the guys who by far scored the most touchdowns. The group closest to it, in magenta (number 5), are the second tier guys. These guys are very productive running backs, just not as productive as the guys in the first group. But while these two groups are interesting, pretty much every football fan could break these players into these three buckets. What is more interesting is who are the other three groups.

The second magenta group (number 3) is our least productive players. These players rushed only for 4 yards on average in a game with each attempt advancing them slightly more than 2 yards. The blue group (number 4) is made of players that while rushing 5 times the yards per game and twice the yardage per attempt, managed to score about the same numbers of touchdowns, 0.07 a game. The green group is made of players who are very similar to the blue group, only twice more effective in scoring touchdowns.

While this analysis does not provide a myriad of insights that are not already known to subject matter experts it does provide a nice and robust framework to understand player similarities with a single look. Also, while it’s very easy to compare players according to only their rushing abilities, things become more complicated when we add more dimensions to look at like fumbles and catches. Which running backs finished last season most similar to each other in terms of all this stats combine? This is a much harder question to answer for which I will let you guess in the comments, but the answer could be easily displayed once again as point on a flat surface.

Like always, you should follow me on Twitter