Published: 31 January 2015
The recent debunking and de-debunking of the Patriots' fumble numbers has taken a decidedly unscientific turn. The posts I've read seem to be "advocacy" research, not unlike research sponsored by Nabisco claiming that Oreos are good for you. Each side's biases and contortions aren't very hard to detect. I think that the possibility that the fumble rates may be connected to the current allegations warrants some fair-minded investigation. Some of the debunking has merit, but some of it does not. Whether NE appears to be a 4-sigma outlier or an 8-sigma outlier is beside the point. Without knowing who to trust, I thought I'd go a step further with my own look into the numbers.
Instead of slicing the data this way and that--by year, by dome/outdoor, home/away, elite-QB teams and non-elite QB teams, or whatever--let's allow the numbers to speak for themselves. We won't exclude certain teams or years. We'll design a simple model that can account for all of these factors. It seems to me that the debate boils down to two primary considerations: Should dome teams be compared with outdoor teams? And, does the fact that the NE offense is normally very good anyway account for their level of ball security?
I ran a linear regression to predict how many fumbles we should expect of an offense. The predicted number of fumbles isn't really what we're after. It's the error of the model. If one team or another is consistently defying the odds with superior ball security even after accounting for the variables that are likely to explain low fumble rates, that might revise our suspicion that NE has enjoyed some kind of advantage. Some of you more sophisticated readers might pooh-pooh linear regression, but it does have its uses, and I think this is a good one.
The model predicts the total number of offensive (special teams is excluded) fumbles based on each team's completions, incompletions, sacks, and run attempts. Each of these play types have uniquely different probabilities of fumbling, so teams with high numbers of some and low numbers of other play types should be expected to have their fumble numbers vary accordingly. These variables are intended to account for the nature and quality of each offense and its players. This also means that fumble "rate" is implicitly accounted for. In other words, more plays would naturally mean more chances to fumble.
Additionally, the model accounts for a team's home stadium type (indoor or outdoor). Instead of including or excluding certain teams, the regression will give us a fair estimate of how much playing in a dome helps a team's fumble rates. I also included year in the model because fumble rates have been declining league-wide since 2000. Without accounting for this trend, it would appear that all recent teams are fumbling less than expected.
I also created an alternate specification of the model. I thought that the indoor/outdoor dummy variable might be too blunt and not really capture the actual playing conditions based on climate or even field turf type. Some environments make fumbling much more likely than others, such as wet, muddy or extremely cold conditions. Instead of indoor/outdoor, I used a variable for each team's opponent's number of fumbles. (Just fumbles from the games in which they played each other, not season-long totals for every opponent.) The idea is that opponent fumbles are a fair way of gauging each game environment's "fumblerificness." For any one game there's a lot of randomness, but in aggregate it might be a clever way of looking at things.
Here are the results for the first specification (indoor/outdoor). Each cell tells us how many more or fewer fumbles a team actually had compared to how many we would expect. Teams with much fewer fumbles than expected are in green and those with much more than expected are in red. I have deliberately not sorted the table or emphasized NE in any way.
Offensive Fumbles Above Expected
Here are the results using opponent fumbles as a control for game environment. Not much difference. In fact, when I switch windows back and forth on each table, the color codes barely change.
Offensive Fumbles Above Expected (Opponent Fumble Spec.)
READ THIS IF YOU WANT TO BELIEVE NE IS INNOCENT
NE is not an outlier in any sense. They are actually #2 on the list after ATL for the period since 2006. There are other teams with nearly as impressive over-performance as NE for several-year stretches, such as ATL, IND, and DET. Should we also be investigating ATL for ball tampering?
You can see that the model used may not completely account for QB "eliteness." Brady's best years came since 2007, and you can see the Manning-led Colts had impressive fumble over-performance for a certain period. Drew Brees' Saints seem to have a similar pattern.
NE actually had 3 more fumbles than we'd expect in 2013, breaking the string of consecutive seasons with fewer than expected fumbles.
The results show that indoor/outdoor considerations aren't that important. It's improper to throw out all the dome teams.
Bill Belichick is known to be very focused on ball security. He is very conscious of ball conditions during practice and deliberately makes his players practice with balls in poor condition. He quickly benches players who fumble, and even cuts or releases them. We should expect NE to have good fumble numbers.READ THIS IF YOU WANT TO BELIEVE THERE'S SOMETHING FISHY GOING ON
NE is #2 since '06, but is #1 since '07, #1 since '08, #1 since '09...you get the picture. The string of consecutive seasons of over-performance is unmistakable. One year where they might be nipped by randomness (and just barely) does not mean they don't have an advantage.
Besides, the standard here isn't that unless NE is an extreme outlier they are in the clear. It's misleading to imply otherwise. The standard should be: which explanation is the data most consistent with?
The tables above show that the data is exactly what we'd expect if a team had an advantage but extremely unusual and unlikely to be observed by chance alone.
Regarding ATL, yes, they are also very good with ball security. Should they be investigated too? Possibly, but I wouldn't start an investigation or make an accusation on statistics alone. The bottom line is that nothing here rules out the possibility other teams also tamper with their footballs.
The timing of the sudden improvement is suspicious. Just a year into the new rules, NE goes from being average overall and having just about the number of fumbles we would expect given their offensive style to being the absolute best in the league.
NE's over-performance cannot be explained by Brady's skills or his "eliteness." Brady was the same person before and after the rule change, but his team's fumble numbers didn't change until after. Further, when certifiably non-elite Matt Cassel had the reigns in '08, NE had equally impressive over-performance as any other year in the post-rule-change period. Those considerations plus the fact that the model accounts for run-pass balance, completion percentage and sacks disproves the "Brady is just that good" argument.
PS Ever notice that Brady underperformed his usual self in the two Super Bowls since the rule change? What's different about those games? The league provides all the balls.WHAT DO I THINK?
First, I believe we should have very strong evidence before making accusations about cheating or ethical violations whether we're talking about football or any other endeavor. Statistics alone could never be expected to provide that level of evidence.
Regarding the two main questions of the analysis, I'm split. I agree that throwing out all the indoor teams appears to be premature. Indoor teams do fumble less frequently but not by enough to throw them out of the comparison entirely. The model says indoor teams have under 1 fumble per year less than others, accounting for the other predictors. That's not enough to explain why ATL and NO also fumble relatively infrequently.
On the other hand, I think this model shows that the basic nature and quality of the NE offense doesn't fully explain their over-performance. But that is tempered by the fact that there are certainly other factors the model doesn't capture.
I admit I'm also sympathetic to what I'd call the maximum likelihood explanation
. Which alternative explanation is the evidence most consistent with? For those of you statisticians out there, think of it as Bayesian parameter fitting. The parameter is the range of possible explanations, i.e. "the truth:" (NE tampered, NE did not tamper). Which of these theories does the data best fit? But it's not completely clear to me which way things lean. I guess it depends on how cynical you are.
To be honest, I'm not as interested in the statistical issue at hand as I am in the epistemological considerations surrounding it. There's a difference between wondering how often any NFL team might be so fortunate with fumbles, and wondering how likely NE itself might be so fortunate in a certain period of time. It's not like we're scanning the stats of all 32 teams to find unusual patterns over any period and then making accusations of improper behavior. In this case, the accusations already exist based on non-statistical evidence and we're scanning NE's stats in a certain period to see if they're consistent or inconsistent with the accusations. Those are very different questions, and given the same data, yield very different answers.
At the furthest, we could say the numbers are consistent with
the theory there was tampering, and stop there. NE isn't some crazy outlier, but they are the best in the league at ball security since shortly after the rule change, which would be what we'd expect if the accusations were true. The numbers do not prove it by any stretch, and the alternative theory that the numbers are explained by other factors remains plausible.HOW BIG A DEAL IS THIS ANYWAY?
Let's say that NE had exactly the number of fumbles over the last few seasons that we'd expect given their offensive parameters. How big of a difference would this mean to the bottom line of winning games? Fumbles (not fumbles lost) average -0.055 Win Probability Added (WPA). In other words, each fumble reduces a team's chances of winning a game by about 5 or 6 percentage points. NE has averaged 5 fewer fumbles than expected in each season since '07. That means that their over-performance results in about 0.28 WPA per season. Every little bit matters, but that's not enough to make a big dent in NE's outstanding record over that period. Then again, this only considers fumbles, and not other possible benefits of under-inflation, such as throwing and catching.
1. Regression results pasted below.
2. I also ran specifications where year was a set of dummy (0/1) variables. This did not change the results meaningfully, so I kept the simpler version with year as a linear predictor.
lm(formula = ofum ~ sk + comp + inc + ratt + year + indoor, data = data)
Min 1Q Median 3Q Max
-17.1969 -3.4299 -0.3501 3.6088 18.3044
Estimate Std. Error t value Pr(>|t|)
(Intercept) 596.262989 116.293152 5.127 4.30e-07 ***
sk 0.154891 0.023258 6.660 7.67e-11 ***
comp -0.005998 0.004910 -1.222 0.22240
inc 0.027522 0.009647 2.853 0.00452 **
ratt 0.007952 0.003660 2.173 0.03028 *
year -0.292761 0.057957 -5.051 6.29e-07 ***
indoor1 -0.283076 0.586677 -0.483 0.62967
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.266 on 471 degrees of freedom
Multiple R-squared: 0.1689, Adjusted R-squared: 0.1583
F-statistic: 15.95 on 6 and 471 DF, p-value: < 2.2e-16
lm(formula = ofum ~ sk + comp + inc + ratt + year + oppfum, data = data)
Min 1Q Median 3Q Max
-16.7713 -3.4957 -0.3868 3.6292 18.4993
Estimate Std. Error t value Pr(>|t|)
(Intercept) 557.868140 117.670637 4.741 2.82e-06 ***
sk 0.154281 0.023169 6.659 7.70e-11 ***
comp -0.007245 0.004777 -1.517 0.13006
inc 0.025897 0.009656 2.682 0.00757 **
ratt 0.006918 0.003665 1.887 0.05971 .
year -0.273874 0.058624 -4.672 3.90e-06 ***
oppfum 0.087719 0.047598 1.843 0.06597 .
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.248 on 471 degrees of freedom
Multiple R-squared: 0.1744, Adjusted R-squared: 0.1639
F-statistic: 16.59 on 6 and 471 DF, p-value: < 2.2e-16