I have no doubt that both of those approaches can offer teams insights and benefits, but it remains to be seen just how big those benefits might be and how costly and difficult it will be to get them. I get the sense that there will be an avalanche of data and it will require large and expensive efforts to gain marginal benefits above what conventional methods offer.

That's why I remain convinced that the most direct, most demonstrable, and most actionable analytic approach in football is in-game decision support.

By

By

By

Perhaps the most important aspect of game strategy analysis is that it's the most cost-effective way to increase a team's win total, by far.

]]>

First, clients now have direct access to the full version of the tool under the Client menu. This allows up to 20,000 replications of a game and can provide very high accuracy.

Second, a limited version is now available for the public under the Tools menu. This is different than the demo I published a couple weeks ago, and allows only up to 200 replications, and will provide limited accuracy. I'll keep this available as long as the server load can handle it.

Both versions feature the identical underlying model. Entering n=1 will continue to provide play-level output for validation. I like using this feature to watch what's happening under the hood and to gain insight into why the results are coming out the way they do.

]]>

The demo on the sim will continue, and I appreciate any more feedback you have. You can leave comments here or on the original demo post from last week. As with any software fix, the solution usually creates some other unintended problem, so please give it another whirl.

It's also looking more likely that I'll be able to host some sort of limited version of the sim that will allow multiple replications. In other words, you'd be able to run a bunch of games from the same starting point to get an estimate of simulated win probability. The real value of a sim is in its "excursions," that is, alternate specifications of the model that can be compared. For example, an excursion would be setting up the sim to start the 2-minute offense for one team at 5 min to play rather than 2 min to play. We could compare how often that team wins in each specification, and then perhaps gain some insight into why the results come out as they do.

]]>

a) attempt the field goal right away to save time for a possible last-minute touchdown drive? Or,

b) should you continue to drive for a touchdown, and hope to get a quick field goal later?

It seems like a question we could answer with empirical win probability, but there cases of relevant situations leave the intent of the offenses unclear. Teams overwhelmingly prefer option b. The answer is dependent on a number of important factors, including score, time, field position, and timeouts. We would need a large sample size for each combination of factors to build a reliable model for when (if ever) it would be a good idea to try an immediate field goal. It’s definitely a question outside the box of convention, and a perfect problem for my simulation model, the WOPR.

]]>

Instead of writing a long-winded post on how to use the tools, I thought I'd jump into the 21st century and make a short video demonstrating how they work and what the results mean. This is the first of what I hope will be a series of videos on various topics. For the time being I'm making both tools available on the fan side of the site. You can find them under the Tools menu.

]]>

There's a distinction between the WP model’s empirical methodology and its automatic output without any intervention or input from a human. When I do a detailed analysis for any specific play, I have the luxury of time and logic to dig directly into the data. The “auto” model that spits out WP estimates without any human input is based on lots of assumptions and interpolation on top of extrapolation etc. There are literally billions of combinations of game states (yd lines, downs, to go distances, seconds remaining, score difference, time outs). It’s just a matter of how much time I can put into coding the calculator to handle special cases like “a team's very last desperation play.”

With all the attention on that final play in the GB-SEA game, I thought it would be useful to look at Hail Mary success rates.

I looked at all the situations in which there were 8 or fewer seconds remaining (enough for one play) and a team needed a TD to tie or win (down by 4 through 8 points), and see how often they got that TD based on field position. This includes any plays that ultimately result in a TD, which would include any defensive penalty that enabled a subsequent scoring play. Even though SEA was specifically down by 5, the situations for being down by 4-8 are effectively indistinguishable for the purpose of estimating the chance they can get the TD.

From the 2000 through 2011 seasons, there were 223 examples in total--a little over 20 per 10-yard bin of field position. The chart below plots the TD success rate in the sample.

The Seahawks were at the 24 yd line, which would correspond to just over a 10% chance of a TD and winning the game.

Notes:

-The one indicated TD from the offense’s own 24 was from a 2003 NO-JAX game. I'm told that's an error. Thanks for the correction.

-It’s possible in short ranges (inside the 20), teams could get off 2 plays, not just 1 with 8 sec left. Limiting the results successively to cases within 7, 6, 5 sec, etc. doesn’t affect the results around the 20-30 bin. Actually, only the 1-10 yd line bin drops, and it becomes measurably lower than the 10-20 yd bin. This might suggest that it’s easier to score from slighter further back from the goal line. (There’s more space for receiver routes.)]]>

`Unless you're inside the 10 kicking on 3rd down isn't a good idea. Even a gain of 1 yd improves FG prob more than the chance of a bad snap.`

— Brian Burke (@Adv_NFL_Stats) December 2, 2013

Admittedly, I wrote that just based on my familiarity with the relevant numbers, so I thought I'd do the legwork. FG% improves with every yard closer a team gets. Every yard matters. In fact, every yards matters to the tune of 1.6% per yard when the line of scrimmage is between the 35-yard line and the 10-yard line.

Yesterday, Keith looked at this kind of situation in the context of the CHI-MIN game, and his results suggest the same conclusion. This post will examine play outcomes on 3rd down when the game is on the line and teams are in deep FG (

Run plays average 4.3 yards, and pass plays average 6.3 when a team is down by 0 through 3 points in the final 3 minutes of a game and their field position is between the 35 and 25. (Those seem long to me, too--but the averages are 3.8 for runs and 4.6 for pass plays in situations under 1 minute to play.) But averages aren't everything--here are the distributions:

We also need to consider turnovers, which would typically be fatal or near-fatal. But there's not much to consider. In the last 13 seasons, there has never been a fumble lost on a 3rd down prior to a make-or-break long FG attempt, and there have only been 3 interceptions. Those 3 interceptions have come on 173 pass plays, for a 1.7% rate. Offenses are wisely protecting the ball.

A back-of-the-envelope analysis says that a run on 3rd down, which averages 4.3 yards improve the chance of making the FG by 4.3 yds * 1.6% = 6.8%. A pass on 3rd down improves the chance of making a FG by 6.3 yds * 1.6 = 9.9%. But when we consider the cost of a turnover, the benefit is cut by, say, about 2%. Ultimately, we'd be safe saying that the benefit of running a conventional scrimmage play on 3rd down improves the chances of winning by at least a net of 4%+.

What about bad snaps and holds? Unfortunately those aren't official league statistics. So how can we estimate their likelihoods? Well, extra points don't get any easier as a matter of a kicking exercise, so let's proffer that

XPs were successful on 99.5% of all attempts in 2012 and 99.7% so far in 2013. The numbers for FGs inside the 29 are just as high. Someone with game-charted play-by-play might have a better number, but I'm confident bad snaps and bobbles occur in fewer than 1 in 200 FG attempts, or 0.5%.

Even with the most conservative of assumptions, a conventional play on 3rd down, on average, improves the chances of winning by at least 8 times greater than the risk of a bad snap or hold.

Inside the 10 yard line, things change

But what about the value of peace of mind to the snapper and holder, as @dmv726 pointed out on Twitter? I don't think there's much to that, based on what I've learned over the years. It's equally likely players focus even more is such situations. But the real evidence is that FG attempts are no less successful in clutch situations, so it's extremely unlikely snaps and holds are adversely affected by pressure. Run

Run a play on 3rd down and gain some yards. "Field goal range" is a myth. Closer is better, and the risks aren't big enough to make the difference.]]>

Watching the Sunday night game this week, I was struck by how the Bears returner struggled to get past the 20. The Bears began their first four drives at their own 6, 15, 20, and 19. Their returner was bringing the ball out from deep in the end zone, and I wondered if it were simpler, safer, and better in the long run to just take the touchback. Does the risk of penalties, short returns, and turnovers outweigh the potential reward of a good return?

Based on the Bears first four returns, no. But later in the game, their strategy paid off with a touchdown return. One lucky break does not make a strategy, however, so I looked at all the numbers, and I think I know the answer.]]>

The

Two strategies are compared. The first is playing for the stop and forcing the FG attempt. This may be dangerous due to the ability of the offense to burn clock. The second strategy is to allow an immediate TD. This strategy forfeits points to the opponent in exchange for enough time to respond with a game-winning TD drive.

There is no guarantee the offense will take the bait and score a TD. If the offense is cognizant of the strategy, they may take a knee close to the goal line. So strictly speaking, this analysis merely estimates

There are many considerations in the analysis:

1. The current time on the clock

2. How many timeouts you have

3. When you would get the ball back given a series 'stop' and a FG

4. The current field position of your opponent

5. The accuracy of a FG attempt from the expected attempt distance

6. The chance of scoring either a FG or TD after the 'stop' and FG

7. The chance of scoring a TD from your own 20 given an intentionally allowed TD at the current clock time, minus the time for the TD play itself

The problem boils down to a single comparison: A defense would prefer to intentionally allow a TD whenever the probability of scoring a TD in response exceeds the total probability of the offense missing the FG attempt plus the possibility that either a FG to TD can be scored in response to a successful FG.

It's a very complex problem with many moving parts. To simplify the analysis, there are several assumptions needed. The intent here is to begin to get our arms around a seemingly impenetrable problem. First, for now, I'll only look at first downs as decision points. In other words, we'll decide whether to play for the stop or to allow a TD immediately following a series conversion by the offense.

Second, once within reasonable field goal range, the offense will only run the ball and will not make another conversion. Of course they could convert, but this would be the least preferable outcomes for the defense. The possibility of a series conversion would only add weight to the scale on the side of intentionally allowing a TD. As with other analyses questioning conventional wisdom, it's best to choose simplifying assumptions that count in favor of the conventional choice and against the unconventional choice. In other words, this analysis says, "Coach,

Another assumption is that the team on offense will play smartly enough not to commit a significant penalty or turnover.

Also, the offense will gain a modest amount of field position on its three plays prior to a FG attempt. For the sake of simplicity, I'm going to say there will be 5 yards gained between 1st and 4th down prior to the FG attempt. Unless the offense is at a very long FG attempt range, a few yards in either direction will not make a large difference in the final analysis. Additionally, the offense will use 39 seconds between plays (whenever a timeout is not called or two-minute warning does not occur), and plays themselves will take 6 seconds.

Lastly, the analysis assumes that after a score, the subsequent drive will start very near a team's own 20-yard line. This is plausible because touchbacks are now so common that the average starting position following a kickoff is a team's own 22, and touchbacks would be preferred by the receiving team because no time elapses on the play.

If a team does allow a TD, it would need its own in response to win. The probability of scoring a TD is a function of only time remaining. The probability estimates for scoring are based on recent history where teams need a TD to tie or win on a final possession.

If the team on defense forces the FG attempt and it's successful, a TD or FG in response would be needed to win. The probability of scoring either a TD or FG is also a function of time and based on recent historical scoring rates for teams that need a score to tie or win in the endgame.

The main question boils down to three possibilities:

A. Forcing a stop and hoping for a FG miss

B. Given an opponent's made FG, getting your own FG or TD in response after the opponent has burned as much time as possible

C. Getting your own TD in response to an intentionally allowed TD at the current time remaining (minus the duration of the play).

Possibilities A and B comprise the win probability (wp) for forcing the stop and the FG attempt. Possibility C directly relates to the chances of winning after allowing an intentional TD. [I left wp in lowercase to distinguish it from the global Win Probability model I often use. This analysis estimates the chances of winning

Or alternatively:

wp[force FG] = p(FG fail) + p(scoring | made FG)

wp[allow TD] = p(scoring own TD in response)

The decision should be whichever wp is higher:

Decision = max{wp[force FG], wp[allow TD]}

The next four parts of this series will estimate the time that the team on defense will regain possession (part 2), estimate the probability of a failed FG attempt (part 3), estimate the probabilities of the team on defense responding with its own score (part 4), and present the final results (part 5). This is the second part of a five-part series on when a defense would prefer an intentionally allowed TD to forcing a FG. The first part laid out the analysis and assumptions. This part explains how to estimate the time remaining when the defense would regain possession following a forced FG attempt by the offense.

The first task of the analysis was to create an algorithm to compute the time on the clock when the team on defense would get the ball back following a forced FG. This is a function of current time and time outs remaining for the defense. For example, suppose the offense has just converted a series so that the 1st down snap will happen at 1:20, and the defense has two timeouts. The offense will run three times, you'll call both timeouts, and following a FG, you'll probably get the ball back with 17 seconds remaining. The two-minute warning is factored in, which is more challenging than it might seem.

The time-you-get-the-ball-back algorithm assumes that the defense will use its timeouts at every immediate opportunity. The only exception will be when the play itself spans the 2-minute warning. For example, if there is 2:10 on the clock at the snap and the play duration is 6 seconds, the defense will call a timeout at 2:04 rather than allow the clock to wind down to the 2-minute warning.

However, there is a special case where the defense may want to allow the clock to run down to the two-minute warning rather than use all its timeouts. For example, if there is 2:10 remaining between 2nd and 3rd down and a team has 2 timeouts remaining, it may chose to allow the clock to wind down to 2:00. The third down snap would occur following the two minute warning, and the defense would call its 2nd timeout between 3rd and 4th down, at around 1:54. This would allow the defense to save one timeout for use on offense.

I did not account for this scenario in my analysis. The reason was that timeouts on offensive final drives did not have a discernible effect on scoring success. In fact, the number of timeouts available to the offense appeared to have a slightly negative effect. I believe this is an artifact of sample error, but the fact remains that I can't estimate the value of an offensive timeout with the current data. For that reason, I said teams would always use their timeouts while on defense, which

The graph below indicates how much time will be on the clock at kickoff (vertical axis) based on the time remaining at the 1st down snap (horizontal axis). This assumes the offense will attempt to keep the clock running on 1st, 2nd and 3rd downs, and the defense will use its timeouts whenever a play does not span the two-minute warning. It also assumes the offense will consume 39 seconds between plays whenever the clock is allowed to run and that the duration of each play is 6 seconds.

Each color represents a number of timeouts remaining to the defense. The sharp vertical segments indicate the effect of the two-minute warning.

Here are two examples to help understand this chart. Suppose the defense has two timeouts and the 1st down will occur with 1:40 (100 seconds) to play. On the horizontal axis find the 100 mark and move upward until you hit the green (2 timeouts) line. Look directly leftward from that point to the vertical axis to see that there will be 37 seconds remaining. Here's the breakdown from the algorithm for this example:

1st down snap: 100

Timeout

2nd down snap: 94

Timeout

3rd down snap: 88

4th down snap: 43

Get ball back at: 37

Timeouts left: 0

Now suppose the defense has one timeout and the 1st down will occur with 2:10 (130 seconds) to play. On the horizontal axis find the 130 mark and move upward until you hit the red (1 timeout) line. Look directly leftward from that point to the vertical axis to see that there will be 67 seconds remaining. Here is that breakdown:

1st down snap: 130

Timeout

2nd down snap: 124

Two minute warning

3rd down snap: 118

4th down snap: 73

Get ball back at: 67

Timeouts left: 0

These numbers are critical to the analysis because they will largely determine likelihood of scoring in response to either a forced FG or an allowed TD, which in turn will determine which is the better option.

The next three installments will estimate the probability of a failed FG attempt (part 3), estimate the probabilities of the team on defense responding with its own score (part 4), and present the results (part 5). This is the third part of a five-part post on when an intentionally allowed TD is preferable to forcing a FG in the endgame. The first two parts of the post laid out the analysis and assumptions, and estimated the time remaining when the team on defense would get back possession. This installment looks at the probability of an unsuccessful FG attempt to take the lead.

The simplest way to win for a defense caught in the

Also, as mentioned previously, an estimate of the expected field position on 4th down is required. For now, I will say that the offense will gain 5 yards during its 1st, 2nd and 3rd down plays. This is essentially a plausible placeholder for now, and a more detailed analysis can be done to confirm or adjust this.

The graph below shows three things. The jagged blue line is the actual raw FG success rate by line of scrimmage. (Add 17 or 18 yards for the commonly used 'kick distance'.) The red line is the estimated true probability of success. This estimate was computed non-parametrically, using locally weighted regression. The green line is the same curve as the red line, but offset 5 yards closer to the uprights. (Inside the 10, the 5 yards are progressively curtailed.)

But what really matters is the inverse of the chart above. What we care about is the expected probability of failure, that is, one minus the probability of success. The chart below provides a different perspective, emphasizing how the FG failure rates dramatically increase with range.

The final two parts of this post will look at the probabilities of a responding scoring drive (part 4) and will present the final results of when a defense would prefer to allow a TD to forcing a FG (part 5).

If the defense plays conventionally, and the opponent's FG is successful, a score will be needed to win. Either a FG or TD will do. We can assume that the drive will began at or very near the offense's own 20-yard line for a couple reasons. First, the average starting field position for all drives is the 22. And second, it's very likely that, with time at a premium, the offense would prefer a touchback so that no time expires on the kickoff.

For this estimate, I looked at all game situations in which an offense needed a score to survive and had a 1st down at or very near its own 20. Success is defined as any drive that results in a TD or FG.

The blue line is the raw success rate as a function of seconds remaining at the 1st down snap. The red line is the smoothed estimate of the probability of scoring based on a local regression. Because there were several bins of data with very few cases (which caused the large noisy swings in the raw averages), I used a regression method that weighted each case by how often it appeared. In other words, if there were 10 cases where a team gained possession with around 50 seconds to play and 20 with around 60 seconds to play, the regression weighted each bin of cases proportionately.

The probability of success drops precipitously under 30 seconds to play. That appears to be the very least amount of time for a team to get into FG range from a team's own 20. But time in excess of 30 seconds is only marginally more valuable as time remaining increases.

In this case, the offense now needs a TD of its own. Like the situation that requires a response to a FG, the team now on offense is assumed to gain possession at or very near its own 20-yard line. The preceding TD play was estimated to consume six seconds.

For this estimate I looked at all endgame situations in which the offense needed a TD to survive and had a 1st down very near its own 20. The graph below plots the success rate by time remaining. As mentioned previously, timeouts remaining for the offense do not have a measurable effect. The jagged blue line is the raw data, and the red line is the smoothed estimate.

Without the possibility of a FG, the curve appears smoother than the probability of either a TD or FG, at least according to the regression. I intuitively suspect that the true probability is closer to initial steepness that the raw numbers indicate, but for now I'm leaving the regression as is. It's another factor I'd be willing to revisit.

Comparing the two charts (FG or TD needed vs. TD needed) suggests that teams play differently, and perhaps irrationally, based on the score situation. When time is not too pressing (with about 2 minutes or slightly more to play), offenses that need a TD appear slightly

Next, the fifth and final part of this series will put everything together and present the results. This is the fifth and final part of the series on when a defense would prefer an intentionally allowed TD to forcing a FG. The previous four parts laid out the analysis and assumptions, computed the time remaining when the defense would regain possession, estimated the probability of a failed FG attempt, and estimated the probabilities of the team on defense responding with its own score. Now it's time to put all the parts together and present the results.

Taking a step back, the goal is to compare two strategies for the defense. The first is to play conventionally and force a stop and a FG attempt, hoping it will either fail or that there is enough time to match it with a counter-score. The second is to intentionally allow a TD immediately and use the time remaining to respond with a counter-TD.

So far, we have estimates for the key inputs:

-When the team on defense would get the ball back

-The probability of failure on the FG attempt

-The probability of responding to a made FG with a score

-The probability of responding to an allowed TD with a TD

The allow-the-TD strategy is the simpler one to value. We can account for the time of the intentional TD play and plot the probability of responding with another TD as a function of time. However, there is one wrinkle. If the team on defense is only ahead by one point, the offense would be smart to go for the two point conversion following a TD, allowed or not. If the offense converts the two point conversion, a response TD only ties. If the two point conversion fails, it's no different than kicking the extra point. A response TD wins either way. The offense therefore has nothing to lose by going for the two point conversion.

For when the team on defense is ahead by two points:

wp[allow TD] = p(scoring own TD in response)

But for when the team on defense is ahead by one point, and the offense would go for the two point conversion to take a 7-point lead:

wp[allow TD] = p(scoring own TD in response) * p(2-pt conv fails) + 0.5 * p(2-pt conv succeeds)

The value of forcing the FG is slightly more complicated because it combines the possibility of a failed FG with the possibility of responding with another score.

This is a total probability computation, much like we do for typical 4th down decisions. The probability of scoring is a function of time remaining, and the probability of a failed FG is a function of field position. The calculation becomes:

wp [force FG] = p(FG fail) *1 + p(scoring) * (1 - p(FG fail))

There are too many variables to show in a single illustration, so presenting the results required some creativity. There are timeouts, field position, time remaining, plus the result variable, win probability. To simplify things, the results are broken out into separate graphs for each possible number of timeouts remaining. Also, field position is represented by multiple lines on each graph, with each color denoting a 5-yard increment.

I'm going to go out of order so I can illustrate the results with a prominent example from Super Bowl 46 between the Giants and Patriots. Up by 2 points, the Patriots defense took the field to stop a final Giants drive that started on the New York 12 with 3:46 to play. It took only three plays for the Giants to make it inside New England's 35.

The graph below shows the win probability for defenses with two timeouts remaining. The horizontal axis is time remaining at the 1st down snap. The vertical axis represents the wp for the various situations described in the curves. The black line is the wp for allowing an intentional TD. The colored lines are the wp for forcing the FG attempt and each one represents the field position at the 1st down snap. Wherever the black "allow TD" line is higher, an immediate offensive TD would be preferable to forcing a FG.

You'll notice two abrupt vertical inclines in the colored curves for the force-FG option. The leftward one is due to the rapidly increasing probability of responding to a made FG with a score with respect to time. The second is due to the two minute warning. The force-FG option curves are so irregular because the time the defense would get the ball back is so irregular. The allow-TD curve is smooth because the time the defense would get the ball back is nearly immediate.

The Giants had three first downs inside FG range. The first (1) was at 2:52 at the NE 34. The second (2) was at the NE 18 immediately following the two minute warning. The third (3) was at 1:09 at the NE 7.

As the chart shows, 1st down #1 was well above the

Curiously, Patriots coach Bill Belichick did not call a timeout between play #2 and play #3. Not until following a 1-yard gain on 1st and goal from the 7 did Belichick call his second timeout. On the very next play, Ahmad Bradshaw was (by most accounts) allowed to score the TD. Ultimately, the Patriots got the ball with 57 sec to play and one timeout remaining.

Had NE called a timeout prior to 1st down #3 and the identical events unfolded--NYG scoring a TD on their subsequent 2nd down--NE would have had an estimated 20% chance of winning instead of the 6% or so they had when they actually took possession. It's possible Belichick was hoping that somehow time would run out on the Giants. But it's more likely that, with two timeouts in his pocket, Belichick chose to wait to see how the first down play turned out before deciding to use them. If his defense held, he would use one, but if his defense allowed a conversion, he would wait to see how the subsequent first down play went. I think this was a mistake because at any time the very next play could be a touchdown, and he'd rather have the extra 39 seconds than an extra timeout on offense.

Here are the resulting charts for when an immediate TD is preferable to forcing a FG. (Suitable for lamination, coaches!) With no timeouts remaining, the situation is very dire, and there is a relatively large window for preferring to allow a TD (or for taking a knee). The solid black line is the wp for when the team on defense leads by two points. The dashed black line is the wp for when the team on defense leads by one point. Note: I chose a value of 47% for the chance of the offense converting a two point conversion in the case of the 1-point lead for the defense.

As a reminder, wherever the first down situation is above the appropriate black line, the preferred option is to force the stop and FG attempt. Wherever the situation is below the appropriate black line, the preferred option is to allow the TD.

With a single timeout, the window gets smaller as the team on defense's ability to respond to a made FG comes into play.

Here is the chart for two timeouts remaining, which we saw earlier in the example from Super Bowl 46.

With all three timeouts available to the defense, the immediate TD is almost never preferable to forcing the FG. There's just a tiny window with about a minute left and the ball inside the 15.

There's more work to be done. As pointed out by a commenter, if the offense misses its FG attempt but still has timeouts and time on the clock, the probability of winning by making a stop would be lower than I've estimated here. We also want to know the numbers for when the game is tied, or when the defense is up by three.

This is why football is uniquely compelling. In what other sport would it be better to allow your opponent to achieve a major score? When would you prefer that your opponent score a goal in hockey or soccer or lacrosse? When would you want your opponent to ever hit a three-pointer? What about baseball or cricket? Sure, you'd prefer to walk in one run to save four runs, but that's instinctively intuitive, the same way a football defense would normally prefer to give up 3 points instead of 7.

This may be the most complex, most challenging, and most counter-intuitive analysis I've done. There were some assumptions made in this analysis that could use some refinement, but I think we've got our arms around the problem, and we have a framework for further research. We also have a clear way of presenting the results in a way a coach can look up quickly in the heat of battle.]]>

Previous studies on 4th down decision-making include Carroll, Palmer, and Thorn's book Hidden Game of Football (1988, 1998) and Professor David Romer's Do Firms Maximize? (2005). The first serious study of the concepts used in these studies was by former NFL quarterback Virgil Carter, who co-authored an operations research paper examining the value of field position using data from the first 56 plays of the 1969 season.

My own analysis published in this post largely repeats the methods used in previous studies. But I think I can add a good deal to the topic. First, this analysis is based on a much larger data set compared to previous research. Second, this analysis offers possible confirmation of previous results. Third, I think I explain a complex, abstract subject such as this in a straightforward manner, which is essential if the 4th down revolution is going to make any headway.

This is how the study goes: At each yard line, I'll calculate and compare the expected point value, based on recent historical averages, of each of the three 4th down options--punt, field goal, or go for it. The option with the highest value is the recommended choice.

'Expected Points'

Speaking of abstract subjects, this entire analysis rests upon the foundation of the Expected Points (EP) concept. Most readers here may already be familiar with EP, but I'm going to summarize it now before I move on to the meat of the study.

EP is the average potential points a team can expect given a certain situation. The most common example is the potential point value of a 1st down at each yard line on the field. EP is the average of all 'next' score values at any given yard line. It's not necessary the average points scored on the current possession because possession could be exchanged several times before the 'next' score. EP is positive when the offense will usually score next, and negative if the defense will usually score next.

Here is the EP graph for a 1st down at each field position. These EP values are based on data from 2,400 NFL games from the 2000-2008 seasons. I used only data from the 1st and 3rd quarters to exclude situations hurried by an expiring clock and by desperate teams or teams with large leads playing differently late in games.

I'll be referring to the graph several times, so be sure you understand how to read it. A 1st down on an opponent's 20 is worth 3.7 EP. But a 1st down on an offense's own 5 yd line (95 yards to the end zone) is worth -0.5 EP. The team on defense is actually more likely to eventually score next.

Note that a 1st down at an offense's own 27 yd line is worth 0.7 EP. This is critical to explain an important twist in the EP concept. Every score requires a subsequent kickoff, and this has value to the receiving team. So to understand the real value of a the score, we need to subtract the value of the kickoff. For example, field goals aren't really worth 3 points. In the long run, they're worth 3 - 0.7 = 2.3 EP. And touchdowns are really worth 6.3 EP.

This concept is especially apparent when considering safeties. After a safety, the scoring team gets the ball back, on average at its own 40, which is equivalent to 1.3 EP. A safety is therefore really worth 2 + 1.3 = 3.3 EP. These score values and the resulting EP values are used throughout the rest of this analysis.

'Expected Outcomes' in General

Before I go any further, I'm going to take a step back and explain the concept of 'expected outcomes' in general. Say I can choose one of two routes home from work each day. One route is through the side roads, and the other is via the highway. If I take the side roads my commute always takes 20 minutes.

But the highway is more dicey. Sometimes it's clogged with traffic and can take 25 minutes to get home, but most other times traffic is clear and it only takes 15 minutes to get home. The highway is backed up 40% of the time and clear 60% of the time. Which route should I normally choose?

My 'expected commute time' can be calculated using simple proportions. The side roads take 20 minutes 100% of the time, so that's easy--the expected time for that route is 20 minutes. But the highway's expected time would be:

(25 min * 0.40) + (15 min * 0.60) = 19 min

So over the long run, the freeway is the better option. Unfortunately, life isn't that simple. We don't have such clear-cut options and we don't know the probabilities and payoff values of our decisions. But in football, we do<!-more->.

Punts

The EP value of the punt option is relatively straightforward. Based on recent historical data, we know the average net distance for punts from each yard line. The closer a team is to the end zone, the shorter a punt will tend to be due to touchbacks. Since we know the net distance of the punt, we know the expected subsequent field position for the opponent.

For example, a punt from a team's own 40 (60 yds from the end zone) nets around 37 yards, giving the opponent a 1st down at their own 23. This corresponds to 0.5 EP for the opponent, which is -0.5 EP for the punting team.

Field Goal Attempts

The EP value of a FG attempt is based on the probability of making the kick, which is dependent on kick distance. Just like taking the highway home from work, we can calculate the overall value of a FG attempt. Below is the graph of FG percentage by line of scrimmage.

A successful FG is worth 3 points minus the value of the ensuing kickoff for a total of 2.3 points. A missed FG is worth the EP value of a first down for the opponent at the spot of the kick (or the 20 yd line, whichever is larger).

For example, with the ball on the 20 yard line, the NFL average FG percentage is 82%. The spot of the kick would be the 27, which corresponds to 0.7 EP (that's -0.7 EP for the FG kicking team). Therefore the EP value of a field goal attempt from the 20 would be:

(0.82 * 2.3) + ((1-0.82) * -0.7) = 2.0 EP

Going For It

The value of a successful conversion attempt would be at least the EP value at the 1st down marker. Often, conversion attempts would obviously go further than the marker, but for now let's consider the minimum value of the conversion. The minimum value of an unsuccessful conversion attempt would be the EP value of a 1st down for the opponent at the spot of the attempt.

The probability of a successful conversion is primarily dependent on the distance to go. Field position also affects the chances of success due to the compression of the field in the red zone. With less area to defend, the defense's task is easier. The graph below plots the probability of a successful conversion by distance to go, broken out by areas of the field.

For example, a 4th down and 3 at the 50 yd line could be converted 56% of the time. A successful conversion would (at worst) give the offense a 1st down at the opponent's 47, worth 1.8 EP. And an unsuccessful conversion would give the opponent a first down at the 50, worth 1.9 EP to them and -1.9 EP to the current offense. The value of going for it is therefore:

(0.56 * 1.9) + ((1-0.56) * -1.8) = 0.3 EP

A Sample Situation

Let's say you're the coach of a team facing a 4th down and 3 from the opponent's 37. It's early in the second quarter and the score is tied. Should you call a punt, attempt a FG, or go for it? In reality, coaches have called for the punt 100% of the time in close games early in the second quarter. But is this the best thing to do?

Let's start with a punt. From the 37, we would expect a net punt distance of 23 yards, coming to the 14 yd line on average. The 14 yd line corresponds to -0.2 EP for the opponent, which is +0.2 EP for your team.

A FG attempt would be successful 45% of the time. A made FG would yield 2.3 EP factoring in the kickoff. A missed FG would give the ball to the opponent at his own 44. This is worth 1.1 EP to him and therefore -1.1 EP to us. The total expected point value of a field goal attempt would be:

(0.45 * 2.3) + ((1-0.45) * -1.1) = 0.4 EP

A 4th down and 3 conversion attempt from that part of the field would be successful 56% of the time. A successful conversion would mean a 1st down at at least the opponent's 34, which is worth 3.3 EP. A failed conversion attempt give the opponent a 1st down at his own 37, worth 1.3 EP to him and -1.3 EP to us. The total expected point value of going for it would be:

(0.56 * 3.3) + (1-0.56) * -1.3) = 1.3 EP

So in this example, the best decision is to go for the 1st down. In simple terms, it's worth the risk in the long run. It's not even close.

Putting It All Together

To built a chart of general recommendations where teams should go for it or kick, we can simply repeat this analysis for each yard line and distance to go. We'll start by plotting the EP values for kicks from various yard lines. First, here are the values for punts:

And here are the vales for FGs:

The graph for 'go for it' attempts is a little trickier. While punts are the same value regardless of distance to go, the value of a conversion attempt is highly dependent on it. The colored curves plotted below correspond to the EP values for each distance to go.

Now, let's put it all together and overlay the graphs for the kick values. (Click on the graph to enlarge).

Wherever the value lines for going for it are above or overlap the value lines for kicking, the decision should normally be to go for it. Remember, we assumed a successful conversion would be exactly at the first down marker and no further, which means the tie goes to ‘go for it.’ The final graph below charts the recommended option for each field position and distance to go combination. On the line or below it, a coach should go for the 1st down.

That chart is the bottom line, the take-away. It says that coaches should normally be far more aggressive on 4th down.

If the benefit of going for it is so clear, why are coaches choosing to kick so often? The authors of Hidden Game of Football suggest that the current 4th down doctrine in football is a hold-over from the early days of the sport. Back in the day, teams were lucky if they mounted one successful scoring drive all game. A good punt basically ensured the opponent wouldn't score on their ensuing possession.

David Romer's explanation goes a step further. He suggests that coaches are thinking more about their job security than their team's chances of winning. Coaches know that if they follow age-old convention by kicking and lose, then the players get most of the blame. But if they defy convention and go for the 1st down and fail, even if it was the best decision, they'll take all the criticism.

I buy both of those explanations, plus I'll throw in my own take. In addition to the natural conservatism of coaches, I believe much of the reason why coaches don't go for the conversion more often can be explained by Prospect Theory. As I outlined in my Decision Theory article, people tend to fear a loss more than they value an equivalent gain. This is a built-in tendency toward risk aversion means that coaches would be biased toward kicks rather than conversion attempts.

Do I expect coaches to do all this math on the sideline? Of course not. What I hope is that some coaches will one day see research like this and reset their baseline 4th down paradigm.

End Notes

The 37 yard line is the boundary between FGs and punts

All data are from official NFL gamebooks for all non-preseason games from 2000 through 2008.

This analysis only applies to ‘typical’ game situations when the score is relatively close, time is not expiring, and weather is not a large factor. With time expiring or if one team has a large lead, a different type of analysis is required. An analysis based on Win Probability can be generalized to any game situation.

This type of analysis can be tailored to any team’s specific characteristics, or opponent characteristics. For example, the Expected Points curve, 4th down conversion probability, and FG range and accuracy can be customized to produce a chart specific to a particular game.]]>