Frank brought up some good questions about using win-probability to determine how to organize a bullpen. I’ll try to answer those questions… in a later post. To be honest, I don’t have good answers ready to roll, and my initial thoughts brought me back to thinking about the best way to determine high-leverage situations in the first place. So I’ll discuss those issues first.
The most commonly used measure of leverage (relative importance of a specific points in a baseball game) that I’ve seen is called “P”. It’s simply the difference between the current win-probability and the win-probability if no more runs are scored the rest of the half-inning. (This is what I was calculating in the Closer Revolution post.) P was intially concocted by Doug Drinen. Dave Studeman (Baseball Graphs Blog and The Hardball Times) put together a sweet spreadsheet that allows the tracking of the play by play of a game, and assigns each change in win-probability to invididual players. You can then create graphs and summarize a game in cool ways, like this. Included in the file is a nice table of the P-values for every game-state. The file can be downloaded through this link and contains pretty thorough instructions. (Warning, it relies on macros and doesn’t work on Macs for some unknown reason.)
Another approach to leverage is to compare the best- and worst-case scenarios. Of course, the worst-case scenario for any half-inning is giving up an infinite number of runs, so this approach is often limited to just one plate-appearance. Leverage using this method is defined to be the difference in win-probability between a strikeout and a homerun. I find this approach lacking because it mostly boils down to how many runners are on base and doesn’t separate potential from actual. For example, bases loaded with zero outs and four runs scored with zero outs are “only” about two runs different from an expected-value point of view (4 from HR plus .5 expect with bases empty, versus 2.5 expected with bases loaded). However, they could be four runs different from an actual points of view (all score or they all don’t). The potential for not allowing those runs to score needs to be considered even if it only happens 13% of the time.
Both the above methods don’t thrill me because they don’t consider the full spectrum of outcomes, they merely compare extreme cases. Ideally, you’d like to account for the difference between allowing no more runs, one more run, two more runs, or eight more runs. Not all those outcomes are equally likely in a specific situation, and their probabilities vary greatly betweeen situations. We need some data that shows the likelihood of allowing a certain number of runs given a game-state. With it, you could compute the expected change in win-probability by multiplying the probability of scoring a specific number of more runs by the change in win-probability those runs would cause, and add it up. Here’s a simple example with the visiting team pitching with the bases loaded and two outs in the bottom of the eighth inning, leading by one run. I’m completely making up these numbers:
Current WP: 40%
Outcome — Likelihood — New WP — Change in WP
No runs — 65% — 60% — +20% (P)
One run — 15% — 50% — +10%
Two runs — 10% — 40% — 0
Three runs — 5% — 20% — -20%
Four runs — 3% — 5% — -35%
Five+ runs — 2% — 0% — -40% (exactly five in calculation below)
EV = .65x.20 + .15x.10 + .10×0 + .05x-.20 + .03x-.35 + .02x-.40 = +.12%
Light bulb moment:What that EV means is that we expect the win-probability to increase by 12% from the current bases-loaded/two-outs situation to the end of the inning. But that doesn’t make sense. If we expect the win-probability to increase by 12%, doesn’t that just mean that the win-probability is actually 12% higher right now? By definition, the expected change in win-probability should be zero — win-probability is calculated from the likelihood of all potential outcomes in the first place.
Taking real numbers (what a concept!) from the WPA spreadsheet and this chart at TangoTiger’s site (also included in the WPA file) yields:
Current WPA: 64.7% (taken from 2005 empirical data — grrr)
Outcome — Likelihood - -New WP — Change in WP
No runs — 67.5% — 84.2% — +19.5%
One run — 9.2% — 50% — -14.7%
Two runs — 10.5% — 15.8% — -48.9%
Three runs — 5.5% — 7% — -57.7%
Four runs — 4.8% — 3% — -61.7%
Five+ runs — 2.5% — 1.3% — -63.4% (exactly five in calculation below)
EV = .675*.195 + . 92*-.147 + .105*-.489 +.055*-.577 +.048*-.617 + .025*-.634 = -13%
Ok, so my theory’s not proved. The error likely comes from combining the theoretical numbers with the empirical 2005 number used for current WP. But I still believe my theory is correct: the expected change in win probability is by definition zero.
So if the expected change is always zero, what other calculation could take into account the many different possible outcomes? How about standard deviation? Here’s the standard deviation calculation using the “real” numbers above. Standard deviation is the sum of: each difference between actual and expected value, squared, times the probability of the actual, with the whole thing square rooted, or sqrt( sum[ P(Ai)*(Ai-EV)^2 ] ), which reduces because we’re assuming EV is zero:
SD = sqrt[ .675*(.195)^2 + .092*(.147)^2 + .105*(.489)^2 + .055*(.577)^2 + .048*(.617)^2 + .025*(.634)^2 ] = 31.5%
The standard deviation of all the possible changes in win-expectancy between the current situation and the end of the inning is 31.5%. If we performed another calcuation for a different situation and got a higher number, that situation would be more important. A lower value would imply less importance. (For example, the standard deviation of changes in win-probability for the same situation except no runners on base is only 13.3% — it makes sense that this situation is much less critical to the outcome of the game.) One benefit of using standard deviation from a mathy point of view is that it’s not resistant to outliers. Outcomes that result in more drastic changes in win-probability (walk-off homers, getting out of a bases-loaded jams) carry a lot of weight because values are squared in the calculation of standard deviation. I really like this approach.
One last issue — should we be defining change in win-probability as the current situation compared to end of inning or do we want to compare the current situation to the result of the end of the current plate-appearance? I’m not sure. Comparing to the end of the inning takes into account the fact that the current scoring threat doesn’t go away until the inning’s over. But on the other hand, pitcher decisions are made according to the current plate appearance, and isn’t chopping up a game into more separate pieces a good idea? I’m still trying to figure out which way to go. The good news is that the math for the plate-appearance method is the same, but a little more difficult to find data for — just find out the likelihood of each outcome of a plate-appearance in a given situation (SO, single, HBP, GIDP, etc) and its corresponding change in win-probabilty. Then compute standard deviation.
The standard deviation approach is my own idea that I seen anywhere else. I’d also like to point out that Keith Woolner created his own measure of leverage and discusses it in the 2006 Baseball Prospectus book, but I havn’t read the article and don’t really know what his approach is. Also, TangoTiger has his own measure of leverage called “leverage index” but has not yet disclosed its methodology. He anticipates posting an article about it later this month. I’m really looking forward to reading it.
Hat Tips…
… to Dave Studeman for the WPA Spreadsheet, interest in the subject, and the nice introductory article at The Hardball Times.
… to the FanGraphs guy(s) for figuring out how to track one of the coolest stats out there.
Popularity: 1% [?]
Share This
Sky is a baseball fan and racket sport afficianado living in upstate NY. His favorite color is orange and is just about ready to give up on his life-long dream to become the next Magnus ver Magnuson (World's Strongest Man). His favorite baseball teams are the Yankees and Red Sox, proving that there's hope in the Middle East.
April 13th, 2006 at 1:04 pm
Yeah, it seems like the EV should be 0 for the theoretical average reliever. If there was some way you could come up with profiles of each reliever giving the likelyhood of each outcome based on his past performance (i.e. avg reliever would surrender 0 runs 60% of time where as stud reliever would be 80%) you could calculate the win probability added for putting in said stud reliever in that situation.
This is rather off topic, but one thing thats always bothered me about reliever usage is when there is a reliever on the mound who has been effective, hasn’t pitched for long and is taken out for a situational advantage. It seems to me, pitchers frequently either have their stuff or don’t on a particular day. If you have a pitcher in who is obviously throwing well, why not leave him in instead of risking bringing in another pitcher who may or may not be having a good day.
I’d be curious to see some sort of analysis of play by play data to see what the expected outcomes are for a pitcher coming in cold, vs a pitcher that had gotten one out previously, vs pitchers who had gotten 2 consecutive outs, etc. If the BA for hitters against pitchers who had gotten 2 outs without giving up a hit is significantly lower it would seem to be an advantage to leave that pitcher in rather than trying to get some smaller advantage out of a platoon split.
Another case would be if the reliever had pitched the previous inning scorelessly is he more likely to pitch well the next inning. That would probably be easier to check. I haven’t read too much performance analysis stuff, so somebody might have done this already, but I’d be curious about the results…