Movatterモバイル変換


[0]ホーム

URL:


  1. Home
  2. Baseball Articles

Baseball Articles

RSS

2004 Predictions -- Keeping Score

By Tom Tippett
October 14, 2004

When we release our annualProjection Disk in the spring, we give our customers a chance to get a head start on the baseball season. Withprojected statistics and ratings for over 1600 established big leaguers and top minor-league prospects, plus league schedules, park factors, team rosters, projected pitching rotations, bullpen assignments, lineups and depth charts, the Projection Disk gives them everything they need to play out the new season using the Diamond Mind Baseball simulation game.

It also gives us a chance to get a head start on the season. Ever since we created the first Projection Disk in 1998, we've been publishing ourprojected standings along with comments on the outlook for all 30 teams. Those projected standings are based on the average of a number of full-season simulations using the Projection Disk.

Of course, nobody really knows what's going to happen when the real season starts, but we're always curious to see how our projected results compare to the real thing. And we're equally interested in seeing how our projections stack up against the predictions made by other leading baseball experts and publications. This article takes a look at those preseason predictions and identifies the folks who were closest to hitting the mark in 2004. And because anyone can get lucky and pick the winners in one season, we also look at how everyone has done over a period of years.

Comparing predictions

In addition to projecting the order of finish, our simulations provide us with projected win-loss records, projected runs for and against, and the probability that each team will make the postseason by winning its division or grabbing the wild card.

Unfortunately, most of the predictions that are published in major newspapers, magazines and web sites don't include projected win-loss records. Instead, they give the projected order of finish without indicating which races are expected to be hotly contested and which will be runaways. Some don't even bother to predict the order of finish, but settle instead for the division winners and wild card teams.

As a result, we do our best to assign a meaningful score to each prediction based solely on order of finish within each division. We borrowed the scoring system from our friend Pete Palmer, co-author ofTotal Baseball andThe Hidden Game of Baseball,who has been projecting team standings for more than 35 years.

Pete's scoring system subtracts each team's actual placement from its projected placement, squares this difference, and adds them up for all the teams. For example, if you predict a team will finish fourth and they finish second, that's a difference of two places. Square the result, and you get four points. Do this for every team and you get a total score. The lower the score, the more accurate your predictions.

We don't try to break ties. If, for example, two teams tie for first, we say that each team finished in 1.5th place for the purposes of figuring out how many places a prediction was off. Suppose a team was projected to finish third and they tied for first instead. That's a difference of 1.5 places. The square of 1.5 is 2.25, so that would be the point total for this team. That's why you'll see some fractional scores in the tables below.

Keeping things in perspective

That first year, we created a little database with our projected standings and those of fourteen national publications, and we were pleased to see that we ended the year with the best accuracy score among those fifteen forecasts. When we wrote up the results and posted them to our web site, however, we were very careful not to make any grand claims, saying:

"I'm not sure what to make of all this. It's just one year, and it's entirely possible that we were just lucky. Time will tell whether our approach to projecting seasons is consistently better than average."

Over time, we expanded our database to include the predictions of prominent baseball writers from major newspapers and other publications. This is easier said than done because some publications and web sites change their approach from year to year. For example, we used to track the predictions of several ESPN.com writers and editors, but they limited their picks to division winners in 2003. So the number of entries in our database can rise and fall depending on what the various publications do and whether we were able to find those predictions in our spring survey.

In the sections below, we'll show you how various prognosticators ranked in 2004 and over a period of years, with the period varying in length depending on when we added that person or publication to our database. We don't make any claims of completeness here -- there are lots of other predictions that are not in our database -- but we think you'll find that our sample is an interesting one.

For several reasons, we want to emphasize that it's important that nobody take these rankings too seriously.

First, this isn't the only scoring system one could use to rank these projections, of course. A fellow named Gerry Hamilton runs a predictions contest every year (seehttp://www.tidepool.com/~ggh1/index.html) and assigns a score based on how many games each team finished out of their predicted place in the standings. (We came 22nd out of 195 predictions in their 2004 contest after finishing 4th in 2003.)

Second, because of publishing deadlines, the predictions in some spring baseball magazines are made long before spring training started, others are prepared in early-to-mid March, and some are compiled just before opening day. Obviously, the longer you wait, the more information you have on player movement and injuries.

Third, many newspaper editors ask staff writers to make predictions so their readers have something to chew on for a couple of days. Some writers hate doing them but comply because their editors insist. Some do it even though their main beat is a different sport. Others may make off-the-wall picks just for grins or feel compelled to favor the hometown teams.

Rankings for 2004

It's interesting to see how everyone did this year, but it's even more interesting to look back to see how different people perceived the baseball world before the season started. We'll start by showing you the prediction rankings for the current season, then we'll follow that up with a review of each division race and how those races affected these rankings.

Forecaster                              ScoreNew York Times                            30Las Vegas over-under line                 32.5Tony DeMarco, MSNBC.com                   40Diamond Mind simulations                  42Bob Hohler, Boston Globe                  42Joe Sheehan, Baseball Prospectus          42Michael Wolverton, Baseball Prospectus    42David Lipman, ESPN.com                    44Michael Holley, Boston Globe              46Gary Huckabay, Baseball Prospectus        46Team payroll (per USA Today)              46Poll of SABR members                      48Athlon                                    48Eric Mack, CBS SportsLine                 482003 final standings                      48MLB Yearbook                              50Baseball Prospectus                       52Nate Silver, Baseball Prospectus          52Lindy's                                   52Dan Shaughnessy, Boston Globe             52ESPN.com power rankings                   56Phil Rogers, ESPN.com                     56Steve Mann                                56The Sporting News (Ken Rosenthal)         58Rany Jazayerli, Baseball Prospectus       58Charley McCarthey, CBS SportsLine         58Baseball America                          60Sports Illustrated                        60Spring Training Yearbook                  60Tristan Cockroft, CBS SportsLine          60USA Today                                 61.5Street & Smith                            62Chris Kahrl, Baseball Prospectus          62Miami Herald                              64Derek Zumsteg, Baseball Prospectus        64USA Today Sports Weekly                   66Jonah Keri, Baseball Prospectus           66Pete Palmer                               68Dallas Morning News                       68Seattle Times                             68CBS SportsLine                            72Gordon Edes, Boston Globe                 72Scott Miller, CBS SportsLine              72ESPN the magazine (Peter Gammons)         74Los Angeles Times                         74Bob Ryan, Boston Globe                    76Adam Reich, CBS SportsLine                80Spring training results                  134

The "Diamond Mind simulations" entry is the one representing the average result of simulating the season 100 times. These simulations were done about three weeks before the season started.

There are a few other entries in this list that don't represent the views of a writer or a publication. If you predicted that the 2004 standings would be the same as in 2003, your score would have been 48. If you put together a set of standings based on the Las Vegas over-under line, you'd have racked up an impressively low total of 32.5 points. If you thought the teams would finish in order from highest to lowest payroll, your score would have been 46.

And if you predicted that the regular season standings would match the 2004 spring training standings, your score would have been 134. In other words, the spring training results were almost useless as a predictor of the real season, and that's been true for at least the past four years.

Reviewing the divisions

Much more interesting than the overall scores, in our opinion, are the details. Which teams were consistently under- or over-estimated? Which divisions contained the biggest surprises? Did anyone predict that certain teams would have a sudden change of fortune?

Leaving out the entries that don't represent writers or publications, here are some observations about how the others saw things last spring:

AL East. Everyone had either New York or Boston winning the division, with the Yankees being picked first four more times than the Red Sox. Other than Gary Huckabay, who picked Toronto second and Boston third, everyone had this as a two-team race. A good number of people picked Baltimore third ahead of Toronto, but four people picked the Orioles to finish last, too, so there was no clear consensus on the Orioles.

AL Central. The Kansas City Royals were the downfall for many this year. The young Royals led the division for much of the 2003 season before fading down the stretch, then added some veteran players during the winter. As a result, they were a trendy pick to win the division or finish second behind Minnesota. A good part of the reason our score is among the leaders in 2004 is that we identified the Royals as one of the teams most likely to disappoint. That was based largely on our simulation results, but also based on the fact that the 2003 Royals didn't have the statistical foundation to justify their high placement. Surprisingly, seven predictions had Detroit finishing fourth, in every case because they thought the Indians would be even worse.

AL West. A year ago, our score was significantly improved because we chose to rank the Mariners ahead of the Angels when those two teams finished in a virtual tie for second in our simulations. This year, those teams were again neck and neck, with the Mariners averaging one more win but the Angels having a slightly better run margin. In a decision we'd love to have back, we gave the nod to Seattle. More than twice as many people chose Anaheim to win the division over Oakland, with three choosing the Mariners for first place. Everyone picked the Rangers to finish last, meaning that nobody in our survey got this division (or any other division) correct from top to bottom.

NL East. Before the season, the Phillies appeared to be loaded with talent, the Marlins were shedding payroll after winning the World Series, and the Braves seemed quite vulnerable. All three teams were selected by at least one person to win the division, with Philly being the choice about 80% of the time. Most predictions had a clear separation between the top three and the bottom two, but Montreal (five times) and New York (three times) snuck into third place on a few lists.

NL Central. Only two entries (Diamond Mind and Steve Mann) had the Cardinals finishing first in this division. The others seemed caught up in the hype surrounding the Cubs young pitching (Prior, Wood, Zambrano) and the Astros older pitching (Clemens, Pettitte). Just about every prediction had the Cubs and Astros duking it out for first with the Cardinals third. The picks for first place were almost evenly split between Chicago and Houston, with the Cubs having a very slight edge. There was some variation in the order of the bottom three teams, but nobody picked any of them to finish in the top half of the division.

NL West. Picking the Dodgers to finish at or near the top was a key to the better-scoring predictions this year, as was picking against the Diamondbacks. We were among those who thought Arizona would finish ahead of Los Angeles, but we were not alone. Approximately 2/3 of the predictions had Arizona beating the Dodgers, with thirteen people picking the D'backs to win the division outright. (In an example of the importance of timing, Arizona finished one game ahead of the Dodgers in our simulations, but the teams would have been reversed had we run them again after Milton Bradley was traded to LA.) It's clear that many people thought this division was wide open, as four of the five teams (everyone but the Rockies) were picked to finish first at least once.

Summing up. For the first time ever, not a single division was nailed by even a single predictor. Certain teams surprised a lot of people by overachieving (Texas, Los Angeles) or falling short (Arizona, Seattle, Kansas City, Toronto). As a result, the prediction scores were much higher this year than in 2003. A year ago, things went more in accordance with expectations.

Seven-year rankings

Here are the rankings for those who were included in our sample every year. There's a new entry this year. We went back and ranked all of the teams based on their payroll as reported in USA Today in April, and we computed a standings score based on the "prediction" that teams would finish in order from highest to lowest payroll. As you can see, that doesn't seem to be a very good predictor.

Forecaster            2004  2003  2002  2001  2000  1999  1998  TotalDiamond Mind          42.0  28.0  40.0  54.5  68.0  42.0  44.5  319.0Las Vegas over-under  32.5  30.0  46.0  65.5  51.5  48.0  52.0  325.5Sports Illustrated    60.0  30.0  48.0  56.5  40.0  56.0  54.0  344.5Steve Mann            56.0  48.0  60.0  38.5  58.0  54.0  44.0  358.5Sports Weekly         66.0  38.0  42.0  46.5  58.0  51.5  60.0  362.0Athlon                48.0  36.0  38.0  67.5  42.0  72.0  72.0  375.5Sporting News         58.0  44.0  54.0  52.5  38.0  78.0  54.0  378.5Pete Palmer           68.0  56.0  50.0  70.5  54.0  40.0  58.0  396.5Street & Smith        62.0  36.0  70.0  68.5  58.0  68.0  64.0  426.5Previous season       48.0  42.0  48.0  64.5  56.0  70.0 100.0  428.5Payroll ranking       46.0  64.0 102.0  60.0  88.0  72.0  44.0  476.0

Six-year rankings

In 1999, we added some writers from the Boston Globe.

Forecaster                  2004  2003  2002  2001  2000  1999  TotalGordon Edes, Boston Globe   52.0  32.0  54.0  56.5  26.0  28.0  248.5Las Vegas over-under line   32.5  30.0  46.0  65.5  51.5  48.0  273.5Diamond Mind simulations    42.0  28.0  40.0  54.5  68.0  42.0  274.5Sports Illustrated          60.0  30.0  48.0  56.5  40.0  56.0  290.5USA Today Sports Weekly     66.0  38.0  42.0  46.5  58.0  51.5  302.0Athlon                      48.0  36.0  38.0  67.5  42.0  72.0  303.5Baseball America            60.0  28.0  48.0  54.5  54.0  70.0  314.5Steve Mann                  56.0  48.0  60.0  38.5  58.0  54.0  314.5Sporting News               58.0  44.0  54.0  52.5  38.0  78.0  324.5Previous season standings   48.0  42.0  48.0  64.5  56.0  70.0  328.5Dan Shaughnessy, Globe      52.0  56.0  70.0  44.5  54.0  58.0  334.5Pete Palmer                 68.0  56.0  50.0  70.5  54.0  40.0  338.5Bob Ryan, Boston Globe      76.0  40.0  58.0  84.5  58.0  40.0  356.5Street & Smith              62.0  36.0  70.0  68.5  58.0  68.0  362.5Payroll ranking             46.0  64.0 102.0  60.0  88.0  72.0  432.0

Five-year rankings

The Diamond Mind simulations missed the mark by quite a bit in 2000. We added a new concept to our projection system that year, but we were unhappy with the results, and we took that out of the model before generating our projections in 2001. The results have been much better since. As you can see, the Las Vegas over-under line has been getting much better in recent years.

Forecaster                   2004  2003  2002  2001  2000  TotalLas Vegas over-under line    32.5  30.0  46.0  65.5  51.5  225.5Athlon                       48.0  36.0  38.0  67.5  42.0  231.5Diamond Mind simulations     42.0  28.0  40.0  54.5  68.0  232.5Sports Illustrated           60.0  30.0  48.0  56.5  40.0  234.5Gordon Edes, Boston Globe    72.0  32.0  54.0  56.5  26.0  240.5Baseball America             60.0  28.0  48.0  54.5  54.0  244.5Sporting News                58.0  44.0  54.0  52.5  38.0  246.5Previous season standings    48.0  42.0  48.0  64.5  56.0  248.5USA Today Sports Weekly      66.0  38.0  42.0  46.5  58.0  250.5Steve Mann                   56.0  48.0  60.0  38.5  58.0  260.5Dan Shaughnessy, Globe       52.0  56.0  70.0  44.5  54.0  276.5Street & Smith               62.0  36.0  70.0  68.5  58.0  294.5Pete Palmer                  68.0  56.0  50.0  70.5  54.0  298.5Bob Ryan, Boston Globe       76.0  40.0  58.0  84.5  58.0  316.5Payroll ranking              46.0  64.0 102.0  60.0  88.0  360.0

Four-year rankings

Lindy's was a strong addition to our survey in 2001. We also added the San Francisco Chronicle that year, but they've been dropped from this list because we couldn't find their 2004 predictions. That paper ranked second from 2001 to 2003.

Forecaster                   2004  2003  2002  2001  TotalDiamond Mind simulations     42.0  28.0  40.0  54.5  164.5Lindy's                      52.0  40.0  42.0  36.5  170.5Las Vegas over-under line    32.5  30.0  46.0  65.5  174.0Tony DeMarco, MSNBC.com      40.0  34.0  34.0  67.5  175.5Athlon                       48.0  36.0  38.0  67.5  189.5Baseball America             60.0  28.0  48.0  54.5  190.5USA Today Sports Weekly      66.0  38.0  42.0  46.5  192.5Sports Illustrated           60.0  30.0  48.0  56.5  194.5Steve Mann                   56.0  48.0  60.0  38.5  202.5Previous season standings    48.0  42.0  48.0  64.5  202.5Sporting News                58.0  44.0  54.0  52.5  208.5Los Angeles Times            74.0  18.0  44.0  73.5  209.5Gordon Edes, Boston Globe    72.0  32.0  54.0  56.5  214.5Dan Shaughnessy, Globe       52.0  56.0  70.0  44.5  222.5Street & Smith               62.0  36.0  70.0  68.5  236.5Pete Palmer                  68.0  56.0  50.0  70.5  244.5Bob Ryan, Boston Globe       76.0  40.0  58.0  84.5  258.5Payroll ranking              46.0  64.0 102.0  60.0  272.0Spring training results     134.0  70.0  86.0 113.5  403.5

Three-year rankings

Here's how things looked from 2002 to 2004. The LA Times was unable to follow up the excellent 2003 predictions that put them in top spot in last year's two-season rankings.

Forecaster                   2004  2003  2002  TotalTony DeMarco, MSNBC.com      40.0  34.0  34.0  108.0Las Vegas over-under line    32.5  30.0  46.0  108.5Diamond Mind simulations     42.0  28.0  40.0  110.0Bob Hohler, Boston Globe     42.0  32.0  38.0  112.0Athlon                       48.0  36.0  38.0  122.0Lindy's                      52.0  40.0  42.0  134.0Los Angeles Times            74.0  18.0  44.0  136.0Baseball America             60.0  28.0  48.0  136.0Sports Illustrated           60.0  30.0  48.0  138.0Previous season standings    48.0  42.0  48.0  138.0USA Today Sports Weekly      66.0  38.0  42.0  146.0USA Today                    61.5  32.0  58.0  151.5Sporting News                58.0  44.0  54.0  156.0Gordon Edes, Boston Globe    72.0  32.0  54.0  158.0Steve Mann                   56.0  48.0  60.0  164.0Street & Smith               62.0  36.0  70.0  168.0Bob Ryan, Boston Globe       76.0  40.0  58.0  174.0Pete Palmer                  68.0  56.0  50.0  174.0Dan Shaughnessy, Globe       52.0  56.0  70.0  178.0Payroll ranking              46.0  64.0 102.0  212.0Spring training results     134.0  70.0  86.0  290.0

Two-year rankings

Finally, here's how things have looked over the past two years.

Forecaster                   2004  2003  TotalLas Vegas over-under line    32.5  30.0   62.5Diamond Mind simulations     42.0  28.0   70.0Tony DeMarco, MSNBC.com      40.0  34.0   74.0Bob Hohler, Boston Globe     42.0  32.0   74.0Athlon                       48.0  36.0   84.0Baseball America             60.0  28.0   88.0Sports Illustrated           60.0  30.0   90.0Previous season standings    48.0  42.0   90.0MLB Yearbook                 50.0  40.0   90.0Lindy's                      52.0  40.0   92.0Los Angeles Times            74.0  18.0   92.0
USA Today 61.5 32.0 93.5Street & Smith 62.0 36.0 98.0Sporting News 58.0 44.0 102.0USA Today Sports Weekly 66.0 38.0 104.0Gordon Edes, Boston Globe 72.0 32.0 104.0Steve Mann 56.0 48.0 104.0Dan Shaughnessy, Globe 52.0 56.0 108.0Spring Training Yearbook 60.0 48.0 108.0ESPN the magazine 74.0 36.0 110.0Payroll ranking 46.0 64.0 110.0Pete Palmer 68.0 56.0 114.0Bob Ryan, Boston Globe 76.0 40.0 116.0Spring training results 134.0 70.0 204.0

Summing up

Overall, we've been pretty happy with our results, and if there's one thing that stands out, it's our ability to identify over-rated teams.

In 2004, we saw the Royals as a 2003 overachiever that was unlikely to repeat, we projected the Blue Jays to finish below .500, and we didn't buy all of the hype surrounding the Cubs and Astros.

A year earlier, our simulations correctly indicated that the Mets were likely to finish at the bottom of their division again, the Angels were very unlikely to repeat their 2002 success, and the Dodgers wouldn't score enough runs to make a serious run at the NL West title.

Even so, we're always surprised by something that happens each year. We didn't anticipate the emergence of the Rangers and Dodgers in 2004 or the surprising finishes of the Marlins and Royals the year before. As a result, we have a bunch of test cases to study as we consider possible improvements to our projection system.

More than anything, this process -- projecting the season in March, watching the real thing for six months, and taking a look back after the season -- is highly educational for us. So we'll be back with our projected 2005 team standings in March.

2001 Gold Glove Review

By Tom Tippett
December 10, 2001

If you haven't already done so, please read the introduction to the2002 Gold Glove Review article for a summary of the techniques we use for evaluating defensive performance.

 Pitchers. There's a very strong tendency for Gold Glove voters to fixate on one guy and keep giving him the award year after year after year, as long as he doesn't get hurt or do anything to make it clear that something has changed. This tendency is especially strong for pitchers, perhaps because the voters don't get to see them as often.

At other positions, we can judge performance over a span of 1,000 to 1,400 defensive innings, but even the most durable starting pitchers are in the field only for 200-250 innings. And relievers get only a fraction of the innings of a starting pitcher.

With 14 or 16 teams in the league, a voter might get to see a certain shortstop play 80 innings in the field. That's not much in the context of a whole season, but it sure beats the 10-20 innings they might see of a starting pitcher or the 4-5 innings a reliever might pitch in those games.

So it's hard for anyone to evaluate pitcher defense just by watching, because nobody is in position to watch enough pitchers in enough situations to get a complete picture. And it's hard to evaluate pitchers just by looking at their putouts and assists because a pitcher's tendency to induce ground balls can have a major impact on those numbers. Even if you're a brilliant fielder, you're not going to look good next to Greg Maddux if you're a fly-ball pitcher and they're using traditional fielding stats to evaluate you.

This year, Mike Mussina was chosen for the fifth time, and he's a pretty good pick. He had a good year, handling 43 chances successfully while participating in 5 double plays, making only one error, and doing a very good job holding opposing runners. But there are other deserving candidates.

(By the way, I'll leave it up to you to decide whether holding runners is a pitching skill or a defensive skill. But I'll mention it for those of you who think it's relevant to a Gold Glove debate.)

Freddy Garcia also participated in five double plays and made only one error while handling 68 chances successfully, more than half again as many as Mussina. On the other hand, Garcia creates more chances for himself because he's a ground ball pitcher, and he doesn't hold runners well.

Steve Sparks had 62 successful chances, only one error, and held runners well despite throwing a pitch, the knuckleball, that is easy to run on. He was involved in one double play.

Brad Radke had 57 successful chances, four double plays, and only one error, but wasn't quite as good as Sparks and Mussina at holding runners.

Andy Pettitte was error-free in 49 successful chances with one double play and has a terrific pickoff move, though he is less successful holding runners close when he goes home with the pitch.

Jeff Weaver also handled 49 chances without an error. He was in on four double plays and was in the middle of the pack in holding runners. All things considered, my vote would have gone to Garcia this year.

In the other league, Greg Maddux won his 12th straight, and there's no question that he's a very good fielder. But it must also be said that he has a head start on his competition because he's an extreme ground-ball pitcher who creates for himself a ton of opportunities to make plays. This year, he led the majors by handling 72 chances successfully, making only one error in the process.

But there are two arguments against Maddux's iron grip on this award. First, quite a few others have ranked above Maddux each year in plays made per batted ball in his zone. And Maddux has made 14 errors in the past five years; that's a lot for a pitcher, and only three other pitchers have made more in that span.

Consider Kirk Rueter. I'll bet if the voters had picked him a few years ago, they'd keep picking him every year just like they do with Maddux, because if Rueter had once been deemed the best, he's definitely doing enough to reinforce the view that he still is.

This year, Rueter handled 61 chances without an error and took part in eleven (!) double plays. Among players with at least 50 balls hit into his zone, he ranked #1 in converting those chances into outs. And he was almost impossible to run on.

Last year, Rueter handled 52 chances without an error and took part in four double plays. He converted an extremely high number of batted balls into outs and was almost impossible to run on. In 1999, Rueter handled 45 successful chances but made one error.

Over the past five years, Maddux has made 14 errors in 424 chances for a fielding percentage of .967. In the same span, Rueter has made 3 errors in 265 chances for a fielding percentage of .989. Rueter has been involved in seven more double plays (26 to 19) despite pitching about 240 fewer innings. Rueter has converted a noticeably higher percentage of batted balls into outs. The only area where Maddux has the edge is raw totals, and that's only because he generates so many more come- backers than the average pitcher.

Getting back to the 2001 season, the pitchers who bested Maddux in converting opportunities into outs are Adam Eaton, Rueter, Chris Reitsma, Livan Hernandez, Russ Ortiz, Tom Glavine, Javier Vazquez, and Mike Hampton, in that order.

Eaton only pitched for half the season and made two errors, so I don't consider him to be in the same league as the others, though he's someone to watch for the future. Rueter, Reitsma, Hernandez, Glavine, Vazquez, and Hampton each handled more than fifty chances without making an error.

Maddux was a good choice. Any of these guys I just mentioned would have been a slightly better choice. Rueter was the best of the bunch and deserved the Gold Glove this year. Just as he did last year.

Catchers.Ivan Rodriguez is the owner of one of the best throwing arms in history, and has been a lock for this award for many years. He had another great throwing year, and even though he missed a third of the season due to injury, and he's the hands-down choice again this year. For some reason, the best arms have found their way into the other league in the past few years, and there's nobody left in the AL to challenge him.

A year ago, I argued that Brad Ausmus should have been the choice in the AL, partly because he had a great year defensively and partly because Rodriguez missed half the season. Ausmus is now in the NL and had another good year throwing, though others bested him in that department, and backed it up by allowing only one passed ball (best in the majors) and making only three errors (tied for second best in the majors).

There were other candidates, of course. Jason LaRue, Mike Matheny, and Henry Blanco threw out a higher percentage of enemy base stealers. But LaRue allowed 15 passed balls, second most in baseball, despite starting only 95 games behind the plate. Blanco started only 94 games himself, and didn't quite match up to Ausmus at any rate.

In my eyes, it's almost impossible to choose between Ausmus and Matheny. Playing time was similar. Ausmus made one fewer error and was charged with five fewer passed balls. On the other hand, Matheny had a better year throwing, though he got more help from his pitchers than Ausmus did. All in all, I think Ausmus was a worthy victor.

First basemen. Based on our analysis, there are four men who could reasonably be thought of as viable candidates at this position, two in each league: Doug Mientkiewicz and Tino Martinez in the AL, Kevin Young and Todd Helton in the NL.

The voters got it right when they chose Mientkiewicz over Martinez. Doug had a better fielding percentage, turned a higher percentage of batted balls into outs, and led the majors in highlight-reel plays. It's actually an easy choice, but I wanted to mentioned Martinez because he's a very good fielder who had another very good year, and he deserves some recognition.

It's not quite so clear in the NL. The voters picked Helton, who I thought should have won the award over J. T. Snow in 2000, but Young had a terrific year, too. Both the Diamond Mind and STATS methods for assessing range give Young a slight edge over Helton. And after making a boatload of errors in 1999 and 2000, Young got his act together and finished around the league average in fielding percentage. Helton led the league in this category.

Over the past four years, Helton has shown more range than any other first baseman in baseball. Young is second. You rarely hear good things about Young's range because he made far too many errors in two of those four seasons. But the man can cover ground at first base.

Helton and Young were almost on par with each other this year, but I'd agree with the voters and choose Helton. He's been the best in the league since 1998 and this year sustained his high level of play over 157 starts (compared to only 125 for Young).

Second basemen. Here's some of what I wrote a year ago:

"Here we go again. Roberto Alomar won his ninth Gold Glove, and there isn't a baseball writer or television commentator who doesn't gush incessantly about Alomar's brilliance in the field. And I've seen him make some very spectacular plays myself. Problem is, year after year, our analysis (and other measures such as range factors and the STATS zone rating) shows that he doesn't make many more plays than the average second baseman.

Alomar was one of three Cleveland infielders to be rewarded with Gold Gloves this season. But that infield was below the league average in turning ground balls into outs. And according to the STATS Major League Handbook, they were fourthworst in the league in converting double plays when grounders were hit in double-play situations.

And even though they used a lot of different pitchers this year, I don't think you can argue that this defense was made to look worse by a lousy pitching staff. They did, after all, get almost 600 innings from three good starting pitchers (Burba, Colon, Finley) and a bunch more from a group of veteran relievers who have fared quite well playing in front of other defenses in the recent past.

The bottom line is that somebody isn't making nearly as many plays as people think ..."

I'm repeating so much of last year's comment because it's still relevant. This season, Cleveland's infield was 13th in the league in the percentage of ground balls turned into outs. And they were only a hair above the league average in double-play percentage.

You could argue that the infield looks bad because the corner guys -- Jim Thome at first, Travis Fryman and Russ Branyan at third -- don't cover much ground, and you'd be correct. Problem is, there's absolutely no evidence that their middle infielders are doing more than their share, either.

The best case for Alomar's Gold Glove is that he won the fielding percentage title by making only five errors all season. His nearest rivals, Ray Durham and Bret Boone, made ten errors each. But Alomar's range factor was .12 below the league average despite playing behind a ground-ball staff. His STATS zone rating was thirty-five points below the norm for his position. According to our method, Alomar made 20 fewer plays than the average 2B, and he was consistently below average on all types of plays -- line drives, ground balls and popups. And he was 33 years old this year, an age when many middle infielders struggle to keep up with their younger rivals.

Those numbers are indicative of a player who deserves our Fair rating. But we gave him an Average rating anyway. Why? Because he has a great reputation and because it's possible that his pitching staff did indeed make him look worse that he really is.

This is the fifth time in the past nine years that we've given Alomar a rating that's better than our analysis shows is justified. Not once in those nine years has his play-making score been far enough above the league average to merit a Very Good rating.

But every year we say to ourselves that there must be some aspect of his ability that doesn't show up in fielding studies. But don't you think that if Alomar was truly the best at his position in the history of baseball, he'd score well at least once in nine years? Is it really possible that external factors or quirks in the data would make him look worse every single year?

I know that some people will look at this rating and conclude that (a) we're vastly underestimating his ability, (b) we have something against Alomar, and/or (c) we know nothing about baseball. Looking at all of the evidence, however, I have to say that, if anything, we've been generous in how we've rated him over the years.

I'll end this commentary with a quote fromThe New Bill James Historical Baseball Abstract:

"[Alomar is] an overrated fielder, in my opinion; a good fielder, even a very good one, but no better than some guys who don't win Gold Gloves, like Fernando Vina."

That was written before the 2001 data was available, and I agree with Bill's assessment of Alomar's career. We're now in the late stages of that career, however, and we're seeing evidence of a decline in Alomar's play-making ability.

Other worthy candidates for the AL Gold Glove were Adam Kennedy, Ray Durham, Bret Boone, and Jerry Hairston. Kennedy was the best of this group, but started only 123 games. Nevertheless, I'd go with Kennedy.

The other league's Gold Glove went to Fernando Vina. If Pokey Reese had played the entire year at second, instead of splitting his time between second and short, he would have gotten my vote. But he didn't, and that left things open for Vina, who I nominated as my choice a year ago.

Vina had another good year, with above-average range and a low error rate, and the Cardinals were second in the NL in double play percentage. Those are solid credentials. And he played a lot more than some of the other guys (Ron Belliard, Damian Jackson, Mark Grudzielanek) who could be considered viable candidates.

Third basemen. The voters got it right at this position. Scott Rolen was so amazing that he managed to stand out in a league featuring several other very good players who had very good years. His closest rivals were Robin Ventura and Jeff Cirillo. But Rolen was so good that if there was an award for defense -- an MVP or Cy Young for defense, single award that crosses all positions -- Rolen would be my choice for NL Defensive Player of the Year.

The AL produced three strong candidates, Eric Chavez (the winner), Corey Koskie, and David Bell. Of the three, Chavez was best in range and sure-handedness, and he played a lot more than Bell. So I agree with this selection, too.

Shortstops. As I mentioned above, the voters tend to settle on one guy and give him the award year after year as long as he doesn't blow it. By posting the second-best fielding percentage in the majors (.989, trailing only Rey Sanchez's .991), and by continuing to ply his trade with grace and style, Omar Vizquel did enough this year to keep the voters' trust, and he was rewarded with his ninth straight Gold Glove.

I'm not going to spend a lot more time writing about the Cleveland defense because I did that in the second base comment above. Suffice it to say that Vizquel's range wasn't all that good this year. If Rey Sanchez hadn't been traded out of the league, I'd nominate him, as he bested Vizquel in both range and steadiness. But Sanchez WAS traded out of the league, and in his stead, my vote goes to Toronto's Alex Gonzalez.

Interestingly, I don't recall hearing any gripes about Orlando Cabrera getting the nod in the NL. I figured that with Rey Ordonez healthy and playing a full season, some in New York would have pushed for him to get it back. But Ordonez' range was nothing special according to the measures we use, and it may be that the lingering effects of his arm and shoulder injuries affected his ability to make certain plays for at least part of the season.

On the other hand, Cabrera showed above-average range and was among the steadiest fielders in either league. Rich Aurilia also looked quite good, but in my opinion, Cabrera was a deserving winner.

Outfielders. There are a lot of good outfield candidates this year, and with one major exception, all of the winners were drawn from that pool. In other words, five of the six choices were at least in the right ballpark.

According to our analysis, five center fielders stood out this year, and all of them are in the AL. They are, from top to bottom, Chris Singleton, Kenny Lofton, Mike Cameron, Darin Erstad, and Torii Hunter. Bobby Higginson and Jacque Jones were the two left fielders who separated themselves from the pack. In right, the top performers were in the NL, with Jermaine Dye and Ichiro Suzuki being the best of the AL contenders.

The voters and I agree on Mike Cameron, so I'll focus on the voters' selection of Torii Hunter and Ichiro.

Given that center field is the most demanding outfield position and that we have a large number of deserving candidates there, I see no reason to choose a corner outfielder. Furthermore, according to our analysis, Ichiro had above-average range and an above-average arm, but he wasn't as far above average as the media would have you believe.

Ichiro's range factor was .26 above the norm, but he played behind a pitching staff that produced almost 200 more fly balls than the average AL team (according to the STATS Player Profiles book). His STATS zone rating was seven points below the major-league average for right fielders.

Nevertheless, based on his reputation and the fact that our fielding analysis shows that Ichiro would almost certainly have made more plays if he wasn't playing next to Cameron, we believe he's worthy of a Very Good rating. But we don't see evidence of Gold Glove range here.

In addition, he had only 8 assists, a below-average number for a RF who played as much as he did. And it's not as if nobody was willing to test him. Runners tried to advance on him a little less often than against the average RF, but notthat much less. It does appear as if runners got a little more wary of his arm as the season progressed, but not alot more wary. So we've rated him Very Good in throwing as well.

The media seems to be saying that Ichiro is unquestionably excellent in all phases of the game. According to our methods, he's excellent at a lot of things (hitting for average, hitting in the clutch, sacrifice bunting, running the bases, stealing bases, avoiding errors, staying healthy), very good at some things (getting to balls in right and keeping runners from taking extra bases), and below average in some ways (drawing walks, hitting for power). That's quite a package, and I'd definitely want this guy on my team. But I just don't see the evidence that he's among the top defensive outfielders in the game.

So, if Ichiro doesn't get my vote, then who does deserve the other two outfield Gold Gloves for the AL? Singleton topped the charts in plays-made-per-opportunity, but he only started 102 games. Lofton only started 123 games. Singleton and Hunter have subpar throwing arms. (Hunter tied for the league lead in assists by a CF with 14, but several of those came on plays where the lead runner scored, and he allowed lots of runners to take extra bases.) Hunter plays in a tough park -- it's easy to lose balls in the Metrodome roof -- so he's better than his numbers suggest, and his numbers are very good to begin with. Erstad made only one error all season, leading all major-league CFs in fielding percentage.

It's a very close call, but there are some big differences in playing time to consider. Performance rates are very important, but when it comes to seasonal awards, the volume of performance is more important. So when someone performs at a high level for 145 games, that trumps someone else who performed at a slightly higher level for 120 games. On that basis, my other two votes would go to Erstad and Hunter.

Over in the NL, the top candidates (in my mind) were Geoff Jenkins in left, Andruw Jones in center, plus Larry Walker, Vladimir Guerrero, and Brian Jordan in right. J. D. Drew would have been on this list were it not for the injury that cost him about 50 games. The voters chose Walker, Jones, and Jim Edmonds.

I agree with the selections of Walker and Jones, but in my opinion, either Jenkins or Guerrero would have been a much better choice than Edmonds. Jenkins is a terrific left fielder, but I have to give it to Guerrero because (a) Jenkins started only 104 games, (b) Guerrero showed great range too, and (c) Guerrero has a cannon for an arm. Guerrero does make too many errors, but his range and arm more than compensate for them.

Jim Edmonds has made some of the most amazing plays I have ever seen, but he simply doesn't cover as much ground as some of the younger players at this position. This year, he was below average in range factor and the STATS zone rating, and according to our method, made 16 fewer plays than the average CF given the opportunities presented to him. He battled groin, toe and knee problems, and he's starting to get up in years. I just don't see any reason to believe that he's a more valuable outfielder than the other guys I mentioned.

Recap. Here's how my selections would agree or disagree with those of the voters:

Pos   Voters                        Diamond Mind  P     Mussina, Maddux               Garcia, Rueter   C     Rodriguez, Ausmus             same   1B    Mientkiewicz, Helton          same   2B    Alomar, Vina                  Kennedy, Vina   3B    Chavez, Rolen                 same   SS    Vizquel, Cabrera              Gonzalez, Cabrera   OF    Cameron, Walker               same   OF    Hunter, Jones                 same   OF    Ichiro, Edmonds               Erstad, Guerrero


We agree on twelve of the eighteen selections. I haven't been keeping track, but I'm guessing this represents the highest rate of agreement
since we began doing this.

Other players

Now that we've offered our two-cents worth on the Gold Glove winners, there are some other players worth mentioning:

Bobby Abreu, RF -- According to our system, Abreu's play-making scores have been very erratic lately -- quite good through 1998, subpar in 1999, very good in 2000, and average this year. Looked at in the context of the past three seasons, it now seems as if the Excellent rating we assigned for his performance last year was generous, even though he was clearly in the top tier statistically that season. I'm at a loss to explain these ups and downs.

Craig Biggio, 2B -- This former Gold Glover missed the last two months of the 2000 season with a knee injury that required surgery. In January, his general manager warned that Biggio's range and baserunning ability would most likely be limited, especially early in the year. Those comments proved to be accurate, as Biggio's range was far below its previous level and he stole only seven bases, down from 50 only three years ago. His baserunning instincts are still good, so he was a little above average in that regard, but nowhere near the Excellent level he sustained before he hurt his knee.

Tony Clark, 1B -- A great athlete who has earned our Very Good rating for defense the past two years, Clark has been battling back problems that have kept him out of the lineup and hurt his power and defense. We downgraded his range rating to Fair as a result, but if he regains his health, you can expect it to rebound next year.

Ken Griffey, CF -- Spent much of the season trying to play despite a torn hamstring and its after-effects, and it clearly showed. In a little more than half a season of playing time, Griffey made ten fewer plays than the average CF, thereby earning a Fair rating. Expect that to rise next year if he's back at 100%.

Derek Jeter, SS -- I know we're going to take some heat from New York fans on this one, but I assure you that there is no bias in our decision to assign Jeter a Fair range rating this year.

According to our analysis, Jeter made 32 fewer plays than the average shortstop given the opportunties presented to him. He was below average going to his right, below average going to his left, and below average on balls hit more or less at his position. His STATS zone rating was fifty points below average. His range factor was lowest in the majors among those who played at least 100 games at the position. At one time, Scott Brosius's superior range affected Jeter's numbers, but Brosius has declined from Excellent to Average in recent years and is no longer a factor in evaluating Jeter.

The New York infield ranked 10th in the league in the percentage of ground balls that were turned into outs. And it was 13th in double play percentage. Alfonso Soriano probably deserves most of the blame for the low DP rate, but if Jeter was an outstanding fielder, he would have compensated for Soriano's limitations to some extent, and the team would have been closer to the league average.

In his defense, he played behind a staff that produced 5% fewer ground balls than the average team, so his range factor was artificially depressed. Take that into account, and Jeter's range factor would have been only the second- or third-worst in the majors. And, of course, in the playoffs, he made a couple of very heady and gutsy plays that had everyone talking about his courage, his will to win, and his intelligence.

But a couple of attention-getting plays aren't enough, in my opinion, to offset the mountain of evidence indicating that Jeter simply didn't get to as many balls as most of the other shortstops in the game.

Ryan Klesko, 1B -- Earlier in his career, before he was traded to San Diego, Klesko didn't show much range at first base in the limited amount of time he played that position for Atlanta. In 2000, he showed average range in his first full season as a 1B. We gave him an average rating for that performance, even though we weren't certain that he had improved that much. But there was a major drop this year, and his Pr rating reflects that. Klesko has surprised a lot of people by stealing 23 bases in each of the past two seasons, but his career record is quite poor in both left field and at first base, so it seems as if his 2000 season was the anomaly.

Carlos Lee, LF -- Different fielding metrics suggest that Lee's range in left was anywhere from a little above average to a little below average. Yet his defense was sharply criticized in Sports Illustrated's pre-season baseball issue and again late in the season in a Baseball Weekly note. He was replaced defensively 39 times, and that normally happens only to players who are major liabilities in the field. In this case, however, the guys replacing him were superior defenders like Chris Singleton, so it doesn't necessarily mean that Lee was terrible, only that the other guys were better. We asked several people who follow the Sox, and their opinions ranged from "he's under-rated" to "he looks awkward but gets the job done" to "he's as bad as they say." We've chosen to assign him an Average rating this year. That may be a little generous, and I wouldn't be surprised if he slips back to a Fair rating next year.

Raul Mondesi, RF -- Has a very good reputation for defense, but that's mostly based on his great arm. In terms of range, our analysis shows that he's been slightly above average throughout his career. In the spring, it was reported that Mondesi came to camp carrying some extra weight, and his defensive numbers took a big dive. Coincidence? Maybe, but we felt a Fair rating was an accurate reflection of his 2001 performance. He could easily rebound next year.

Todd Zeile, 1B -- A year ago, we wrote that his Excellent range came as a complete surprise even though third basemen often move across the diamond and look very good relative to the men who play first. But we were skeptical. He's never had a reputation as a good fielder, and we wondered whether he'd be able to keep it up. He didn't, so it may be that last year was a fluke or a case where the various fielding measures over-stated his value for some reason. We rated him Average this year.

Can pitchers prevent hits on balls in play?

Tom Tippett
July 21, 2003

In January, 2001, Voros McCracken published an article that shook the baseball analysis community.

In an attempt to better understand how to separate the contributions of pitching and defense, McCracken divided the traditional pitching stats into two groups -- those that are under the direct control of the pitcher (hit batsmen, walks, strikeouts, homers) and those that aren't (hits on balls in play). He called the first group defense-independent pitching stats, or DIPS for short.

I'll get into the details shortly, but before I do, the reason McCracken's work caused such a stir is that he reached a conclusion that seems very counter-intuitive and, if true, extremely important. In his own words, he stated his major finding in these two ways, once at the beginning of the article and once at the end:

"hits allowed are not particularly meaningful in the evaluation of pitchers"

"major-league pitchers don't appear to have the ability to prevent hits on balls in play"

McCracken wasn't able to give a reasonwhy this would be true, but stated rather emphatically that itis true.

Ever since I read that article, I've been wondering how this could possibly be. It seems so obvious that certain pitchers must be able to get more than their share of easy outs. Doesn't Greg Maddux produce more than his share of routine ground balls? Doesn't Mariano Rivera's cutter eat up opposing hitters even when they don't strike out? Doesn't a flame-thrower like Roger Clemens induce a lot of weak swings from hitters who are down in the count? Wouldn't a knuckleball lead to more lazy popups from hitters who are just guessing at where that pitch will dance next?

McCracken's analysis used a stat that I'll call in-play average (or IPAvg), which he defined as (H - HR) / (BF - HR - HBP - BB - K). That's just non-homer hits divided by balls in play, and because all but a handful of homers leave the yard, it's a good reflection of how well pitchers and defenses are able to turn batted balls (that stay in the field of play) into outs.

He found that:

  • there are "massive differences in the ability of pitchers" even before considering balls in play. To put it another way, a lot of a pitcher's ERA is explained by his walk rate, strikeout rate, and ability to prevent homers.
  • the correlation between a pitcher's IPAvg one year and the next is low, suggesting that pitching ability might not have a major impact on IPAvg, as compared to other factors such as defense and luck
  • some of the best pitchers in the game, such as Greg Maddux and Pedro Martinez, have gone from the top to the bottom and back to the top in IPAvg in subsequent seasons, again suggesting that these results are largely out of their control
  • the variations in IPAvg decrease when you add park effects and the quality of the defense to the analysis
  • projections of next-year pitching stats are more accurate if you use a team's collective IPAvg than if you use each pitcher's personal IPAvg from the year before

My reaction was to think that McCracken was on to something but may have gone too far, so I began to think about how to dig a little deeper.

McCracken appears to have done most of his work using stats from two seasons. I wasn't sure whether those two seasons were representative or not, so I decided to apply his method to all pitcher-seasons since 1913. Why 1913? Because that's the first year my historical database has all of the stats needed to compute IPAvg and the DIPS for every pitcher. And I figured that 90 years would be more than enough to prove the point one way or the other.

After compiling this information and studying it for a while, I discovered a pair of columns by Rob Neyer of ESPN.com. In the first column, Rob described the McCracken article. In the second one, which appeared a couple of days later, Rob included email messages from Craig Wright and Bill James with their take on McCracken's assertion.

Wright described his own work in this area:

"Like McCracken, I've studied hits allowed per ball in play (though with the small difference that I subtract sacrifice hits) ... I agree that this type of hit rate is not as heavily influenced by the pitcher as is commonly believed, but at the same time I am distinctly uncomfortable with McCracken's conclusion."

James wrote that he hadn't studied this issue, but that he shared Wright's reservations and suggested that someone do a large-scale study to find out whether the idea would hold up. It appears that the work I had just finished doing was exactly what Bill was proposing.

In addition, Bill wrote about McCracken's work in theNew Bill James Historical Baseball Abstract. Based on a review of an unspecified number of pitching careers and about 400 pitcher-seasons, he concluded that pitchers do have an influence on these outcomes but confirmed McCracken's finding that there's still a lot of random variation in single-season performances.

Finally, in recent months, I've seen more and more references to McCracken's assertion in various baseball articles and posts to baseball research forums. There's enough momentum building behind this idea that a few of our customers have asked how we might change the design of our Diamond Mind Baseball game to reflect this new knowledge about how baseball works.

Before making any changes to our game or our method for projecting player performance, I figured it was worth spending some time looking at this question.

NOTE: In an article published on Baseball Primer last year, McCracken softened his original conclusion a little, saying that there are small differences among pitchers in their ability to prevent hits on balls in play, and those differences are "statistically significant if generally not very relevant." Except for the regulars on Baseball Primer, I don't think many people in the baseball research community are aware of this update to McCracken's thinking.

The methodology

For every pitcher who appeared in the big leagues since 1913, I computed his HBP rate, walk rate, strikeout rate, homerun rate, and IPAvg for each of his seasons. The first four numbers are computed quite simply -- take the relevant stat and divide by batters faced. The IPAvg figures were computed according to McCracken's formula, which I wrote out a few paragraphs back.

To establish a baseline against which to evaluate those figures, I also computed those stats for each league-season and each team-season since 1913.

This enables us to evaluate every pitcher relative to the norms for his league. Last year, for example, Roger Clemens faced 768 batters and fanned 192 of them. That's a strikeout rate of .250 in a league where the average was only .163. His advantage over the league can be stated in two ways: (a) his rate was .077 higher than the league, and (b) he had 67 more strikeouts than the league-average pitcher would have had if he faced the same number of batters as Clemens. The same method was used to determine how many hit batsmen, walks, and homeruns each pitcher yielded above or below the league average.

For balls in play, I compared the in-play batting average for each pitcher and subtracted from that the corresponding in-play batting average for the league. As was the case with strikeouts, the result can be expressed either as a number of batting average points above/below the league or a number of hits above/below the league.

But hits on balls in play are subject to some outside influences that make comparisons with the league average a little suspect. Some parks (like Coors Field) tend to inflate batting averages. Some defenses are much better than others. If Jamie Moyer allows 15 fewer hits than normal, how can we decide whether to give Moyer the credit or chalk it up to Safeco Field and the talents of Mike Cameron and Ichiro?

To account for the effects of park and defense, I also computed the in-play average for each team-season in the period from 1913 to 2002. If McCracken is correct when he says that pitchers have virtually no influence over these outcomes, every pitcher on a given team should have roughly the same IPAvg. After all, those pitchers share a common park and a common defense.

If we then (a) compute the IPAvg for each team, (b) compare the IPAvg for each pitcher to that of his team, and (c) study those differences, we should find that the differences in IPAvg between a pitcher and his teammates are random. In other words, those differences should be centered around zero, equally likely to be above zero as below zero, and have no predictive value from one year to the next.

If we find that these differences are not random, there must be another factor, apart from defense and park effects, that accounts for them. And it follows that the missing factor must be an attribute of the pitcher. Because if the pitcher had nothing to do with it, there'd be no reason for that external factor to be evident only for this pitcher.

Studying career totals

At this stage of the process, we now know how much a pitcher exceeded or fell short of hisleague in five categories -- HBP, BB, K, HR and hits on balls in play -- for every season of his career. And we also know how much a pitcher exceeded or fell short of histeammates on in-play hits for every season of his career. The last step is to sum these values to obtain career totals (from 1913 forward) for every pitcher.

McCracken asserted that pitchers have a lot of control over the defense-independent pitching stats, so I would expect to see substantial differences among pitchers in their career HBP, walk, strikeout, and homerun rates, even after normalizing all of these figures against the league averages for each season.

After crunching the numbers for a total 29,973 seasons by 6,004 pitchers, we did indeed find very large differences among pitchers in some of the defense-independent statistics, especially walks and strikeouts. That's not likely to surprise any of you. It didn't surprise me, and it's entirely consistent with McCracken's findings.

More importantly, McCracken asserted that pitchers have almost no control over balls in play. If he's right, we would expect to see essentially random values for the career rates of in-play hits, especially for net in-play hits relative to the team baseline.

 

But we also found meaningful differences in the number of hits allowed on balls in play. In other words, a large number of pitchers consistently demonstrated the ability to limit the number of those hits. Their influence on these outcomes isn't as great as it is on the defense-independent stats, but it is real, and it is large enough to be important.

Here's a partial list of the top pitchers based on the number of career hits they saved relative to the IPAvg of their teams. The list includes two figures for each pitcher, the first without adjustments for park and defense and the second with those adjustments:

Pitcher            IPHits vsLg  IPHits vsTm-----------------  -----------  -----------Charlie Hough          -371         -299
Walter Johnson -277* -214*
Tom Seaver -269 -201
Catfish Hunter -296 -185
Warren Spahn -266 -183
Fergie Jenkins -128 -182
Pete Alexander -197* -177*
Phil Niekro -147 -172
Jim Palmer -315 -170
Ned Garver -71 -168* excludes seasons before 1913

Charlie Hough has prevented more hits on balls in play than any other pitcher in our study, and our sample includes the last ninety years, so we've covered most of baseball history. Compared with the league-average pitcher, Hough has allowed 371 fewer hits on balls in play. Compared with his teammates, that figure drops to 299 hits, suggesting that his parks and defenses deserve some of the credit.

How important is 299 hits? Hough would have given up an extra run every three games or so if he had allowed hits on balls in play at the same rate as his teammates over the course of his career. That's a pretty big deal.

Could this happen by chance? No, it couldn't. Hough allowed batters to put 11,586 balls in play over the course of his career. If these results were random, there'd be a 95% chance that his net hits allowed would fall between +93 and -93 and a 99% chance they would fall between +116 and -116. The probability that a pitcher could reduce hits by 299 totally by chance is exceedingly small. (For the statisticians among you, Hough was more than six standard deviations from the mean.)

And Hough wasn't the only one, not by a long shot. In a sample of 351 pitchers with at least 6000 career balls in play, more than 12% of them posted results that would happen less than 1% of the time by chance. And that understates the case, too, because you get to keep pitching if you're that much better than the league, but you usually don't make it to 6000 balls in play if you're that much worse than the league. If one end of the distribution hadn't been truncated by job losses, approximately 20% of those pitchers would have fallen outside the range that can be explained by chance.

There are two knuckle-ballers on this list, and while you can't see it here, I can tell you that if I had run this list a little further, you'd have seen 6 knuckle-ballers in the top 35. (The other four are Eddie Rommel, Ted Lyons, Hoyt Wilhelm and Tim Wakefield.)

NOTE: The observation that knuckleball pitchers are especially good in this area is not new. Craig Wright noted the same thing in his email to Rob Neyer in January, 2001, and McCracken made this point in an article on Baseball Primer last year.

 

Some pitchers got a lot of help from their defense and park -- almost half of Jim Palmer's hits saved can be attributed to his defense (mostly) and his park -- while others look even better after the defense/park adjustment.

Of course, when you rank players based on counts, rather than averages, you're going to see a lot of guys with very long careers at the top of the list. So let's rank them again, this time dividing career hits saved by career balls in play, and setting a minimum of 5000 balls in play:

Pitcher            IPAvg vs Lg  IPAvg vs Tm-----------------  -----------  -----------Charlie Hough         -.032       -.026Don Wilson            -.015       -.023Andy Messersmith      -.033       -.021Ned Garver            -.008       -.020Tim Wakefield         -.020       -.019Catfish Hunter        -.028       -.017Bud Black             -.020       -.017Oral Hildebrand       -.015       -.017Walter Johnson        -.021       -.016Dave Stieb            -.022       -.016

Hough remains the career leader by holding enemy hitters to an in-play batting average that was 26 points lower than that of the pitchers on his teams. That's a very substantial advantage, and one that is entirely inconsistent with McCracken's conclusion.

To recap, this examination of career totals suggests very strongly that a meaningful number of pitchers have demonstrated the ability to reduce the rate of hits on balls in play.

Year-to-year variations, part one

By comparing the results for two seasons, McCracken concluded that "there is little correlation between what a pitcher does one year in the stat and what he will do the next." I'll start by looking at a few of the pitchers mentioned in the McCracken article, then expand the study and get a little more scientific.

McCracken pointed out thatGreg Maddux had one of the league's best marks in baseball in 1998, then had one of the worst in 1999, and bounced back with a good in-play average in 2000. The following chart shows his entire career, with bars going up indicating an IPAvg that was worse than average and the bars going down indicating a lower-than-average rate of hits on balls in play:

The wild swings of 1998-2000 look like an anomaly when you examine Maddux's entire career. In fact, it appears that he struggled a bit as a youngster, reeled off a decade of good-to-great performances, then began to lose it as he got into his mid-30s. That sounds like a pretty normal career progression to me.

Pedro Martinez was another pitcher who gave up a lot of in-play hits in 1999 but bounced back in 2000. It should be noted that Pedro had a 2.07 ERA despite all those in-play hits in 1999, so we can only imagine what he would have done if he'd been a little less unlucky. Here's Pedro's career:

There's really only one bad year in this line, but it happened to fall in one of the years McCracken looked at. I think it's fair to say that Pedro has shown an above-average ability to prevent hits on balls in play, but his influence on these results is much less than on strikeouts, where he consistently mowed down an extra 90 or more hitters a year, and an incredible 181 more than average in 1999.

McCracken wrote that "You'll often hear people use names like Randy Johnson,Jamie Moyer and Andy Pettitte [as being very good at preventing hits on balls in play], but by any definition you want to use, these guys are not particularly good in the stat." Here's Moyer's career:

Moyer wasn't very good in this respect, or in most other respects, for the first half of his career. But he figured something out in 1996 and has been consistently better than the league ever since, with the exception of 2000. If I was McCracken and I was looking at the 1999 and 2000 seasons, I would have concluded that Moyer isn't particularly effective in preventing hits, but his last seven years say otherwise.

By the way, it's tempting to assume that Safeco Field and a very good Seattle defense are responsible for these recent successes, but that wouldn't be true. First of all, the 1996-1999 numbers were accumulated in a mix of Fenway Park, the Kingdome, and Safeco, with only the second half of 1999 in Safeco. More importantly, these numbers are relative to the in-play average for his teams, so they already factor out the impact of the park and the defense. The bottom line is that Jamie Moyer has been a master at preventing hits on balls in play since 1996.

How aboutAndy Pettitte? Here's his career:

McCracken was quite correct in pointing out that Pettitte is not a pitcher who prevents hits on balls in play. On the other hand, he's a very good counter-example regarding the claim that pitchers are not consistent in this regard.

Randy Johnson is the third pitcher mentioned by McCracken in the quote I cited above. Here's how Johnson has fared on balls in play over his career:

That's nine straight seasons at or better than the league average, followed by five seasons that were league-average or worse. The shift occurred at the very moment that he moved from the AL to the NL. I'm not sure whether that's meaningful, or whether it has more to do with the fact that he turned 35 in 1998. Like Pedro, Johnson's main asset is not his ability to prevent hits on balls in play, it's his ability to prevent balls in play in the first place. But Johnson was pretty good on those balls in play for nine years.

McCracken also claimed that "Randy Johnson gives up fewer hits thanScott Karl. That's not because batters hit the ball harder off Karl than Johnson, but because they hit the ball more often off Karl than Johnson." Here's Karl's career:

You might be able to make the case that Karl in his prime wasn't any worse than Randy Johnson in his late 30s, but if you compare the two pitchers at the same age, there's a noticeable edge for Johnson.

While we're on the subject of consistency from year to year, let's take a look at some of the knuckleballers, starting withCharlie Hough:

This chart is a little misleading in one respect. There are two bars for 1980, one for each of the teams he played for that year. Hough's IPAvg was awful in his 32 innings with the Dodgers and quite good in his 61 innings with Texas. Overall, he was a little worse than average for the year. The bottom line is that Hough was remarkably good at preventing hits on balls in play for a very long time.

Here's another knuckleballer,Tim Wakefield:

And a third knuckleballer,Phil Niekro:

Hough and Wakefield were remarkably good throughout their careers, and if you ignore the years after his 43rd birthday (1983 to the end), you could say the same about Niekro, too.

Number two on the all-time list wasWalter Johnson, whose career looked like this:

Remember, I cut things off at 1913, so this leaves out his early years. It's quite possible that he would have been the all-time leader if those seasons had been included.

Sandy Koufax got some help from Dodger Stadium, but that wasn't the only reason he was so dominant during the last five years of his career. Even with the park and defense factored out, his IPAvg was consistently good during those years:

Finally, here'sJim Palmer, another Hall-of-Famer who was consistently good on balls in play during his career, except for the very beginning and end of his time in the big leagues:

If I had run Palmer's chart showing his performance relative to the league average (instead of his team), it would have been twice as impressive.

We could go on and do a lot more pitchers, but I think we've seen enough to make the point that it's not too hard to find examples where these in-play averages appear to be anything but random. In other words, this is highly persuasive evidence that these pitchers did indeed have the ability to prevent hits on balls in play.

Year-to-year variations, part two

It goes without saying that one cannot prove or disprove the idea that "there is little correlation between what a pitcher does one year in the stat and what he will do the next" by examining only ten or twelve careers.

To get a better handle on this phenomenon, I compiled a database consisting of all pairs of consecutive seasons in which a pitcher faced at least 400 batters in each season. Using this sample of 7,486 season-pairs, I computed the correlation coefficient for the net HBP rate, BB rate, K rate, HR rate, and in-play hit rate.

I found the highest correlation (.73) for strikeout rates. Walk rates (.66) were also highly correlated. The correlation coefficients dropped to .36 for hit batsmen, .29 for homeruns, and .16 for in-play batting average relative to the league. The lowest correlation (.09) was seen for in-play batting average relative to the team.

It may appear to be contradictory to say that certain pitchers appear to be consistently good while the overall correlation rate is quite low. But that's not necessarily so.

If McCracken is right, the difference between a pitcher's IPAvg and that of his team should vary randomly around zero as he moves through his career, and the correlation would be quite weak.

But if pitchers do have some influence over these outcomes, they could still exhibit a weak correlation by varying around some value other than zero that reflects the ability of the pitcher.

What about the weaker pitchers?

Most of our work to this point has focused on pitchers who had long and mostly successful careers in the big leagues. How do the DIPS and IPAvg stats of these players compare to those of players who weren't good enough to last that long?

The following table shows how eleven groups of pitchers compared with the overall averages. The first row includes all pitchers who faced less than 1,000 batters in their careers. The second row includes all pitchers who faced at least 1,000 batters but less than 2,000 batters during their careers. And so on.

 Career BF          BF    HBP     BB      K     HR   vsLg   vsTm   1 -  999    401,138   .002   .027  -.017   .002   .017   .0151000 - 1999    931,981   .001   .013  -.009   .001   .006   .0042000 - 2999  1,105,712   .001   .007  -.005   .000   .002   .0013000 - 3999  1,179,916   .000   .006  -.003   .000   .000   .0004000 - 4999    906,271   .000   .002  -.002   .000   .000   .0015000 - 5999    920,680   .000   .001   .000   .000   .000   .0006000 - 6999    647,553   .000  -.004  -.002   .001  -.001  -.0017000 - 7999    843,937   .000  -.003   .000   .000  -.002  -.0018000 - 8999    716,200  -.001  -.005   .005   .000  -.002  -.0029000 - 9999    788,532   .000  -.008  -.001  -.001  -.002  -.00110000+       2,589,409  -.001  -.010   .008  -.001  -.004  -.003

Let's walk through the first row so it's clear how to read this table. Those pitchers, as a group:

  • faced a total of 401,138 batters in their careers
  • hit batters at a rate that was .002 above the league average. In other words, they hit two more batters per 1000 BF than did the average pitcher.
  • walked 27 more batters per 1000 BF
  • struck out 17 fewer batters per 1000 BF
  • gave up 2 more homers per 1000 BF
  • gave up 17 more hits per 1000 balls in play when compared with the league-average pitcher
  • gave up 15 more hits per 1000 balls in play when compared with the in-play averages of their teammates

As you can see from the table, the pitchers with longer careers were progressively better than their shorter-career counterpartsin every respect. They walked fewer batters, struck out more hitters, gave up fewer homeruns, and gave up fewer hits on balls in play. The ability to prevent hits on balls in play appears to be as much of a skill as anything else.

It might be easier to see this in chart form, so here are the walk rate, strikeout rate, homerun rate, and in-play averages for these groups of pitchers:

Another interesting aspect of this breakdown by career length is the total number of batters faced by each group. Only a very small percentage of batters are faced by pitchers with short careers. Of the roughly 11 million plate appearances since 1913 (including the Federal League of 1914-15), only 3.6% featured pitchers who finished their careers with less than 1000 batters faced.

In fact, the midpoint falls in the 6000-6999 group. A little more than half of the plate appearances since 1913 have been initiated by a pitcher who faced at least 6000 hitters in his career. We, along with other baseball analysts, often compare pitchers to the league average. Those league averages reflect the fact that the majority of plate appearances involve pitchers who are good enough to face thousands of big-league hitters.

That's avery high standard. And that may explain why it's difficult for any pitcher to consistently perform at a level higher than the league average. The table shows that the pitchers with the longest careers are only a little better than average. (They peak at a higher level, of course, but if you take their entire careers, there's not a huge difference.)

A better indicator may be the comparison of the short-career pitchers to the league averages. The chart shows that these marginal hurlers are far worse than the average in every way. In particular, they give upa lot more hits on balls in play than do the pitchers who are good enough to be big-league regulars for several years.

What's the right baseline?

At this point, we've seen (a) career totals that demonstrate that pitchers do influence these outcomes over the course of their careers, (b) several examples of pitchers who have been very consistent in IPAvg during their careers, and (c) that pitchers with longer careers are better than pitchers with shorter careers in every respect, including IPAvg.

In other words, pitchers do affect the rate of hits on balls in play. That means we can no longer use the team's IPAvg as a baseline against which to evaluate a pitcher. McCracken asserted that the team's IPAvg depended only on the park and the defense, but we've found that it depends on the park, the defense,and the quality of the pitchers on that team. If we use team IPAvg as the baseline, a good pitcher on a good staff is going to look worse than he really is. A good pitcher on a bad staff is going to look better than he really is. A good pitcher on an average team is still going to look a little worse than he really is because his own good performance is included in the team's IPAvg.

That leads to a good question, one that is not easily resolved. Is it better to compare a pitcher's IPAvg to that of his league or his team? If we use the league IPAvg as our baseline, we leave out the impact of the park and the defense. If we use the team's IPAvg as the baseline, we adjust for the park and the defense, but we introduce the quality of the fellow pitchers as a variable that can skew the results.

Neither approach is completely satisfactory. It's probably best to evaluate each pitcher's IPAvg against that of his team but make some accommodation for the quality of the pitching staff before making any judgments about that pitcher and before making any predictions about future performance.

Pitcher Profiles

In addition to ranking pitchers on IPAvg, this exercise provides a different way of looking at pitching careers. By putting each pitcher's career totals for net HBP, BB, K, HR, and IPHits side by side, we get a very clear picture of the reasons why they were successful.

Let's do a few, starting with Roger Clemens:

How's that for a picture of all-around greatness? Sure, he hit a few more batters than the average pitcher, but compared to the league averages, he walked 173 fewer and struck out 1,355 more, allowed 138 fewer homers, and surrendered 101 fewer hits on balls in play. (The IPHits figures include the defense/park adjustments for all of these profiles.)

Pedro Martinez shows a very similar pattern to that of Roger Clemens, but based on less than half of Clemens' batters faced.

 

Greg Maddux demonstrates awesome control, an above-average K rate, and the ability to keep the ball in the park. He had some influence on IPAvg, but that was only a part of his success.

By the way, some of those 69 hits saved might be attributable to his own defensive skill rather than his pitching skill. It's also quite possible that the -69 figure signficantly understates his contribution. Maddux saved 97 hits relative to the league averages, and now that we've shown that the team IPAvg reflects the ability of the other pitchers on the staff, that figure may represent Maddux's talent more accurately.

This line shows only one dominating characteristic -- the strikeouts. But if you're going to dominate in one area, that's a good one, because they can't get a hit if they can't put the ball in play. Fortunately for Johnson, his control is only a little worse than the norm, and got better in the later stages of his career.

Guys with below-average strikeout rates aren't supposed to be successful, but Moyer's exceptional control and low IPAvg have been the keys, especially in the later stages of his career.

Now here's a guy who didn't strike anyone outand gave up a lot of hits on balls in play, but survived because he had excellent control and kept the ball in the park. In particular, he kept the ball on the ground, meaning that a lot of those extra hits were singles and that a good number of potential rallies were killed by double plays.

John's profile made me think that it would have been a good idea to extend McCracken's work to measure GDP rates, but that notion didn't hit me until it was too late. Some day, I'll go back and add that to the study and see what pops out.

We can't leave this section without looking at the all-time leader in in-play hits saved. As you can see, Hough hit more batters, walked more batters, struck out only a few more batters, and gave up more homers than the average pitcher. His ability to prevent hits on balls in play is the biggest reason he had a long and successful career.

Is there really any doubt that Don Sutton is a Hall-of-Famer when you look at this profile?

Groups of similar pitchers

We could go on forever this way, so let's speed things up by looking at groups of pitchers with similar styles. Maybe we'll see some patterns.

Power pitchers

                   HBP    BB      K    HR   IPHitsNolan Ryan         +44  +878  +2578  -117    -133Randy Johnson      +44  +107  +1769   -52     -10Roger Clemens      +17  -173  +1355  -138    -101Dazzy Vance        +19   -65  +1122   -20     -19Steve Carlton      -49    -1  +1042    +5     -31Bob Feller          -1  +149  +1022   -42     -53Sandy Koufax       -35   +64  +1015   -12     -94Pedro Martinez     +27  -152   +974   -60     -47

Obviously, the defining characteristic of these pitchers was their ability to retire batters without help from anyone else. As a group, with the exception of Ryan, they had average control. All of them were better than average on hits per ball in play, but that wasn't the main reason for their success.

Closers

                   HBP    BB      K    HR   IPHitsRich Gossage       +10   +90   +492   -31     -57Lee Smith          -17   +25   +447   -21     +12Tom Henke          -10   -24   +391   -12     -20Rollie Fingers      +4  -109   +358   -16     +12Armando Benitez     -3   +78   +332    +1     -41Trevor Hoffman     -17   -34   +317    -8     -49John Wetteland      -5   -25   +310    -5     -39Billy Wagner         0   +17   +295    -6     -23Robb Nen           -18    -3   +283   -27     +12Troy Percival        0   +30   +279    -9     -55Bruce Sutter        -4   -48   +269    +1     -54

This is just a special case of the power pitchers group, but it's interesting to see how many of these guys have posted impressive IPHits numbers even though they pitch many fewer innings than do the power pitchers in the previous table.

Control freaks

                   HBP    BB      K    HR   IPHitsRobin Roberts      -40  -772    -15   +56     -82Pete Alexander     -50  -570   +247    -1    -177Jim Kaat           +19  -566   -264    -4    +144Ferguson Jenkins   -12  -534   +635  +125    -182Greg Maddux         -4  -507   +150  -147     -69Ted Lyons          -43  -481   -366    -7    -121Dutch Leondard      +7  -477   -100   -53     -64Don Sutton         -25  -476   +512   +42    -138Lew Burdette        -9  -445   -611   -13     +32Walter Johnson     +25  -442   +847   -20    -214

Some of these guys (Roberts, Jenkins, and Sutton) gave up more than their share of homers, but with control this good, plus the ability to reduce hits on balls in play, a lot of those homers were solo shots.

Crafty lefties

                   HBP    BB      K    HR   IPHitsWarren Spahn       -63  -437    -36   -44    -183Bud Black            2  -110   -204   +23    -114Randy Jones        -18  -189   -346   -13     -97Wilbur Wood          0  -238   -135   -13     -84John Tudor          -3  -146    -50    +5     -82Kenny Rogers        +3   -39   -105   -40     -74Larry Gura          +4  -127   -276   +21     -72Jim Deshaies        -7   +21    -34   +44     -72Jamie Moyer          0  -238   -153   +15     -65Don Carman          +4   +44     -4   +36     -65

This is a list of left-handed pitchers with below-average strikeout rates. Most had very good control, but six of them were at least as susceptible to the long ball as the average pitcher. A significant part of their success is/was the ability to keep hitters off balance and keep their in-play batting averages down.

Putting the pieces together

We've seen that there's more than one way to succeed as a big-league pitcher. Robin Roberts walked 772 fewer batters than his peers. Roger Clemens struck out 1355 more batters than average. Greg Maddux yielded 147 fewer homeruns. And Charlie Hough prevented somewhere between 299 and 371 hits on balls in play.

So what's the most important element of a pitcher's repertoire?

Well, the value of various baseball events depends on the era. When scoring is up, as it has been in recent years, an extra baserunner comes around to score more often than during a period like the 1960s. InThe Hidden Game of Baseball, Pete Palmer provided a table of run values for various periods in the 20th century, and I'll use those values to evaluate these events.

Palmer puts the value of a walk at about a third of a run, so the 772 walks saved by Robin Roberts are worth about 250 runs over the course of a career. That's not bad.

Clemens struck out 1355 more batters, but if he hadn't, some of those batters would have reached base, and some would have been retired in other ways. If his strikeout rate had been at the league average, it's possible that he would have allowed another 125 walks, 35 homers, and 320 more hits on balls in play. Using Palmer's run values and reasonable assumptions about the distribution of those hits among singles, doubles, and triples, those strikeouts are worth about 250-280 runs.

Palmer puts the value of a homer at about 1.4 runs, so Maddux saved about 200 runs by keeping his homerun rate down.

And the 300+ hits saved by Hough are worth about 150-175 runs.

Those are impressive figures, and they'd be even more impressive if we were evaluating them against replacement level pitchers instead of the league average. As we noted before, the league average is a very high standard.

The bottom line is that success in all four areas is important. You can have a good career if you're average in all four areas or if you can offset one weak area with a strength in another. You can have a very good career if you have no major weaknesses and you have a special ability in one of these respects. And you can have a great career if you're better than average in all four areas.

Summing up

Having completed this study, I can sum up my own beliefs as follows:

1. Pitchers have more influence over in-play hit rates than McCracken suggested. In fact, some pitchers (like Charlie Hough and Jamie Moyer) owe much of their careers to the ability to excel in this respect.

2. Their influence over in-play hit rates is weaker than their influence over walk and strikeout rates. The most successful pitchers in history have saved only a few hits per season on balls in play, when compared with the league or team average. That seems less impressive than it really is, because the league average is such a high standard. Compared to a replacement-level pitcher, the savings are much greater.

3. The low correlation coefficients for in-play batting average suggest that there's a lot more room for random variation in these outcomes than in the defense-independent outcomes. I believe this follows quite naturally from the physics of the game. When a round bat meets a round ball at upwards of 90 miles per hour, and when that ball has laces and some sort of spin, miniscule differences in the nature of that impact can make the difference between a hit and an out. In other words, there's quite a bit of luck involved.

4. Year-to-year variations in IPAvg-versus-team can occur if the quality of a pitcher's teammates varies from year to year, even if that pitcher's performance is fairly consistent.

5. The fact that there's room for random variation doesn't necessarily mean a pitcher doesn't have any influence over the outcomes. It just means that his year-to-year performances can vary randomly around value other than zero, a value that reflects his skills.

6. Unusually good or bad in-play hit rates aren't likely to be repeated the next year. This has significant implications for projections of future performance.

7. Even if a pitcher has less influence on in-play averages than on walks and strikeouts, that doesn't necessarily mean that in-play outcomes are less important. Nearly three quarters of all plate appearances result in a ball being put in play. Because these plays are much more frequent, small differences in these in-play hit rates can have a bigger impact on scoring than larger differences in walk and strikeout rates.

The process of separating pitching stats into defense-independent and defense-dependent groups is illuminating. The notion that pitchers don't have as much control over in-play outcomes as they do over defense-independent outcomes is both obvious (in retrospect) and very important. Voros McCracken deserves a lot of credit for introducing this way of thinking.

The bottom line, though, is that I am convinced that pitchers do influence in-play outcomes to a significant degree. There's a reason why Charlie Hough and Jamie Moyer and Phil Niekro and Tom Glavine and Bud Black have had successful careers despite mediocre strikeout rates. There's a reason why the top strikeout pitchers have also suppressed in-play hits at a good rate. Using power or control or deception or a knuckleball, pitchers can keep hitters off balance and induce more than their share of routine grounders, popups, and lazy fly balls.

Stolen Games

Written by Tom Tippett
September 15, 2003

After Oakland's 8-6 win over the Red Sox on August 20th, these two quotes appeared in ESPN.com's game story:

"I feel like we stole two games," Oakland third baseman Eric Chavez said. "These aren't the kind of games we're going to win down the line."

"We felt like we had the right people up there at the right time at several points in the game, but we couldn't get more runs across," Boston manager Grady Little said.

Chavez talked about stealing the game because Boston outhit the A's 18 to 11 and drew seven walks to only one for Oakland. Add up the total bases and walks (TBW) for both teams and you find that Boston outproduced the visitors 28 bases to 19. But 17 Red Sox runners were stranded, Oakland bunched their hits with a key Boston error in a four-run eighth inning, and the visiting team went home with the win.

That got us thinking. How often does this happen? How often does a team win the statistical battle yet lose the final-score war?

Measuring team performance in a season

For several years, we've been looking at measures of team production to learn more about why a season played out the way it did and to get a sense for each team's chances the next year. (For our recap of the 2002 season , seeMeasuring Team Efficiency).

One of those measures is total bases plus walks. By comparing the TBW produced by each team's hitters with the TBW allowed by its pitchers, we get a good indication of the strength of that team.

Most times, those TBW figures flow quite naturally into runs, which flow quite naturally into wins, and you can see the statistical underpinning for a team's performance. For instance, the 2002 Yankees produced 558 more TBW than they allowed, outscored their opponents by 200 runs, and finished with the AL's best record.

Sometimes, however, these relationships don't hold up. The 2002 Angels were exceptionally good at converting offensive events into runs, compiling a run margin that was a little better than New York's even though their TBW differential was less than half that of the Yankees. By taking full advantage of their opportunities, they finished with 99 wins, beat New York in the divisional series, and didn't stop until they'd won it all.

Measuring team performance in a game

We've been wondering whether we'd learn anything by applying this approach to the results of individual games. How often does the team with the higher TBW figure actually win the game? And do the games that go the other way have a significant effect on the standings?

While the TBW differential is a very good measure of team performance over a season and has the advantage of being easy to figure, it isn't perfect. Among other things, it doesn't include events like hit batsmen and errors.

Most of the time, we can safely ignore those events when evaluating full seasons. The difference between bases gained by a team and given to its opponents in these ways is usually very small and doesn't affect any conclusions one might draw from the TBW differentials.

In a single game, though, HBP and errors can make the difference, so we added them for this project. For every game in the last ten years (through the end of August, 2003) we computed the number of bases produced by each team on hits, walks, HBP and errors that allowed their batters to reach base.

It turns out that the team that produced more bases in these ways was the victor 82% of the time. In 4% of the games, the teams tied in bases produced, so the win could have gone to either team. That leaves 14%, or about one game in seven, in which a team was outproduced but found a way to win anyway.

In a little more than half of the games that went to the less productive team, the winners were outproduced by only one or two bases, leaving about 7% of the games in which one team overcame a deficit of at least three bases. For the rest of this discussion, we'll focus on this subset, and for lack of a better term, we'll call them "stolen games".

The big ones

Two of the biggest steals of the 2003 season came in back-to-back games involving Anaheim and Texas.

On April 15th, at Texas, the Angels drew four walks and pounded out out ten hits, including a triple and a pair of homers, for a total of 22 bases on hits and walks. Meanwhile, Jarrod Washburn and Brandon Donnelly held the Rangers to six hits (two doubles), three walks, and a hit batsman, for a total of 12 bases. But Texas won 5-4 because the Anaheim hits were scattered and much of the Texas action was crammed into a single five-run inning.

The tables were turned a day later. Both teams had 13 hits, allowed one hitter to reach on an error, and drew four walks. But the Rangers blasted four homers to none for the Angels. Add it all up and the Texas hitters accounted for 12 more bases. All that production went for naught, however, when Anaheim bunched their hits in a seven-run eighth inning that gave them an 8-7 win. This deficit of 12 bases was the season's largest for a winning team.

In the past ten years, only 17 games (of 22,334 that were played) have exhibited a larger deficit, topped by a pair of games in which the winner overcame an 18-base shortfall.

Winners and losers in 2003

With 7% of all wins going to a team that overcame a deficit of at least three bases, we'd expect each club to have about five wins and five losses of this type through the end of August.

And most did. Twenty-four teams had between 3 and 7 stolen wins, while twenty-six teams gave away between 3 and 7 losses of this type.

The Cincinnati Reds were far and away the biggest winners in the 2003 stolen-game sweepstakes. Twelve times the Reds picked up a victory in a game in which they were outproduced by at least three bases. Only once did they lose a game in this fashion. That's why they were able to hang around .500 for a few months despite having the worst run margin and the worst TBW differential in the NL.

Montreal has also improved its standing by winning eight and losing only three of these games. But, like the Reds, the Expos faded after a promising start and are no longer serious contenders for a postseason berth.

Three teams have lost more than their share of these games, but two of them are Detroit (five more stolen losses than wins) and San Diego (six more). Nothing of great importance there, at least in terms of postseason implications.

And then there are the Boston Red Sox. Only three teams had more than seven stolen losses, and Boston heads that list with twelve. With only four stolen wins to their credit, Boston has lost eight more stolen games than they've won, easily the worst imbalance in the majors.

In case you want to check out the boxscores and game logs, here are the games:

Date  Opp   Bases  Score  Comment5/11  @Min  31-24   8-9   Rally from 8-0 deficit falls short 5/21  NYY   17-13   2-4 5/31  @Tor  27-23   7-10  Five Tor hits bunched in 5-run sixth 6/10  StL   32-29   7-9 6/12  StL   37-31   7-8   Nixon leaves based loaded four times 6/28  Flo   33-22   9-10  Marlins score four each in 8th and 9th 7/3   @Tam  26-20   5-6 7/25  NYY   19-16   2-4 8/8   Bal   24-20   4-10  O's get 6 of their 13 hits in 7-run inning 8/8   Bal   15-11   2-4     8/10  Bal   23-17   3-5 8/20  Oak   28-19   6-8   Boston strands 17 runners

The four-game series against Baltimore in early August was particularly disheartening for Boston fans. The home team outproduced the O's in every game but still managed to lose the series three games to one.

In 17 of the 19 games against the Yankees (including the three games in September), the more productive side emerged victorious. But both of the stolen games went in New York's favor. So the season series, won 10-9 by the Yankees, turned on these stolen games.

Remember that these twelve losses were in games where Boston outproduced their opponents by at least three bases. They also lost five games in which they had an edge of one or two bases, and their overall record in these games was 5-17. That's a very big deal.

This isn't the only statistical evidence to support the idea that Boston hasn't taken full advantage of its opportunities this year. Their run margin is right up there with Seattle's for the league lead. And their TBW differential (+539 through 9/14) is far better than New York's (+403).

In fact, the Sox are on pace to post the fifth-best TBW differential in the past thirty years. The only teams ahead of them on that list are the 1998 Braves (who finished with a 106-56 record), the 1998 Yankees (114-48), the 2001 Mariners (116-46), and the 1995 Indians (100-44 in a shorter season). That's great company. In other words, this Boston team is a statistical juggernaut that should be leading the league in wins.

Note: These measures of team performance exclude stolen bases. Looking over the boxscores for the dozen games listed above, I found only one game where steals might have made the difference. Boston has been a good running team this season, and I don't believe the conclusions would have changed if we had included stolen bases in our measure of a team's performance in a game.

 

The outlook for the Red Sox

After they blew the August 20th game against Oakland, I thought the Red Sox were done. Time after time, they had been able to bounce back from tough losses, and they've earned a lot of praise for being a resilient team. But you can only dig a hole and climb out of it so often, and I thought they may have used up their quota.

To their credit, they won the series finale against Oakland, swept the Mariners at home, and took two of three from New York in Yankee Stadium the next weekend. During the toughest part of the schedule, they played their best baseball of the season.

So they're still in position to be playing in October. But had the Red Sox been able to play .500 ball in these stolen games, their magic number for clinching a playoff spot would be in the low single digits right now. Instead, they're fighting tooth and nail just to get in.

Does their poor record in these games point to a weakness in the makeup of this team? Or was it just a run of bad luck? I don't know the answer to these questions. I can say that the Red Sox are a very strong team statistically, and if they can put all of this behind them and start posting a win-loss record that is consistent with their production, they can be very dangerous in October.

That, of course, is a very big if. And if Bostondoesn't make the playoffs, you can bet that New England can anticipate a winter full of hot-stove conversations about how the Yankees "know how to win" and how the local nine is missing something.

2015 Debut Players

 Some of you might be anxious to get started on your pre-draft planning, so we've put together this set of stats for the players who made their major league debuts in 2015. If your league has a rookie draft, these are the players who'll be available, listed alphabetically. 

Batters
NameUIDTmAVGGABH2B3BHRRRBIHBPBBKSB
---------------------------------------------------------------------
Hanser Alberto29443TEX0.22241992221012402171
Dariel Alvarez29456BAL0.24112297101310280
Nevin Ashley29471MIL0.10012202100211080
Jett Bandy29474ANA0.500221001110000
Austin Barnes29477LAN0.20720296200411661
Steve Baron29478SEA0.0004110000000020
Greg Bird29498NYA0.261461574190112631119530
Carson Blair29505OAK0.129113140013304180
Ryan Brett29519TBA0.667332100000100
Socrates Brito29523ARI0.303183310310510171
Trevor Brown30198SFN0.23113399300150381
Keon Broxton29527PIT0.000720000300011
Kris Bryant29529CHN0.27515155915431526879997719913
Byron Buxton29536MIN0.209461292771216616442
Ramon Cabrera29538CIN0.367133011101430050
Orlando Calixte29539KCA0.000230000100000
Mark Canha29540OAK0.254124441112223166170833967
Daniel Castro29550ATL0.24033962321214503150
Darrell Ceciliani29555NYN0.2063968142015324255
Dusty Coleman30174KCA0.000450000000030
Michael Conforto30199NYN0.270561744714093026117390
Carlos Correa29571HOU0.279993871082212252681407814
Kaleb Cowart29573ANA0.174344682018405191
Cheslor Cuthbert29584KCA0.217194610211680490
Cody Decker29597SDN0.0008110000010050
Delino DeShields29600TEX0.26112142511122102833735310125
Alex Dickerson29603SDN0.2501182000000030
Wilmer Difo29605WAS0.18215112000100020
Danny Dorn28080ARI0.167233051000302100
Brandon Drury29613ARI0.214205612302381280
Matt Duffy29615HOU0.375883100030120
Allan Dykstra29621TBA0.129133140013416120
Ed Easley29623SLN0.000460000010010
Taylor Featherston29634ANA0.1621011542551223937464
Daniel Fields29640DET0.333131100100020
Ramon Flores29645NYA0.21912327100300040
Rocky Gale29650SDN0.10011101000000010
Joey Gallo29652TEX0.20436108223161614015573
Adonis Garcia30173ATL0.277581915312010202605350
Dustin Garneau29659COL0.1572270113026806140
Slade Heathcott29708NYA0.400172510202680250
Austin Hedges29709SDN0.1685613723203131118380
Oscar Hernandez29717ARI0.161183151004113150
Odubel Herrera29719PHI0.2971474951473038644182812916
John Hicks29722SEA0.063173221001101181
Travis Jankowski29739SDN0.21134901922291204242
Micah Johnson29746CHA0.230361002340010429303
Jung Ho Kang29756PIT0.2871264211212421560581728995
Max Kepler29762MIN0.143371000000030
Kyle Kubitza29771ANA0.194193670006103150
Tyler Ladendorf29773OAK0.2359174010320120
Ryan LaMarre29774CIN0.08021252000200090
Francisco Lindor29793CLE0.313993901222241250511276912
Ryan Lollis30171SFN0.1675122000000111
Dixon Machado29804DET0.2352468163006507141
Mikie Mahtook29806TBA0.2954110531519221936314
Luke Maile30209TBA0.17115356300220080
Deven Marrero29817BOS0.2262553120018303192
Jefry Marte30172DET0.21333801740491108220
Ketel Marte29818SEA0.288572196315322517024438
Max Muncy29872OAK0.206451022181314909310
Danny Muno29873NYN0.148172741002004111
Tom Murphy29877COL0.257113591035904100
Rey Navarro29882BAL0.27610298201530030
Rico Noel30213NYA0.5001521000500005
Peter O'Brien29899ARI0.4008104101130250
Hector Olivera29908ATL0.25324792041241125120
Paulo Orlando29912KCA0.24986241601467312725533
Jarrett Parker29916SFN0.347214917206111405211
Jose Peraza29920LAN0.1827224110310223
Carlos Perez29921ANA0.250862606513042021019492
Stephen Piscotty29934SLN0.305632337115472939120562
Kevin Plawecki29935NYN0.21973233519031822417600
Michael Reed30216MIL0.333762100200030
Rob Refsnyder29955NYA0.302164313302350372
Yadiel Rivera29969MIL0.0717141000000040
Eddie Rosario29987MIN0.267122453121181513605001511811
Addison Russell29993CHN0.2421424751152911360543421494
Tyler Saladino29994CHA0.22568236536443320212518
Miguel Sano30003MIN0.26980279751711846521531191
Scott Schebler30008LAN0.250193690036413132
Kyle Schwarber30013CHN0.246692325761165243436773
Corey Seager30016LAN0.3372798338141717114192
Pedro Severino30217WAS0.250241100100010
Richie Shaffer30024TBA0.189317414304116310320
Travis Shaw30025BOS0.2706522661100133136218570
Cody Stanley30054SLN0.4009104100230030
Ryan Strausborger30219TEX0.200314590019303112
Darnell Sweeney30070PHI0.176378515413911013270
Blake Swihart30071BOS0.274842887917154731118774
Travis Tartamella30220SLN0.500321000000000
Trayce Thompson30083CHA0.29544122368351716013261
Yasmany Tomas30087ARI0.273118406111193940482171105
Kelby Tomlinson30088SFN0.30354178546322320114405
Ronald Torreyes30089LAN0.333862100110110
Devon Travis30094TOR0.304622176618083835218433
Preston Tucker30097HOU0.2439830073190133533320680
Trea Turner30101WAS0.225274091015104122
Giovanny Urshela30106CLE0.22581267608162521218580
Mason Williams30136NYA0.2868216301330130
Mac Williamson30140SFN0.21910327010211080

 

Pitchers

 

NameUIDTmGGSWLSERAInnHRERBBKHR
----------------------------------------------------------------------
Scott Alexander29447KCA400004.506533330
Miguel Almonte29453KCA900206.238.77667104
Cody Anderson29458CLE15157303.059177323124449
Matt Andriese29461TBA2583524.116669323018498
Elvis Araujo29465PHI4002103.383529171319341
Shawn Armstrong29468CLE800002.2585222111
Jonathan Aro30186BOS600106.97101588482
Alec Asher29470PHI770609.312942303010168
Manny Banuelos29475ATL761405.132630171512194
Kyle Barraclough30195MIA2502102.5924128718301
Yhonathan Barrios30196MIL500000.006.7300070
Chris Beck29484CHA110106.0061054430
Andrew Bellatti30179TBA1703102.3123167610184
Matt Boyd30178TOR2202014.856.7151111175
Matt Boyd30178DET11101406.5751563937193612
Silvino Bracho30197ARI1300011.46129224172
Archie Bradley29516ARI882305.803636232322233
Jake Brigham29521ATL1200108.64172816168121
Mike Broadway29525SFN2100205.19172010107131
Danny Burawa29530NYA1000054.000.7344111
Danny Burawa29530ATL1200003.65128554101
Enrique Burgos29533ARI3002224.672727151415392
Angel Castro29549OAK500102.254811341
Miguel Castro30169COL5001010.135.3666462
Miguel Castro30169TOR1300244.381215766122
A.J. Cole29565WAS310015.799.314116191
Adam Conley29567MIA15114103.766765282821597
Tim Cooney29568SLN661003.163128121110293
Scott Copeland29569TOR531106.4615241111261
John Cornely30185ATL1000036.001344111
Caleb Cotham30200NYA1201006.529.714771114
Tyler Cravy29577MIL1470805.704347292722355
Brandon Cunniff29583ATL3902204.633527201822374
Zach Davies29588MIL663203.713426141415242
Abel de los Santos30201WAS200005.401.7211131
Jose De Paula29595NYA100002.703.3211421
Oliver Drake29612BAL1300002.871616759171
Tyler Duffey29614MIN10105103.105856202020534
Ryan Dull29618OAK1301214.241712886164
Carl Edwards29624CHN500003.864.7332340
Jerad Eickhoff29627PHI883302.655140161513495
Brian Ellington30202MIA2302102.88251710813181
Andrew Faulkner29633TEX1100002.799.78333102
Michael Feliz29635HOU500007.888977472
Jeff Ferrell30184DET900006.35111288463
Kendry Flores29644MIA711204.97131687490
Jason Garcia29656BAL2101004.253025191417223
Sean Gilmartin29668NYN5013202.675750171718542
Mychal Givens30190BAL2202001.803020766381
Zack Godley30203ARI965103.193729131317344
David Goforth29672MIL2001004.01253213118244
Chi Chi Gonzalez29674TEX14104603.906749332932306
Severino Gonzalez29676PHI773307.92314427277285
Nick Goody30204NYA700006.355.7644330
Trevor Gott30194ANA4804203.024843171616272
Matt Grace29679WAS2602104.2417261188140
J.R. Graham29680MIN3911104.9564734135215310
Jon Gray29681COL990205.534152262514404
Mayckol Guaipe29684SEA2100305.402734191613225
Deolis Guerra28057PIT1002006.48172612123175
Junior Guerra30188CHA300006.754733131
Jason Gurka30205COL900009.397.71688271
Cody Hall29697SFN700006.488.31066471
Mitch Harris30189SLN2602103.672730141113154
Marcus Hatley29704SLN200000.001.3100220
Keith Hessler30206ARI1800108.03121611114124
Dalier Hinojosa30181BOS100000.001.7000320
Dalier Hinojosa30181PHI1802000.782315328211
Adrian Houser30207MIL200000.002100200
Edgar Ibarra29733ANA200002.254411330
Raisel Iglesias29735CIN18163704.15958145442810411
Jay Jackson29736SDN600006.234.3733140
Luke Jackson29737TEX700004.266.3533261
Brian Johnson29745BOS110108.314.3344430
Taylor Jungmann29753MIL21219803.7711910655504710711
Keone Kela29758TEX6807512.396052181618684
Ryan Kelly30191ATL1700007.02172114136105
Guido Knudson29769DET4000018.005131010365
John Lamb29775CIN10101505.805058323219588
Raudel Lazo30208MIA700003.185.7522251
Jack Leathersich29784NYN1700102.311212337140
Zach Lee29786LAN1101013.504.71177131
Arnold Leon29790OAK1900204.39273014139193
Adam Liberatore29791LAN3902204.25302614149293
Jacob Lindgren29792NYA700005.147544483
Jorge Lopez29800MIL221105.401014665100
Michael Lorenzen29801CIN27214905.401131317068578318
Sugar Ray Marimon29812ATL1600107.362630212114143
Matt Marksberry30210ATL3100305.012322161316212
Cody Martin29819OAK4202014.009161414534
Cody Martin29819ATL2102305.40222413137244
Rafael Martin29822WAS1302005.111212975254
Steven Matz29824NYN664002.2736349910344
Cory Mazzoni29829SDN8000020.778.7232220582
Lance McCullers29832HOU22226703.2212610649454312910
Scott McGough30211MIA600009.456.71277440
Andrew McKirahan29837ATL2701005.932740181810222
Alex Meyer29846MIN2000016.882.7455332
Frankie Montas29860CHA720204.801514889201
Mike Montgomery29862SEA16164604.6090924946376411
Diego Moreno30183NYA401005.2310966381
Adam Morgan29865PHI15155704.4884884542174914
Akeel Morris30223NYN1000067.500.7355301
Jon Moscot29870CIN331104.63121166562
Toru Murata30193CLE110108.103.3453122
Colton Murray30212PHI800105.877.71155292
Angel Nesbitt29886DET2401105.40222214138142
Justin Nicolino29891MIA12125404.017472333320238
Aaron Nola29894PHI13136203.5978743131196811
Scott Oberg30192COL6403415.0958583533314410
Nefi Ogando29905PHI400009.004754220
Tyler Olson29910SEA1101105.401318881082
Ryan O'Rourke29903MIN2800006.142216151515243
Josh Osich30176SFN3502002.2029241278274
Roberto Osuna30170TOR68016202.587048212016757
Henry Owens29915BOS11114404.576362353224507
James Pazos30214NYA1100000.005300330
Ariel Pena29919MIL652104.282724141314272
Williams Perez29924ATL23207614.781171306662517313
Branden Pinder29932NYA2500202.9328289914254
Noe Ramirez29946BOS1700104.1513131267133
Josh Ravin30187LAN902106.759.313774123
Colin Rea30215SDN662204.263229161511262
Chris Rearick29951SDN5000012.003644242
Chris Reed29954MIA200004.504622110
Felipe Rivero29971WAS4902122.794835151511432
Ken Roberts30177PHI6010010.384.3955110
Ken Roberts30177COL900105.799.31366250
Hansel Robles29974NYN5704303.675437272218618
Carlos Rodon29975CHA26239603.7513913063587113911
Eduardo Rodriguez29977BOS212110603.851221205552379813
David Rollins29983SEA2000207.56253721218213
Joe Ross29989WAS16135503.647764333121697
Nick Rumbelow29991NYA1701104.021616875152
Keyvius Sampson29998CIN13122606.545267433826427
A.J. Schugel30011ARI500005.00917135552
Luis Severino30022NYA11115302.896253212022569
Josh Smith30042CIN970406.893342272521305
Sammy Solis30051WAS1801103.3821251184172
Giovanni Soto30218CLE600000.003.3300000
Noah Syndergaard30072NYN24249703.2415012660543116619
Ryan Tepera30079TOR3200213.27332314126228
Matt Tracy30091NYA100000.002230210
Jose Urena30103MIA2091505.256273373625285
Jose Valdez30107DET700104.0091044442
Vincent Velasquez30112HOU1971104.375650282721585
Pat Venditte30114OAK2602204.402922141412233
Logan Verrett30115TEX400106.0091176431
Logan Verrett30115NYN1441113.033923131311365
Tyler Wagner30121MIL330207.2414221111751
Ryan Weber30221ATL550304.76282515156193
Tyler Wilson30144BAL952203.503639141411131
Daniel Winkler30148ATL2000010.801.7222122
Matt Wisler30151ATL20198804.711091195957407216
A.Wojciechowski30153HOU530107.16162313137162
Mike Wright30157BAL1293506.044552303018269
Tony Zych30222SEA1310002.451817653241

Diamond Mind Online

Looking for Diamond Mind Online?
Go head to head with other team owners with Diamond Mind OnlineLearn More Now!

We Accept

  • american express
  • diners club
  • discover
  • master
  • paypal
  • shopify pay
  • visa

Featured Collections

Stay in Touch

© 2025 Diamond Mind Baseball
Powered by Shopify

[8]ページ先頭

©2009-2025 Movatter.jp