Mathematicians Rebecca Sparks and David Abrahamson, a husband-and-wife team who teach at Rhode Island College, have developed a formula that predicts which pitchers will place first through third in Cy Young voting. The researchers structured their formula to predict the voting results for starting pitchers, who almost always win the award, rather than relief pitchers, who are rarely the recipients. However, their formula reveals a lack of standout American League starting pitchers this year, suggesting that the AL award will go to relief pitcher Mariano Rivera for his extraordinary 2005 season.
Sparks and Abrahamson presented their model in the April 2005 issue of Math Horizons, a magazine published by the Mathematical Association of America (MAA). Abrahamson will discuss the model in a talk about math and sports at a regional MAA meeting to take place at the University of New Hampshire on November 18 and 19, 2005.
Every season, the baseball writers' association selects two sportswriters from every city in the major leagues to vote for a first, second and third place choice. The ballots are due right after the regular season ends. "The identities of the voters change frequently," Sparks and Abrahamson write in their Math Horizons article, "but we will see that their voting results follow a predictable course."
The pair took an extremely pragmatic approach in developing a method to forecast Cy Young winners. They did not consider which pitchers should win the award, or which qualities were most important in a pitcher. They simply aimed to develop a mathematical formula that would best match the voting results.
Their formula computes a score for each pitcher on a scale from roughly 0 to 10. For their formula to be successful, it must yield the top score in a particular season to the pitcher who places first in Cy Young voting, the next-highest score to the player who places second, and the third-highest score to the player who places third.
To calculate the scores, they first chose four key pitching statistics: wins, losses, strikeouts, and ERA (earned run average, which is the average number of runs that the pitcher is responsible for giving up per 9 innings of play). They also included a fifth statistic, the winning percentage of the pitcher's team, as they thought that it influences the voting results.
But the main question, according to the two researchers, is how much importance the voters placed on each of those five categories. Do voters, consciously or unconsciously, generally value a pitcher's number of wins more than his number of strikeouts? Does a pitcher on a first-place team really have a better chance of winning the award than a pitcher with slightly better stats on a last-place team?
The tools of mathematics can answer this seemingly subjective question. First, the researchers looked up the statistics in those five categories for starting pitchers between 1993 and 2002 and compared them to the Cy Young voting results for those years.
Then, to determine the relative importance of each of the five categories in the voting results, they turned to a mathematical method, dating to the 1940s, called linear programming. First developed by economists (who won the Nobel Prize for work that employed it) and mathematician George Dantzig, the idea is to find the missing numbers (in this case, the relative importance or "weight" of each pitching category in the voting) in order to satisfy certain constraints (i.e., a formula that would correctly yield the first- through third-place results for Cy Young balloting).
Analyzing the 1993 to 2002 data, they concluded that a pitcher's number of wins carried almost three times as much weight in the voting as his earned run average. ERA, in turn, was about one-and-a-half times more important than strikeouts, and about twice as important as the winning percentage of the pitcher's team. Almost completely insignificant, according to the model, is a pitcher's number of losses; they seemed to have very little bearing on the voting results.
By taking each pitcher's statistics in these five categories and adjusting their values according to these relative weights, the researchers' formula correctly yielded all but one of the first-, second- and third place vote-getters in each league from 1993 to 2002. Recently, they incorporated the data for the 2003 and 2004 seasons into the model, and predicted three out of four Cy Young winners (the fourth was a reliever). By looking at the 2003 and 2004 statistics, they again found that the relative weights of the five categories were almost exactly the same as in the earlier data.
Using their formula, the researchers come up with the following predictions for the first three places in the 2005 National League voting:
- Chris Carpenter, St. Louis (6.4257 points)
- Dontrelle Willis, Florida (6.3420)
- Roy Oswalt, Houston (5.9064)
According to Abrahamson, it is possible that voters may drift away from their past behavior by voting for Roger Clemens or Andy Pettitte ahead of Roy Oswalt this year.
Clemens and Pettitte are generally better known veterans who may have a somewhat higher profile in the news media than Oswalt.
In the American League, the top starters in their model are, in order,
- Bartolo Colon, LA/Anaheim (5.8074)
- Johann Santana, Minnesota (5.3671)
- Jon Garland, Chicago (5.0730)
The model shows that there is no standout starter in the American League this year. Bartolo Colon, the top starter according to their model, has a total score of less than 6, a far cry from many AL Cy Young award winners in years past, such as Barry Zito (6.75, 2002) and Pedro Martinez (7.54, 1999).
"Our model quantifies the fact that there is no AL pitcher who will knock the voters' socks off," says Abrahamson. Therefore, Sparks says the two are "very confident" that the AL Cy Young Award will go to Mariano Rivera, a relief pitcher who had a particularly outstanding year. A Cy Young for Rivera, they say, would also serve as a kind of "lifetime achievement award" as Rivera, who has never earned the award, is likely toward the end of a very distinctive career.
The researchers think that their mathematical approach, known generally as "constrained optimization," might work for other sports awards, such as the most valuable player in various leagues. It also might help provide insights into how magazines rank corporations, or top colleges. But the point of their approach, they say, is to show how the methods of mathematics can apply in many unexpected everyday situations.
"The moral is always the same for the mathematical modeler," they write in their Math Horizons article. "More often than we may know, there is a pattern out there. We just have to keep thinking creatively, and we have got a good chance of finding it."
Rebecca L. Sparks and David L. Abrahamson, "A Mathematical Model to Predict Award Winners," in Math Horizons, April 2005.