Thursday, August 24, 2006

Fielding as a Continuous Measure

One of the aspects of baseball that is the hardest to quantify and evaluate is fielding ability. Most events in baseball, such as hitting events, are discrete which makes them easy to tabulate and model probabilistically. The central difficulty with fielding is that we are trying to evaluate players on a continuous playing surface where we must take into account not just whether a successful play was made, but whether a successful play was possible. The much-maligned error statistic is a subjective attempt at discretising this phenomenon: players are assigned an error if the official scorer deems that their unsuccessful play should have been successful. However, tabulating errors isn't a good measure of ability without a corresponding measure that credits a player for making a play that most players wouldn't have.

Recent techniques such as Ultimate Zone Rating or the Plus-Minus system from The Fielding Bible are based on the tabulation of both positive and negative fielding events. These statistics are more detailed and accurate measures of fielding ability. These techniques are also getting a bit more attention is the regular media, as evidenced by this recent Yahoo! Sports article. However, despite being obvious improvements on previous methods, both of these approaches are still based on dividing the baseball field into discrete zones and vectors, and tabulating events within each zone. Ideally, the baseball field could be treated as the continuous playing surface that it actually is, instead of a set of zones or vectors.

Over the next few weeks, we will present results from our own approach to this problem, which we call SAFE: Spatial Aggregate Fielding Evaluation. We use the same raw BIS data used for The Fielding Bible. However, instead of tabulating fielding events within discrete zones, we fit continuous probability distributions to each fielder based on their past fielding events. The closest technique (at least in spirit) to our approach is the work by David Pinto. Once we have probability models for each player, we use numerical integration to calculate the expected runs that they save or cost their team.

We'll give a more detailed outline of our methodology, along with lots of results in the upcoming weeks. As a teaser, though, we'll just say that Adam Everett really is a good shortstop and Derek Jeter really is a bad one, but the results are not as dramatic as The Fielding Bible would lead you to believe!

Tuesday, August 22, 2006

Welcome!

The primary goal of this blog is a discussion of interesting and novel statistical research for the study of baseball. The founders of this blog are two professors (Shane Jensen and Abraham Wyner) in the Department of Statistics at the University of Pennsylvania. We are overseeing an active baseball statistics research group that started last year in collaboration with ESPN.

Our plan is to post our own research as often as possible, and we would really like to hear feedback as well as tips on other interesting baseball research out there on the web.

More soon...