For those that don’t know Retrosheet is a non-profit organization with the goal of archiving historical baseball data. They put together cool lists like occurances of the hidden ball trick (Ozzie Guillen fell victim three times) and have boxscores from amazing number of seasons. But the coolest resource is play-by-play data for every year between 1957 and 2005 (except 1999) thanks to some newly added years. This is the source for the data that almost everyone out there uses for their play-by-play analysis — sites like Tangotiger and Baseball Prospectus. Retrosheet provides a couple basic tools to turn the data into boxscores, game summaries, or event files (that can more easily be analyzed).

Baseball Hacks is a cool book published a couple months ago that discusses tricks to finding, consolidating, and analyzing baseball data on the web. It points out another source (with a GUI) for analyzing the Retrosheet data: Chadwick. (I haven’t used it yet, to be honest.) Baseball Hacks also walks through a variety of ways to get the Retrosheet data into a MySQL database and discusses some basic Perl scripts to manipulate it. I’ve only been reading the book for a day, and I’m already set to propose. You can buy Baseball Hacks through Amazon, or, better yet, read it online at O’Reilly’s website — the first fourteen days are free.

The point of this post? Umm, I guess to point out two amazing resources of play-by-play analysis.

Popularity: 4% [?]

Share This


Further Reading -- Similar Posts



Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>