11
May
08

Sports-oriented graph paper

I’ve been preparing a post on the Pirates’ win records over the last few years and I had created an Excel spreadsheet to plot their win fraction (total wins/total games played) as a function of time during each season. Initially, I had plotted their win fraction versus the date but that made comparing the data between seasons complicated because the seasons’ start and end dates vary a bit from year to year. So, I plotted it versus the number of games played. When I did that, I was somewhat surprised to find that there were sections of the plots from different seasons that overlapped exactly, sometimes for 5 or more games. This indicated that these plots were constrained to consist of a series of small sections from a relatively small, finite set of curves. However, I didn’t know what these curves were, so I set about to figure it out. The final product of this investigation is graph paper that a sports fan can use to easily plot the progress of his/her favorite team as the season progresses.

Before long, I had figured out how to basically fill the graph with these curves. If I drew a set of curves that plotted a team’s win fraction assuming they won their first W games (with values of W varying from 1 to 161) and then lost all subsequent games, that gave me half of the curves I needed – the curves that governed a team’s win fraction trajectory when it lost games. In order to get the curves that governed the team’s win fraction trajectory when it won games, I needed curves that assumed that the team lost their first L games and then won all their subsequent games. I implemented this in Excel, but Excel balked at the idea of plotting 322 (2*161) curves. So, I decided to try writing a Gnuplot script to do the same thing. The functions I used in my Excel spreadsheet were a little kludgy but I gradually figured out that the simplest representation of the functions I needed to plot were F(G) = (W/G) and F(G) = (L/G) = 1-(W/G) where W represents the total number of wins (and L represents the total losses) a team has accumulated over its G games played and F is the win fraction. (Note that W+L = G or L = G-W.) Using these functions, I was able to get everything plotted using Gnuplot. However, Gnuplot didn’t do a good job of plotting the F = 1/G curve. Plus, I was having a hard time getting everything formatted the way I wanted. So, I scrapped Gnuplot and tried using Mathematica.

I’ve used Mathematica quite a bit for classes that I’ve taken and TA’ed, so I’m reasonably familiar with it. It turns out I was able to create the graphs easily and in only one line (it spans multiple lines, but it’s really just one compound command). Mathematica is slower than Gnuplot but I find that making formatting changes is easier (mostly because I’m more familiar with Mathematica and Mathematica has a better help browser). In any case, I decided to make “graph paper” that a fan could use to easily follow a favorite team’s win percentage as the season unfolds.

The graph paper necessary for an entire (regular) season of baseball is pretty involved, since there are 162 games in the season. It’s easier to see the important features of this graph paper if we produce graph paper for the regular NFL season instead, since there are only 16 games.

Let me explain what the various curves represent. The green curves are what might be called iso-win lines. That is, each green curve represents the team’s win percentage as a function of the number of games played if the team has won the number of games that appears at the top of the graph where the curve terminates. As long as the team keeps losing (that is, its number of wins doesn’t change), the win fraction trajectory stays coincident with this curve. The red curves might be called iso-loss lines. That is, they represent the win percentage as a function of the number of games played given that the team has lost the number of games that appears at the bottom of the graph where the curve terminates. As long as the team keeps winning (that is, its number of losses doesn’t change), the win fraction trajectory remains coincident with this curve. However, if the team loses a game, then its win fraction trajectory stays on the same iso-win curve but jumps from one iso-loss line to the next one to the right. For example, the 2007 Pittsburgh Steelers won games 6, 7, 8, and 9 of the season. Prior to game 6, they had lost 2 games. Thus, after games 6, 7, 8, and 9, their win fraction trajectory was coincident with the 2-loss iso-loss curve. However, they lost their 10th game of the season. Up to that point they had won 7 games. So, as a result of the loss, they moved down the 7-win iso-win curve and jumped from the 2-loss iso-loss curve to the 3-loss iso-loss curve.

This description may seem pretty arcane. Let me explain how this works in practice. If you want to plot your favorite team’s progress over the course of a season, you start by putting a point at the top left or the bottom left of the plot, depending on whether they won (top left) or lost (bottom left) their first game. If they won their first game and keep on winning, then the line tracking their win fraction will remain at the top of the plot. However, if they lose a game, you then follow the appropriate green curve down to the first intersection with a red curve. After each subsequent game is completed, if it was a win, move upward along the red curve to the next green-red intersection. If the game was a loss, move downward along the green curve to the nearest green-red intersection. Repeat this process until the end of the season.

The value of this graph paper lies in the information it gives you after you have performed the above procedure, or even in the middle of the season. The x coordinate of a given red-green intersection gives the total number of games the team has played and the y coordinate of the intersection gives the win fraction the team has earned to that point. Furthermore, if you follow the green curve back to its terminus, the number there will tell you how many wins the team has had and doing the same procedure with the red curve will tell you how many losses the team has had.

Furthermore, this graph paper exhibits all the behavior we would expect of win fractions. For example, a team’s win fraction changes dramatically with each additional win or loss at the beginning of the season but by the end of the season each win or loss has only a small effect on the win fraction. This is seen clearly in win fraction trajectories plotted using this graph paper. In addition, if a team alternates between winning and losing for a few games, the average rate of change over those games in the win fraction depends on whether the starting win fraction was above or below 0.5. If the starting win fraction was below 0.5, then a team that wins 50% of its subsequent games will improve its win fraction. However, if the team started with a win fraction above 50% and it subsequently wins half of its games, its win fraction will decrease, approaching 0.5 asymptotically. This behavior can be seen easily by plotting such a trajectory on this graph paper. Obviously, teams that start with a win fraction of 0.5 and subsequently win 50% of their games will maintain, on average, their 0.5 win fraction.

It’s certainly possible that other people before me have reported using this type of graph paper to easily visualize the win fraction of a sports team over the course of its season; however, I have never seen any such report. It may also be possible that such a scheme has been proposed to plot some other similar quantity in a non-sports context, but, again, I have not seen anything like this. I expect that this type of behavior is well-known to mathematicians, but I hope that its application to sports represents an original contribution. The links below will allow you to download the PDF version of the graph paper for the NFL (16-game), NBA/NHL (81-game), and MLB (162-game) regular seasons. If you would like the files in SVG format, let me know.  WordPress isn’t letting me upload them, so I might have to email them to you.  In addition, I have included a link to a PDF copy of the Mathematica notebook I used to create the plots. After creating them, I saved them to the SVG format and imported them into Inkscape, which I used to resize and massage them a bit, before saving them to PDF. However, with the Mathematica notebook, you can create your own or improve on mine. Please let me know if you have any difficulties or if you have any suggestions on how these might be improved.

NFL Graph Paper (PDF)

NBA/NHL Graph Paper (PDF)

MLB Graph Paper (PDF)

Graph Paper Mathematica Notebook (PDF)


6 Responses to “Sports-oriented graph paper”


  1. May 12, 2008 at 7:04 am

    Kludgy, a new word for me in paragraph two. It means “a system and especially a computer system made up of poorly matched components,” if anyone else needs the definition.

    One suggestion, there’s no indication on the red-green NFL graph that the green lines represent wins and the red losses. Though I’m sure smart people would probably figure it out, perhaps you could add a key or simply add the labels ‘total games won’ at the top and ‘total games lost’ at the bottom.

    Yes, dear husband, I did read this entire post. I love you! :)

  2. 2 andy (not andyl)
    May 12, 2008 at 12:26 pm

    So what you’re saying is, if being a materials nerd doesn’t work out for you, you’re going to try and get a job at Baseball Prospectus?

  3. 3 Colin
    May 12, 2008 at 1:05 pm

    @andy: Yes. I was just talking to an officemate about that. Well, not Baseball Prospectus, but working for an actual team. I’m trying to figure out how to get my foot in the door with the Pirates.

  4. 4 Colin
    May 14, 2008 at 10:40 am

    I realized recently that this type of graph paper could also be used to plot players’ batting averages. If the wins curves instead represent hits, the losses curves instead represent outs, and the total games scale represents at-bats, then the analogy is complete. Maybe I’ll create a sheet that you could use to plot your favorite player’s batting average over the course of the year. Maybe I won’t, however, since many players get 500+ at-bats in a year and that would be one complicated piece of graph paper if it involved 1000+ curves.

  5. 5 andy (not andyl)
    May 14, 2008 at 10:39 pm

    Well, you could write a web application that takes a player’s at-bat data and then plots it on your graph, but in a nicely zoomable AJAXy graph, like the google finance stock graphs. You’d be able to move between the long and short views fairly quickly.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s


Calendar

May 2008
S M T W T F S
« Apr   Jun »
 123
45678910
11121314151617
18192021222324
25262728293031

Recent Twitterings

Follow Me on Twitter

RSS That to which I am listening

  • An error has occurred; the feed is probably down. Try again later.

%d bloggers like this: