Reading: Section 2.4 on Graphical Presentations

# Charts and Graphs

Presenting data is important, and graphs, charts, histograms and other visual presentations are very important. Being able to visualize data helps to focus on what is important: outliers, peaks and valleys, trends of growth or decline, and so forth.

## Sample Data

On January 1, 2006, The Boston Globe Magazine published a cover story on the occasion of the 1000th execution in the U.S. since the Supreme Court reinstated the death penalty in 1976. The article had a breakdown of that data by state, which is interesting as a dataset. Thanks to Irene Laursen in the Wellesley's Science Library, we have that data, plus additional detail. Take a few minutes to look over the data.

## Line Graphs

Line graphs are useful when the x-axis is ordered in some way. The most common way for it to be ordered is time. The graphs we've seen in Extend for the amount of money or the number of rabbits or whatever are interesting because of how they change over time, and a Line Graph is a good way of doing this.

Let's graph the number of executions in the U.S. since the reinstatement of the death penalty in 1976. To get started, choose "Insert / Chart" or click on the icon of a bar chart. This starts the "chart wizard," which helps you create a chart in four steps.

1. Chart Type:
• We'll take a minute to discuss the different types, but for this example, choose "Line"
• We'll also take a minute to discuss the different sub-types, but for now, choose the one in the upper left (also "Line")
• click "next"
2. Chart Source Data:
• Click in the "Data Range" box and then select the row with the data you want. By data, it means the y-axis numbers. The data can be in a column if you want; it doesn't matter.
• For this example, choose the row with "U.S. Total" in it. When you're done, your data range formula should look like "Sheet1!\$c\$13:\$AE\$13"
• click "next"
3. Chart Options. These are mostly labels and other non-data stuff
• The Chart Title can only be a fixed string. Make up an appropriate one, such as "U.S. Executions since 1976"
• The X axis can only be a fixed string. Type in "year"
• The Y axis can only be a fixed string. Type in "executions"
• We'll leave the other options at their defaults for now.
• click "next"
4. Chart Location. The chart can either be placed in an existing sheet or a new sheet. The default is reasonable.

Look at the result. Feel free to compare it with the chart under "Chart1" in the tabs below the spreadsheet. Notice:

• The X axis is 1-29, which is not nearly as usful as knowing the particular year. How can we change this?
• The Y axis goes from 0 to 120, but none of the data are over 100, so maybe that would be a better choice. How can we change this?
• The legend calls this line "Series1," which isn't very informative. How can we change this?
• The line is light blue. What if we wanted a different color?

You should see a little window floating in front of your plot. Some of the things we are interested in changing can be changed there. For example, to change the line color, pull down the menu and choose "Series 1". The little icon next to the pull-down menu brings up a window called "Format Data Series." Alternatively, you can try double-clicking on the line itself. From the Format Data Series window, you can choose different colors and such.

If you double-click the "120" on the y-axis, you can bring up a "Format Axis" window, which has a tab for "scale." Change it to 100.

We'll fix the others by doing them right the first time. (We could also do it after the fact, but I think it's better and easier to see how to get it right the first time.) So we'll re-do our chart. Delete the first one and then:

1. Same choices for the chart type
2. For the Chart Data
• click on "series" at the top of the window. This is used if there is more than one dependent variable (y-axis value) and if there is a non-trivial x axis variable.
• By default, Excel has grabbed a bunch of rows, with each a series. This is actually pretty good. Let's keep it for now.
• Click below that in the box next to "Category (X) axis labels:" and select the C11:AE11 range.
• Click "finish," since we're just curious about what this will do.

Pretty cool, hunh? Notice that the X axis is now correct. However, the Y data is more than we want, at least for this graph. Let's fix that.

• Click on "Chart/Source Data" and you'll get a window much like step 2 of the chart wizard.
• Click on "series" to switch to that tab
• Delete the series you don't want
• Add the series you do want. You might want only the total, in which case the "Name" box will be "=Sheet!\$A\$42" and the "Values" box will be "=Sheet1!\$C\$42:\$AE\$42"
• Click on OK when you are done. Feel free to compare it with the chart under "Chart2" in the tabs below.

Examine the result. Looking at the breakdown by the three major regions is worthwhile. This is a good time to consider the other chart sub-types. You can change chart sub-types with "Chart/Chart Type."

## Bar Charts

Another way to display data is via a bar chart. A bar chart is more appropriate when there isn't necessarily any order to the X axis, or, even if there is, "trends" are not the point. For example, we could look at the row totals, broken down by region or even by state.

### Executions by Region

Use the chart wizard again.

1. Chart Type: Choose "Column," and choose the sub-type in the upper left ("clustered column").
2. Chart Source Data. You can see that Excel has already made some "guesses." These will typically be useless, unless you have a simple spreadsheet. Here, delete it's notion of the data, and replace it with four cell addresses, separated by commas:
• B13 (the U.S. total)
• B17 (the northeast)
• B28 (the midwest)
• B42 (the south)

Then, switch to the "series" tab and choose a "name" cell for each series, namely the data in column A.

3. Chart Options: fill in the title and axis labels as you see fit.
4. Chart Location: put it anywhere you like. Compare with "Chart3"

Notice that the data from the Department of Justice doesn't include the 2005 data. I added a column to compute the totals through 2005; it's column AE.

### Executions by State

Redo the chart, building a bar chart by state. Unfortunately, Excel's desire to guess something useful will work against us. To see, let's try it:

1. Same choice for the first step
2. For the second step, try to get the data ranges correct. It's a pain, especially since there are 50+ of them.

Start over. This time, copy the first two columns (starting at US total) to a blank sheet.

1. Same choice for the first step
2. For the second step, choose "series in columns" and switch to the series tab. Then, amend the formula for values for Series1 to be the B column (B1:B62) and the Category (X) axis labels to be the A column (A2:A62).
3. Choose an appropriate chart title and axis labels
4. Finish and compare with Chart4

Having the state labels vertical is hard to read. This is a good case for the horizontal orientation.

• Switch the chart type to "Bar"
• Notice that not all of the states are listed. Double-click on the y-axis labels and you'll see a "Format Axis" window. Under the "Scale" tab, make sure "number of categories between tick mark labels" is 1.
• Reduce the font so that you can read all the text.
• Compare to "Chart5"

## Histograms

Consult Flick Coleman's histogram tutorial in Flash.

Note that it's actually not necessary to sort the data; the histogram tool will sort a copy of the data.

## Data Summarization

Do you agree with summarizing by region?

How would you summarize the data about the 1000 total executions since 1976?

• What is the mean?
• What is the median?
• What is the mode?

Sheet3 has the data with just the basic 52 regions (50 states, plus DC and Federal).

### Sorting

You could sort the data by state name or by magnitude. Let's try both.

• On Sheet 3, notice that I added a header row with column labels (region name and executions). This can be helpful with sorting, but it's not necessary
• Select the whole sheet (command-A or the little diamond) then choose "Data/Sort." Notice the option where you can say whether the data has a header row (in case Excel guesses wrong).
• Choose to sort by region name, ascending.
• Make a chart of the data. What sticks out?
• Sort by number of executions and make another chart.

### Summarizing

Assume you can report several numbers, not just one. What do you report? Why?

• What would you report if you could report only one number?
• What would you report if you could report only two numbers?
• What would you report if you could report only three numbers?
• What would you report if you could report only four numbers?

## Chart Analysis

Look at the chart Executions by Year 1608-2000? (It's at the bottom of the page.) What do you think of it? Why?

## More Chart Analysis

I'll hand out a copy of the January 30, 2006 Newsweek cover story on boys and girls, which has two charts on the bottom of page 52 that I think are worth trying to re-do. Please do them.

If you want to compare, here is my charts for the boys and girls reading and writing scores

What is the effect of zooming in on the data?