HerfstkleurenHelpdesk IBM SPSS Statistics 20

For students from Arnhem Business School
Home Codebook Data Data editing Analysis Graphs Settings Links Methods

Graphs Histogram

In this example we use data about the tallest buildings in the world. This data comes from Wikipedia and was downloaded and edited in September 2011. Surely there is an update of this data available at the moment.
The following variables are in our data file. Height is recorded in meters. Year is the year the building was completed.


We want to show the height distribution of these sky scrapers by means of a histogram. In this example we use Graphs > Legacy Dialogs > Histogram...:

We have only filled in the variable to use (Height). There is no need to panel the graph - which means that SPSS draws histograms for subgroups based on the panelling variable.
A title will be dealt with later.


The first Result

Clearly this graph needs editing. We know we have to add a title. The Burj Khalifa is an extreme outlier that on its own uses half of the available image space. Do we want this or not? And do we want to keep the statistics in the upper right corner in the graph?

SPSS has automatically created a set of classes for this scale variable. Is it the right choice for us or do we want to change it?
For example, the first class starts at 233.3 m. We can do better than that.
Depending on the context where it will be published it might need a source.
And of course we might want to experiment with the scale on the vertical axis, gridlines, colors and the fonts that are used.

Double clicking on the chart in the SPSS output windows opens the graph in a new Chart Editor window.



If you click on a part of the graph the corresponding Properties Window appears. It might have several tabs. As an example you see the Properties Window that pops up when you click on the histogram bars. On the active tab you can adjust the binning, but as you can see there are more tabs where you can make adjustments.

It isn't useful to show you all of these dialog boxes. There are too many of them and they all work in a similar way. You choose the options that you want and click on "Apply". The changes will show in the chart.
If you like it then keep it. Otherwise go back to the dialog and change it again.

If the Properties Window doesn't show you can activate it via the button
 properties button

Explore the other buttons of the tool bars as well:

tool buttons
properties colors 


The edited result:

final histogram tall buildings

In the plot we have edited a number of things but we kept the scale intact, so that all buildings would show up in the picture and the extreme size of the Burj Kalifa is emphasized.

We have changed the binning. The class width is now 25 m and the first class starts at 225 m. Also the scaling of the vertical axis has changed.

Note: Copying and pasting graphs into Word works fine most of the time, but not always.
As an alternative you can use the Export option in the SPSS output window.
Select in the output window the chart you want to export and from the menu choose File > Export... .
Next fill in the dialog box to get what you want.

Note: If you find it hard to add a text field in the SPSS chart, don't hesitate to export the chart and continue your editing in another program of your choice.


About binning

In the left picture above you see what happens if you pick an class width that is too large. You will get a few big lumps that tell you nothing about the actual distribution of the data.

In the right picture above you see the opposite. The class width is too small and this results in a very spiky graph. But again the distribution of the data is obscured.

In the third picture (below left) we got it right. And now we see clearly that we are dealing with a bimodal distribution.

If you like you can have a look at the applet we used to create these pictures.

When you ask for a histogram in SPSS the program uses the data range and the number of observations to make a set of classes for the graph. Often this will give an acceptable result, but not always. Have a look at the examples below. The first one looks like the spiky graph we dismissed above. The spikes obscure the global picture.
(Also note that there is no unit of measurement for the variable income on the horizontal axis. One might argue that it is given in the text around the graph, but since you can never be sure how your report will be used or quoted; always make sure the graph is self reliant.)

We will examine a second example in some more detail.
In this case the corresponding dataset is not available. So you will have to practice with the tall buildings data to learn this technique.

It is immediately clear that the labeling is horrible, but that is another story. It is also clear that the automatically generated set of classes results in a spiky graph with gaps between bars that don't seem to to justified. So how can we set this right?

By double clicking on the chart in the output window of SPSS you open it in its own editing window.

  1. We want to adjust the binning. To do that, you click on the histogram bars. The properties window below left will pop up. It has a tab called "Binning". As you can see the default setting is automatic binning, where SPSS chooses for you. The above example shows that does not always provide the best result. So you activate the radio button for Custom and make your own choices. Setting the interval width and choosing a custom value for anchor gives you the control you need.

  2. We want to have a proper labeling of the horizontal axis. To do that, you click on that axis. The properties window below right will pop up. It has several tabs for you to control every aspect of the labeling. Using the "Scale" tab you can specify how many numbers you want to see, and where it should start. With "Number Format" you specify the decimals.

Experiment a little with the settings at your disposal until you are satisfied with the result.

As you can see, we have plenty of tools for a proper axis display at our disposal. A decent result might look like this.

By taking a class width of 0.5 instead of 0.25 we get a result that is too coarse, as you can see in the picture below.


Choose the class limits wisely

The graph below is another example from students' work. In this case the class width is Can$ 30,000.

The choice for the class width is ok. But what is unconventional here is the fact that 0 is not a class limit. Every National Bureau of Statistics will have a set of classes for household income where one of the classes starts at 0.
You should do the same. It enables you to compare your results to those of the experts.

And also ask yourself if you really want a normal curve superimposed over the histogram. Is there any need for you to assess whether the income distribution is approximately normal? If the answer is "no", leave the curve out.


Last modified 30-10-2012

Jos Seegers, 2009; English version by Gé Groenewegen.