HerfstkleurenHelpdesk IBM SPSS Statistics 20

For students from Arnhem Business School
Home Codebook Data Data editing Analysis Graphs Settings Links Methods

Data Sort Cases

Looking at your raw data to see what it seems to say is useful in exploratory data analysis.
Sorting the cases of your data file gives you the opportunity to look at it from a particular perspective. It helps you to find outliers or other particular cases easily.

You use "Sort Cases" if you want to order your data based on the values of one or more variables in your file.
This is a simple technique and probably you are already familiar with it in Excel.

An example of sorting

For a number of years we have asked foundation year students from Arnhem Business School to fill in a simple questionnaire with some questions about who they are. We use this data in class in our introductory statistics course. The following information is collected in the file (see abs_students.sav):

abs students codebook

In our sorting example we will order our data set by height of the students. Choose from the menu Data > Sort:

data sort menu
sort cases by You select the variable by which you want to sort your data and next just click on OK.

You can choose the Sort Order (Ascending or Descending).

If you select several variables for the sorting process, the data is first sorted by the variable at the top of the list, and next the cases with the same value for this first variable will be sorted by the second variable.

If you want to try this then sort the ABS student data by gender by age.

The result of sorting by height:

sorting by height result

We see four students for which the height is unknown. Their cells for "height" are empty, denoted by (.), which represents a system missing value.
Next we see a male students who is only 130 cm tall. Although this is not impossible, it is a remarkable case. Whether or not you want to keep the value depends on two things.

First of all you have to answer the question "Do I trust this data?".
    If your answer is "no", you delete this data.

    If your answer is "yes",
next you ask yourself if this data will be an influential outlier that might disturb your analyses of the rest of the data.
If that is the case, you report in the data editing part of your analysis that you have found an outlier, that it is to be trusted, what it is about and why you have deleted it from the data set.
If this data won't be disturbing to the overall analyses, you can leave it.

Case number 6 is a female, not born in Europe, who is 150 cm tall. There is nothing strange about this case.

At the bottom of the file we find the tall guys in our dataset. Nobody seems to be exceptionally tall.

sorting result tall guys


Last modified 30-10-2012

Jos Seegers, 2009; English version by Gé Groenewegen.