HerfstkleurenHelpdesk IBM SPSS Statistics 20

For students from Arnhem Business School
Home Codebook Data Data editing Analysis Graphs Settings Links Methods

Codebook

On this page we explain the basic ideas behind the definition and description of variables in SPSS. It will not be exhaustive.
Also check the section Data for additional information.
The most common situations will be explained by means of often occuring instances of survey questions. They also serve as guidelines for chosing a proper table or graph. These example are also present in the Analysis part of our site.

TOP

Purpose of a codebook

A codebook provides the translation from the set of possible answers given by respondents or obtained by measurements/observation into the SPSS data matrix. The codebook ensures a correct and complete definition of all variables in the data matrix and is a foundation for data analysis.

TOP

Basic guidelines

  1. Give every completed questionnaire or set of answers a number or other unique identification code.
    Make a variable in SPSS to contain that number or code.

  2. Use numeric variables whenever possible.

  3. Use the variable names as links to the numbers of the questions in the questionnaire (like V1, V2, ... or Q1a, Q1b, Q2, ...).
    Use the variable labels to describe the meaning of the variables.

  4. Use concise but complete variable labels and value labels.

  5. Define missing values.

  6. Be consistent in the (number) codes that you use for missing values.
    For example use as missing values consistently 9 or 99 or 999. Make sure that the missing value code can never be a valid answer code. Hence don't use 99 as missing value for the variable age.

TOP

The choices in the variable view explained

You can make a choice for each of of the elements in the variable view, or you can settle for the default choices made by SPSS.
And remember the basic guidelines.

Werkbalk Variable View 

  •  Name:  SPSS refers to variables by their name. Keep these short and preferably use the number of the question in your questionnaire.

  • Type: Here you specify the type of variable to use. The default choice and also our preferred choice is "numeric".
    The second most used option is "string". You specify its width by chosing the number of characters to use.

  • Width: Here you specify the size of the variable. For example the number 12345.67 has a width of 8, namely 5 digits, then the decimal separator and finally 2 decimals. This gives a total of 5 + 1 + 2 = 8 positions.
    numeric width
    The width of a string variable equals the maximum number of characters it may contain. Of course strings don't have decimals.
    string width

  • Label: Here you clearly describe the meaning of the variable. Try to be concise. SPSS uses this label to refer to the variable in all its output.

  • Values: This is the place to describe the answer codes that you use.

  • Missing: Here you specify which value(s) are used to indicate non response (like "don't know", "refuse to answer" or "not applicable"). These values will be treated by SPSS differently from the valid answer codes. For example they won't be used in the calculation of an average.

  • Columns: Here you specify the colomn width of the variable in the SPSS Data View Window. This doesn't have to equal the variable width.
    For example you can use a string variable of width 100. To limit its amount of space on the screen you may choose Columns =10.
    column widht

  • Align: This specifies the alignment in the Data View Window. By default string variables are aligned left and numeric variables are aligned right.
    data alignment

  • Measure: Here you specify the level of measurement of the variable. Some statistical techniques require a certain level of measurement for the variables involved. SPSS "knows" about these requirements and in some procedures is programmed to act accordingly.
    To refresh your memory a short overview of the options:

    • Nominal
      The values (the answer options) are a list of categories (of course complete and without overlaps). For example the variable Gender is coded as 1 = male, 2 = female and 9 = unknown (the missing value). Note that an average gender has no meaning.

    • Ordinal
      A variable is ordinal if the list of answer options has a logical ordering. Ordering t-shrt sizes alphabetically (L, M, S, XL, XS, XXL) makes no sense. We all know that the logical ordering should be XS, S, M, L, XL, XXL.
      Another well-known example are answer options as used in Likert scales:  1 = totally disagree, 2 = disagree, ... , 5 = totally agree. The higher the value the stronger the agreement. If you like, you can reverse the order, but scrambling it makes no sense.

    • Scale
      A variable were the answers are actual numbers and were arithmetic makes sense. Examples are age, price, income, time until completion and the like.

    • Unknown
      This is new to SPSS 20. If you create a new variable, its level of measurement is unknown to the program and it will label it as such. There is no default level of measurement. You have to set it yourself.

  • Role: This feature is new to SPSS 20. Here you indicate the role a variable plays in your data analysis.
    For example in regression analysis a variable can be used as cause (input) or as effect (target).

The SPSS Help tells us about the new Role feature: "Some dialogs support predefined roles that can be used to pre-select variables for analysis. When you open one of these dialogs, variables that meet the role requirements will be automatically displayed in the destination list(s). By default, all variables are assigned the Input role.

Available roles are:
    Input. The variable will be used as an input (e.g., predictor, independent variable).
    Target. The variable will be used as an output or target (e.g., dependent variable).
    Both. The variable will be used as both input and output.
    None. The variable has no role assignment.
    Partition. The variable will be used to partition the data into separate samples for training, testing, and validation.
    Split. Included for round-trip compatibility with IBM® SPSS® Modeler. Variables with this role are not used as split-file variables."

TOP

An overview of your codebook

Once you have finished defining all variables you can check your work by creating an overview of the codebook in the SPSS output. In our example we use the file example_data. In Variable View is looks like this:

This gives us a useful first impression. For a complete overview proceed as follows:



and select the variables you want in your overview. In our case we only have two variables: v01 and v02.

The default settings for the tabs and are:


and

Using these standard settings our output is:

An alternative way to see how our data is defined is through File > Display Data File Information > Working File.
You can also use this to view the data structure of another SPSS file.

display data info command

TOP

Typical questions and their codes

There are sample data files available that correspond to the examples below. You can download them and open them in SPSS to experiment on your own.

We present examples of the following types of questions:

Nominal question example

Vraag 1

Coding:

Meerkeuzevraag

Because we want an option for all anwers and there might be people who don't specify a gender, we have code 9, labeled "Unknown" and we also specified code 9 to be a missing value. We will also use this option if someone tick both options, or if for some reason we can't read the answer that was chosen.

For this type of questions we start our analysis with a frequency table or a chart. See the part on Analysis for more details.

SPSS data file to practice with a nominal question example

Up to "Typical questions and their codes"

Scale question with an open answer

Vraag 2

coding:

In order to be able to process all answers that might occur we need room for 3 digits.
We don't need value labels for normal answers. With the variable label "Age in years" the answer "57" is perfectly clear. The value label "Unknown" for the outcome 999 is the only addition we need to make.
Remember to specify 999 as a missing value.

SPSS data file to practice with a scale question with an open answer

Up to "Typical questions and their codes"

Set of ordinal scale questions

Vraag 3

Coding:

Note: The main part of the question is not part of the variable labels, but will be used in the title of the tables and graphs made for this set of variables.

SPSS data file to practice with a set of ordinal scale questions

Up to "Typical questions and their codes"

Multiple response question, dichotomies

Vraag 4

Coding:



Do you understand why these variables have no missing values?

SPSS data file to practice with a multiple response question with dichotomies

Up to "Typical questions and their codes"

Multiple response question, categories

Vraag 5

Coding:

Since respondents are asked to tick (at most) two options, two variables suffice.

Note: Coding this question using multiple dichotomies is an alternative. See the explanation above.

SPSS data file to practice with a multiple response question with categories

Up to "Typical questions and their codes"

TOP Top of page.

Last modified 30-10-2012
Graph

©
Jos Seegers, 2009; English version by Gé Groenewegen.