Codebook
On this page we explain the basic ideas behind the definition and description
of variables in SPSS. It will not be exhaustive.
Also check the section Data for additional information.
The most common situations will be explained by means of often occuring instances of
survey questions. They also serve as guidelines for chosing a proper table or
graph. These example are also present in the Analysis
part of our site.
A codebook provides the translation from the set of possible answers given by respondents or obtained by measurements/observation into the SPSS data matrix.
The codebook ensures a correct and complete definition of all variables in the
data matrix and is a foundation for data analysis.
-
Give every completed questionnaire or set of
answers a number or other unique identification code.
Make a
variable in SPSS to contain that number or code.
-
Use numeric variables whenever possible.
-
Use the variable names as links to the numbers
of the questions in the questionnaire (like V1, V2, ... or Q1a,
Q1b, Q2, ...).
Use the variable labels
to describe the meaning of the variables.
-
Use concise but complete variable labels and value labels.
-
Define missing values.
-
Be consistent in the (number) codes that you
use for missing values.
For example use as missing values
consistently 9 or 99 or 999. Make sure that the missing value
code can never be a valid answer code. Hence don't use
99 as missing value for the variable age.
You can make a choice for each of of the elements in the variable view, or
you can settle for the default choices made by SPSS.
And remember the basic
guidelines.
-
Name: SPSS
refers to variables by their name. Keep these short and
preferably use the number of the question in your questionnaire.
-
Type: Here you
specify the type of variable to use. The default choice and also
our preferred choice is "numeric".
The second most used option is "string". You specify its width
by chosing the number of characters to use.
-
Width: Here you specify the size of the variable. For example the number 12345.67 has a width
of 8, namely 5 digits, then the decimal separator and finally 2 decimals. This gives a total of
5 + 1 + 2 = 8 positions.
The width of a string variable equals the maximum number of
characters it may contain. Of course strings don't have
decimals.
-
Label: Here you clearly describe the meaning of
the variable. Try to be concise. SPSS uses this label to refer
to the variable in all its output.
-
Values: This is the
place to describe the answer codes that you use.
-
Missing: Here you
specify which value(s) are used to indicate non response (like
"don't know", "refuse to answer" or "not applicable"). These
values will be treated by SPSS differently from the valid answer
codes. For example they won't be used in the calculation of an
average.
-
Columns: Here
you specify the colomn width of the variable in the SPSS Data
View Window. This doesn't have to equal the variable width.
For
example you can use a string variable of width 100. To limit its
amount of space on the screen you may choose Columns =10.
-
Align: This
specifies the alignment in the Data View Window. By default string variables
are aligned left and numeric variables are aligned right.
-
Measure: Here you
specify the level of measurement of the variable. Some
statistical techniques require a certain level of measurement
for the variables involved. SPSS "knows" about these
requirements and in some procedures is programmed to act
accordingly.
To refresh your memory a short overview of the options:
-
Nominal
The values (the answer options) are a list of
categories (of course complete and without
overlaps). For example the variable Gender is
coded as 1 = male, 2 =
female and 9 = unknown (the missing value). Note
that an average gender has no meaning.
-
Ordinal
A variable is ordinal if the list of answer
options has a logical ordering. Ordering t-shrt
sizes alphabetically (L, M, S, XL, XS, XXL)
makes no sense. We all know that the logical
ordering should be XS, S, M, L, XL, XXL.
Another well-known example are answer options as
used in Likert scales: 1 = totally
disagree, 2 = disagree,
... , 5 = totally agree. The higher
the value the stronger the agreement. If you
like, you can reverse the order, but scrambling
it makes no sense.
-
Scale
A variable were the answers are actual numbers
and were arithmetic makes sense. Examples are
age, price, income, time until completion and
the like.
-
Unknown
This is new to SPSS 20. If you create a new
variable, its level of measurement is unknown to
the program and it will label it as such. There
is no default level of measurement. You have to
set it yourself.
-
Role: This feature
is new to SPSS 20. Here you indicate the role a variable plays
in your data analysis.
For example in regression analysis a variable can be used as
cause (input) or as effect (target).
The SPSS Help tells us about the new
Role feature: "Some dialogs support predefined roles that can
be used to pre-select variables for analysis. When you open
one of these dialogs, variables that meet the role
requirements will be automatically displayed in the
destination list(s). By default, all variables are assigned
the Input role.
Available roles are:
Input. The variable will be used as an input (e.g.,
predictor, independent variable).
Target. The variable will be used as an output or target
(e.g., dependent variable).
Both. The variable will be used as both input and output.
None. The variable has no role assignment.
Partition. The variable will be used to partition the data
into separate samples for training, testing, and validation.
Split. Included for round-trip compatibility with IBM® SPSS®
Modeler. Variables with this role are not used as split-file
variables."
Once you have finished defining all variables you can check your work by
creating an overview of the codebook in the SPSS output. In our example we use the
file example_data. In Variable View
is looks like this:
This gives us a useful first impression. For a complete overview proceed as
follows:
and select the variables you want in your
overview. In our case we only have two
variables: v01 and v02. |
|
The default settings for the tabs
and
are:
and
Using these standard settings our output is:
An alternative way to see how our data is defined is through
File > Display Data File Information > Working File.
You can also use this to view the data structure of another
SPSS file.
There are sample data files available that correspond to the examples below.
You can download them and open them in SPSS to experiment on your own.
We present examples of the following types of questions:
Coding:
Because we want an option for all anwers and there might be
people who don't specify a gender, we have code 9, labeled "Unknown"
and we also specified code 9 to be a missing value.
We will also use this option if someone tick both options, or if for some reason
we can't read the answer that was chosen.
For this type of questions we start our analysis with a frequency table or a
chart. See the part on Analysis for more details.
SPSS data file to practice with a
nominal question example
Up to "Typical questions and their codes"
coding:
In order to be able to process all answers that might occur we need room for 3
digits.
We don't need value labels for normal answers. With the variable label "Age in
years" the answer "57" is perfectly clear. The value label "Unknown" for the
outcome 999 is the only addition we need to make.
Remember to specify 999 as a missing value.
SPSS data file to practice with a
scale question with an open answer
Up to "Typical questions and their codes"
Coding:
Note: The main part of the question is not part of the
variable labels, but will be used in the title of the tables and graphs made for
this set of variables.
SPSS data file to practice with a set of ordinal scale questions
Up to "Typical questions and their codes"
Coding:
|
Do you understand why these variables have no
missing values? |
SPSS data file to practice with a multiple response question with dichotomies
Up to "Typical questions and their codes"
Coding:
Since respondents are asked to tick (at most) two options, two variables
suffice.
Note: Coding this question using multiple dichotomies is an alternative. See the
explanation above.
SPSS data file to practice with a multiple response question with categories
Up to "Typical questions and their codes"
Top of page.
|