HerfstkleurenHelpdesk IBM SPSS Statistics 20

For students from Arnhem Business School
Home Codebook Data Data editing Analysis Graphs Settings Links Methods

Data Merge files

Using Merge Files you can combine two data files. There are two ways of doing this, namely:

  • Add Cases: Two data files have the same variables and you combine the cases from files 1 and 2.

  • Add Variables: Two data files are linked via a Key variable.
    You combine the variables values from files 1 and 2, using the key to determine the corresponding cases.

Add Cases

We show the process distinguishing three steps:


The preparation

When a project group uses face to face interviews - for example during a fair - or a written questionnaire the data entry has to be done manually. In such a case it is convenient if several people work on this parallel to each other.
Proceed as follows:

  • Make sure the codebook is complete and the corresponding SPSS data file (with all variables defined but still without cases) contains no mistakes.
  • Make as many copies of this master as are needed for the parallel data entry process.
  • Make sure that every copy of the SPSS data file has its own unique name. SPSS can handle duplicate file names, but it makes the merging process more complicated for you.
  • Make sure the paper questionnaires are numbered and assign each subgroup its own pile of questionnaires.
  • Every subgroup now enters the data into its copy of the master data file.
  • Make sure that each subgroups backs up their work.


The merging in SPSS

In this example we use three separate data files: merge_files_1.sav, merge_files_2.sav and merge_files_3.sav.

If you open merge_files_1.sav in SPSS, you will see that it contains 7 cases.

Now choose from the menu:
Data > Merge Files> Add Cases

The following dialog box opens:

Browse to merge_files_2.sav and click Continue.

Note: An alternative is to open merge_files_2.sav first; then the file will be listed under "An open dataset".
So you first activate the two data files that you want to merge before the actual merging. Use this alternative if it appeals to you.


The Result

The following screen shows us the result:

We can infer from it that:

  • all variables exist in both files;
  • all variables will be used in the new merged file;
  • all variables are unique, so the merging will work just fine.

Now click OK. Part of the data windows now looks as follows:

We see the data file now contains new cases next to the original 7 from merge_files_1.sav.

Since we used the identification variable respondent_number it is easy to check whether the merging process was successful and all the questionnaires involved are in the merged file.

It is now up to you to merge merge_files_3.sav to the active data file.
Save the end result under a proper name. We used merge_files_total.sav.


Add Variables

In our example we have a data file with basic information about our customers, containing demographics, characteristics, self-declared info and the like. This data is collected in the SPSS file demographics.sav.
For a number of our custumers we have additional attitudinal data. It is in the SPSS file attitudes.sav.

Below you can see the two data files. Every customer has a unique identification code (variable number) and both files contain additional information.

demographics data
We know how old each customer is (variable age).
opinion data
We asked some people "Do you like Brussels sproutes?".
So we know some of their preferences (variable opinion).

We want to combine this into one merged file. We start by opening the file demographics.sav and we choose from the menu Data > Merge Files > Add Variables.

menu add variables

add variables to

In the first dialog box we now have to specify where the variables that are to be added come from.
In this example they come from our file with attitudinal data. As you can see we opened is and we have selected it.

If you like, you can also add variables from an external data file.

Of course we want to add the extra info about the liking of Brussels sproutes correctly to each of the customers. For this we use a key variable (in our example it is called number).
This provides a unique identification for each customer. Through this variable we are able to add the opinion variable to our file with basic info.

It is essential that both files in the addition process are sorted in ascending order on the key variable, before the merging starts.

sorting warning

As the screendumps above show you we have taken care of this. So we may continue to the next step and specify the merging.

add variables from

As you can see we have indicated we want to "match cases on key variable in sorted files". And "both files provides cases".

The box on the top right shows us which variables will be in the merged file and also where they come from.

The variables indicated by (*) come from our active dataset.
The variables indicated by (+) come from the other file.

Everything is set. Click OK to see the result:

merged result

The data view after the merging shows what has happened.
Both files have provided cases. We have no demographics from customer number 8, but he or she was in the survey (and doesn't like Brussels sproutes).

We also see in the title bar (note the *) that our data file has changed since we last saved it.
We can either same the extended file under the same name or we can opt for "save as" and choose a new name for it.

to the top

Last modified 30-10-2012

Jos Seegers, 2009; English version by Gé Groenewegen.