HerfstkleurenHelpdesk IBM SPSS Statistics 20

Introduction Sample size Table design Graph design Syntax Testing Links SPSS Statistics 20

Testing A basic example

On this page you find an example in which step by step the basic procedure for testing of hypotheses is explained.
Elsewhere we explain a little more about the philosophy of testing and we show you some further examples of tests where the calculations are done using SPSS.

An example

Somewhere in the world there is a national railway company that states it complies with government targets. The target is that at least 90% of all trains arrive in time. A consumer organization of rail travellers disputes this. They claim the target is not met and less than 90% of all trains arrive in time.
The consumer organization sets up a large-scale research project. During a whole year they will track the arrival times of a 1000 scheduled trains. These 1000 trains are randomly selected from all the scheduled trains. Their arrival (on time or not) is recorded. This should settle the dispute.

Suppose that by the end of the research we would have found that 89.9% of the trains in the research were on time. It is below 90% but would we have a case against the railway company? Not really. We are dealing with a sample from a larger population. As we all know there are random fluctuations that most likely cause the sample result to deviate from the true population value. With this result of 89.9% the consumer organization has insufficient evidence to back up their claim.

The situation would be completely different if the research showed only 10% of the trains to be on time. Then everyone would consider this as clear evidence that the railway company totally fails to meet its punctuality targets.

Let us assume that in this example the company found 880 trains out of 1000 to be on time. Is this the proof we are looking for our claim or does the railway company go free due to lack of evidence? Is the sample result something or nothing at all?


Specifying the problem

Testing of hypotheses follows a fixed pattern of steps from specifying the problem to the final conclusion. It always starts with a description of the property of the population that is disputed.
This property can be a parameter (like the mean, the variance or a proportion), the distribution of a random variable or something else.

In our example the discussion is about p, the population proportion of all scheduled trains that arrive in time. 



The alternative hypothesis HA specifies the claim we seek to prove. The null hypothesis H0 is the initial belief, stating the status quo, or in this case stating that the company is doing a decent job.
This is conform the starting point in any court case, namely that everyone is innocent until proven guilty.
The consumer organization is making claims against the railway company. Hence they have to come up with evidence to proof the substandard performance they claim to exist.

H0: p 0,9
HA: p < 0,9


Test design and test statistic

Now we have to set up our research. We specify what and how we will measure. These measurements will be summed up in a single number, the test statistic T.

In this example we choose T = The number of trains that arrive in time from our random sample of 1000.


Distribution as predicted by the null nulhypothesis

Given the setup of randomly selected trains and the specifications of H0 we know that T follows a binomial distribution with as parameters n=1000 and p=0.90 (we use the boundary of the range specified by the null hypothesis).

This binomial distribution is well-known; it looks like this:

binomial n=1000 p=0.9

As we can see the probability that the railway company meets its targets but that we nonetheless find fewer than 900 trains arriving on time is 0.4734. The distribution shows the expected average but also the variability that we may expect due to sampling.

It also shows that only 861 trains arriving on time (the lower end of the scale in the picture) would be a very unlikely outcome in combination with H0. So the question is: When do we conclude that we have found something, that we have proof against the railway company, and when do we have to concede that there is nothing at all going on, that there is insufficient evidence against the railway company ?


Significance of the sampling result

As stated in the introduction we found as value of our test statistic: T = 880.

First of all we remark that the test result is consistent with HA. The sample results shows a punctuality that is indeed below 90%.
But could it be attributed to randomness, or is that too unlikely?

To answer this we calculate the significance that goes with the sample result. That means we calculate the probability of T = 880 or even more extreme, given the distribution based on the null hypothesis. So we need P( T ≤ 880 | Bin(n=1000 and p=0.90) ).
Using a computer program or a graphical calculator we find:

P( T ≤ 880 ) = 0.022.



Small probabilities cast doubt on our null hypothesis. Its predictions combined with the survey data lead to an very unlikely situation. This is the evidence we were looking for.
The result we found (T = 880) is too unlikely to keep on trusting the railway company. We reject the null hypothesis and choose for the alternative one.

In plain English this means: Our research shows that the punctuality of the railway company is not good enough. It is below the target value of 90% (with sign. = 0.022).


Significance levels

The example above shows us that the choice between "stick to H0" and "reject H0, choose HA" is based on a probability, the significance of the test result. The smaller it gets, the more convincing our evidence is. But when is it small enough? Here are some guidelines:

  • Smaller than 0.10: some suspicion, but nowhere near conclusive evidence for HA
  • Smaller than 0.05: considerable evidence for HA
  • Smaller than 0.01: very strong evidence for HA
  • Smaller than 0.001: practically conclusive evidence for HA

In marketing research the default setting is that when the significance drops below 0.05 we will reject the null hypothesis.
in our example the significance was 0.022, which is smaller than 0.05. Therefore we rejected H0.


Last modified 30-10-2012

© Jos Seegers, 2009; English version by Gé Groenewegen.