Testing, the philosophy

Testing Philosophy

In every hypothesis test we have to chose between the null hypothesis and the alternative hypothesis. But there is no level playing field between the two. One of them (H₀) is true until proven otherwise. The other one (H_A) is an interesting claim that we need to/hope to prove.
On this page we explain a little about the philosophy of testing.

The alternative hypothesis
The null hypothesis
Test design and test statistic
Distribution as predicted by the null nulhypothesis
Significance levels
Conclusion
Errors

The alternative hypothesis

The alternative is the interesting part of the story behind a test.

Situation 1: Testing a research hypothesis.
The research hypothesis should be formulated as the alternative one. H₀ is based on the established theory or the statement that the research treatment will have no effect. Our hope is to prove that we have found something that is new, better, different, faster, ...
Whenever the sample data contradicts H₀ we reject it. In this case, H_A is concluded to be true.

Situation 2: Testing the validity of a claim.
It could be about quality of a product or service. The guiding principle is: Everyone is innocence until proven guilty.
This implies that in general the producer’s quality statement is chosen as H₀.
If it says on the bottles he produces that the content is 1 liter, we start by believing it really is.
The challenge to this statement (poor quality or guilt) is chosen as the alternative one. Action against the statements of the producer will be taken when the sample data contradict H₀. When that occurs, the challenge implied by H_A is concluded to be true. Only in this way the accusation by H_A can be proven.
How eager you are to prove this alternative hypothesis depends on the situation. You want to convict criminals but honest work should remain ok. But again, the alternative hypothesis is the really interesting part of the story.

Situation 3: Testing in decision-making situations.
Often a decision maker must choose between two courses of action, one associated with the null hypothesis and one associated with the alternative hypothesis. It comes with his job to choose H_O and H_A wisely.
This is a tricky one, because the null hypothesis gets the benefit of the doubt. Hypothesis testing is an unbalanced situation. As a manager you have to make a decision here. The null hypothesis that you choose is the preferred action until proven otherwise.

The null hypothesis

In statistics, the null hypothesis proposes an established model for the world. Then we look at the data. If the data is consistent with that model, we have no reason to disbelieve H₀.
In a lot of cases H₀ states that all groups are the same, that the new idea is no improvement, that we should carry on as usual.
The null hypothesis H₀ is our point of reference. This description of the population is considered to be the truth, until proven otherwise.

We know what to expect according to H₀. This is input into the calculations that are needed to find a significance for the sample data we have collected.

Test design and test statistic

Now we have to set up our research. We specify what and how we will measure. These measurements will be summed up in a single number, the test statistic T. The actual formula that describes T differs from test to test.
If you want to learn about that formula for a specific test, you can use the SPSS help. It has a special section on the algorithms it uses for all its procedures. We cite from the Introduction to Algorithms:

Throughout much of the documentation, we avoid detailed discussion of the inner workings of procedures in order to promote readability. The algorithms documents are designed as a resource for those interested in the specific calculations performed by procedures. The algorithms are available in two forms:
• Integrated into the overall Help system. In the Contents tab, there is a section labeled "Algorithms" which contains all the algorithms in alphabetic order. In the Index tab, each procedure's main index entry has a second-level "algorithms" index entry.
• As a separate document in PDF format, available on the Manuals CD and online at the SPSS help site.

Distribution as predicted by the null nulhypothesis

Given the setup and the specifications of H₀ the statistical theory tells us the exact distribution for T or a useful approximation (for example T follows approximately a normal distribution).

We have one sample that results in a single value for the test statistic T. We would like to know if this result is "Something" or "Nothing At All".
We answer this question by positioning the outcome of T in the sampling distribution that is predicted by H₀.
The key question now becomes: Does is fit in nicely?

Yes, then H₀ stands; No, then we have proof for H_A.

“Fitting in nicely” is translated into a probability about likelihood of occurrence given the null hypothesis. We call this the significance of the sample data. Using this probability we conclude whether we have found something of interest (proof for H_A) or we have to concede that there is nothing at all going on (we stick to H₀).
Please get it firmly into your head that a significance is a conditional probability. It is calculated as:

sample significance = P( T = sample result or more extreme | the distribution for T as predicted by H₀ ).

A large significance is consistent with the null hypothesis, while a small significance is (very) unlikely given H₀, hence casts serious doubt on the correctness of H₀.

In many cases the alternative hypothesis states that there is some difference between groups or between a population (parameter) and a given number or distribution. If we stick to the null hypothesis we have found insufficient evidence against it.
It is important to realize that this is a negative conclusion.

"No evidence of a difference" is definitely not the same as "evidence of no difference".

If in a court case an accused person is acquitted due to lack of evidence, because there are doubts regarding the evidence, that does by no means implicate we have proven that this person is innocent. He or she might be, but that was not the issue. Not enough evidence against H₀ is something else than proving the truth of H₀. In a hypothesis test you can never hope to prove H₀. You can only look for a proof of H_A and against H₀.

Significance levels

The explanation above tells us that the choice between "stick to H₀" and "reject H₀, choose H_A" is based on a probability, the significance of the test result. The smaller it gets, the more convincing our evidence is. But when is it small enough? Here are some guidelines:

Smaller than 0.10: some suspicion, but nowhere near conclusive evidence for H_A
Smaller than 0.05: considerable evidence for H_A
Smaller than 0.01: very strong evidence for H_A
Smaller than 0.001: practically conclusive evidence for H_A

In marketing research the default setting is that when the significance drops below 0.05 we will reject the null hypothesis.

Please note that significances vary on a continuous scale and that the value of 0.05 is not written in stone. It was settled on when significances were hard to compute and so some specific values needed to be provided in tables. Nowadays calculating exact significances is easy (thank you SPSS) and so an investigator can report "sign. = 0.06" and leave it to the reader to decide how significant it is.

Referring to outcomes where sign. < 0.05 as significant and where sign. > 0.05 as nonsignificant is problematic when the significance is close to 0.05. We are not dealing with an all-or-nothing situation, where 0.049 means everything and 0.051 means nothing. Ask yourself whether the effect is interesting enough for further research.

Conclusion

How do we proceed when H₀ is rejected? A statistically significant result is not automatically a scientifically significant result.
The importance is to a large extend determined by the magnitude of the effect and by the width of the confidence interval if we are testing a population parameter.
Ask yourself the following questions:

Are the results meaningful?
For example do they indicate market segments that can be targeted individually?
Are the results stable?
If our H_A shows only short-term or transitory differences it is not good enough to act on it.
Are the results actionable?
This means we can focus various marketing strategies and tactics on the different groups we found.

Errors

"Statistics is the only profession that demands the right to make mistakes five percent of the time."

We base our conclusions on sample data. No matter how good the test design was and how well it was executed, a sample can never give you absolute centainty about the properties of the underlying population. The possibility of errors is inherent to any test. There are two types of errors we can make:

		State of Nature (the truth about the population)
		Null Hypothesis True	Null Hypothesis False
Decision	Stick to H₀	Correct Decision	Type II Error
	Reject H₀	Type I Error	Correct Decision

Note that we know the probability of a Type I error. It is equal to the significance. We can control this and impose a maximum (like 0.05) before we start the research.

The probability whether a test will correctly decide that the null hypothesis is false is far harder to handle. It is called the power of the test. This topic is beyond this website. If you are interested start with searching on "power of a test". It gave over 75,000 hits on Google when I wrote this page.

Helpdesk IBM SPSS Statistics 20 METHODS
Introduction	Sample size	Table design	Graph design	Syntax	Testing	Links	SPSS Statistics 20