Statistical hypotheses form the basis of statements and conclusions that we can make about sets of data. A hypothesis is a statement designed to be proven or disproven , such as "The sample means of two sets of data are statistically the same and the samples come from the same overall population."
Taking the example above comparing sample means, we would define the hypothesis H
H: "two sets of data (1 and 2), with sample means m1 and m2, are both part of the same population, so that their populations means are equal, μ1=μ2."
If we accept this hypothesis, we are saying that despite the fact that the samples came from two different measurements, they are part of the same overall population or that the measurement is being made on the same general system. If we reject the hypothesis, we are saying the population means are different, and that we are dealing with two separate systems.
We use statistical tests of signifigance to determine whether to accept the hypothesis or not, and we choose the test depending on if we are comparing two or more means, standard deviations, or variances. Two tests are covered in this turotial: the t-test and the F-test. Other tests, such as Z-tests, χ2-tests, and Analysis of Variance (ANOVA), are described in most statistics textbooks.
First, we will see how to construct a hypothesis. Referring to the above example of a comparison between means, assume we want to analyze some soil to determine its arsenic content and to see if it exceeds the allowable amount. We have run a series of n=7 tests various soil samples and find that the mean arsenic concentration is 4 ppm, with a standard deviation of s=0.9 ppm. If the allowable limit is 2 ppm arsenic, we wish construct a hypothesis to determine whether the soil is indeed contaminated, or if the difference between the sample mean and the allowable limit is to random error.
There are two possibilities:
- The true mean of the soil arsenic concentration μ is greater than the allowable limit: μ > 2 ppm = μ0
- The true mean of the soil arsenic concentration μ1 is the same or less than the allowable limit and any deviation is due to random error: μ1 ≤ 2 ppm
To set up the hypothesis, we make what is called the null hypothesis, which says there is no difference between the means. We also set up an alernate hypothesis, which is the hypothesis we adopt if the null hypothesis is disproved.
- Null Hypothesis H0: μ = μ0
- Alternate Hypothesis HA: μ > μ0
where μ0 = 2 ppm is the allowable limit.
If our statistical test shows that the the null hypothesis is true, we conclude that the means are equal and the arsenic concentration in the soil is not above the allowable limit. If the test shows that the null hypothesis is false, then we accept the alternate hypothesis, and can conclude that the arsenic concentration in the soil is indeed above the allowable limit.
The statistical test we would use in this case is the t-test, which we will explore on the next page.
It is important to note that the hypothesis we just established is a one-tailed test, since we were looking at the probability that the sample mean was either "greater than", or "less than or equal to" 2 ppm. It is also possible to have a two-tailed test, where we would try to establish "equal to" or "not equal to." This concept is covered in the pages on confidence levels and one- and two-tailed tests.