## Introduction to Statistical Hypotheses:

Whenever we want to apply some statistical test to evaluate experimental data, we need to frame our question in an statistical appropriate form. In other words, we need to state a hypothesis from which conclusions can be drawn. Some common questions have already been outlined; in this section, we will see how to formulate these into hypotheses that can then be subjected to statistical evaluation.

Suppose, for example, that we have two sets of replicate data obtained for the same sample. This could be as a result of an analyst repeating the determination on different occasions, or having two different analysts perform the same determination on the same sample. We might want to know several things about the two sets of data:

• Did the two sets of measurements yield the same result?
• Is one set of measurements more or less precise than the other?
• Is one set of results more or less accurate than the other?

Remember that any set of measurements represents a sample from the population of all possible results; there will always be some inherent variation in the mean and standard deviation for each set of replicate measurements. What we therefore need to establish is whether or not our two sets of measurements are drawn from the same, or different populations. In statistical terms, we might therefore propose a hypothesis statement (H) that:

H: “two sets of data (1 and 2) with sample means m1 and m2, are both part of the same population such that their population means μ1 and μ2 are equal (μ1 = μ2)”

### The Null Hypothesis:

An important part of performing any statistical test, such as the t-test, F-test, Grubb’s test, Dixon’s Q test, Z-tests, χ2-tests, and Analysis of Variance (ANOVA), is the concept of the Null Hypothesis, H0. This is the hypothesis that value of the test parameter derived from the data is purely the result of the random sampling error in taking the sample measurements from the population of all possible values; the exact interpretation depends to some extent on the type of test being performed, but essentially if the null hypothesis is true then there is no significant difference betweeb the sample and poulation values.

As an illustration, consider the analysis of a soil sample for arsenic content. In such a situation, we might want to know whether the experimental value exceeds the maximum allowable concentration (MAC). Suppose a set of 7 replicate measurements on a soil sample returned a mean concentration of 4.0 ppm with standard deviation s = 0.9 ppm, and that the MAC was 2.0 ppm. Our null hypothesis would then be that the mean arsenic concentration is less than or equal to the MAC within experimental error:

H0: μ ≤ 2.0 ppm

We can also formulate the alternate hypothesis, HA, that the mean arsenic concentration is greater than the MAC:

HA: μ > 2.0 ppm

Note that we implicitly acknowledge that we are primarily concerned with is the population mean soil arsenic concentration: we would not want to draw a false conclusion about the arsenic content of the soil simply because our sample had somewhat less arsenic than average in it!