Comparison of standard deviations - The F-test:

When comparing one sample against another, or a sample against what we would expect to see given a certain population distribution, we are interested in whether or not the spread or dispersion of the two sets of data are comparable. Techmically, we wish to compare the variance of the two (where the variance is the square of the standard deviation.) Doing so allows us to answer various questions such as:

  • Is the precision of one set of values better or worse than the other?
  • Is a set of replicate measurements representative of the population of expected results, or is it possible that it derives from some other population?

Such questions are relevant to evaluating whether: a given sample is representative; if a new method of analysis performs comparably to the existing one; and evaluating the technical proficency of individual analysts and laboratories employing starndard methods.

The F-test provides the means for performing such comparisons, which are a necessary precursor to employing t-test for the comparison of two sample means

Tips & links:

Section outline

Skip to Comparison of variance

Skip to Example 1

Skip to Example 2

Skip to F table

Comparison of variance:

Whether we are comparing two sample variances, or a sample and a population variance, we need to define our null and alternate hypotheses first.

Obviously, if the two variances were exactly equal, then the ratio of the variances would be one; conversely, if they were significantly different, the ratio would be greater or less than 1. For convenience, the Fisher F-test is defined so that the larger variance is always the numerator and the smaller the denominator.

That is, we test the null hypothesis

H0: σ12 = σ22

against the appropriate alternate hypothesis

H1: σ12 > σ22 or σ12 < σ22

We therefore calculate the Fisher F-value as:

where s12s22, so that F ≥ 1.

The degrees of freedom for the numerator and denominator are n1-1 and n2-1, respectively. Note that it is not necessary to have exactly the same number of replicate values in each set.

As with the t-test, we can either compare Fcalc to a tabulated value Ftab or calculate the probability that we would expect such a value given our two variances to see if we should accept or reject the null hypothesis. We can also perform 1- or 2-tailed F-tests. The following two examples illustrate the use of such tests.

Back to Section outline

Comparison of variance

Skip to Example 1

Skip to Example 2

Skip to F table

Example 1:

As an example, assume we want to see if a method (Method A) for determining the arsenic concentration in soil is significantly more precise than a second method (Method B). Each method was tested ten times, yielding the following values:

MethodMean (ppm)Standard Deviation (ppm)
A6.70.8
B8.21.2

A method is more precise if its standard deviation is lower than that of the other method. So we want to test the null hypothesis H0: σ22 = σ12, against the alternate hypothesis HA: σ22 > σ12.

Since s2 > s1, Fcalc = s22/s12 = 1.22/0.82 = 2.25. The tabulated value for d.o.f. ν2 = ν1 = 9 in each case, and a 1-tailed, 95% confidence level is F9,9 = 3.179. In this case, Fcalc < F9,9, so we accept the null hypothesis that the two standard deviations are equal, and we are 95% confident that any difference in the sample standard deviations is due to random error. We use a 1-tailed test in this case because the only information we are interested in is whether Method 1 is more precise than Method 2.

Example 2

If we are not interested in whether one method is better compared to another, but were simply trying to determine if the variances of were the same or different, we would need to use a 2-tailed test. For instance, assume we made two sets of measurements of ethanol concentration in a sample of vodka using the same instrument, but on two different days. On the first day, we found a standard deviation of s1 = 9 ppm and on the next day we found s2 = 2 ppm. Both datasets comprised 6 measurements. We want to know if we can combine the two datasets, or if there is a significant difference between the datasets, and that we should discard one of them.

As usual, we begin by defining the null hypothesis, H0: σ12 = σ22, and the alternate hypothesis, HA: σ12σ22. The "≠" sign indicates that this is a 2-tailed test, because we are interested in both cases: σ12 > σ22 and σ12 < σ22. For the F-test, you can perform a 2-tailed test by multiplying the confidence level P by 2, so from a table for a 1-tailed test at the P = 0.05 confidence level, we would perform a 2-tailed test at P = 0.10, or a 90% confidence level.

For this dataset, s2 > s1, Fcalc = s12/ s22 = 92/22 = 20.25. The tabulated value for ν = 5 at 90% confidence is F5,5 = 5.050. Since Fcalc > F5,5, we reject the null hypothesis, and can say with 90% certainty that there is a difference between the standard deviations of the two methods.

Tables for other confidence levels can be found in most statistics or analytical chemistry textbooks. Be careful when using these tables, to pay attention to whether the table is for a 1- or a 2-tailed test. In most cases, tables are given for 2-tailed tests, so you can divide by 2 for the 1-tailed test. For the F-test, always ensure that the larger standard deviation is in the numerator, so that F ≥ 1.