We just saw how a series of measurements can follow a normal distribution, such that most of the measurements will be around the mean value, with fewer and fewer measurements occurring the farther away they are from the mean.

This can tell us a lot about a dataset. If a certain value is far away from the mean, the chance of randomly getting that value is small. If take the same measurement numerous times, and keep getting that value, then it is likely that we are measuring a system with a different mean, than what we had previously thought. So if we get these values that are far away from the mean, we can say that there is a small probability that it is part of the same system, and a larger probability that is part of a different system, with a different mean.

Consider the following data set:

2.1 | 2.3 | 2.6 | 2.1 | 1.9 | 2.2 | 1.8 | 3.8 |

All the values are centered fairly evenly around 2.3, except for 3.8. The question is, what is the chance that the large value of 3.8 occurred by random chance, and is a valid measurement of the system we're interested in, or is it a different system entirely. We can look at the probability distribution for this set of data, which has a mean of 2.35 and a standard deviation of 0.635. The probability distribution is shown in the following figure.

This figure shows both the probability distribution for a system with mean 2.35, and the dataset discussed above, with each data point marked at its point on the curve. The probability of each value occurring refers to the relative frequency of each value, if a large number of measurements were taken. As you can see, all the values have a fairly high probability of occurring, except 3.8, which in fact would only occur with a probability of less than 0.05, or 5% of the time. This means that if you made many measurements of a system with mean 2.35 and standard deviation 0.635, there is only a 5% chance that you would get a result near 3.8. Getting one or two results in the neighbourhood of 3.8 is not unlikely, since 1 result in 20 (5%) should be that far away from the mean.

However, if you were to see a large number of result around 3.8, it would be highly unlikely that the value of 3.8 is due solely to random error. More likely, the system changed while you where working, and the multiple values around 3.8 are part of a different system. So you can be 95% confident that the value of 3.8 is from a different system. We call this a 95% confidence level.

The graph below shows the amount of values that fall within 95% of the mean.

The shaded region shows the values that fall within 95% of the mean, which we can also say is P = 0.95. This means that 95% of the time, or 19 in 20, the value will be within this region. The unshaded regions on either side are the P = 0.025 + 0.025 = 0.05. Only 5% of the time, or 1 time in 20, will the value be within this unshaded region. Throughout this tutorial, we will denote the probability with P, so that a P(0.05) or a P = 0.05 will indicate the 95% confidence level. It is also sometimes written as α, so you might also see α = 0.05.

Most statistical tests discussed in this tutorial (*t*-test, *F*-test,
*Q*-test) follow a normal curve. If the statistical test shows that a result falls outside the 95%
region, then you can be 95% certain that the result was not due to random chance, and is a *significant* result.
You can learn about all these tests in the section on Evaluation of Data. Or, you can proceed to the next
page to learn about degrees of freedom in statistical calculations.