# Stats Tutorial - Mean, Variance and Standard Deviation:

Introduction

Basics of Excel™

Basic Statistics

Linear Regression

Data Evaluation & Comparison

The three main measures in quantitative statistics are the mean, variance and standard deviation. These measures form the basis of any statistical anaylsis.

## The Mean

The mean, or μ, is the average of a set of measurements. It can be viewed as the expected outcome E(x) of an event x, such that if the measurement is performed multiple times, the average value would be the most common outcome. However, this definition is more theoretical and beyond the scope of this tutorial.

### Population versus Sample Mean and Standard Deviation

It is important to differentiate between the population mean μ and the sample mean, . The population mean is the expected outcome, such that if an infinite number of measurements are made, the average of the infinite measurements is the result. This represents the true value of a measurement. For instance, if a food manufacturer claims that there are 100 mg of Sodium in its canned soup product, this would them be the expected outcome for any measurement made of that compound, and is taken as the "true" value. A literature or accepted value is usually a population mean, since a very large number of measurements have been made to come up with this value. The population mean is usually denoted μ, and is the expected value E(x) for a measurement.

The sample mean is the average value of a sample, which is a finite series of measurements, and is an estimate of the population mean. The sample mean, denoted , is calculated with the formula

For example, if we use atomic absorbance spectroscopy to measure sodium content of five different cans of soup, and obtain the results 108.6, 104.2, 96.1, 99.6, and 102.2 mg, the sample mean is (108.6 + 104.2 + 96.1 + 99.6 + 102.2)/5 = 102.1 mg sodium.

You learned how to calculate the sample mean with the `AVERAGE` function using Excel, in the previous section.

## Variance and Standard Deviation

The variance and standard deviation are related indicators of the spread of data within a population or sample. The same distinction exists between the population and sample variance. The population variance and standard devation, denoted σ2 and σ, are the deviation among individual measurements from the poluation mean, for the entire population. For the sample variance and standard deviation, s2 and s, it is how much how much each inidividual measurement deviates from the sample mean. As is the case with the mean, the population variance and standard deviation are the expected, or true, deviations. The population variance is calculated using the population mean

To calculate the sample variance, we first find the errors of all the measurements, that is, the difference between each measurement and the sample mean, . We then square each value and add them all together, then divide by the number of samples, minus 1.

The sample standard deviation is simply the square-root of the sample variance, . From the previous example, the standard deviation of the sodium content is 4.7 mg.The sample variance uses the degrees of freedom n-1, because we lose a degree of freedom because we are estimating the population mean with the sample mean. This leaves us with one less independent measurement, so we must subtract one. Degrees of freedom in statistics will be discussed later on.

© 2006 Dr. David C. Stone & Jon Ellis, Chemistry, University of Toronto
Last updated: September 25th, 2006