Statistics in Analytical Chemistry

Errors in the Regression Equation:

There is always some error associated with the measurement of any signal. Earlier, we saw how this affected replicate measurements, and could be treated statistically in terms of the mean and standard deviation.

The same phenomenon applies to each measurement taken in the course of constructing a calibration curve, causing a variation in the slope and intercept of the calculated regression line. This can be reduced - though never completely eliminated - by making replicate measurements for each standard.

Multiple calibrations with single values compared to the mean of all three trials. Note how all the regression lines pass close to the centroid of the data.

Even with this precaution, we still need some way of estimating the likely error (or uncertainty) in the slope and intercept, and the corresponding uncertainty associated with any concentrations determined using the regression line as a calibration function.

The Uncertainty of the Regression:

We saw earlier that the spread of the actual calibration points either side of the line of regression of y on x (which we are using as our calibration function) can be expressed in terms of the regression residuals, (y_i − ): The greater these resdiuals, the greater the uncertainty in where the true regression line actually lies. The uncertainty in the regression is therefore calculated in terms of these residuals. Technically, this is the standard error of the regression, s_y/x:

Note that there are (n − 2) degrees of freedom in calculating s_y/x. This is because we are making two assumptions in this equation: a) that the sample population is representative of the entire population, and b) that the values are representative of the true y-values. For each assumption, we remove one degree of freedom, and our estimated standard deviation becomes larger.

Another way of understanding the degrees of freedom is to note that we are estimating two parameters from the regression – the slope and the intercept. Therefore, ν = n − 2 and we need at least three points to perform the regression analysis.

The Uncertainty of the Slope:

The slope of the regression line is obviously important, as it determines the sensitivity of the calibration function; that is, the rate at which the signal changes with concentration. The higher (steeper) the slope, the easier it is to distinguish between concentrations which are close to one another. (Technically, the greater the resolution in concentration terms.) The uncertainty in the slope is expressed as the standard error (or deviation) of the slope, s_b, and is calculated in terms of the standard error of the regression as:

sb
equals sy/x over the root of the squared x residuals

The corresponding confidence interval for the slope is calculated using the t-statistic for (n − 2) degress of freedom as:

b ± t_n−2s_b

Remember: here n is the number of calibration points used in the regression calculation.

The Uncertainty of the Intercept:

The intercept of the regression line has implications for both the smallest detectable signal (measured response) and the corresponding lowest detectable concentration. The uncertainty in the intercept is also calculated in terms of the standard error of the regression as the standard error (or deviation) of the intercept, s_a:

The corresponding confidence interval for the intercept is calculated in the same way as that for the slope, namely:

a ± t_n−2s_a

Suggested Exercise:

Try calculating: the standard error of the regression; the slope, standard error of the slope, and its 95% confidence interval; and the intercept, standard error of the intercept, and its 95% confidence interval for the fluorescence calibration data. Examine the effect of including more of the curved region on the standard error of the regression, as well as the estimates of the slope, and intercept.

Using Excel’s Functions:

So far, we have been performing regression analysis using only the simple built-in functions or the chart trendline options. However, Excel provides a built-in function called LINEST, while the Analysis Toolpak provided with some versions includes a Regression tool. These can be used to simplify regression calculations, although they each have their own disadvantages, too.

(a) LINEST: You can access LINEST either through the Insert→Function... menu item, or by typing the function directly as a formula within a cell. The function takes up to four arguments: the array of y values, the array of x values, a value of TRUE if the intercept is to be calculated explicitly, and a value of TRUE if additional statistics are to be determined:

Once you have completed the formula and pressed Enter or return, you will see a single value in the cell, which is the slope of the regression line. To see the rest of the information, you need to tell Excel to expand the results from LINEST over a range of cells. To do this, first click and drag from the cell containing your formula so that you end up with a selection consisting of all the cells in 5 rows and 2 columns:

Now press F2, followed by CTRL+SHIFT+ENTER (Mac OS: control+u then command+return); this will expand the results into a table of values:

The values obtained in this way are as follows:

slope b	intercept a
standard error of the slope s_b	standard error of the intercept s_a
correlation coefficient R²	standard error of the regression S_y/x
Fisher’s F	degrees of freedom ν
sum of the squares of the regression	sum of the squares of the residuals

Note that the sum of the last two values (bottom row) is equal to the term from the equation for R, while the sum of the squares of the residuals is used in calculating S_y/x

(b) Regression: Excel 2003 and Excel:Mac 2004 included various additional utilities that could be added through the Tools menu. If you don’t see a Data Analysis... item at the bottom of the Tools menu, select the Add-Ins... item instead. Check the Analysis TookPak item in the dialog box, then click OK to add this to your installed application.

Once the Data Analysis... item is installed, selecting it will call up a dialog containing numerous options: select Regression, fill in the fields in the resulting dialog, and the tool will insert the same regression statistics into your work sheet.

Continue to Using the Calibration...

Stats Tutorial - Instrumental Analysis and Calibration