# Stats Tutorial - Errors in the Regression Equation:

Introduction

Basics of Excel™

Basic Statistics

Linear Regression:

Data Evaluation & Comparison

In any area of measurement science, there is always some error in any signal. The error can arise from many sources, and can normally be accounted for using statistical techniques. However, because measurment is inherently random, it contributes some degree of uncertainty into the measurement, which corresponds to a certain confidence limit, within which we can be fairly certain about the accuracy of our measurement.

This leads to the way in which results are normally displayed, where a measurement is reported with the estimated error, such as C = 51.2 ±0.5 μg/ml. The ±0.5 is the error, normally 2.58 standard deviations. The 2.58 will be explained below.

When preparing a calibration curve, there is always some degree of uncertainty in the calibration equation, in the slope and the y-intercept. To calculate the standard errors of the slope and the intercept, we require the residuals. The residual is the difference between each measured y-value and that calculated from the calibration curve, for a given observation. The calculated y-value is determined from the calibration equation and denoted , so the residual would be .

Once the residuals are known, we can calculate the standard deviation of y, which is a measure of random error of y-values.

The use of n-2 in the denominator are the degrees of freedom. Normally, there is a degree of freedom for each data point. However, we are making two assumptions in this equation: a) that the sample population is representative of the entire population, and b) that the , are representative of the true y-values. For each assumption we make, we must remove a degree of freedom, and our estimated standard deviation becomes larger.

This sx/y standard deviation can be used to calculate the standard deviations of the slope and the y-intercept using the formulas

where sb is the standard deviation of the slope and sa is the standard deviation of the y-intercept.

Confidence intervals for the slope and intercept are calculated from the t-statistic for nÂ-2 degrees of freedom. Tables of t-statistics are available in any statistics textbook, and are also included in the lab manual. Note that some tables give values of t for different values of n, while others give them for values of ν = nÂ-1. Check carefully so that you use the appropriate value.

The confidence limit for the slope is b±tn-2sb and for the y-intercept a±tn-2sa. For a large number of samples with a 99% confidence interval, we can use tn-2=2.58. For the fluorescence data, the standard deviation of the slope is sb = 0.0350, so the slope with confidence interval b=1.88±(2.58×0.0350)=1.88±0.09. The y-intercept with confidence interval is a=1.8±0.9.

Now that we've seen how to calculate the regression equation and its associated error, we can use it as a calibration curve to determine an unknown concentration from a measured signal, and how to report it correctly, with the associated uncertainty.

© 2006 Dr. David C. Stone & Jon Ellis, Chemistry, University of Toronto
Last updated: September 26th, 2006