Statistics in Analytical Chemistry

The Regression Line:

In previous sections, we saw how to plot calibration data and view a regression equation. We also saw how to compute the Correlation Coefficient, R, and estimate the linear portion. In this section, we see how to compute the regression equation, and use it effectively.

Calculating the Regression Line:

Calculation of the regression line is straightforward. The best-fit straight line has the form y = bx + a, where b is the slope and a is the y-intercept of the line. The slope and intercept are given by:

Equations for the slope & intercept of the best-fit straight line

Remember that and represent the centroid (mean x & mean y of the calibration points – use only the values for the linear portion used when calculating R. The slope and intercept are easily calculated manually in Excel™ from the table of data used to generate the plot and calculate R – try this for yourself using the 9-point fluorescence data from the previous example. You should find that the equation of the line is y = 1.880x-1.749.

Regression Assumptions:

Technically, the best-fit straight line we have calculated is termed the "line of regression of y on x". This method of linear regression makes the key assumptions that:

All error occurs in the y-values only
All errors are normally distributed

Obviously, this is not the case for calibration data, since there is always uncertainty associated with the concentrations of the calibration standards used; good technique can minimise this, however, so that these uncertainties are small relative to the actual measurement errors.

Measurement errors are often distributed normally; however, this is clearly not the case for data that has been linearised, such as potentiometric calibration data.

Regression Residuals:

The figure below shows an example of a regression line with the calibration data, centroid (red circle) and y-residuals from the regression line displayed.

XY scatter plot showing the centroid, regression line, and y-residuals

To calculate the regression residuals, we determine the difference between the measured values (y_i) and the values predicted from the actual concentrations using the regression equation, (pronounced “y-hat”):

= bx_i + a

y_i,res =

The method of leastsquares (regression of y on x) effectively tries to minimise these residuals. As mentioned previously, the residuals provide a convenient means of checking whether the calibration data is actually linear. To do this, calculate the regression residuals for all the points used in the regression analysis, and look for any pattern in their magnitude and sign. One easy way to do this is to plot a bar chart of the residuals. The figure below shows such a plot for the fluorescence calibration data. Note how the curvature at high concentrations is clearly visible in the chart:

Final calibration curve showing linear portion bar chart of residuals from the regression line

Calibration curve and chart of the residuals from the regression line for all calibration points; regression line calculated for the first 9 points in the data set. Notice how the curvature of the graph at higher concentrations is very evident in this representation.

Continue to Regression Errors...

Stats Tutorial - Instrumental Analysis and Calibration

The Regression Line:

Tips & links:

Calculating the Regression Line:

Regression Assumptions:

Regression Residuals: