# Stats Tutorial - The Regression Line:

Introduction

Basics of Excel™

Basic Statistics

Linear Regression:

Data Evaluation & Comparison

In previous sections, we saw how to view a regression equation and how to calculate the regression equation. We also saw how to compute the Correlation Coefficient R, and estimate the linear portion of a set of data. In this section, we see how to compute the regression equation, and how we can use it effectively.

Calculation of the regression line is straightforward. The best-fit line has the form y = bx + a, where b is the slope of the line and a is the y-intercept. The slope is given by the formula

and the intercept is

both of which can be easily calculated in Excel with the table of data used in the previous section. The method is similar to that in the previous section. The `AVERAGE` function can be used to calculate and . Using the 9-point fluorescence data, the equation of the line is y = 1.880x-1.749.

Figure 2 shows an example of a regression line with the calibration data, centroid and y-residuals displayed. Note that, as is commonly the case, it is assumed that any error in the data lies solely in the y-values. Technically, the best-fit straight line shown is termed the "line of regression of y on x". This method for linear regression assumes that the errors are normally distributed. Other methods exist that do not make this type of assumption.

Figure 2 - XY scatter plot showing the centroid (red circle), regression line, and y-residuals.

Finally, it should be noted that errors in y values for large x values tend to distort or skew the best-fit line. This can be taken into account using either a weighted or robust regression technique. However, this is beyond the scope of the present tutorial.

Now, proceed to the next page for a discussion of errors in the regression equation.

© 2006 Dr. David C. Stone & Jon Ellis, Chemistry, University of Toronto
Last updated: September 26th, 2006