In the first part of this tutorial, we saw how to use the trendline feature in
Excel to fit a straight line through calibration data and obtain both
the equation of the best-fit straight line and the correlation
coefficient, *R* (sometimes written as *R*^{2}).

There are in fact various correlation coefficients, but the one we are
interested in here is the Pearson or product-moment correlation
coefficient. The Pearson *R* value provides a measure of the degree to
which the values of *x* and *y* are linearly correlated. We can
assess this visually using a scatter plot (Figure 1), in which we also mark the
centroid of the data, .

Figure 1 - XY scatter plot showing the centroid of the data

If *x* and *y* were linearly correlated, we would expect all the points to fall
on a straight line passing through the centroid. As a result, we would expect
all *x* values to be uniformly distributed either side of ; similarly, all the
*y* values should be uniformly distributed about . The Pearson R is calculated
using the formula

If *x* and *y* are perfectly correlated in a linear fashion, we
would expect the value of *R* to be either +1 or -1, depending on whether
*y* increases (positive slope) or decreases (negative slope) with *x*.

To demonstrate how to calculate this formula in Excel, we return to our previous example of fluorescence intensity data. Then,

- Set up a spreadsheet with the
*x*and_{i}*y*values in columns_{i} - In the adjacent cells, set up expressions for ,
, their squares, and their product. For instance,
the formula for may look like
`=B3-AVERAGE(B$2:B$8)`

, depending on the location of your cells in the spreadsheets. - Determine the sums of squares and ,
and the sum of products in Excel and insert these values in the formula for
*R*. - To calculate the square root in the denominator, use the
`SQRT`

function.

The easiest way to calculate *R* in Excel is by setting up a table to
calculate the required values, as shown below. This yields a
correlation coefficient *R*^{2}=0.9978, so the data are
well-correlated and the best-fit line describes the data.

The following equation is an alternate and simpler way to calculate the correlation coefficient. It is left as an excercise for the reader to verify and implement this formula.

A few points to mention regarding the correlation coefficient:

- It is essential to retain a large number of significant figures in the
numerator and denominator during the calculation, otherwise a misleading value
of
*R*may be obtained. - Even a high
*R*value of, say, 0.9991 does not necessarily indicate that the data fits to a straight line. The trendline should always be plotted and inspected visually.*R*^{2}is more discriminating in this respect, although it no longer indicates the slope of the regression line. This, however, is evident by inspection. - Any curvature in the data will result in erroneous conclusions about the
correlation.
*R*values are only applicable to linear correlations. Nonlinear correlations are possible, but involve a different measure than*R*, and*R*values will not necessarily be close to 1. - The statistical significance of R depends on the number of samples in the
data set
*n*.

We have seen how to calculate the correlation coefficient using Excel. On the next page, we discuss the use of linear portions of non-linear curves to estimate a calibration equation.