Stats Tutorial - The Correlation Coefficient:

Introduction

Basics of Excel™

Basic Statistics

Linear Regression:

Introduction

Correlation

Linear Portions

Regression Equation

Regression Errors

Using the Calibration

Limits of Detection

Outliers in Regression

Data Evaluation & Comparison


PreviousSitemapNext

In the first part of this tutorial, we saw how to use the trendline feature in Excel to fit a straight line through calibration data and obtain both the equation of the best-fit straight line and the correlation coefficient, R (sometimes written as R2).

There are in fact various correlation coefficients, but the one we are interested in here is the Pearson or product-moment correlation coefficient. The Pearson R value provides a measure of the degree to which the values of x and y are linearly correlated. We can assess this visually using a scatter plot (Figure 1), in which we also mark the centroid of the data, .

XY scatter plot showing the centroid of the data

Figure 1 - XY scatter plot showing the centroid of the data


If x and y were linearly correlated, we would expect all the points to fall on a straight line passing through the centroid. As a result, we would expect all x values to be uniformly distributed either side of ; similarly, all the y values should be uniformly distributed about . The Pearson R is calculated using the formula

R = sum((xi-xbar)(yi-ybar))/sqrt(sum((xi-xbar)^2(yi-ybar)^2))

If x and y are perfectly correlated in a linear fashion, we would expect the value of R to be either +1 or -1, depending on whether y increases (positive slope) or decreases (negative slope) with x.


To demonstrate how to calculate this formula in Excel, we return to our previous example of fluorescence intensity data. Then,

  1. Set up a spreadsheet with the xi and yi values in columns
  2. In the adjacent cells, set up expressions for , , their squares, and their product. For instance, the formula for may look like =B3-AVERAGE(B$2:B$8), depending on the location of your cells in the spreadsheets.
  3. Determine the sums of squares and , and the sum of products in Excel and insert these values in the formula for R.
  4. To calculate the square root in the denominator, use the SQRT function.

The easiest way to calculate R in Excel is by setting up a table to calculate the required values, as shown below. This yields a correlation coefficient R2=0.9978, so the data are well-correlated and the best-fit line describes the data.

Spreadsheet layout showing the calculation of the correlation coefficient

The following equation is an alternate and simpler way to calculate the correlation coefficient. It is left as an excercise for the reader to verify and implement this formula.

A few points to mention regarding the correlation coefficient:


We have seen how to calculate the correlation coefficient using Excel. On the next page, we discuss the use of linear portions of non-linear curves to estimate a calibration equation.

© 2006 Dr. David C. Stone & Jon Ellis, Chemistry, University of Toronto
Last updated: September 26th, 2006