Stay up to date on the topics that matter to you

# Spearman Rank Correlation

Credit: Technology Networks.
Listen with
Speechify
0:00

It is common in statistics to be interested in the relationship between two quantitative variables, such as increasing age and height. Spearman’s rank correlation is a flexible statistical tool that assesses the strength and direction of the relationship between two quantitative, ranked variables. In this article, we will explore the theory, assumptions and interpretation of Spearman’s rank correlation, and a worked example of calculating Spearman’s rank correlation coefficient, often referred to as Spearman’s ρ (“rho”).

## What is Spearman’s rank correlation coefficient, or Spearman rho?

Spearman’s rank correlation is a statistical technique used to understand the relationship between two variables when the relationship does not conform to a linear pattern. Spearman’s rank correlation coefficient, denoted by ρ and pronounced “rho”, or sometimes denoted by rs, measures the strength and direction of an association between two variables in ranked or ordered data. It is a non-parametric measure of association meaning that it does not assume linearity (where the relationship follows a straight line) between the two variables, which is why it is used as an alternative to Pearson’s correlation coefficient which is sensitive to the assumption of linearity. Spearman’s rank correlation coefficient works by ranking the observations in a dataset and calculating the correlation between the ranks rather than the observations themselves.

## Spearman rank correlation formula

Once the ranks for the two variables are found, we apply the formula for the correlation to the ranks as follows:

ρ = Spearman’s rank correlation coefficient

di = difference between the two ranks of each observation in the dataset

n = number of observations

## Spearman test assumptions

There are two key assumptions for the Spearman’s rank correlation coefficient:

• The data should be on the ordinal or continuous scale. An example of an ordinal variable is a survey question that ranks a five-point satisfaction scale from “most satisfied” to “least satisfied”. Examples of continuous variables are height, temperature and test performance (0100).
• Calculating Spearman’s ρ requires the two variables to have a monotonic relationship. Monotonic means that as one variable increases (or decreases) the other variable also increases (or decreases). This is so that ranks can be formed between them. Figure 1 shows example scatter plots of two monotonic relationships and a non-monotonic relationship between two variables:

Figure 1: Scatter plots showing examples of monotonic and non-monotonic relationships. Credit: Technology Networks.

## Spearman vs Pearson correlation and when to use Spearman correlation coefficient

Both Spearman’s rank and Pearson’s correlation tests share the purpose of assessing the strength and direction (negative or positive) of an association between two variables. For both correlation coefficients it is possible to conduct a hypothesis test, as with other statistical tests, to quantify the strength of evidence for correlation between the two variables of interest.

Pearson’s correlation assumes normality and a linear relationship between the variables, while Spearman’s rank is non-parametric and assumes neither specific distributions nor a linear relationship. It follows that Pearson’s correlation coefficient is calculated based on standard deviation and covariance, whereas Spearman’s rank correlation coefficient is based on ranking the data points and measures monotonic association.

## Spearman correlation interpretation

Spearman’s ρ can range from – 1 to + 1, with the sign of the coefficient indicating a negative or positive monotonic relationship, respectively. A positive correlation is when one variable increases and the other tends to increase, and negative correlation is when one variable increases and the other tends to decrease. Values closer to – 1 and + 1 indicate stronger relationships whereas values closer to 0 indicate weaker relationships.

An important point to note in the interpretation of correlation coefficients is that correlation does not necessarily imply a causal relationship between the two variables. Additional methods such as multivariable regression analysis and formalized causal thinking are needed to assess causality between variables in data research.

## Spearman correlation test example

Let us suppose a university lecturer is interested in comparing 2 sets of exam scores from their class of 16 students to see if the 2 sets of exam results are correlated. General chemistry and mathematics exams were both scored between 0 and 100.

Step one is to calculate the ranks for each student’s exam scores, the differences between the ranks and the differences squared, tabulated as follows (Table 1):

Table 1: Student’s math and chemistry exam scores and values required to calculate Spearman’s correlation.

 Student Math Score (X) Rank (X) Chemistry Score (Y) Rank (Y) Difference (d) d2 1 85 2 78 4 2 4 2 80 3 82 3 0 0 3 90 1 88 1 0 0 4 70 5 75 5 0 0 5 60 6 65 6 0 0 6 75 4 85 2 2 4 7 54 8 60 7 1 1 8 55 7 55 8 1 1 9 45 9 40 9 0 0 10 35 10 32 11 1 1 11 25 13 30 12 1 1 12 11 14 36 10 4 16 13 4 16 16 14 0 0 14 10 15 5 16 1 1 15 27 12 6 15 3 9 16 30 11 25 13 2 4

It is also always useful to visualize our data using a scatter plot to check the assumption of a monotonic relationship is met (Figure 2):

Figure 2: A scatter plot of the exam result data. Credit: Elliot McClenaghan.

Step two is to sum the squared differences (d2), in this case they sum to 42, and calculate the Spearman’s rank correlation coefficient using the formula:

Step three is to conduct a hypothesis test for the relationship between the two variables using Spearman’s ρ.

Our hypotheses for this test are as follows:

• Null hypothesis (H0) is that the two variables are independent, there is no correlation.
• Alternative hypothesis (H1) is that as one variable increases the other also increases.

Next, we calculate the test statistic (t). In the case of a sample size greater than 10, as in our example, the ρ can approximate to a Normal distribution and the test statistic can be calculated as:

The test statistic can then be used to look up the one-sided or two-sided p-value (the latter being the probability of observing data as extreme or more extreme than the observed results, assuming the null hypothesis is true) using a t-distribution table. The t-distribution is a probability distribution that gives the probabilities of the occurrence of different outcomes for an experiment and is commonly used in statistical hypothesis testing.The two-sided p-value is usually more of interest as it gives both directions of the effect and better represents the hypotheses of our test. In this case, we find a two-sided p-value of p < 0.001. In practice this can be done, as can the whole calculation of the Spearman’s ρ and hypothesis test, using statistical software. Our p-value indicates strong evidence against the null hypothesis and that there is evidence of a correlation between the math scores and chemistry scores.