The Wilcoxon Signed-Rank Test
Learn how and when to use the Wilcoxon signed-rank test.
Complete the form below to unlock access to ALL audio articles.
What is the Wilcoxon signed rank test?
The Wilcoxon signed rank test, which is also known as the Wilcoxon signed rank sum test and the Wilcoxon matched pairs test, is a non-parametric statistical test used to compare two dependent samples (in other words, two groups consisting of data points that are matched or paired). As with other non-parametric tests, this test assumes no specific distribution of the data being analyzed (for example, whether or not it takes a Normal distribution). The parametric equivalent to the Wilcoxon signed rank test is the dependent samples t-test (or paired t-test).
The Wilcoxon signed-rank test can be used when comparing one sample to a specified value, as a non-parametric equivalent to a z-test or t-test. However, we focus here on the use of Wilcoxon signed rank test in the analysis of paired data. As with the Mann-Whitney U Test and the Kruskal Wallis test, the Wilcoxon signed rank test is based on ranks assigned to the data points rather than the actual observed data.
The hypotheses for the Wilcoxon signed rank test for paired data are as follows:
- The null hypothesis (H0) is that the difference between the paired observations in the population is zero.
- The alternative hypothesis (H1) is that the difference between the paired observations is not equal to zero.
As with the Mann-Whitney U test, you can interpret this as comparing the medians of the differences between the paired observations but note that the medians are not actually involved in calculating the test statistic.
When to use the Wilcoxon signed-rank test
As this is a non-parametric test of the magnitude of difference between paired data, it follows that the variable of interest should be continuous (able to take a number in a range) or discrete (data that can only take certain values). It is also important to understand what we mean when we say data are paired. Paired data arise when observations from one independent sample are uniquely matched or related to observations in another independent sample. This may be due to them coming from the same individual (such as duplicate measurements of blood pressure before and after a treatment) or from related individuals (such as a brother and sister or matched participants from a drug treatment vs control arm of a clinical trial). It is important that paired data are analyzed as such and not treated as independent samples.
Non-parametric tests, also known as distribution-free tests, make no assumptions about the shape of the distributions of your data. They are used to test hypotheses when the assumptions for the normality of the data are not met. Usually, this will be in the context of small data sets that are not normally distributed.
Wilcoxon signed-rank test example
In a clinical trial conducted to evaluate the effectiveness of a new pain relief medication, 8 patients were given the medication and rated their pain level on a scale of 1 to 10 both before and after taking the drug. As each patient contribute a pair of datapoints of a discrete numeric variable, the sample size is small and normality cannot be assumed, the Wilcoxon signed rank test for paired data is suitable to test the differences between the pairs. This can be done by hand in four steps.
Step one: Present the null and alternative hypotheses
The hypotheses to be tested in this example are as follows:
The null hypothesis (H0) is that there is no difference in pain ratings between the before and after measurements, or that the median difference between the pairs is zero.- The alternative hypothesis (H1) is that there is a difference in pain ratings between the before and after measurements, or that the median difference between the pairs is not equal to zero.
Step two: Calculate the differences between paired measurements and rank them
To find the differences between the pairs of data we subtract the pain rating after treatment from the pain rating before treatment. The measurements and their differences are shown in Table 1. Note that when the pain rating increases after treatment the difference is presented as a negative value. We assign the ranks ignoring whether the difference values are positive or negative.
Patient ID | Pain rating (before treatment) | Pain rating (after treatment) | Difference | Ranks |
1 | 8.0 | 6.5 | 1.5 | 2 |
2 | 6.0 | 5.0 | 1.0 | 1 |
3 | 3.5 | 5.5 | -2.0 | 3 |
4 | 9.5 | 4.0 | 5.5 | 8 |
5 | 10.0 | 6.5 | 3.5 | 5 |
6 | 8.0 | 3.5 | 4.5 | 7 |
7 | 9.0 | 5.0 | 4.0 | 6 |
8 | 7.0 | 10.0 | -3.0 | 4 |
Table 1
Step three: Calculate the sums of the ranks to find the test statistic
Next, we use the ranks and calculate a sum of the ranks for the negative differences (W^{-}) and a sum of the ranks for the positive differences (W^{+}).
W^{- }= 3 + 4 = 7
W^{+ }= 2 + 1 + 8 + 5 + 7 + 6 = 29
The Wilcoxon signed ranks test statistic is taken as the lowest of the two sums of the ranks, so in this case, our test statistic is W=7.
Step four: Obtain and interpret the p-value
Next, we determine a critical value of W with which to compare our calculated test statistic. We do this using a reference table of critical values. In this case, for our sample size of n=8, a significance level of 0.05 and a two-sided test (not specifying the direction of difference) our critical value is 3. Our calculated W statistic (W=7) is larger than the critical value, hence our p-value obtained is >0.05.
We can conclude that there is insufficient evidence to reject the null hypothesis in this example, and that there is no evidence to conclude a difference between the pain ratings before and after treatment.