The Kruskal–Wallis Test
What is the Kruskal–Wallis test?
The Kruskal–Wallis test is a statistical test used to compare two or more groups for a continuous or discrete variable. It is a non-parametric test, meaning that it assumes no particular distribution of your data and is analogous to the one-way analysis of variance (ANOVA). The Kruskal Wallis test is sometimes referred to as the one-way ANOVA on ranks or the Kruskal Wallis one-way ANOVA.
The hypotheses of the Kruskal–Wallis test are as follows:
- The null hypothesis (H0) is that the population medians are equal.
- The alternative hypothesis (H1) is that the population medians are not equal, or that the population median differs from the population median of one of the other groups.
Kruskal–Wallis test assumptions
Assumptions for the Kruskal–Wallis test are detailed below:
- Data are assumed to be non-Normal or take a skewed distribution. One-way ANOVA should be used when data follow a Normal distribution.
- The variable of interest should have two or more independent groups. The test is most commonly used in the analysis of three or more groups – for analyzing two groups the Mann-Whitney U test should be used instead.
- The data are assumed to take a similar distribution across the groups.
- The data should be randomly selected independent samples, in that the groups should have no relationship to each other.
- Each group sample should have at least 5 observations for a sufficient sample size.
These assumptions are similar to the Mann–Whitney U test, as the Kruskal–Wallis test is essentially an extension of that test with more than two independent samples. Similar to the Mann-Whitney U Test, the Kruskal–Wallis test is based on ranking the data and calculating a test statistic.
When to use the Kruskal–Wallis test
The Kruskal Wallis test and other non-parametric (or distribution-free) tests are useful to test hypotheses when the assumption for normality of the data does not hold. They make no assumptions about the shape of data distributions and this makes them particularly useful when a dataset is small. It is important to note that when conducting non-parametric statistical tests, they tend to give a more conservative result (a larger p-value) than their parametric counterparts. The Kruskal Wallis test should be used when the variable of interest is continuous (taking any number within a range eg. age, height, blood pressure) or discrete (taking on a certain value that can be counted eg. shoe size, number of hospital visits, number of people in a household).
Kruskal–Wallis test by hand
A researcher working in the field of psychology may be interested in the relationship between the sleeping habits of young people and their mental wellbeing. They conduct a small survey of 15 young people who report sleeping either more than 8 hours, 6–8 hours or less than 6 hours per night on average. They then measure their mental wellbeing using a validated score. Table 1 shows the raw wellbeing scores collected across the sleeping categories along with the median wellbeing score in each category.
Hours of sleep per night | Wellbeing score values | Median |
>8 hours | 42, 34, 57, 69, 55 | 55 |
6-8 hours | 29, 66, 46, 68, 42 | 46 |
<6 hours | 16, 32, 35, 66, 59 | 35 |
Since we are working with a discrete outcome variable, three independent groups, a small sample size and cannot assume a Normal distribution in the groups, the Kruskal–Wallis test is appropriate to test whether there is a difference in wellbeing scores across sleep categories. Widely used statistical software can easily calculate this statistical test, but by hand we can do it in five steps.
Step one: Present the null and alternative hypotheses
The hypotheses in this example are as follows:
- The null hypothesis (H0) is that the median wellbeing score is equal across sleeping groups, or that the difference between the medians is zero.
- The alternative hypothesis (H1) is that in at least one sleeping group the population median wellbeing score differs from the population median of one of the other groups.
Step two: Sort and assign ranks to the data
Next we sort the data from all groups into ascending order and assign ranks to the wellbeing scores, as shown in Table 2.
Sleep category | <6 | 6-8 | <6 | >8 | <6 | >8 | 6-8 | 6-8 | >8 | >8 | <6 | 6-8 | <6 | 6-8 | >8 |
Score | 16 | 29 | 32 | 34 | 35 | 42 | 42 | 46 | 55 | 57 | 59 | 66 | 66 | 68 | 69 |
Rank | 1 | 2 | 3 | 4 | 5 | 6.5 | 6.5 | 8 | 9 | 10 | 11 | 12.5 | 12.5 | 14 | 15 |
Table 2
Note that when there are two scores that are the same, the rank assigned is the average of the two ranks that would have been allocated had they been different from each other.
Step three: Add up the ranks for each group
Next, we find the total of the ranks in each of the sleeping groups, which we can call “T_{j}”, by simply adding together the ranks for each group using the information in Table 2:
- T_{1 }(rank total for the <6 hours sleep group): 1 + 3 + 5 + 11 + 12.5 = 32.5
- T_{2 }(rank total for the 6-8 hours sleep group): 2 + 6.5 + 8 + 12.5 + 14 = 43
- T_{3 }(rank total for the >8 hours sleep group): 4 + 6.5 + 9 + 10 + 15 = 44.5
Step four: Calculate the H statistic
As with other statistical tests we assess the hypothesis using a test statistic, which in the case of Kruskal–Wallis test is called a H statistic. The H statistic is given by the following formula:
Credit: Technology Networks
In the formula, n is the total number of observations in all groups (n=15 in our example), T_{j} is the rank total for each group (T_{1 }= 32.5, T_{2 }= 43 and T_{3 }= 44.5) and n_{i }is the number of observations in each group (n_{1 }= 5, n_{2} = 5 and n_{3} = 5). The value 12 remains constant in this formula, as it occurs naturally in relation to the mean of the sum of squares between ranked groups.
The first section of the formula that we should solve is the section which represents taking each group’s rank total, squaring it and dividing the result by the number of observations in each group, before adding these numbers together. In the formula, j=1 tells us the first value of the sum and c is the final value (in our example c = 3 as there are 3 groups). A primer on summation notation can be found here.
Credit: Technology Networks
Next, we can plug this value and the total number of observations into the full formula to find H:
Credit: Technology Networks
This gives us our test statistic of H=0.855. The degrees of freedom (df) for this test is given by the number of groups minus one, so we have 2 df.
Step five: Obtain and interpret the p-value
Our final step is to compare the H value with a critical chi-square value and interpret the p-value obtained. This value is derived from the chi-squared distribution, which is a theoretical distribution of values for a population that is often used in nonparametric statistics. In our example, with 2 df and a 0.05 significance level, we obtain a p-value of >0.05 given that our calculated H statistic is much smaller than the critical value of 5.991 (the exact p-value calculated using statistical software is p=0.652). This means the likelihood of obtaining a value of H as large as the one we have found by chance is 0.652. This is a large p-value, and so we conclude that there is insufficient evidence to reject the null hypothesis that there is any difference in wellbeing scores across the three sleeping groups.