Repeatability vs. Reproducibility
Article Mar 25, 2019 | by Ruairi J Mackenzie, Science Writer for Technology Networks
In measuring the quality of experiments, repeatability and reproducibility are key. In this article, we explore the differences between the two terms, and why they are important in determining the worth of published research.
Repeatability: the basics
Repeatability is a measure of the likelihood that, having produced one result from an experiment, you can try the same experiment, with the same setup, and produce that exact same result. It’s a way for researchers to verify that their own results are true and are not just chance artifacts.
To demonstrate a technique’s repeatability, the conditions of the experiment must be kept the same. These include:
- Measuring tools
- Other apparatus used in the experiment
- Time period (taking month-long breaks between repetitions isn’t good practice)
Bland and Altman authored an extremely useful paper in 1986 which highlighted one benefit of assessing repeatability: it allows one to make comparisons between different methods of measurement.
Previous studies had used similarities in the correlation coefficient (r) between techniques as an indicator of agreement. Bland and Altman showed that r actually measured the strength of the relation between two techniques, not the extent to which they agree with each other. This means r is quite meaningless in this context; if two different techniques were both designed to measure heart rate, it would be bizarre if they weren’t related to each other!
Bland and Altman showed that repeatability, on the other hand, can be used to compare two techniques. By calculating the mean of multiple measurements taken by a technique and pairing those means with those derived from the other technique, we can work out whether the two techniques agree.
If the data follows a normal distribution, the accepted limits of this difference (or 95% of it at least) are +/-1.96 times the standard deviation of the differences between the means of the two techniques.
Reproducibility: the basics
The reproducibility of data is a measure of whether results in a paper can be attained by a different research team, using the same methods. This shows that the results obtained are not artifacts of the unique setup in one research lab. It’s easy to see why reproducibility is desirable, as it reinforces findings and protects against rare cases of fraud, or less rare cases of human error, in the production of significant results.
What is the reproducibility crisis?
Over recent decades, science, in particular the social and life sciences, has seen increasing importance placed on the reproducibility of published studies. Large-scale efforts to assess the reproducibility of scientific publications have turned up worrying results. For example, a 2015 paper by a group of psychology researchers dubbed the “Open Science Collaboration” examined 100 experiments published in high-ranking, peer-reviewed journals. Of these 100 studies, just 68 reproductions provided statistically significant results that matched the original findings. These efforts are part of a growing field of “metascience” that aims to take on the reproducibility crisis.
But what about replicability?
Whilst the distinction between repeatability and reproducibility is clear, sometimes a third term, replicability, comes in to muddy the waters. For some authors, replicability and reproducibility can be used interchangeably. For others, the distinction is of great importance. Sometimes, authors have tried to swap them around or erase one altogether. The history of these different attempts can be read in this excellent article. This can all be a bit confusing, but this guide, at least, will try to be clear, using previously established definitions of both terms:
Reproducibility (Different team, same experimental setup). If an observation is reproducible, it should be able to be made by a different team repeating the experiment using the same experimental data and methods, under the same operating conditions, in the same or a different location, on multiple trials
Replicability: (Different team, different experimental setup). If an observation is replicable it should be able to be made by a different team, using a different measuring system and dataset, in a different location, on multiple trials. This would therefore involve collecting data anew.
This means replicability is somewhat harder to achieve than reproducibility but shows why the reproducibility crisis is so damaging: if results are based on fully reported methods, using reliable data, they should always be reproducible.
A lot of thought is being put into improving experimental reproducibility. Below are just some of the ways you can improve reproducibility:
- Journal checklists – more and more journals are coming to understand the importance of including all relevant details in published studies, and are bringing in mandatory checklists for any published papers. The days of leaving out sample numbers and animal model descriptions in methods sections are over and blinding and randomization should be standard where possible.
- Strict on stats – power calculations, multiple comparisons tests and descriptive statistics are all essential to making sure that reported results are statistically sound.
- Technology can lend a hand – automating processes and using high-throughput systems can improve accuracy in individual experiments, and enable more measurements to be taken in a given time, increasing sample numbers.
When developing diagnostic tests or evaluating results, it is important to understand how reliable those tests and therefore the results you are obtaining are. By using samples of known disease status, values such as sensitivity and specificity can be calculated that allow you to evaluate just that.READ MORE