We've updated our Privacy Policy to make it clearer how we use your personal data. We use cookies to provide you with a better experience. You can read our Cookie Policy here.


Repeatability vs. Reproducibility

Repeatability vs. Reproducibility  content piece image
Listen with
Register for free to listen to this article
Thank you. Listen to this article using the player above.

Want to listen to this article for FREE?

Complete the form below to unlock access to ALL audio articles.

Read time: 4 minutes

In measuring the quality of experiments, repeatability and reproducibility are key. In this article, we explore the differences between the two terms, and why they are important in determining the worth of published research. 

What is repeatability?

Repeatability is a measure of the likelihood that, having produced one result from an experiment, you can try the same experiment, with the same setup, and produce that exact same result. It’s a way for researchers to verify that their own results are true and are not just chance artifacts.

To demonstrate a technique’s repeatability, the conditions of the experiment must be kept the same. These include: 

  • Location
  • Measuring tools
  • Other apparatus used in the experiment
  • Observer
  • Hypothesis
  • Time period (taking month-long breaks between repetitions isn’t good practice) 

Bland and Altman authored an extremely useful paper in 1986 which highlighted one benefit of assessing repeatability: it allows one to make comparisons between different methods of measurement.

Previous studies had used similarities in the correlation coefficient (r) between techniques as an indicator of agreement. Bland and Altman showed that r actually measured the strength of the relation between two techniques, not the extent to which they agree with each other. This means r is quite meaningless in this context; if two different techniques were both designed to measure heart rate, it would be bizarre if they weren’t related to each other!

Bland and Altman showed that repeatability, on the other hand, can be used to compare two techniques. By calculating the mean of multiple measurements taken by a technique and pairing those means with those derived from the other technique, we can work out whether the two techniques agree.

If the data follows a normal distribution, the accepted limits of this difference (or 95% of it at least) are +/-1.96 times the standard deviation of the differences between the means of the two techniques.

What is reproducibility?

The reproducibility of data is a measure of whether results in a paper can be attained by a different research team, using the same methods. This shows that the results obtained are not artifacts of the unique setup in one research lab. It’s easy to see why reproducibility is desirable, as it reinforces findings and protects against rare cases of fraud, or less rare cases of human error, in the production of significant results.

Why are repeatability and reproducibility important?

Science is a method built on an approach of gradual advance backed up by independent verification and the ability to show that your findings are correct and transparent. Academic research findings are only useful to the wider scientific community if the knowledge can be repeated and shared among research groups. As such, irreproducible and unrepeatable studies are the source of much concern within science. 

What is the reproducibility crisis?

Over recent decades, science, in particular the social and life sciences, has seen increasing importance placed on the reproducibility of published studies. Large-scale efforts to assess the reproducibility of scientific publications have turned up worrying results. For example, a 2015 paper by a group of psychology researchers dubbed the “Open Science Collaboration” examined 100 experiments published in high-ranking, peer-reviewed journals. Of these 100 studies, just 68 reproductions provided statistically significant results that matched the original findings. These efforts are part of a growing field of “metascience” that aims to take on the reproducibility crisis.

But what about replicability? 

Whilst the distinction between repeatability and reproducibility is clear, sometimes a third term, replicability, comes in to muddy the waters. For some authors, replicability and reproducibility can be used interchangeably. For others, the distinction is of great importance. Sometimes, authors have tried to swap them around or erase one altogether. The history of these different attempts can be read in this excellent article. This can all be a bit confusing, but this guide, at least, will try to be clear, using previously established definitions of both terms: 

Reproducibility (Different team, same experimental setup). If an observation is reproducible, it should be able to be made by a different team repeating the experiment using the same experimental data and methods, under the same operating conditions, in the same or a different location, on multiple trials

Replicability: (Different team, different experimental setup). If an observation is replicable it should be able to be made by a different team, using a different measuring system and dataset, in a different location, on multiple trials. This would therefore involve collecting data anew.

This means replicability is somewhat harder to achieve than reproducibility but shows why the reproducibility crisis is so damaging: if results are based on fully reported methods, using reliable data, they should always be reproducible.

How can we improve reproducibility?

A lot of thought is being put into improving experimental reproducibility. Below are just some of the ways you can improve reproducibility:

  • Journal checklists – more and more journals are coming to understand the importance of including all relevant details in published studies, and are bringing in mandatory checklists for any published papers. The days of leaving out sample numbers and animal model descriptions in methods sections are over and blinding and randomization should be standard where possible.
  • Strict on stats – power calculations, multiple comparisons tests and descriptive statistics are all essential to making sure that reported results are statistically sound. 
  • Technology can lend a handautomating processes and using high-throughput systems can improve accuracy in individual experiments, and enable more measurements to be taken in a given time, increasing sample numbers.