Addressing the Reproducibility Crisis, One Step at a Time

Article

Published: June 23, 2020

Naomi Heffer

Addressing the Reproducibility Crisis, One Step at a Time content piece image

Listen with

Speechify

0:00

Thank you. Listen to this article using the player above. ✖

Want to listen to this article for FREE?

Complete the form below to unlock access to ALL audio articles.

Read time: 7 minutes

Delivering reproducible scientific findings – that is, results that are consistently observed when studies are independently replicated – is key to ensuring the credibility of scientific research; some may even go as far as to say that reproducibility is the “demarcation criterion between science and nonscience”. And yet, in this time of rapidly developing technology, where scientific collaboration, experimentation and data analysis are in many ways quicker and easier than ever before, the majority of scientific researchers agree that we have a “reproducibility crisis” on our hands. Empirical attempts to estimate how much of the published research is replicable suggests that published scientific findings are more likely to be false (i.e. not replicable) than true.

This is a complex problem resulting from biases across different levels, from the cognitive biases of individual researchers, to journal publication biases, and prejudices that are inherent in our current scientific mentality. But as individual researchers and research teams, there are potentially problematic practices that we can all seek to address. By keeping up to date with developing statistical and methodological best practices, taking steps to mitigate our own cognitive biases, and endeavoring to be as transparent as possible in how we report our research, we can help to address the reproducibility crisis, one step at a time.

Be aware of your own biases, and take steps to mitigate them

As human beings, and especially as enthusiastic scientists with a vested interest in our fields of research, we are susceptible to cognitive biases, such as confirmation bias – the tendency to focus on evidence that supports our beliefs – and hindsight bias – the tendency to perceive events as predictable only after they have occurred. Such biases can make it hard for us to recognize when we engage in problematic practices like “p-hacking”, where analytic flexibility is exploited by changing small aspects of pre-processing or analysis routines to make statistical significance more likely in the outcome. As Marcus Munafò, professor of biological psychology at the University of Bristol, explains: “It’s very easy, once we see the data, to become excited about apparent patterns in the data – we all go into science because we want to discover something. But this excitement and enthusiasm can lead us astray if we’re not careful, and as humans we have a natural tendency to see patterns in noise.”

There are several practical ways that we can protect ourselves from being misled by our own biases, including blinding. Depending on the experimental design, it may be possible to blind study participants and those carrying out data collection to the different experimental conditions under investigation, and/or the key research hypotheses. This is not relevant or possible for all experimental designs, but for almost all studies it should be possible to blind statistical analysis, such that the identities of experimental conditions and variable labels are masked during data preparation and analysis, to ensure that processes like identification of outliers are not influenced by the researchers’ own expectations of what the results should show.

Another way of protecting against the influence of bias on data collection and analysis is pre-registration, where researchers publish a statement outlining their study design, primary outcomes and analysis plan before beginning the planned research. “Pre-registration is simply a way to lay out what we intended to do before we set out on a study,” Professor Munafò clarifies. He goes on to explain that this can help protect us against the biases that may lead us into problematic analytical practices “…by having an explicit statement of what we planned at the outset, to remind ourselves.” Thus, preregistration can help to dissuade researchers from exploiting analytical flexibility to make their results appear more novel or conclusive. Support for study pre-registration is increasing, with websites such as the Open Science Framework offering preregistration services and guidance to researchers who are new to this approach. As Katherine Button, senior lecturer in psychology at the University of Bath, points out, pre-registration “done properly” will also “force you to think about many key aspects of study design which are important for reproducible results (e.g., clear hypotheses, sample size justification, analysis plan).”

Efficient Collaboration Can Help Address the Reproducibility Crisis

Scientific research builds on itself, so when past results cannot be reproduced, the foundations of science as we know it are called into question. In recent years, this has become an increasingly concerning issue; according to one Nature survey, more than 70 percent of researchers reported being unable to reproduce another scientist’s experiments. Solving this so-called “reproducibility crisis” is one of the biggest challenges facing the scientific community. In this whitepaper, discover how efficient, digitized collaboration can help to address the reproducibility crisis, with a technological solution.

View Whitepaper

Keep up to date with methodological and analytical best practice

Attempts to do reproducible science can, in many cases, be confounded by common misperceptions and lack of methodological and statistical understanding; for example, misunderstanding of the meaning and importance of statistical power. Analytical and methodological best practices are under continual improvement, especially in this period where metascience (the study of science itself, and how it operates) is a flourishing area of research. In the absence of formal requirements and/or programs for continuing education, researchers should be encouraged to make the most of freely-accessible educational resources. Brief, interactive, web-based modules on key topics of methodological and analytical importance are becoming increasingly available, such as the app “P-hacker”, which demonstrates how easy it is to generate statistically significant findings if you look hard enough for them.

In terms of improving analytical rigor, one thing we can all do as researchers is to ensure that our studies are adequately powered. As Professor Munafò explains: “Smaller studies, all other things being equal, tend to give less precise results, which means they might be less likely to replicate.” He outlines “a nice illustration of this in a study which looked at studies that had been subject to a replication attempt – the larger the original study, the more likely that finding was to replicate in the subsequent replication study.” While methodological best practice suggests that studies should be powered to around 80%, empirical attempts to estimate power in the published neuroscience literature suggest that studies are only 20% powered on average to address their key research questions.

Low statistical power, by definition, reflects a reduced probability of discovering effects that are genuinely true (i.e. increases false negatives) but also undermines the reproducibility of science. Perhaps counterintuitively, the lower the statistical power, the lower the probability that an observed effect which reaches an arbitrary statistical significance level (such as p < 0.05) actually reflects a true effect, making the discovery of false positives more likely. As Dr Button explains further, “With bigger samples you are less likely missing genuine effects, and when you do find a significant result it is less likely to be false positive. Bigger samples also increase the precision of your effect estimates, so as long as your sample is representative, your results will be closer to the true population effect with tighter confidence intervals.”

One way that all researchers can help to reduce the threats to reproducibility posed by false positives and inflated effect sizes, is to do an a-priori power calculation to determine sufficient sample size to address their key research questions. A number of free and easy-to-use software packages are available in order to do this, including the commonly used package GPOWER. Where large scale studies are made difficult by a lack of resources within individual research teams, collaboration between research teams and across multiple sites can be invaluable in increasing power. Dr Button says that her network of research collaborators “use the Open Science Framework to manage [their] collaborative projects and opt for open source software and resources where possible to maximize [their] ability to share them.”

Increase transparency in reporting

When we read and evaluate the scientific literature, we rely on everything being reported transparently so we can make judgements about whether results are trustworthy and under what conditions they are likely to apply. When asked to provide one key message for researchers seeking to do more reproducible research, Professor Munafò stated: “The one thing we have control over is the extent to which we make our work transparent.”

There are lots of practical steps we can take towards making our research more transparent. Importantly, we can be clear about what is confirmatory and what is exploratory analysis when we report our study results. Seeking to do reproducible research “doesn’t prevent us from doing exploratory analyses…” advises Professor Munafò, it just requires us to make “…clear which analyses were pre-specified (i.e., confirmatory) and which were not (i.e., exploratory).” Exploring the data we obtain can be very informative and important for generating new hypotheses, but in the case of exploratory analysis, researchers should only describe what they observe in the data, rather than reporting p-values or significance tests, as p-values by definition are designed to test pre-existing hypotheses. By making it clear which analyses are confirmatory and which are exploratory, researchers can tell the reader which effects should be replicable, and which need further investigation to be considered true, reproducible findings.

Other ways that researchers can increase transparency is by documenting all pre-processing and data cleaning steps, making their analysis pipelines publicly available (or even better, using pre-existing standardized pipelines, as this additionally protects against analytic malpractices like p-hacking) and making all research data publicly available. Professor Munafò highlights how taking some of these steps towards transparency can also help make us more efficient researchers. He explains that in his research group “preparing our data for sharing […] means we have to curate our data carefully – well-labelled data files and analysis code, a corresponding data dictionary describing the variables, a readme file describing the basics of the study. This all means the data are easy for anyone to understand and that also means we can understand them better if we ever return to them, months or years later.” All of the data from Professor Munafò’s research group is available to download from the University of Bristol Data Depository. Guidelines for improving the quality and transparency of research reporting are widely available and provide specific guidance for quality reporting for different types of study design, many of which are accessible via online database The Equator Network.

Best Practice for Making Particle Concentration Measurements on the Zetasizer Ultra

Measuring the particle concentration of a sample is similar to measuring its size, but the sample preparation and the effect of sample measurement properties can have a significant impact on the concentration measurement. To ensure that your data are accurate and reproducible, it is critical to establish that best practice is followed during the sample preparation and measurement process. Download this guide to discover how you can make sure you achieve high-quality particle concentration data.

View Guide

Take one step towards doing more reproducible research

Whether you take a step towards reporting your results and procedures more transparently, commit to some further training in recent methodological and statistical advances, or pre-register your next study to help protect against biases during data collection and analysis, the key message from those researching best practice, is to commit to doing something, and to think about reproducibility guidelines at the point of design, rather than reporting, when much of the “damage” to the reproducibility of your data may already have been done. “There are lots of open research practices – pre-registration, sharing data, posting preprints,” explains Professor Munafò, “It can feel a lot to do at once, so my suggestion would be to do something – whatever seems easiest and most interesting to you.”

Meet the Author