We've updated our Privacy Policy to make it clearer how we use your personal data. We use cookies to provide you with a better experience. You can read our Cookie Policy here.


Stats Study Reveals Reason for Replication Crisis in Neuroscience

Stats Study Reveals Reason for Replication Crisis in Neuroscience content piece image
Listen with
Register for free to listen to this article
Thank you. Listen to this article using the player above.

Want to listen to this article for FREE?

Complete the form below to unlock access to ALL audio articles.

Read time: 3 minutes

The ability to image inside of our skulls has advanced neuroscience research in much the same way that Ignaz Semmelweiss’s germ theory advanced hospital practice. In short, it has proved a revolutionary technology that has hugely improved our understanding of the brain.

But new research casts significant doubt on the statistical validity of a swathe of experiments. These studies, which had represented the next big step for neuroimaging – are efforts to link specific signatures in brain scans to complex psychiatric symptoms and states.

Issues with association

The findings, published in Nature, suggest that vast numbers of so-called brain wide association studies (BWAS) may be statistically underpowered. “This specific manuscript is focused on the reproducibility of linking neuroimaging measures with complex behavioral phenotypes, much in the same way geneticists focus on linking genes to similar complex phenotypes,” explains Dr. Scott Marek, study co-author and instructor in the department of psychiatry at the Washington University School of Medicine.

Neuroimaging research studies are incredibly varied. Some of the most foundational papers in this area have linked general cognitive processes, like memory, to regions of the brain that house the neurons controlling said processes. Other study designs monitor how changes in blood oxygenation mirror the activation of certain brain areas in response to behavioral tasks.

BWAS studies differ from these more classical neuroimaging studies in several ways. In targeting complex psychiatric behaviors, BWAS research tries to pin a biological signature to brain processes that are infamously variable. While it would be a medical marvel to discover a brain that doesn’t use hippocampal structures in memory recall, it’s far more common to find two people with treatment-resistant depression that have different levels of brain activity. This means that the size of the association involved is much smaller.

Lifting research by the bootstraps

Marek and colleagues wanted to assess what these smaller effect sizes meant for ideal BWAS study design. To work this out, they used data from three studies that represent the current magnum opuses of BWAS research. These are the Adolescent Brain Cognitive Development (ABCD) study, Human Connectome Project (HCP) and the UK Biobank. These beefy endeavors, assembled by well-funded international consortia, contain a huge volume of brain data – nearly 50,000 scans’ worth. With this vast, very real dataset, Marek and his team used a statistical technique called bootstrapping to create a series of virtual datasets, ranging in size from the more commonly used small sizes (n=25) up to huge analyses of tens of thousands of scans.

Marek’s team then essentially modeled how accurate and reproducible the findings from each of these hypothetical datasets were. Their results suggest that current BWAS approaches may require a seismic shift to ensure their data is reliable.

Inflated effect sizes

The team found that, regardless of size, the types of association analyzed in BWAS studies were extremely prone to being inflated by chance. This meant that the findings of Marek’s smaller simulated studies were largely irreproducible. Only once the experiments were modeled with thousands of brain scans did the inflated effect sizes begin to reduce.

Essentially, small BWAS studies, set up to analyze minuscule potential effects with low numbers of scans, are highly likely to be irreproducible. Moreover, in these smaller studies, the authors write, it is the very findings that are most inflated by chance and least reliable that are most likely to be found  “statistically significant” and make it to publication.

Marek is careful not to throw the baby out with the bathwater. Other types of neuroimaging analysis are much more dependable at smaller sample sizes. “The reproducibility, or lack thereof, is a result of the correlational nature of these studies, rather than neuroimaging writ large,” he explains.  “Some neuroimaging studies (e.g., basic brain mapping of specific functions, task induced effects, etc.) do not fall under the umbrella of BWAS.”

He also emphasizes that even BWAS studies have a clear, if hardly facile way to improve their reproducibility: more samples, which ultimately means more funding is required. “The most straightforward way to improve BWAS would be to increase sample sizes, as was recently done with genome-wide association studies (GWAS). This can be done through large consortia, such as the ABCD, HCP and UK Biobank studies or through data aggregation across multiple labs,” he says.

Following the GWAS journey

GWAS analyses, that tie physiological or psychiatric metrics to particular gene signatures, have been through their own replication journey over the 21st century. Initial sample sizes of fewer than 100 genomes were plagued by irreproducibility. In response to these issues, and the crashing price of genomic sequencing, GWAS studies have been able to vastly expand their sample numbers into the millions. It’s not immediately clear how that approach is compatible with current BWAS practices, where small labs, operating with tight budgets, use a median sample size of 23.

Marek also suggests that clearer reporting of effect sizes, regardless of their statistical significance, would help make the reasons for irreproducibility clearer.

Finally, Marek is keen to impress that BWAS studies are worthwhile, given an adequate sample design. “Given the lack of reproducibility of BWAS with current sample sizes, one could argue we have a lot to learn about how the brain relates to complex phenotypes,” he says.

Researchers trying to understand the complex variety of the brain, Marek says, must bring to bear an equally variable and powerful set of experimental approaches. “If [a] researcher want to conduct a BWAS study, they should do so only in the largest available dataset and report all effect sizes,” he concludes.

Reference: Marek, S, Tervo-Clemmens, B, Calabro, FJ, et al. Reproducible brain-wide association studies require thousands of individuals. Nature. 2022. doi: 10.1038/s41586-022-04492-9