The research team has called for a range of measures to reduce the flood of "low-quality" and "science fiction" papers, including stronger peer review processes and the use of statistical reviewers for complex datasets.

In a study published in PLOS Biology , researchers reviewed papers proposing an association between a predictor and a health condition using an American government dataset called the National Health and Nutrition Examination Survey (NHANES), published between 2014 and 2024.





NHANES is a large, publicly available dataset used by researchers around the world to study links between health conditions, lifestyle and clinical outcomes. The team found that between 2014 and 2021, just four NHANES association-based studies were published each year – but this rose to 33 in 2022, 82 in 2023, and 190 in 2024.





“While AI has the clear potential to help the scientific community make breakthroughs that benefit society, our study has found that it is also part of a perfect storm that could be damaging the foundations of scientific rigour," said Dr Matt Spick, co-author of the study from the University of Surrey.





"We’ve seen a surge in papers that look scientific but don’t hold up under scrutiny – this is ‘science fiction’ using national health datasets to masquerade as science fact. The use of these easily accessible datasets via APIs, combined with large language models, is overwhelming some journals and peer reviewers, reducing their ability to assess more meaningful research – and ultimately weakening the quality of science overall."





The study found that many post-2021 papers used a superficial and oversimplified approach to analysis – often focusing on single variables while ignoring more realistic, multi-factor explanations of the links between health conditions and potential causes. Some papers cherry-picked narrow data subsets without justification, raising concerns about poor research practice, including data dredging or changing research questions after seeing the results.





"We’re not trying to block access to data or stop people using AI in their research – we’re asking for some common sense checks," said Tulsi Suchak, post-graduate researcher at the University of Surrey and lead author of the study.





"This includes things like being open about how data is used, making sure reviewers with the right expertise are involved, and flagging when a study only looks at one piece of the puzzle. These changes don’t need to be complex, but they could help journals spot low-quality work earlier and protect the integrity of scientific publishing."



To help tackle the issue, the team has laid out a number of practical steps for journals, researchers and data providers. They recommend that researchers use the full datasets available to them unless there’s a clear and well-explained reason to do otherwise, and that they are transparent about which parts of the data were used, over what time periods, and for which groups.



For journals, the authors suggest strengthening peer review by involving reviewers with statistical expertise and making greater use of early desk rejection to reduce the number of formulaic or low-value papers entering the system. Finally, they propose that data providers assign unique application numbers or IDs to track how open datasets are used – a system already in place for some UK health data platforms.

“We believe that in the AI era, scientific publishing needs better guardrails," said Anietie E Aliu, co-author of the study and post-graduate student. "Our suggestions are simple things that could help stop weak or misleading studies from slipping through, without blocking the benefits of AI and open data. These tools are here to stay, so we need to act now to protect trust in research."



Reference: Suchak T, Aliu AE, Harrison C, Zwiggelaar R, Geifman N, Spick M. Explosion of formulaic research articles, including inappropriate study designs and false discoveries, based on the NHANES US national health database. PLOS Biology. 2025;23(5):e3003152. doi: 10.1371/journal.pbio.3003152





This article has been republished from the following materials. Note: material may have been edited for length and content. For further information, please contact the cited source. Our press release publishing policy can be accessed here.