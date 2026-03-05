Read time: 3 minutes

A research team by the National Institutes of Health (NIH) has developed a versatile machine learning model that could one day greatly expand what medical scans can tell us about disease. Scientists used their tool, named Merlin, to assess 3D abdominal computed tomography (CT) scans, accomplishing tasks as simple as identifying anatomical features to as complex as predicting disease onset years in advance. Despite being developed as a general-purpose CT model, Merlin surpassed a gauntlet of similar automated tools in tasks they were specifically built to handle.





The team trained their model on a unique set of patient CT scans linked to radiology reports and medical diagnosis codes collected from the Stanford University School of Medicine. The researchers note that it is the largest collection of abdominal CT data to date.





“Rich datasets like this are necessary to push the limits of what artificial intelligence models can accomplish in medicine,” said Bruce Tromberg, Ph.D., director of NIH’s National Institute of Biomedical Imaging and Bioengineering (NIBIB). “This work exemplifies how meticulously crafted training data can enable remarkable insights that significantly streamline workflows and assist in clinical decision-making.”





CT is a common form of medical imaging, often performed in the early stage of medical evaluations. To obtain a diagnosis, a radiologist must interpret the results and, oftentimes, additional tests and clinical assessments are needed too. At baseline, this process is lengthy and only becomes more cumbersome when accounting for the growing shortage of physicians in the United States.





“With Merlin, you could potentially go beyond traditional radiology and jump straight from imaging to a possible diagnosis. And that’s just one potential use,” said co-first author Louis Blankemeier, Ph.D., who conducted this work while a graduate student at Stanford University.





Merlin represents a new class of models, commonly referred to as foundation models, that are trained using large-scale, unlabeled datasets, which span many kinds of information.





In the new work, the researchers tested Merlin across six broad categories of activities, spanning more than 750 individual tasks that entailed diagnostics, prognostics, and quality assessment.





To prepare Merlin for the wide breadth of tasks, the researchers initially trained it on their clinical data trove which connected more than 15,000 3D abdominal CT scans paired with their radiology reports and nearly one million diagnostic codes. Using this information as study material, Merlin learned about relationships between visual and written data.





The researchers then quizzed Merlin on more than 50,000 previously unseen abdominal CT scans - coming from one of four different hospitals - to learn how closely their model could match the human-produced conclusions associated with each scan.





“Merlin tackled some tasks, such as predicting diagnosis codes, head-on, while other more complicated tasks, such as drafting radiology reports from scratch or identifying and outlining organs in a 3D space, called for additional training,” said co-first author Ashwin Kumar, a graduate student at Stanford University.





The team also deployed state-of-the-art models, specializing in each task type, to serve as points of comparison.





On average across 692 different diagnostic codes, Merlin successfully predicted which of two scans was more likely to be associated with a particular code over 81% of the time, outperforming several variants of two other models. For a subset of 102 codes, Merlin’s performance rose to 90%.





In another category, the team pushed Merlin to predict the onset of chronic diseases, such as diabetes, osteoporosis, and heart disease, in healthy patients based solely on CT scans.





The study authors found that, when comparing scans from different subjects, Merlin could identify patients who were at higher risk of developing a particular disease in the next five years 75% of the time, versus 68% for the other model. These findings hint that the model can detect key features in scans that may be lost to human eyes, suggesting that the tool could help identify new biomarkers for disease, Blankemeier explained.





The researchers ramped up the difficulty further by challenging Merlin to interpret CT scans of the chest, a body part completely absent from its CT study material. Merlin’s unique ability to identify generalizable features of disease allowed it to perform as well as or better than models trained exclusively on chest scans.





Despite being a jack-of-all-trades, Merlin exceeded or matched the specialist models across all tasks. The authors attribute Merlin’s magic touch to its architecture and training data, which allowed it to process complex 3D scans and build associations between visual and written information.





The researchers have high hopes that their approach could soon leverage prior precedent to obtain regulatory approval for simpler tasks but also plan to refine Merlin to better handle more complicated challenges, such as report writing.





While the tool is powerful out of the box, they encourage users to fine-tune the model with their own data to address their specific needs.





Reference: Blankemeier L, Kumar A, Cohen JP, et al. Merlin: a computed tomography vision–language foundation model and dataset. Nat. 2026. doi: 10.1038/s41586-026-10181-8





