We've updated our Privacy Policy to make it clearer how we use your personal data. We use cookies to provide you with a better experience. You can read our Cookie Policy here.


Biomarker Bonanza: Computational Methods and Metabolomics Advances

A scientist holds a blood sample.
Credit: iStock
Listen with
Register for free to listen to this article
Thank you. Listen to this article using the player above.

Want to listen to this article for FREE?

Complete the form below to unlock access to ALL audio articles.

Read time: 8 minutes

A biomarker, an abbreviation of a biological marker, is a measurable indicator of the state of a patient in the clinical setting. Biomarkers are of interest because they can pave the path for personalized medicine and guide patient care through appropriate clinical decisions.

Biomarkers come in multiple guises, most notably as diagnostic, prognostic and predictive, which can potentially aid patient management across the stages of their disease course, from diagnosis, to prognosis, to treatment. Diagnostic biomarkers are characteristic of a specific disease state and thus help narrow down and possibly accelerate diagnosis. Once a diagnosis is made, a prognostic marker can assess the overall anticipated outcome for a patient, independent of treatment. A predictive biomarker gauges the expected treatment response to a specified therapy and aids in selecting the optimal treatment.

There are additional types of biomarkers indicative of various characteristics, such as risk, safety, monitoring and digital biomarkers. Besides individual patients, biomarkers can also help in clinical trial stratification, facilitating testing of targeted, precision therapies.

Biomarker discovery has been successfully implemented in clinical practice for many diseases, particularly cancer. Several companion diagnostics are available to match cancer patients with an actionable mutation or molecular characteristic, e.g., immune checkpoint expression, to a tailored therapy. This treatment paradigm pinpoints a specific actionable target that drives disease pathology. However, the scenario is not as straightforward for many diseases that lack a distinct etiology or with multifactorial pathophysiology driven by several intertwined pathways.

Recently, advances in computational methods have enabled analysis of complex biological datasets. Artificial intelligence and machine learning (ML) approaches are adept at biomarker discovery and can extract subtle signatures from large intricate datasets. Moreover, besides genomics, interest has grown in leveraging alternate datasets for biomarker discovery, specifically in metabolomics.

This article will review advances in computationally-driven biomarker discovery, including from metabolomics datasets, for complex, multifactorial diseases.

Biomarker progress in Parkinson’s disease


Neurodegenerative diseases are a growing concern in our aging population. Parkinson's disease (PD), characterized by α-synuclein protein aggregation, is the second most common and is set to double in global prevalence from 6 million to 12 million by 2040.

Classically, PD was considered a movement disorder, but it is now recognized as a multisystem disease affecting multiple domains. “PD is a very complex disorder that includes a wide variety of motoric and non-motoric symptoms,” explained Enrico Glaab, professor at the Luxembourg Centre for Systems Biomedicine at the University of Luxembourg. Different non-motoric symptoms are present in virtually all patients, and span constipation, urinary dysfunction, orthostatic hypotension (low blood pressure upon standing), pain, impaired sense of smell, cognitive impairment and various neuropsychiatric symptoms.

“The first non-motoric symptoms often appear during a prodrome phase many years before a clinical diagnosis. They are relatively nonspecific, difficult to discern, and often missed until the onset of the first visible motoric symptoms that lead to an examination and a clinical diagnosis,” Glaab elaborated. Using statistical and computational methods, e.g., ML, he is interested in seeking PD biomarkers. Given this prodrome phase, he thinks analyzing relevant biomedical measurement data taken during this early disease stage can identify diagnostic biomarkers.

Biomedical measurements may include molecular data, such as genome, transcriptome or metabolome data, neuroimaging data, or digital sensor data. “Specific diagnostic biomarkers could enable an earlier diagnosis as well as an improved differential diagnosis, preventing misdiagnosis of atypical Parkinsonism forms as PD. An earlier and more reliable diagnosis would also help pave the way for more effective treatment. Moreover, diagnostic biomarkers could provide new insights into the early pathological changes in the disease.”

In an integrated analysis of plasma metabolomics and brain imaging by positron emission tomography (PET), Glaab applied ML to differentiate PD patients from controls. He found that combining PET imaging biomarkers with metabolomics data significantly improved the performance of his model for identifying PD patients. Although this study was performed at the disease stage, and not during the prodrome phase, it provided proof-of-concept that biomarker discovery using multimodal data could enhance diagnostic capability.

“In addition to earlier and differential diagnosis, our main interest is to improve prognostication of future disease course. Many, but not all, PD patients develop comorbidities and complications in later disease stages, including impulse control disorders, sleep disorders, cognitive decline, and depression, among others,” Glaab continued. “Biomarkers can help us predict these future complications well in advance, allowing for earlier and more effective interventions.”

In an example of such work, Glaab and his colleagues, as part of the National Centre of Excellence in Research on Parkinson’s Disease Consortium, examined the ability to prognosticate in PD patients using rapid eye movement-sleep behavior disorder, a disorder of movement during sleep. “PD has been subdivided into brain-first, meaning α-synuclein deposits in the brain first, or body-first, meaning pathology initiates in the enteric or peripheral autonomic nervous system,” Glaab described of the disease course. “We hypothesized that rapid eye movement-sleep behavior disorder would predominantly occur in body-first PD through autonomic dysfunction in the brainstem, differentiating it from brain-first PD.”

Using statistical analysis, the study found that rapid eye movement-sleep behavior disorder indeed correlated with autonomic dysfunction and depression, along with hallucinations and constipation, corroborating this body-first PD subtype and its predicted disease course. “Understanding anticipated complications, such as sleep disorders, will help the clinician devise a care plan, such as managing impaired sleep.”

Another avenue for biomarker discovery is to assess current disease stage and severity from biomedical data in a manner that is easier than from complete clinical examination. “This type of analysis can facilitate a more continuous, real-time, and cost- and time-efficient disease monitoring, as well as improved assessments of potential beneficial or harmful effects from new therapeutic approaches. Digital health measures are especially amenable to this type of analysis,” Glaab elaborated of the approach.

As part of the Parkinson’s Disease Digital Biomarker DREAM Challenge Consortium, Glaab was involved in a study that examined crowd-sourcing to extract features from digital accelerometer and gyroscope data to predict the severity of tremor, dyskinesia and bradykinesia in PD patients. “We found that combining features submitted by various participating teams significantly improved prediction of PD status (area under the receiver operating curve of 0.87), as well as predicting severity in tremor, dyskinesia, and bradykinesia to varying extents. This study illustrated the potential of digital biomarkers from electronic wearables for assessing disease severity,” Glaab concluded of the study.

When asked about significant obstacles to making advances in biomarker discovery Glaab added, “The main obstacles are related to limitations in available measurement data. Brain tissue from living patients is not accessible for obvious ethical reasons, and in vivo and in vitro models cannot perfectly mimic the conditions in the human brain in PD. In addition, measurement data are typically strongly influenced by several confounding factors (e.g., systematic variation in sample storage duration, age and sex representation between the investigated study groups) as well as various sources of noise and bias. We therefore need to carefully quality control and pre-process the data and integrate multiple complementary data types to ensure that our findings are robust and mechanistically interpretable.”

Biomarker discovery from metabolomics

Several types of molecular omics datasets can be leveraged for biomarker discovery. Although genomics is frequently used, especially in cancer, analysis of additional omics datasets or multimodal analysis, can especially shed insight into complex diseases or diseases lacking a known genetic cause. Increasingly, attention has been given to metabolomics, since it reflects the overall disease state as the culmination of genomic, epigenomic and proteomic forces.


The human metabolome comprises hundreds of thousands of metabolites and represents a rich dataset, fertile ground for biomarker discovery. Advances in mass spectrometry and the founding of databases, such as The Human Metabolome Database, have facilitated entry of metabolomics into the biomarker discovery arena.

“There are numerous benefits to leveraging metabolomics for biomarker discovery. One advantage of using the metabolomics approach, compared to other ‘omics (e.g., genomics), is that metabolites are indicators of real-time biochemical reactions, not just reactions that could occur because the right genes or proteins are in place. Thus, metabolomics offers a snap-shot of the current state of the cell or tissue or, in the clinical sense, the patient,” explained Sean M. Richards, professor in the Department of Obstetrics and Gynecology at the University of Tennessee College of Medicine.

Richards is interested in leveraging metabolomics to identify disease biomarkers and evaluate the impact of pharmaceuticals and environmental toxicants on humans. “Through metabolomics, we can determine the effects that a pharmaceutical or toxicant has on multiple biochemical pathways simultaneously. This has many advantages when assessing a patient’s health or their response to a pharmaceutical, such as in personalized medicine.”

Another advantage is that the metabolome, and in turn the biomarker, can be assessed from very specific tissues, or non-invasively by sampling blood, saliva or urine. “Ease of biofluid accessibility has significant advantages for the patient, especially if repeated assessments are required. For example, drawing a sample for screening, at the diagnostic stage of a disease, and then again when monitoring the patient’s response to treatment,” Richards elaborated on the benefits of metabolomics biomarker analysis from fluids. Furthermore, he highlighted the comparative low cost of metabolomics, making it amenable as a screening assay over other expensive or invasive techniques.

Richards has evaluated the potential of metabolomics-based diagnostic biomarker discovery from blood or serum as a screening tool in cancer. “Currently, colorectal cancer screening is either highly invasive, by colonoscopy, or unreliable, using fecal occult blood tests. A robust blood (serum)-based metabolomic biomarker assay would truly streamline screening,” Richards explained.

In a recent collaborative study, Richards applied ML to serum metabolomics profiles from colorectal cancer patients and controls. The best model, developed using an ensemble ML algorithm, had a classification accuracy of 100% for differentiating colorectal cancer samples from controls.

Endometrial cancer similarly lacks an optimal screening tool at present. Patients undergo progressively more invasive procedures to arrive at a diagnosis,” Richards continued. In another collaborative study, Richards and colleagues again applied ML to a serum metabolomics dataset from endometrial cancer patients and controls. Ensemble ML was able to generate a classification model with 100% sensitivity and 96% specificity.

Although these studies demonstrated the potential of biomarker discovery from metabolomics datasets, obstacles remain. Currently, it is possible to differentiate the serum metabolomics profiles of cancer patients from cancer-free controls. However, for real-world applications, the methodology will need to be able to differentiate serum metabolomics profiles from patients with various cancers, e.g., colorectal from endometrial, and from carriers of benign tumors. “The biggest disadvantage right now is the lack of data,” Richards elaborated. “However, as we characterize the metabolome of healthy and diseased populations around the world, the advantages of metabolomics will grow more tangible.”

Looking to the future, Richards expressed enthusiasm for ML to facilitate biomarker discovery. In a recent systematic review of the literature, Richards examined the ability of ML to process high-throughput, high-dimensional data collected by mass spectrometry, microarray, DNA/RNA-sequencing and imaging modalities. “Our survey of the literature highlighted a broad scope of over 20 machine learning or neural network algorithms across 17 areas of investigation. Therefore, increasingly, machine learning will play a significant role in biomarker discovery.”

When queried about the advantages of ML for feature extraction, i.e., distilling large datasets into specific signatures that constitute a biomarker profile, Richards replied, “I think a better way to think about this is that feature extraction makes machine learning more efficient. Feature extraction reduces or simplifies redundant or unnecessary data, e.g., noise, from huge datasets, such as ‘omics datasets. In turn, this improves the accuracy of the models, or biomarkers, we are using to either diagnose or predict a health outcome.”

Richards anticipates growing adoption of ML. “As artificial intelligence grows and improves – which it surely will – machine learning, and especially deep learning, will absolutely grow and improve biomarker discovery. For example, the ability of deep learning to train itself to recognize subtle biochemical patterns and biochemical pathways will only increase as more data are obtained.”

Overall, computational methods, such as ML, are making large strides in biomarker discovery. Coupled with complex datasets, including integrated multimodal datasets, computational methods stand to advance the biomarker field significantly, with the anticipated goal of realizing personalized medicine.