Overcoming the Identification Bottleneck in Untargeted Metabolomics: Combining Technological and Software Innovations

Article

Published: January 25, 2019

| By Warwick Dunn, Professor of Analytical and Clinical Metabolomics, School of Bioscience, University of Birmingham.

Overcoming the Identification Bottleneck in Untargeted Metabolomics: Combining Technological and Software Innovations content piece image

Credit: Pixabay

Listen with

Speechify

0:00

Thank you. Listen to this article using the player above. ✖

Want to listen to this article for FREE?

Complete the form below to unlock access to ALL audio articles.

Read time: 3 minutes

Metabolomics, a rapidly evolving field that examines the profile and concentration of the small molecule constituents of cellular processes, has the potential to drive significant improvements in the diagnosis and treatment of human diseases. Being effectively downstream of other 'omics' approaches (such as genomics, transcriptomics and proteomics), ongoing advances in metabolomics research are teaching us much about how our genes, proteins and metabolites ultimately influence the phenotype in both healthy and disease states.

Metabolomics experiments can either be ‘targeted’ (in which the metabolites measured are already known) or ‘untargeted’ (in which the aim is to characterize and quantify a large number of metabolites in a sample). Untargeted metabolomics approaches are allowing previously unreported insights into the pathogenesis of disease to be observed. However, identifying the large numbers of unknown and structurally diverse metabolites typically present in biological samples has proven to be a major bottleneck, requiring the collection, analysis and interpretation of a huge amount of data. Fortunately, recent advances in technology and software are helping to overcome this challenge.

Advances in MS instrumentation and software

To identify the potentially low concentrations of metabolites present in samples, analysis methods must be sufficiently sensitive and have the resolving power to distinguish between different molecules of very similar size. Here, mass spectrometry (MS) and nuclear magnetic resonance (NMR) are the most widely-used techniques for metabolomics research.

While ongoing improvements in NMR are resulting in increasingly powerful analytical performance, MS has come to dominate the field, and in recent years the capabilities of MS instruments and software have advanced significantly. New high-resolution accurate mass (HRAM) systems can detect metabolites at ultra-low concentrations and determine incredibly small differences in their mass, allowing previously indistinguishable MS signals to be resolved and attributed to specific analyte fragments. For this reason, HRAM mass analyzers are playing an increasingly important role in untargeted metabolomics experiments, improving confidence in the identification and quantification of metabolites.

Improvements in compound identification are also being achieved by coupling MS to other techniques, such as gas or liquid chromatography and/or using tandem MS approaches. With tandem MS (MS/MS, or MSⁿ), ions of a particular mass-to-charge ratio (known as precursor ions) are selected and subsequently fragmented and analyzed in a second round of MS (or in multiple rounds in the case of MSⁿ), enabling more confident compound annotation and identification.

When using higher orders of MSⁿ there can be a trade-off between analytical speed and sensitivity. However, novel software solutions are being developed to overcome this. Advanced 'intelligent' software programs can direct the mass spectrometer to efficiently collect more meaningful information by using an iterative re-injection system, processing data after each round of MS and determining which ions to include and exclude in the next analysis. Bringing together these advances in instrumentation and software is helping to streamline procedures and improve the quality and resolution of the data used for compound identification.

Expanding metabolomics databases and libraries

To identify unknown metabolites, researchers compare their own experimental results to data stored in metabolomics databases and libraries. Several metabolomics databases and libraries have been established using a combination of experimental data, literature information and computational modelling. In recent years, these repositories have been greatly expanded to accelerate compound identification. The Human Metabolome Database, for example, has grown from just over 2,000 metabolites in 2007 to approximately 115,000 metabolites today. The mzCloud advanced mass spectral database (HighChem) features a searchable collection of curated high resolution/accurate mass spectra currently with 8384 compounds, just under 2.9M spectra. This rate of expansion would not have been possible using only experimentally collected data, and computational techniques have played a major role in developing these databases.

Currently, chemical standards are available for less than 10% of the human metabolome, which limits our ability for high confidence identification of the chemical structures of metabolites. However, software programs can now be used to replicate the process of fragmentation computationally to determine properties such as retention times. Software tools can then combine in silico fragmentation techniques with library searching, making compound identification quicker and easier. New hybrid techniques can even couple computational MSⁿ and NMR predictions into a single analysis platform. The ability to use data from multiple sources quickly and easily can improve the accuracy of annotating and identifying unknown metabolites.

Combining instrument and software solutions to overcome the identification bottleneck

The exceptional resolution of HRAM technologies, together with the development of intelligent data acquisition strategies and improvements in techniques such as MSⁿ, has led to greater speed and confidence in metabolite annotation and identification. Annotation requires extensive metabolomics databases and libraries, which have expanded hugely in recent years, largely due to advanced computational prediction techniques. For 100% confidence in the identification of compounds, pure chemical standards need to be run and compared to experimental data.

While the identification of unknown metabolites remains challenging, technology is advancing rapidly. The latest generation of instruments brings together several of the workflow solutions described above, offering improved data acquisition and enhanced computational tools within a single device. Modern HRAM instruments employ intelligent automation data acquisition to guide iterative rounds of MSⁿ, and can process the data generated using in silico fragmentation modelling and library searching. Integrating this functionality helps to streamline the process of annotating ‘unknowns’, advancing the potential of metabolomics research.

Metabolomics has developed rapidly over the past decade, and its impact is set to grow further as challenges associated with identifying unknown metabolites are addressed. Thanks to improvements in MS technologies and compound identification workflows, the untargeted identification bottleneck is being overcome. The latest instrument solutions are set to help researchers realize the full potential of metabolomics to drive improvements in disease diagnosis, monitoring and prevention.

Diagnostics

Diagnostics

Overcoming the Identification Bottleneck in Untargeted Metabolomics: Combining Technological and Software Innovations

Advances in MS instrumentation and software

Expanding metabolomics databases and libraries

Combining instrument and software solutions to overcome the identification bottleneck