NMR Spectroscopy and Databases for the Identification of Metabolites
Article Mar 01, 2019 | By Becky A. Gee, Ph.D.
Nuclear magnetic resonance (NMR) spectroscopy is a powerful technique used to identify and quantify the constituents of complex mixtures. Each nucleus of an atom which has a non-zero spin quantum number possesses angular momentum and therefore a magnetic moment. When a sample is placed in a static superconducting magnetic field, the magnetic moment of the nuclei (e.g. protons, 1H) couple to this magnetic field. NMR spectroscopists probe the chemical environment of the nuclei in the magnetic field by applying a radio-frequency pulse tailored to select the signals (resonances) from specific nuclei such as protons. Each nucleus in a unique chemical environment in a sample gives rise to a specific resonance signal (frequency). In addition, the integrated area of the resonance frequency is proportional to the number of nuclei in each unique chemical environment. These two pieces of information which constitute the NMR spectrum – resonance frequency and area – are used by research scientists to identify the structure of a molecule.
For example, the aliphatic, aromatic and carboxylic acid protons in aspirin (C9H8O4) give rise to unique frequencies and integrated signal areas in the 1H NMR spectrum allowing one to evaluate the purity of an aspirin sample. Larger molecules with many protons in different chemical environments give rise to very complex NMR spectra. To obtain additional information enabling structure determination, NMR spectroscopists probe other nuclei such as 13C or 15N. NMR spectroscopists also employ sophisticated techniques where sequences of radio-frequency pulses are applied to two nuclei (e.g. 1H -13C heteronuclear correlation NMR) to obtain spectra with information about the sequence in which the atoms are bonded together in a molecule. The versatility and breadth of sophisticated pulse sequences give NMR spectroscopists a powerful tool to identify the molecular structures and identities of numerous types of materials.
Sophisticated NMR spectroscopy techniques are widely used to determine the structure of proteins such as the amyloid-β monomers and fibrils that play a role in Alzheimer’s disease. NMR spectroscopy, including 1H NMR and heteronuclear correlation spectroscopy, is also the predominant method used to acquire metabolic profiling data. In combination with computational methods and databases of resonances, NMR spectroscopy has allowed research scientists to identify the metabolites produced by metabolic processes. The identification of these metabolites may be used in areas such as the early stages of drug development and as indicators of human health. Here, we take a look at how researchers are using NMR spectroscopy and databases in the emerging field of metabolomics.
Metabolic profiling and NMR
Professor Jeremy Everett, University of Greenwich, and his research group use metabolic profiling - also known as metabonomics or metabolomics - as a powerful methodology for establishing the metabolic phenotypes of organisms. Metabolic phenotypes are important as they report on the integrated physiological status of an organism in real-time, in contrast to genomics which reports on the structure, function, evolution, and mapping of an organism’s genome which can be modified by gene splicing to alter an organism’s traits.
The reproducibility of NMR spectroscopy makes the identification of metabolites in complex mixtures straightforward, as long as the spectra are acquired under similar conditions and at a similar magnetic field strength. Given these requirements, NMR spectra of metabolite reference standards in databases such as the Human Metabolome Database (HMDB) (1), match well with those of the same metabolites in complex mixtures. Hence, databases such as the HMDB are critical tools, alongside other methodologies, in aiding metabolite identification in metabolic profiling experiments (2).
“NMR Spectroscopy is the tool of choice for metabolite detection and identification in our metabonomics group at the University of Greenwich, UK, because of its non-selective metabolite detection capability, its quantitative nature, its stability and ruggedness and its unparalleled ability to determine the structures of metabolites in complex mixtures. We are successfully using NMR spectroscopy for fundamental metabonomics studies in the areas of obesity, ageing and colorectal cancer,” says Professor Everett.
The power of automated analysis
Other research groups are combining the power of NMR and databases to automate the analysis of spectra. The 1H NMR data of metabolites from cells, tissues, and other biological samples may be collected in under one hour. However, it may take an expert spectroscopist a day or more to analyze the data from one sample. Research scientists may analyze twenty or more samples in a small metabolomics research project where each sample contains about 50 metabolites. Professor John Markley and his research group at the University of Wisconsin at Madison have developed a web-based platform to automate the analysis of 1H NMR data thereby reducing the time to identify metabolites in complex mixtures.
Professor Markley’s automated analysis of 1H NMR data relies on an existing library of over 1000 small molecules and statistical (Bayesian) analysis where spectra of each compound from a database are used to identify individual metabolites in a mixture.
“We are developing a new approach to the automated analysis of 1H NMR data that builds on our growing library of NMR spin system matrices for over 1000 small molecules and the solid underpinning of Bayesian analysis. The basic idea is that parameterization by parameterized small molecule templates (PSMTs) representing the NMR spectra of individual metabolites can form the basis for Bayesian analysis of NMR spectra of metabolite mixtures leading to significantly enhanced identification and quantification of constituent compounds,” says Professor Markley.
As mentioned above, the identification of metabolites in complex mixtures is straightforward, as long as the spectra are acquired under similar conditions. However, NMR spectroscopists may collect data under different magnetic field strengths and other experimental conditions, such as compound concentration, pH, or temperature. Professor Markley’s automated analysis of NMR data relies on software he and his research colleagues developed to address different experimental conditions and to address how research scientists provide software and database developers a unique string of letters and numbers for each metabolite.
“This project leverages two recent technologies developed at the National Magnetic Resonance Facility at Madison (NMRFAM): a software tool that takes the 3D structure of a compound and creates a unique and reproducible name for that compound as well as labels for each of its constituent atoms (3) and a software tool that expedites the calculation of a PSMT for a compound that represents its proton NMR spectrum (4). Together, these tools form the basis for a novel approach to identify and quantify compounds in soluble mixtures of the kind studied by metabolomics. PSMTs can be used to simulate NMR spectra collected at different magnetic field strengths and different pH values, and they also can be tuned to account for differential relaxation effects and solute interactions. Because PSMTs can simulate NMR spectra at different field strengths, our library of PSMTs and software will remain useful as new NMR instruments operating at higher fields become available,” explains Professor Markley.
As the fields of NMR spectroscopy and metabolomics grow, the scientific community relies on publicly available research tools, such as the one Professor Markley and his colleagues have developed, to automate and speed up the analysis of NMR data.
“Using PSMTs as a model, we have developed a web server called Bayesxplorer which accepts 1H NMR spectra along with a list of expected metabolites chosen from those for which PSMTs (GISSMO files) already are available.” says Professor Markley.
Bayesexplorer gives researchers information about the probability that a specific metabolite is present in their sample and also flags compounds that do not match those in the database. Professor Markley remarks, “Bayesxplorer implements the latest advances in computational Bayesian statistics to provide robust identification and quantification results. The web server then returns information on the likelihood of the detected compounds being present and estimates (with error bars) on their relative concentration. In addition, signatures that do not match compounds in the PSMT library are detected and identified for further investigation.”
His research group continues to refine and improve Bayesexplorer by adding more 1H NMR data that NMR spectroscopists will use to identify metabolites in complex mixtures. “This development site is benefitting from our experience in handling a wide range of spectra. Bugs are being fixed and a priority list has been created for the 1H NMR spectra of additional compounds to be turned into PSMTs to make the approach more robust and comprehensive,” says Professor Markley.
Publicly available NMR databases for metabolomics
There are numerous publicly available databases for the metabolomics NMR research community in addition to the HMDB mentioned above (1). These databases collectively contain 1H and 13C 1D and 2D NMR data for over 40,000 compounds (5). The research community accesses these databases as a starting point to analyze and interpret their NMR data.
The Biological Magnetic Resonance Bank (BMRB) contains 1H, 13C, and homo- and heteronuclear NMR data for 906 compounds such as proteins, peptide, nucleic acids and other biomolecules (6).
Whereas, the Madison-Qindao Metabolomics Consortium Database (MQMCD) (7), the Birmingham Metabolite Library (BML-NMR) (8), the Platform for RIKEN Metabolomics via SpinAssign (PRIMe) (9), and the TOSCY Customized Carbon Trace Archive (TOCCATA) (10) each contain NMR data of metabolites.
The Magnetic Resonance Metabolomics Database (MRMD) also summarizes NMR parameters such as chemical shift, multiplicity and isotope along with experimental conditions such as the magnetic field strength, temperature and pH (11).
NMRShiftDB is the most extensive and broadest database containing 1H and 13C NMR data for over 40,000 organic molecules.
As the research community continues to make progress in the field of NMR spectroscopy, these and many other databases will continue to grow and enable spectroscopists, such as Dr. Joseph Sachleben, Technical Director of the Biomolecular NMR Core Facility at the University of Chicago, to solve crucial scientific problems.
“As technical director, I help our faculty solve problems in medicine and biochemistry. Online NMR databases, such as the Biological Magnetic Resonance Bank (BMRB), Human Metabolome Library (HML), the AOCS Lipid library, play a crucial role in our ability to solve those problems quickly and efficiently. For instance, when probing protein-protein or protein-drug interactions, having access to assigned protein NMR spectra from the BMRB allows us to concentrate on the interactions of interest without the time and cost of first assigning the spectra. In metabolomics and lipidomics studies, access to those databases allows us to efficiently identify relevant substances. Without access to these databases, progress in solving important medical and biochemical problems would be significantly slowed. The databases allow the true power of the knowledge acquired by the world’s NMR spectroscopists to be used to solve important scientific problems.”
1. Wishart, D.S., Feunang, Y.Dd, Marcu A., Guo, A.C., Liang, K,. et al., HMDB 4.0: The Human Metabolome Database for 2018. Nucleic Acids Res. 46 (D1):D608-17 (2018).
2. Dona, A., Kyriakides, M., Scott, F., Shephard, E., Varshavi, D., Veselkov, K. & Everett, J.R. A Guide to the Identification of Metabolites in NMR-Based Metabonomics/Metabolomics Experiments. Computational And Structural Biotechnology Journal, 14, 135-153 (2016).
3. Dashti, H., Westler, W.M., Markley, J.L., Eghbalnia, H.R., Unique identifiers for small molecules enable rigorous labeling of their atoms. Scientific data. 4, 170073 (2017).
4. Dashti, H., Westler, W.M., Tonelli, M., Wedell, J.R., Markley, J.L., Eghbalnia, H.R. Spin System Modeling of Nuclear Magnetic Resonance Spectra for Applications in Metabolomics and Small Molecule Screening. Anal Chem. 89 (22), 12201-8 (2017).
5. Ellinger, J., Chylla, R.A., Ulrich, E.L., Markley J.L., Databases and Software for NMR-Based Metabolomics. Curr Metabolomics, 1 (1), 15-27, (2013).
6. Markley, J.L., Anderson, M.E., Cui, Q., Eghbalnia, H.R., Lewis, I.A., Hegeman, A.D., Li, J., Schulte, C.F., Sussman, M.R., Westler, W.M., Ulrich, E.L., Zolnai, Z., New bioinformatics resources for metabolomics. Pac. Symp Biocomput., 157–168 (2007).
7. Cui, Q., Lewis, I.A., Hegeman, A.D., Anderson, M.E., Li, J., Schulte, C.F., Westler, W.M., Eghbalnia, H.R., Sussman, M.R., Markley, J.L.. Metabolite identification via the Madison Metabolomics Consortium Database. Nat Biotechnol. 26 (2):162–164 (2008).
8. Ludwig, C, Easton, J., Lodi, A., Tiziani, S., Manzoor, S., Southam, A., Byrne, J., Bishop, L., He, S., Arvanitis, T., Günther, U., Viant, M., Birmingham Metabolite Library: a publicly accessible datab.ase of 1-D 1H and 2-D 1H J-resolved NMR spectra of authentic metabolite standards (BML-NMR). Metabolomics. 8 (1), 8–18 (2012).
9. Akiyama, K., Chikayama, E., Yuasa, H., Shimada, Y., Tohge, T., Shinozaki, K., Hirai, M.Y., Sakurai, T., Kikuchi, J., Saito, K., PRIMe: A Web Site That Assembles Tools for Metabolomics and Transcriptomics. In Silico Biology. 8 (3), 339–345 (2008).
10. Bingol, K., Zhang, F., Bruschweiler-Li, L., Brüschweiler, R., TOCCATA: A Customized Carbon Total Correlation Spectroscopy NMR Metabolomics Database. Anal Chem., 84 (21), 9395–9401, (2012).
11. Lundberg, P., Vogel, T., Malusek, A., Lundquist, P., Cohen, L., Dahlvqist, O., MDL - The Magnetic Resonance Metabolomics Database (mdl.imv.liu.se). ESMRMB 2005 Congress; Basel, Switzerland (2005).
This week's instalment of "The Evolution of Proteomics" features an interview with Emanuel Petricoin. Dedicating his career to driving the clinical proteomics field forward, Petricoin's research focuses on the development of cutting-edge microproteomic technologies, identifying and discovering biomarkers for early disease detection and creating nanotechnology tools for analyte detection, drug delivery and monitoring.READ MORE
A key challenge facing biotechnology and pharmaceutical laboratories is how they can reduce mislabeling errors, which can be difficult to spot and have a huge impact on a research project. In this article, Steve Yemm, CEO at digital science and lab informatics company BioData, looks at how barcode labeling can help cut out errors, reduce costs, and increase efficiency in the lab.READ MORE