RAMclust/RAMsearch: efficient post-XCMS feature clustering and annotation of MS-based metabolomics datasets
Poster Dec 22, 2016
Corey D. Broeckling and Jessica E. Prenni
Introduction: Chromatographically coupled mass spectrometry is a powerful tool for profiling, semi-quantitatively or quantitatively, a breadth of small molecules with sensitivity and selectivity. The complexity of these datasets has driven the development of informatics approaches for feature finding, retention time alignment, feature grouping, and annotation. However, the complexity of signals derived from a single compound is generally underestimated, resulting in poor spectral reproducibility, misannotation, and misinterpretation of individual mass signals. This limitation has driven us to develop informatics tools to improve the quality of post-XCMS data processing.
Methods: RAMclustR is developed in R and is freely available. It is designed with memory constraints in mind, and operates on the scale of minutes, but can take an hour when peak shape similarity scoring is also used. The output is initially an R object containing a dataset of reduced dimensionality as compared to the input XCMS set, as well as spectra which are written to .msp format. These spectra can include MSE (indiscriminant MS/MS) spectra when available. This msp format is taken as input for RAMseach, a .NET-based GUI for performing batch spectral searching against NIST formatted spectral libraries. The results can be output in a format which can be reimported back into the ramclustR.
Preliminary Results: RAMclustR feature similarity scores are calculated for all feature pairs in the input XCMS R object, where feature similarity is the product of individual similarities in correlation in intensity across the dataset, feature retention time, and peak shape. The contribution of each score is tunable using sigmoid functions, enabling the evaluation of results and adjustment, when necessary. The output datasets demonstrate improved injection reproducibility as compared to individual features, reduce false discovery error rate burden, and improve annotation quality. Annotation efficiency is dramatically improved by utilizing the output spectra from RAMclustR as input for spectral searching using RAMsearch, a novel GUI for batch searching and manual validation of search results. The output from RAMsearch is imported into RAMclustR, enabling the storing, visualization, and sharing of the evidence for a given annotation. These output are suitable as supplementary material upon publication of the dataset, to ensure transparency in the annotation process. This workflow reduces annotation time several fold by automating routine manual tasks. Further, it is designed to streamline the efforts that go into reporting annotation confidence, which will enable more robust, transparent, and accessible reporting of metabolomics data.
Investigating the Effects of Fructose Consumption and Inadequate Copper Intake on Nonalcoholic Fatty Liver DiseasePoster
Metabolomics is a viable method for identifying compounds associated with NAFLD. In this investigation, high-quality data facilitate the identification of key metabolites differentiating normal versus diseased species.READ MORE
Exploration of High-Temperature Petroleum Analysis Using Comprehensive GCxGC-TOFMSPoster
The shift toward increasing use of heavy crude oils in the petroleum industry makes it necessary for the capabilities of instrumental analysis to follow suit.READ MORE
Determination of Water-and Fat-Soluble Vitamins in Gummies by Reversed-Phase Liquid ChromatographyPoster
Vitamins are essential substances for human health and growth. Storage, aging, and processing of foods may cause vitamin loss. d growth. There is a need to develop a fast and accurate analytical method for the determination of vitamins in foods.READ MORE