Analysis of Cerebrospinal Fluid (CSF) samples holds great promise to diagnose neurological pathologies and gain insight into the molecular background of these pathologies. Proteomics and metabolomics methods provide invaluable information on the biomolecular content of CSF and thereby on the possible status of the central nervous system, including neurological pathologies. The combined information provides a more complete description of CSF content. Extracting the full combined information requires a combined analysis of different datasets i.e. fusion of the data.
A novel fusion method is presented and applied to proteomics and metabolomics data from a pre-clinical model of multiple sclerosis: an Experimental Autoimmune Encephalomyelitis (EAE) model in rats. The method follows a mid-level fusion architecture. The relevant information is extracted per platform using extended canonical variates analysis. The results are subsequently merged in order to be analyzed jointly. We find that the combined proteome and metabolome data allow for the efficient and reliable discrimination between healthy, peripherally inflamed rats, and rats at the onset of the EAE. The predicted accuracy reaches 89 % on a test set. The important variables (metabolites and proteins) in this model are known to be linked to EAE and/or multiple sclerosis.
Fusion of proteomics and metabolomics data is possible. The main issues of high-dimensionality and missing values are overcome. The outcome leads to higher accuracy in prediction and more exhaustive description of the disease profile. The biological interpretation of the involved variables validates our fusion approach.
This article is published in BMC Bioinformatics and is free to access.