In the spirit of making data widely accessible to the scientific community during the COVID-19 pandemic, some researchers are creating interactive data dashboards that offer greater visualization capabilities. Among them is a COVID-19 multi-omics data dashboard generated by authors of a paper recently published in Cell Systems titled: Large-Scale Multi-omic Analysis of COVID-19 Severity. Ian Miller, research data scientist at the Coon Laboratory, University of Wisconsin-Madison, was a key developer of the platform and walked Technology Networks through some key features of each type of analysis, summarized below.
General features to note:
- Patient subgroups are color-coded.
- Use the dropdown box to display proteins, lipids, metabolites, transcripts or combined biomolecules. A single biomolecule of interest can also be selected, and the plots will be automatically updated.
Principal component analysis
This first page helps you get oriented using a discovery approach. You can ask "which molecules should I be most interested in? And how do they relate to the structure of the data?"
The main page features a principal component analysis (PCA) scores plot, where each point represents a patient sample. The way they cluster in space represents the extent of similarity across all measurements. PCA helps you compress the most interesting information down into fewer dimensions and reveals an innate structure in the data. The most severe patients are grouped together, suggesting there are differences in proteins driving these changes.
On the right-hand side of the page is a PCA loadings plot, where each point represents one biomolecule (i.e., one of 517 proteins, if "proteins" is selected on the dropdown box). The way they are spread out in space on the loadings plot explains how points on the PCA scores plot are separated. In other words, molecules on the far right of the PCA loadings plot are those that drive separation of points in the PCA plot.
Of note, one protein to stand out to the far right of the PCA loadings plot is cartilage acidic protein-1, a protein involved in olfactory bulb development – notable considering a loss of smell has been reported in COVID-19 patients. Miller points out that although a solid experimental connection has not been made, it is an observation that could be helpful for future studies.
“Part of the challenge of developing this type of tool is to get it sophisticated enough to make it powerful for analysis, but simple enough to make it usable and user friendly. It’s a huge challenge. So it's difficult to walk that line. But if nothing else, it provides someone like a clinician an easy way to look up their molecule quickly without having to dig through supplementary data.” – Ian Miller.
Volcano plot: Originally popularized by the transcriptomics field – transcriptomics has inspired a lot of the work in proteomics, especially in data analysis. Volcano plots are a nice way to compare A versus B anyways, i.e., COVID versus non-COVID. The volcano plot simultaneously tells you the effect size, and the statistical significance. We used a log two-fold change (fold-change of the average measurement in COVID patients compared to an average measurement in non-COVID patient) and transformed the p values (negative log 10) to suit the plot style – so that increasing values are more significant. Basically, the further away things are from the center, and the higher up they are, the more significant they are. An approximate threshold line can be drawn at 2 on the y-axis, which approximately equates to a p value of 0.01.
Table: Displays results of different types of statistical tests, enabling insights into effect size and confounding variables.
Enables an insight into how biomolecules relate to disease severity. Clinicians or researchers can select a standard clinical measurement (e.g. hospital-free days at day 45, C-reactive protein concentration) and see how it relates to a biomolecules’ relative normalized abundance across patient subgroups.
Clustergrammer is a third-party, interactive heatmap from the Ma’ayan Lab at the Icahn School of Medicine at Mount Sinai. Each column represents a sample, every row represents a biomolecule measurement (protein, metabolite, lipid or transcript, depending on which dataset is selected in the control panel). The tool provides a way to zoom in and investigate how a biomolecule of interest fits into larger patterns across samples. Tips on how to use the interactive features can be found here. Upon spotting a cluster, one could ask if the biomolecules belonged to the same pathway, immune dysfunction, or drug response, for example.