Take a Tour of a COVID-19 Multiomics Data Dashboard

Article

Published: April 6, 2021

Michele Trott, PhD

Take a Tour of a COVID-19 Multiomics Data Dashboard content piece image

Listen with

Speechify

0:00

Thank you. Listen to this article using the player above. ✖

Want to listen to this article for FREE?

Complete the form below to unlock access to ALL audio articles.

Read time: 3 minutes

In the spirit of making data widely accessible to the scientific community during the COVID-19 pandemic, some researchers are creating interactive data dashboards that offer greater visualization capabilities. Among them is a COVID-19 multiomics data dashboard generated by authors of a paper recently published in Cell Systems titled: Large-Scale Multiomic Analysis of COVID-19 Severity. Ian Miller, research data scientist at the Coon Laboratory, University of Wisconsin-Madison, was a key developer of the platform and walked Technology Networks through some key features of each type of analysis, summarized below.

General features to note:

Patient subgroups are color-coded.
Use the dropdown box to display proteins, lipids, metabolites, transcripts or combined biomolecules. A single biomolecule of interest can also be selected, and the plots will be automatically updated.

Principal component analysis

This first page helps you get oriented using a discovery approach. You can ask "which molecules should I be most interested in? And how do they relate to the structure of the data?"

The main page features a principal component analysis (PCA) scores plot, where each point represents a patient sample. The way they cluster in space represents the extent of similarity across all measurements. PCA helps you compress the most interesting information down into fewer dimensions and reveals an innate structure in the data. The most severe patients are grouped together, suggesting there are differences in proteins driving these changes.

On the right-hand side of the page is a PCA loadings plot, where each point represents one biomolecule (i.e., one of 517 proteins, if "proteins" is selected on the dropdown box). The way they are spread out in space on the loadings plot explains how points on the PCA scores plot are separated. In other words, molecules on the far right of the PCA loadings plot are those that drive separation of points in the PCA plot.

Of note, one protein to stand out to the far right of the PCA loadings plot is cartilage acidic protein-1, a protein involved in olfactory bulb development – notable considering a loss of smell has been reported in COVID-19 patients. Miller points out that although a solid experimental connection has not been made, it is an observation that could be helpful for future studies.

“Part of the challenge of developing this type of tool is to get it sophisticated enough to make it powerful for analysis, but simple enough to make it usable and user friendly. It’s a huge challenge. So it's difficult to walk that line. But if nothing else, it provides someone like a clinician an easy way to look up their molecule quickly without having to dig through supplementary data.” – Ian Miller.

Differential expression

Volcano plot: Originally popularized by the transcriptomics field – transcriptomics has inspired a lot of the work in proteomics, especially in data analysis. Volcano plots are a nice way to compare A versus B anyways, i.e., COVID versus non-COVID. The volcano plot simultaneously tells you the effect size, and the statistical significance. We used a log two-fold change (fold-change of the average measurement in COVID patients compared to an average measurement in non-COVID patient) and transformed the p values (negative log 10) to suit the plot style – so that increasing values are more significant. Basically, the further away things are from the center, and the higher up they are, the more significant they are. An approximate threshold line can be drawn at 2 on the y-axis, which approximately equates to a p value of 0.01.

Table: Displays results of different types of statistical tests, enabling insights into effect size and confounding variables.

Linear regression

Enables an insight into how biomolecules relate to disease severity. Clinicians or researchers can select a standard clinical measurement (e.g. hospital-free days at day 45, C-reactive protein concentration) and see how it relates to a biomolecules’ relative normalized abundance across patient subgroups.

Clustergrammer

Clustergrammer is a third-party, interactive heatmap from the Ma’ayan Lab at the Icahn School of Medicine at Mount Sinai. Each column represents a sample, every row represents a biomolecule measurement (protein, metabolite, lipid or transcript, depending on which dataset is selected in the control panel). The tool provides a way to zoom in and investigate how a biomolecule of interest fits into larger patterns across samples. Tips on how to use the interactive features can be found here. Upon spotting a cluster, one could ask if the biomolecules belonged to the same pathway, immune dysfunction, or drug response, for example.

Article updated September 6, 2021, to remove the hyphen from "Multi-omics".

Meet the Author

Michele Trott, PhD

Michele Trott holds a PhD in endocrinology from Lincoln University. She works as a freelance science writer and scientific content manager for Izon Science.

Informatics

Informatics

Take a Tour of a COVID-19 Multiomics Data Dashboard

General features to note:

Principal component analysis

Linear regression

Clustergrammer

Take a Tour of a COVID-19 Multiomics Data Dashboard

COVID-19 Multiomics Research: Strategies, Insights and Tools

General features to note:

Principal component analysis

Linear regression

Clustergrammer