Data visualization makes its comeback — from static plots to storytelling
Around 15 years ago, for biologists awash with data, visualization was often dealt with as an add-on, an afterthought. The tools were basic, and there was little in the way of interactivity.
Fast-forward to today, and data visualization is making something of a comeback. Advances in technology and an increased appreciation of its importance and potential by the life sciences community and funders, are contributing to a renaissance.
So why is data visualization making a return?
Data visualization: The wonder of web
Today’s data visualization tools have been transformed by their ability to run in web browsers with no need to download specialized software. “When you think of web technology – the high-quality computer games that run in a browser, the technology that’s enabled standardization across browser platforms – it has really pushed the envelope in terms of the visualization tools you can create” says Nils Gehlenborg, Assistant Professor in the Department of Biomedical Informatics at Harvard Medical School.
The field has benefited enormously from efforts across computer science involving a very large community of people. “Today, we can create tools that will run pretty much in whatever web browser you’ve got, and you don’t have to worry about whether people have the right software to use it, you know it’ll just work.”
This has allowed the visualization community to focus more on the problem at hand, rather than worry about building the infrastructure just to get the tools to work.
It’s a game changer, according to Marc Streit, Associate Professor at Johannes Kepler University Linz, and CEO of Datavisyn, which develops visualization tools for pharma and biomedical R&D: “The move to the web has also brought cloud technology, which is particularly critical for pharma companies, who want to pull data from multiple in-house and public databases and access everything in the same system.”
Streit is working with pharma on tools that combine clinical and experimental data from patients on trials, public data like The Cancer Genome Atlas, in-house cell culture experiments, and metadata. “This fusion is now happening in the browser of the user, and can pull information from multiple servers simultaneously.”
3D data visualization: Plug and play
Being able to render 3D data very quickly in a browser is not the only thing helping to create better visualization tools. There are now many more standard components available that researchers can use rather than creating everything from scratch.
“People are explicitly designing now with the intention that we will provide a platform that can be extended towards different ends”, says Jeremy Goecks, Assistant Professor of Biomedical Engineering at Oregon Health and Science University. “The intention is to provide a common core set of functionalities and then allow people to extend that.”
A key advance is ‘virtualization’, he says, where you either have fully virtual operating systems or computing clusters that work like a container, holding all the software that you need to run a visualization is available in one place. You just plug this container in and it’s already been configured for you. “It makes it so much easier to work with these tools, because bioinformatics has lots of dependencies – some are written by professional software developers, but many are not, and these virtualizations make it much easier to plug them together.”
Data visualization for exploration
One of the reasons visualization is making a comeback is its potential to aid exploration and new insights. There are two worlds in data visualization, Streit says. One is using visualization for presentation purposes to explain something that you already know. And a second uses it for exploration.
“Most of the people who hear data visualization think about presentation and not exploration. Pharma R&D is of course interested in presentation because it’s necessary to present findings to stakeholders. But the other world is becoming even more interesting to them because they want to find new insights, new targets for their drug.”
Traditionally, data visualization has been predominantly used to create static plots, he explains. But these days, people want to interact with the data. They’ll spot something interesting in the visualization and want to be able to drill down to see the detail. That’s something that wasn’t possible with the older tools.
It could also play a key role in tackling the data reproducibility crisis, says Streit. “New systems can track what the user does during data exploration, documenting every step. You can recall the information later and use it to share with others and explain what you’ve found, and importantly, how you found it.”
Bridging the gap
Data visualization is the primary way that we bridge the gap between the computer and the human investigator, says Gehlenborg. And this gap is only getting bigger as new data and models emerge. “What we’re seeing now with machine and deep learning, is that people have these great predictive models but the problem is how to interpret what the model is doing.”
This is a big topic for the data visualization community overall, he explains, and one where there needs to be progress made in the interface between researchers building the algorithms and those doing the visualization. “We need to bring all these systems together and allow developers and analysts to interpret data in a comprehensive fashion rather than looking only at small-scale problems.”
There are also new consumers for visualization: How will a clinician want to look at data? How do you visualize biomedical data for patients?
Goecks is a fan of dashboards that present several visualizations simultaneously, allowing users to drag and drop visualizations and seamlessly connect them together: “One visualization is probably not sufficient in many cases, even for one data type. If you can allow users to compose dashboards it will help ensure that the tools are usable, not only by technologists but by the end-users – the physicians, the biologists, the translational researchers. It’s got to be interactive so that we can dive down into the details because every physician who brings their expertise to the table will want to look at that data slightly differently.”
There is a need for real creativity to produce these more sophisticated visualizations, says Streit: “You need to keep thinking about clever strategies to visualize your data. It would be nice if, as the user, you could type in or ask the system ‘I want to see the distribution of gene expression in my data’, and the system could come up with the perfect visualization because you formulated the question. But we’re not there yet. That will be the challenge for the next ten years.”