Can Supercomputers Break Down Data Silos?
Article Mar 28, 2018 | by Ruairi J Mackenzie, Science Writer for Technology Networks
The Iridis Supercomputer. Credit: University of Southampton
The race to build ever-more powerful supercomputers often seems somewhat superficial, a simple game of one-upmanship between research institutes and universities where academic performance and potential is measured in cores and petaflops. The announcements heralding a new supercomputer are often quite drab, focusing in on just the raw numbers involved, rather than how the computer actually benefits their academic homes, which spend lofty sums of cash on buying and maintaining them.
The University of Southampton’s release about their latest hulking computer, Iridis 5, was a bit different. It dispensed with the dry figures (if that’s your thing, those figures are: 20,000 cores, 1,305 teraflops, number 251 on the Top 500 supercomputer list) and instead highlighted the actual impact on day-to-day research that Iridis has. The supercomputer’s power has helped scientists working with America’s Cup yacht racers model fluid dynamics, powered studies trying to understand how bacteria defend themselves against antibiotics and even allowed researchers to look at jaguar population dynamics. I therefore jumped at the chance to discuss how Iridis is used in more detail with Oz Parchment, Director of the University’s iSolutions IT support division.
Supercomputers: More Than Data Generation
“You put in a model [to a supercomputer] and you get some data out, and then you take that data and put it in a completely different environment to get some insight into what you’re doing. What we’re trying to focus on is supporting more of the research pipeline. So not just being able to generate staggering amounts of data for people to analyse, but actually helping them in their analysis.”
Oz continues: “We’re plugging Iridis into bioinformatics pipelines and plugging it into deep storage arrays where people can look at multi-petabytes’ worth of data, pulling that on and off the system. There’s data analytics, there’s visualisation. Where before we really had just a data generation engine, we’re now trying to use it to underpin the workflow of research. It’s quite a different approach for us.”
The drive to use supercomputers as a complete research tool rather than just a number cruncher is down to two factors; increased power and a spike in use by researchers.
20 years ago, Oz tells me, having a dozen people on the shared computing system would just about cripple the network. Now, Iridis supports a user base of around 1100 researchers. The ability to support this research ecosystem means that more projects can take advantage of supercomputers: “Our researchers are wanting to do far more complex workloads than they’ve done before and need much more automation and infrastructure support.”
But Iridis, built by the high performance compute, storage and data analytics integrator OCF, can do much more than support research pipelines. One innovation could see the technology find trends in data that would previously have gone undiscovered. Southampton have committed to undertaking a trial of data solutions company Strongbox’s new StrongLink data management system.
StrongLink makes it easier to store and contain the vast amounts of data that research produces. The program is vendor-agnostic and capable of bridging multiple data formats. Oz hopes that implementing StrongLink alongside a system as powerful as Iridis can make research a more open and collaborative project. The aim is to break down data silos -accumulations of unshared data that are isolated by one department.
Oz comments, “One of the things that we’ve been trying to do over the last five years is get more value out of our data. When you think about the amount of simulations that are run on a supercomputer, and what data you have generated, oftentimes you don’t necessarily know all the value that’s locked away inside that data, you tend to have a view of what aspects of that data you’re going to want to look at. The rest of the data is just not looked at, simply because you haven’t thought about how you can use it.”
Breaking Down Data Silos
“One way StrongLink interested us was its cognitive component, the ability to look at and match up metadata at scale, which gets interesting when you combine that with different data infrastructures. Our set up currently includes large-scale tape stores, large-scale disc stores, some of that being active data, some of that being nearline data, some being effectively offline data. But then, by linking these into the [Iridis] framework, which StrongLink allows us to do, we can connect these various data lakes that we have across the research side of the organization, and begin to create an open data space for our community where people in one discipline can look through data and see what kinds of data are available in other communities.“
What’s certain is that Southampton are approaching their supercomputing setup in way that could fully realize the systems’ power. For Oz, Iridis’s key innovation is allowing the University to use data like never before, “We’re putting data at the heart of what we do, in terms of trying to manage that data and giving our people the best environment to be able to exploit and turn that data into insight and knowledge.”
We spoke to Andrew Howley from Adventure Scientists,a pioneering not-for-profit organization that seeks to unite skilled adventurers with scientists keen to receive valuable data from remote areas, to learn more about the initiative and the impact their projects are having in the scientific community and beyond.READ MORE
If you work in science, chances are you spend upwards of 50% of your time analyzing data in one form or another.However, it's easy to get lost when it comes to the question of what techniques to apply to what data. This is where data mining comes in - put broadly, data mining is the utilization of statistical techniques to discover patterns or associations in the datasets you have. Here we provide an overview of the critical steps you'll need to get the most out of your data analysis pipeline.