4 Ways That the Cloud is Changing Research
Cloud-based informatics solutions have become an integral part of research. The cloud is everywhere these days, and every new informatics tool seems to have some cloud feature or function. But what does this actually mean for researchers? With this handy list, you’ll find that the answer is “quite a lot”, as we explore four key ways in which the cloud has changed research.
A definition of the cloud
Before we get started, it’s important to lay out exactly what we mean by cloud computing. Cloud computing makes various IT services and resources available on demand to a user without those resources having to be directly managed or hosted by that user. Cloud computing resources are typically defined by the huge scale of the resources they provide, which take advantage of economies of scale, the elasticity of the platform (it can grow or shrink depending on the need of the user), and the sharing and flexibility of resources.
Cloud computing is typically divided into three types of service. Software-as-a-service (SaaS) models provide applications and software on a pay-as-you-go basis. Platform-as-a-service (PaaS) provides the development tools necessary to design bespoke apps within the cloud infrastructure, whilst infrastructure-as-a-service (IaaS) allows users to leverage the power of high-performance computing through the cloud. To discover more information about how cloud-based informatics are designed, download our infographic, available here. For now, let’s delve into four ways that the cloud is changing research.
Increased access to applications
Cloud technologies started with SaaS, and the last five years have seen these software models become commonplace, as was predicted by the European Commission’s Cloud Expert Group way back in 2012. SaaS removes the financial burden of expensive licenses for labs and replaces it with a pay-as-you-go model which is far cheaper, and allows researchers to use the tools they need for the task they need it for - and that task alone. PaaS models provide researchers the tools necessary to design their own application using the infrastructure of the cloud provider (we’ll touch on the benefits of that infrastructure below). Applications offered as on a pay-as-you-go model through the cloud now go far beyond basic word processors and email clients. Researchers can now use cloud-based Laboratory Information Management Systems, Electronic Lab Notebooks and Chromatography Data Systems which are offered by vendors, and, with the flexibility of PaaS solutions, can leverage the cloud for specialized applications. The cloud can therefore provide applications that are beneficial across the entirety of biomedical science. You name it, there is a cloud-based tool for it: from medical imaging to electrophysiology to mass spectrometry.
Supercomputers for everyone
IaaS is possibly the cloud application with the widest appeal – rather than offering specialized applications, IaaS makes high-performance computing resources available to any scientist that might need them. This has had two major impacts on research. First, opening the door to big data projects, which are increasingly important across science. The omics explosion has seen huge datasets become far more common; a single whole genome sequence fills up 100 gigabytes of data, and the time taken to analyse files of this size would be mind-numbing for even the most patient of researchers using classic computing methods. Instead, cloud-hosted tools like CloudAligner and CloudBurst allow research teams to take the load off of their own infrastructure and split the effort across backend servers. This has made it possible for researchers to leverage datasets that would previously have been too unwieldy. The omics revolution has only been made possible by the concurrent boom in cloud technologies.
The second major impact of IaaS platforms is a huge democratization of science. These big data projects are no longer just the reserve of major research institutes, although they are still able to acquire pretty powerful in-house tech; the Blue Brain Project, a neural simulation initiative based out of the Swiss École Polytechnique Fédérale de Lausanne, recently paid nearly $18 million to cover its new supercomputer, Blue Brain 5. With cloud computing, if your budget doesn’t quite cover a couple dozen Petaflops, you aren’t restricted from conducting the experiments you need to. A great example of the platforms that the cloud enables is the Broad Institute’s FireCloud, an open platform for scalable data analysis which is accessible to any researcher with a Google account. Firecloud’s elastic nature means researchers can use as much or as little power as they wish. The Broad Institute has also made their best practice workflows available to users.
Databases for all your needs
Before one can start playing with cloud-based analytics applications and utilize cloud-based supercomputers, researchers need access to data, and the cloud is playing an increasingly important role in providing that access. Data from large-scale genomics projects such as 1000 Genomes are available through cloud providers Amazon Web Services (AWS). You’ll find that even if the cloud is opening up science, provision of cloud services remains an oligopoly, with Amazon, Google and Microsoft owning over half the market for cloud infrastructure between them. The NIH’s decision to open up its genetic databases to cloud storage in 2015 has only helped accelerate the growth of cloud databases. As researchers involved in the Pan Cancer Analysis of Whole Genomes detailed in a recent Nature comment piece, if a researcher wanted to download the International Cancer Genome Consortium’s dataset, which weighs in a two petabytes – equivalent to just under seven years of full HD video recording – then a typical university internet connection would have them waiting more than 15 months. Cloud technologies are the best way to store and download files in the new era of big data.
Sharing can become a reality
Building up huge piles of results and sitting on them like a data dragon doesn’t do anyone much good. Sharing data efficiently and effectively is essential in modern-day research, and the cloud makes that process infinitely easier. Cloud innovation is a key part of the NIH’s BD2K initiative, which aims to maximize biomedical sciences’ digital potential, whilst the European Open Science Cloud (EOSC) is a Europe-wide project to create a data commons for scientists. Whilst for the EOSC, the term cloud is a metaphor for open data, cloud technologies will be essential to the project’s goal of making data, and the resources needed to analyse that data, accessible across the continent.