Scientists have used supercomputers to help understand the biological, chemical and physical changes our world is going through, to discover new frontiers in science, and to invent new technologies to improve the human condition. Now, researchers from the University of Victoria (UVic) on British Columbia’s Vancouver Island are using supercomputing resources to address one of society’s biggest challenges – the risks of an unregulated drug market – and help change the lives of those impacted by overdose. Their work shows how these cold, complex machines can have personal impacts on our lives and opens doors to greater possibilities for our safety and security.
The Arbutus HPC cloud
The Arbutus phase 2 HPC cloud was recently deployed at the University of Victoria (UVic). Part of a Compute Canada and WestGrid initiative, Arbutus was initially launched in 2015 to support a new generation of investigators who needed access to HPC cloud resources. “Our existing IT services at the time did not have the infrastructure that could provide the answers to some of our researchers’ advanced computing needs,” commented Belaid Moa advanced research computing specialist with the Research Computing Services unit, University Systems Department. “We had HPC clusters, but researchers were in dire need for high-availability collaborative platforms, customized web sites, root access, micro-services environments and other services of cloud computing, which was rapidly becoming as important as HPC clusters and an essential ARC service for many researchers.”
With the lack of cloud services at UVic at the time, some researchers were running projects on Amazon, Azure and Google’s own cloud platforms. Based on this demand, the private Arbutus OpenStack cloud infrastructure was built in 2015 to deliver infrastructure-as-a-service (IaaS) resources and support a diverse library of workloads. Arbutus 1 included several thousand cores of Intel Xeon processors, 10 gigabit Ethernet networking and 1.6 petabytes (PB) of triple redundant Ceph storage (4.8 PB total). Ceph storage is an open-source software storage platform that implements object storage on a single distributed computer cluster.
Over the following four years, new research projects were launched, many of which began using emerging technology capabilities and research environments, such as machine learning (ML), artificial intelligence (AI), JupyterHub and big data. These new projects, along with increasing demand for cloud services, required more storage, advanced computing and larger memory pools – leading to a larger cloud infrastructure and Arbutus 2.
Arbutus and the Victoria Island Drug Checking Project
One of the researchers at UVic taking advantage of these new resources is Professor Dennis Hore of UVic’s Chemistry and Computer Science Departments.
“For the last 20 years, I’ve been studying how molecules interact with surfaces,” Hore stated. “As an example, there are many types of plastics used in the human body – catheters, stents, sutures, artificial organs – the conformation of proteins attached to these devices is of paramount importance to their function. Our group tackles questions related to resolving the molecular basis of biocompatibility using a combination of experimental and theoretical approaches, including molecular dynamics simulations on HPC clusters.”
But an inquiry from a harm reduction pharmacist inspired a new project to help reduce overdose numbers of users of non-prescription street drugs, including fentanyl, the compound at the heart of the opioid crisis in Canada and the United States.
“The pharmacist wanted to develop an in-house quality control test for prescription medication,” Hore explained. “While he had been acquiring the particular drug for years from the same manufacturer, his customers were telling him it was affecting them differently from past usage. The goal of identifying components in drugs ultimately led to the Vancouver Island Drug Checking Project.”
Gaining insight into street drugs with machine learning
Over the past three years the project has combined Hore’s core area of research with his interests in big data, machine learning, software and hardware engineering. The project is providing new data about the use of opioids and other drugs.
“People want to know what is in their drugs,” Hore added. “How much fentanyl, MDMA (Ecstasy/Molly), morphine, or other chemicals and cutting agents. We work anonymously with people to inform them about the makeup of drugs they voluntarily bring in for analysis. Our goal is to quantify the constituents of the sample, both active ingredients and cutting agents. Once we have the analysis, harm reduction workers let service users know what they’re dealing with and give them appropriate guidance if they want it.”
In 20 minutes, the team would run the sample through a host of analytical instruments – including a mass spectrometer, infrared spectrometer, two different types of Raman scattering and antibody tests strips. Besides the potential hardware issues of blitzing the analyses through all these instruments and tests in parallel in a short time, understanding the results is difficult. Chemical agents leave fingerprints, and Hore’s team relies on libraries and databases of known agents to which they match the fingerprints. But the more compounds in the sample or the larger the library, the more difficult it becomes to match. And as new designer drugs hit the streets for which there is no known fingerprint, such as those based on fentanyl, their analyses become even more complicated. The project needed the resources of cloud HPC. That’s where Arbutus came in.
“When a researcher requests an environment, what we consider their own virtual lab,” Moa explained, “we set up the network and hardware to support their work. They can then create their own environment in minutes from a library of tools and software with or without our support.”
Each researcher customizes his or her virtual lab. Some only need highly available web sites to collect and/or share data. Others install big data applications, like Apache Spark. Some are running small-scale HPC workloads, including GROMACS, the molecular dynamics software used for studying things like the SARS-CoV-2 virus. And others build machine learning platforms. Hore’s project required a mixed environment for many workloads.
“We use the data we collect from each sample along with chemical libraries and databases to build machine learning algorithms and applications,” Hore explained.
His team is using everything from unsupervised methods, such as simple principal component analysis, partial least squares regression, and random forest classification, to hierarchical cluster analysis and other approaches. According to Hore, there are too many possibilities for a single pipeline.
“The idea here is to be able to do the full stage of online machine learning,” added Moa. “We learn from existing samples, use that learning to describe the next samples, predict outcomes, and prescribe actions based on what we’re learning.”
But that requires a flexible yet powerful computing infrastructure, like Arbutus, in order to run the many different types of computing on a single, flexible system.
Built with 208 Lenovo ThinkSystem SR630, SR670, and SD530 nodes, Arbutus Phase 2 was deployed in early 2020. The new system comprises an addition of nearly 8,000 cores of 2nd Gen Intel Xeon Gold 6248 processors and Intel Xeon Gold 6130 processors. One terabyte of Intel Optane persistent memory in each node provides the memory capacity to support an increasing number of researchers and persistent workloads that run 24/7—sometimes indefinitely.
According to Moa, Arbutus 2 allows users to choose from different machine learning environments (such as TensorFlow, PyTorch, Julia, Pandas, and Apache Spark). These environments rely on Conda distributions. The Conda distribution, an open-source package manager and environment management system, uses the Intel Math Kernel Library for low-level operations when using Python packages, such as numpy, scipy and sklearn.
Designing the future of remote, ML-based analysis
The research Hore and his team are doing in Victoria has potential across many more applications, such as remote healthcare, where findings could lead to the development of portable devices and kiosks that can quickly and interactively analyze chemical compounds. This would be particularly useful at remote sites where analysis resources and computing facilities do not exist. This remote analysis, using machine learning, can then provide rapid insight about the makeup of the sample and give guidance to those seeking its analysis.
Within traditional research spaces, such as Hore’s project, researchers have the benefits of a large lab, expensive instruments, and trained operators. Scientists do the analysis, ask questions, learn from the data and evolve the algorithms. To do this kind of analysis as a mobile or portable service requires rethinking the technology, including access to computing resources.
For his drug analysis project, Hore has envisioned portable compute-based solutions that integrate analysis with online machine learning and prescription, such that they remove human bias and can be deployed to remote regions with access to cloud resources.
“With the emergence of cloud-based home and personal assistants, people are used to talking to computers. They are learning to trust the unbiased guidance of the technology. A recently funded project seeks to build an interactive kiosk, where people can bring their samples for analysis, and the computer provides guidance based on science, without gender or race bias or bias based on their answers to questions.”
But, going small and mobile introduces new challenges. “If you want to do this using mobile technology, the instrument has to run off batteries and rely on remote cloud-based sources,” Hore explained. “It has to be small, so it can be carried around in the trunk of a car or be handheld and able to do multiple analyses, while also being affordable to build. Right now, we still have a way to go in terms of having good technologies for mobile and portable applications. Some instruments are large, the operators highly trained, and the computing resources powerful and accessible. Thinking about that type of deployment makes us go back to square one and rethink how to engineer the instruments.”
Ken Strandberg is a technical storyteller. He writes articles, white papers, seminars, web-based training, video and animation scripts, and technical marketing and interactive collateral for emerging technology companies, Fortune 100 enterprises, and multi-national corporations. Mr. Strandberg’s technology areas include Software, HPC, Industrial Technologies, Design Automation, Networking, Medical Technologies, Semiconductor, and Telecom. He can be reached at firstname.lastname@example.org.
This article was produced as part of Intel’s editorial program, with the goal of highlighting cutting-edge science, research and innovation driven by the HPC and AI communities through advanced technology. The publisher of the content has final editing rights and determines what articles are published