Aurora Exascale System To Advance Cosmology Research: Building the Universe in a Supercomputer
Aurora Exascale System To Advance Cosmology Research: Building the Universe in a Supercomputer
The study of the universe through the field of cosmology has exploded but there are still surprises in store and much to be discovered. There have been amazing new observations from sky surveys, and along with terrestrial experiments in particle physics, there is now a comprehensive descriptive model of cosmology. From this research, cosmologists know that small fluctuations from early in the development of the universe were the seeds for the structure in the universe as it is today, that dark matter exists, and that the universe is not only expanding, but also that the rate of expansion is increasing.
This is one area of research being targeted by scientists preparing for the upcoming Intel-HPE exascale supercomputer, Aurora, which will be housed at the U.S. Department of Energy’s (DOE) Argonne National Laboratory. Supported by the Argonne Leadership Computing Facility’s (ALCF) Aurora Early Science Program, a cosmology team led by Dr. Salman Habib, director of Argonne's Computational Science (CPS) Division and an Argonne Distinguished Fellow, is contributing to the revolution in cosmological research. With support from the Exascale Computing Project (ECP), the ExaSky team developed the Hardware/Hybrid Accelerated Cosmology Code (HACC) framework designed to run cosmology simulations on supercomputers. They are now evolving the code for Aurora and the nation’s other upcoming exascale systems.
“It is exciting that researchers can pose questions and find answers about the universe by running supercomputer simulations. The ability to do cosmological physics has expanded with the evolution of supercomputers used in the research. Using computers in the 1980s, scientists could barely carry out a computer simulation of cosmological structure formation. Supercomputers have improved by factors of millions in the field of computational science. We can now do things that we could not have imagined a few decades ago which enable a whole new way of doing research. In addition, using exascale supercomputers such as Argonne’s Aurora supercomputer will allow scientists to perform complete simulations of the universe as viewed by next-generation surveys, instead of just small parts of the observable universe,” states Habib.
Dark sky research
The HACC team’s research will connect some of the world’s largest and most detailed extreme-scale cosmological simulations with large-scale data obtained from the Legacy Survey of Space and Time (LSST) conducted at the Rubin Observatory, currently under construction in Chile. This astronomical survey will provide some of the most comprehensive observations to date of the visible sky. The simulations complement the observations by enabling unique probes of the universe using computations based on the underlying physics. By implementing cutting-edge data-intensive and machine learning (ML) techniques, this combined approach will usher in a new era of cosmological inference targeted at scientific breakthroughs.
The team’s “Dark Sky” research label refers to the fact that the universe is dominated by dark matter and dark energy (the cause of cosmic acceleration). Dark matter is unique because it doesn’t emit or absorb light but can be indirectly observed by its gravitational influence. Dark Energy is a mysterious hypothetical element that exerts a repulsive pressure, behaving like the opposite of gravity, thereby causing the expansion rate of the universe to increase, as needed to agree with observations. The team’s simulation models include both dark matter and dark energy physics calculations as well as familiar atomic or visible matter because the dark and visible matter components interact via gravity. “We are trying to literally build a universe within the computer that is representative of what an observation can tell us. HACC software models have been constructed with the needed physics capabilities and can be run on an exascale supercomputer like Aurora. They describe the behavior of normal matter in a cosmological context – gas physics, hot gas, stars, galaxies, astrophysical explosions, and include aspects like injecting black holes,” indicates Habib.
Cosmology simulations of the universe using the HACC code
Simulations are currently performed on a variety of supercomputers capable of running the HACC code including supercomputers at various national laboratories such as Summit at Oak Ridge National Laboratory, Cori at Lawrence Berkeley National Laboratory, and Theta at Argonne National Laboratory. But it is the capabilities exemplified by exascale opportunities, such as those presented by Aurora to run the more complete simulations that excite Habib.
The HACC code is a cosmology N-body code framework that was developed by Habib’s team; it includes treatments of gravity, gas dynamics, and includes astrophysical mechanisms as subgrid models. The code had its origins in 2008 at Los Alamos National Laboratory, driven by the prospects of running on the first petascale system. After moving to Argonne National Laboratory in 2011, work continued on the code that eventually became HACC. Development of HACC is currently funded by DOE’s Office of Science as part of the ExaSky Applications Development effort within DOE’s Exascale Computing Project (ECP). The HACC code is written entirely by the team and places minimal reliance on external libraries or software.
The HACC code is unique because it is designed to run on every possible computer architecture at a high-performance level. “Using the results of supercomputer simulations run with HACC allows researchers to look at billions of objects in the universe, such as galaxies and how they are distributed. The purpose of the simulations is to model this large-scale structure, to make accurate predictions about the galaxy distribution, as well as the underlying distribution of mass and hot gas in the universe. Our team is able to make predictions for cosmological statistics with a high level of accuracy—down to the percent level of accuracy,” states Habib. The CosmoTools framework is a tools library within HACC that is used to enable extraction of relevant information at the same time as the simulation is running; it can also be used in post-processing mode. When the team runs complex simulations, they make the data available to other researchers via the ALCF Petrel service which allows researchers to share data at large scales.
“Simulations must be high-resolution and work together with the physics requirements. Our team must write code that captures all the needed physics. Simulations can be used to fine-tune the details of the physics models to determine the optimal parameter choices while running a sky survey simulation. When running a simulation campaign, it is possible to tune internal parameters, determine associated errors, and find out how to come up with solutions to mitigate any simulation issues,” indicates Habib.
The HACC research team has a close relationship with staff at ALCF in optimizing how HACC runs on ALCF supercomputer hardware. This work is often useful for benchmarking the performance of supercomputers as they are being installed, and to identify and help address any problems that need to be fixed before the system is finally accepted.
Testing HACC software to run on future exascale supercomputers
HACC was specifically written to run on accelerated systems and is designed to run on both CPUs and GPUs. The physics portion of the code consists of many particles interacting with each other using different interaction kernels, depending on the physics of what is being modeled. These kernels run best using GPUs and are heavily optimized. In general, the code can run at greater than 50 percent of the available peak performance. The HACC code can be run with multiple programming models. For instance, it was already written to run using OpenCL, which was very helpful when getting ready for Aurora. The team is porting the kernels to run on Intel oneAPI DCP++ compilers and libraries which allow heterogeneous computing across various hardware choices. They will also take advantage of Intel’s Distributed Asynchronous Object Storage (DAOS) which is an open source software-defined scale-out object store that provides high bandwidth, low latency, and high I/O operations. The team uses the Joint Laboratory for System Evaluation (JLSE) resources at Argonne so new hardware from Intel is tested with HACC. In addition, the team is working closely with Intel on both hardware and software updates.
Challenges for future cosmology physics research
Writing code to work on supercomputers is complex. Habib states, “It is especially challenging to evolve the HACC code for an exascale system that doesn’t yet exist and is being continuously developed. Our biggest challenge is how to evolve the HACC software to keep pace with hardware evolution.”
“One of the things that we already see happening in the field of cosmological research is that artificial intelligence (AI) and machine learning (ML) techniques are being deployed and this will only increase as data sizes grow,” Habib continued. “I think there will be a confluence of conventional high-performance computing and AI/ML which will define the next generation of interesting work. AI/ML methods will be essential in dealing with science problems involving exceptionally large amounts of data.”
“Aurora is being positioned as the first supercomputer that can do high-performance modeling and simulations, data analytics, as well as AI/ML on the same machine. Our team is looking forward to using such a supercomputer and I think this is a sign of things to come,” he concluded.
The ALCF is a DOE Office of Science User Facility. Research for ExaSky was supported by the Exascale Computing Project, a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration, responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation’s exascale computing imperative.
This article was produced as part of Intel’s editorial program, with the goal of highlighting cutting-edge science, research and innovation driven by the HPC and AI communities through advanced technology. The publisher of the content has final editing rights and determines what articles are published.