HPC in Research: Analysing more data, more quickly
Article Jun 28, 2017 | by Jack Rudd, Senior Editor for Technology Networks
Credit: University of Bristol
For over a decade the University of Bristol has been contributing to world-leading and life changing scientific research using High Performance Computing (HPC), having invested over £16 million in HPC and research data storage. To continue meeting the needs of its researchers working with complex and large amounts of data, they will now benefit from a new HPC machine, named BlueCrystal 4 (BC4). Designed, integrated and configured by the HPC, storage and data analytics integrator OCF, BC4 has more than 15,000 cores making it the largest UK University system by core count and a theoretical peak performance of 600 Teraflops.
To find out more about this new system and the work it will enable, we spoke to Simon Burbidge, Director of Advanced Computing at the University of Bristol.
What is BC4 and how does it build on the system it has superseded?
Alongside scientific theory and experimentation, High Performance Computing (HPC) is the third pillar of modern research. At the University of Bristol we have a rolling programme to continually update and invest in our HPC services. Every three years we talk to our users to find out what they need from a system and what they want and their future ambitions too. We then set out to procure a machine that meets their needs and provide as much power as possible within our budget.
BC4 is the culmination of a lot of hard work from the University and our HPC integration partner OCF. The team worked to design, integrate and configured the new system in collaboration with Lenovo, DDN and Intel. It’s highly compatible with BC3, our previous HPC cluster so our users can easily migrate work from the existing machine onto the new system, which is much more powerful - in the region of 3-4x more so.
As well as being significantly faster, does BC4 enable any new applications?
BC4 includes 32 nodes of dual NVIDIA Pascal p1000 GPUs and a GPU login node too, which will really make a difference to certain research areas. In terms of molecular dynamics for example, the new system enables us to carry out much larger simulations, to tackle larger problems with much bigger atomic systems with many more molecules.
GPUs are hard to programme for computational use, you need to rewrite applications, which is complex and time consuming so you need to be a good programmer. But, because of the inherent power of GPUs and the use of the parallel features of the chips you can get a substantial speed up over CPUs.
There are particular applications, like the Amber Molecular Dynamics Package and BUDE that can take advantage of CUDA – the parallel computing platform and application programming interface (API) model, created by Nvidia, for GPUs.
Could you tell us about some of the exciting research this new cluster is going to support?
BC4 played a pivotal role in a €1.8m study into Ebola, looking at the speed of the virus evolution, and the correspondent effect on vaccines, diagnostics and treatment. The capabilities of BC4 were invaluable to the research – it was used to analyse raw data on the virus in 179 patient blood samples to determine the precise genetic make-up of the virus in each case.
This allowed the team, led by Dr. David Matthews, Senior Lecturer in Virology at the University, to examine how the virus had evolved over the previous year, informing public health policy in key areas such as diagnostic testing, vaccine deployment and experimental treatment options.
This complex data analysis process took around 560 days of supercomputer processing time, generating nine thousand billion letters of genetic data in order to determine the virus’ 18,000 letters long genetic sequence for all 179 blood samples.
Dr. Matthews will be using BC4 to help with further research into Dengue fever and the Zika virus.
BC4 is also being as part of the UK Biobank research into genomics. It’s data that comes from real patients, the genetics of whom are studied to help determine possible common causes from diseases by searching through genetic structures and correlating them.
As you can imagine, that’s very compute intensive. The UK Biobank recruited 500,000 people between 40-69 years from across the country to look at and improve the prevention, diagnosis and treatments of a wide range of life threatening illnesses – including cancer, heart disease, stroke, diabetes, arthritis, osteoporosis, eye disorders, depression, and forms of dementia.
BC4 will enable them to process many more datasets much more quickly.
Simon Burbridge was speaking to Jack Rudd, Senior Editor for Technology Networks.
Any organism is a by-product of both its genetic makeup and the environment. To understand this in detail, we must first appreciate some basic genetic vocabulary and concepts. Here, we provide definitions for the terms genotype and phenotype, discuss their relationship and take a look at why and how we might choose to study them.READ MORE
Given the complexity of cancer, it’s arguably unlikely that single molecules will work as clinically meaningful biomarkers for cancer. Today, biomarker discovery involves detecting patterns – characteristics or phenotypes that can be measured and monitored throughout a patient’s journey. Here, we look at two approaches being explored in this evolving field.READ MORE