20 Years On: The Evolution of HPC
Complete the form below to unlock access to ALL audio articles.
The following article is an opinion piece written by Russell Slack. The views and opinions expressed in this article are those of the author and do not necessarily reflect the official position of Technology Networks.
High performance computing (HPC) is at the core of every research institution and engineering business globally. However, it’s only in the last decade that it’s gathered momentum to where it is today and revolutionized the way we can solve the world’s most complex conundrums.
If we are to look at the evolution of HPC, it was very much in its embryonic stage around 20 years ago and was just starting to find its feet. The environments were at a much smaller scale and very much a hidden secret, tucked away in a cupboard.
A relatively small resource that a handful of users would utilize for a particular field of research. It was also very manual. The components were not specifically designed for the tasks they were being used for. Things like GPUs for accelerated computing and high-performance interconnects based on open standards, like InfiniBand, just did not exist at the time.
There has been incredible transformation over the past few years, so let’s go back to where it all started.
At the beginning
Looking back at the inception, an HPC environment was called a Beowulf cluster, essentially consisting of first-generation Intel “pizzabox” servers with 100 Mb network cards wired between them and a very rough cut of Red Hat Linux running over all servers.
Back then, it was a challenge to build up and stabilize these clusters, especially when you included the cutting-edge high-speed interconnects between the servers, which were a nightmare to cable. And from a software perspective, it was effectively a very undeveloped set of tools used to discover, orchestrate and manage the system, and software applications on top were very much developed by the customer themselves.
HPC was also a research tool that only the most advanced Linux users had use for. The tooling was relatively undeveloped and was mainly open source. Also, HPC technology was a little less predictable in terms of what CPU architecture or network interconnect were on the horizon. Certainly, the concepts of GPU computing or cloud bursting for HPC weren’t around when initially building HPC systems.
So, it was very much in its infancy and with that brought its own challenges with regards to stability, performance and fine tuning. However, when the issues were ironed out, early adopters of this type of computing made some great breakthroughs in their quest to solve their complex problems.
Nowadays, the open-source community and commercial software providers have put huge amounts of effort into providing tooling for the everyday systems administrator and end users who can both now jump on quickly and get using the service. This ease of access and powerful management tooling absolutely maximises the usage of every spare cycle of a system. These mature and widely supported tools now allow huge levels of service utilization and flexibility to the user base to change things on the fly to suit the dynamic nature of their work patterns.
Data management has totally transformed over the years. Storage was initially directly attached to one system in the cluster, but now with storage area networks and the internet, data is infinitely sharable and helps promote collaborative working. Architectures are very different now than how they were years ago. We used to work with big, cumbersome pieces of equipment, and when they broke they were irreplaceable. Now, if one part of a cluster breaks, that part can easily be replaced. Twenty years ago, the internet wasn’t pervasive but now we can offer our customers the benefit of HPC in the cloud, whereby you don’t even need to own on-premise infrastructure.
Headaches of heat
Obviously, there is a lot more packed into a smaller space nowadays, which is great in terms of customers not requiring a huge amount of data centre space to house these beasts, but it does create headaches for providing a lot of power to feed the system and obviously all this super dense computing infrastructure creates a massive amount of heat.
On the latter point, heat management has been interesting to watch over the years. In the past, we’d have to fit big clunky air baffles to IBM BladeCenters in an attempt to channel red hot air out of clusters, and these evolved into massive exhaust ducting, IT rack rear door heat exchangers, where water is not being brought very close to the IT infrastructure. In the last few years, we have been installing HPC systems with direct water cooling straight onto the CPU and memory.
It's also hard to think that when HPC was at its infancy, the concept of a support contract for an HPC solution wasn’t even a thing, customers didn’t even ask for it. How times have changed. Traditionally, it was very much an infrastructure only build before handover to the customer to manage and support moving forwards.
This has evolved over the years as customers have required dedicated technical support being available for when they have service issues, and in more recent years this has further evolved into a requirement to provide managed services support for customers.
It’s only the beginning
As technology and understanding has developed, the use cases for HPC services have evolved. It's come a long way and far more of a service for all intensive computing needs, rather than a perceived system that only the most advanced computing experts could get to grips with and use effectively.
Now we have access to some of the most emerging technologies, being cooled by innovative, direct to silicone water cooling, enabling the utilization of highly dense HPC solutions including GPUs and dedicated silicone. With these highly efficient designs and the progress of management software these solutions are now used across the entire organization especially in disciplines that these resources were either not available to, or not known by. It’s now often one of the biggest research tools an institution possesses – a true enterprise offering – that will only progress even more over the next 20 years.
About the author:
Russell Slack is managing director of OCF.