The Rise of On Demand High Performance Computing (HPC)

High Performance Computing (HPC) is changing. Not just because it’s maturing, becoming increasingly pervasive into many different areas of business, but because it’s becoming accessible to a new group of users.

Image for post
Image for post
The future of HPC might not be just on premise.

There will always be a race for the fastest — the quickest processor; the lowest latency, the largest storage system. That competition drives the IT industry just as Nascar and Formula One drive the automotive world. The performance race can never be won — today’s champion is tomorrow’s runner-up. If I’m running a Formula One team then I might still worry about the last few cycles per second in order to get ahead of the competition. There might be a tangible benefit from many hours and days invested in careful selection of a hardware platform, custom development of an HPC software stack, and endless optimisation of my batch jobs. Perhaps it’ll even help me win a race that my team would otherwise have lost.

For an increasingly large section of computer users flat-out performance isn’t the driver any more. Who cares if your laptop has a 2.2GHhz processor, or a 2.6Ghz CPU? Does my desktop need 4GB of video memory or 6GB? Those questions might be important for some users — but just many of us will choose the biggest number we can afford, or possibly the colour that matches the other technology on our desk. There is still a place for optimisation — but for the vast majority of technical computing users inefficiencies in our usage of HPC creep in long before we submit our first batch job.

The average purchasing cycle for a compute cluster in academia is a frightening 6 months of elapsed time. Commercial purchasers often aren’t much better — following the typical, well-worn capital expenditure route means a lot of process to wade through. Even when your datacentre is prepared and your cluster hardware arrives it can take weeks or months before users can run something. What often follows are rounds of operating system and application installations, iterating closer and closer to a system your researchers actually want to use. I’ve seen clusters take six months to be launched once installation has finished, taking huge bites out of the operational lifespan and technological relevance of your expensive hardware. Adding more delay to optimise your software to deliver 2% better throughput seems like lunacy.

There is now an alternative. Public cloud providers like Amazon Web Services (AWS) have been revolutionizing the IT industry for the past few years and now it’s the turn of HPC to receive its make-over. At Alces, we’ve been developing solutions for HPC clusters — specifically making them available to users quickly, simply and cheaply. Our new Flight environment can deliver an HPC cluster running on AWS, complete with batch scheduler and applications, in a matter of minutes — leveraging the most scalable resource available: the end-users themselves.

For the first time in HPC, users have been handed the keys and put squarely in the driving seat. With the ability to choose from more than thirty different compute node types, five different batch schedulers and more than 1,100 application, library, compiler and MPI versions, users can truly achieve self-service. Launching from AWS Marketplace, researchers can get exactly what they need near-instantly, allowing them to create and optimize repeatable workflows like never before. Free from the limitations of on-premise clusters users on different sites can easily collaborate, launching compute clusters near where their data is stored, or where public data-sets are available.

The cost for this convenience? Just pay for what you use — Flight Solo Community Edition is available for free, delivering a single-user cluster up to 8,192 cores for just as long as you need it. There are no start-up fees; no long-term subscription models; just run your cluster when you need it, and stop it when you’re done for zero on-going cost.

So what are people using this new self-service HPC environment for? In the next few months we’ll be catching up with users running workloads on AWS and asking them why they’re finding self-service HPC clusters so compelling. We’ll talk to researchers running life-sciences, engineering, and geographic workloads on AWS; uncovering the problems they’ve solved, and publish real-life data showing performance characteristics and cost-of-ownership for the compute and storage resources they’re using.

High performance computing has changed — it’s become the commodity we were hoping for: accessible, flexible and more cost-effective than ever. Follow our posts to discover how a new generation of users are making their own way in HPC and achieving a new kind of scalability and agility.

Written by

Software for research computing

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store