Desperately seeking Supercomputer

Image for post
Image for post

Alces Flight user Mike Croucher (aka Walking Randomly) had a dilemma. He’d set up a training course for NAG, the Numerical Algorithms Group located in Oxford. He’d planned the course out for parallel programming using MPI and had assembled the list of experts who needed to be trained. He had the materials, he was familiar with the course, and most importantly he knew where the best coffee could be located.

What he didn’t have was a supercomputer.

A few days prior to the course he’d been contacted by one of the trainers requesting details on the HPC cluster they would be using. In that moment he realised that despite all the planning he had done he’d forgotten one crucial element — the compute resources to train the experts on.

Mike considered his options. The HPC cluster he and and his fellow colleagues regularly utilised at Sheffield’s Research Software Engineering Group wouldn’t be available. He needed 128 cores and a bunch of training accounts for a few days and all he had requested was a budget for coffee.

That’s when inspiration struck. Pooling his resources he set about constructing a supercomputer with training environment utilising Amazon Web Services and Alces Flight. Because Flight can quickly spin up a HPC environment, and be configured to look and feel just like the resources Mike was familiar with, he was able to focus on ensuring that his users got their training and he stayed within his coffee budget.

RESULT: He achieved two cups per day plus two each for his instructors and the users got their full training in parallel programming.

Mike was kind enough to write about his experiences on the blog he keeps on his adventures in supercomputing. You can read about his initial panic around forgetting to request resources here, followed by how he plugged AWS and Alces Flight together to make it work here. He was also kind enough to let us know that if you are interested in seeing exactly how he did it he’s made his scripts available on Github. (He’ll also happily remind you that this cluster was set for training and might not be suitable for a production environment.)

The moral of Mike’s story is simple: public cloud is viable for HPC in a pinch. Having access to cloud gives users the opportunity to quickly construct resources that would otherwise be out of reach. In Mike’s case he was able to create a safe training environment for users when he needed it and dismantle it after the class was complete. And, as he built in the cloud with Alces Flight, when class wasn’t being held his resources scaled down to only what was necessary, which kept him well within his coffee budget.

The crew here at Alces Flight were happy to hear that we could help out one of our dedicated users in their time of HPC need. If you find yourself requiring HPC resources quickly why not check out the range of options on our website.

Written by

Software for research computing

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store