CIUK’21: The Student Cluster Challenge begins with deploying AI solutions on commodity cloud

Teams Bristol/Bath and Durham kick-off the second annual CIUK Student Cluster Competition utilising a GPU Cluster Environment on Microsoft Azure.

Having the opportunity to develop and hone skills in High Performance Computing is the backbone of the 2nd Annual CIUK Student Cluster Challenge. Following on from our successful first year, the Alces Flight Crew returned with a series of challenges aimed at developing skills that are highly sought-after in today’s market: Working with GPUs, automating complex workflows , and leveraging software containers. Student teams were provided with an unconfigured HPC cluster and asked to install, setup and execute an AI workload that trained a neural network and used it to identify clothing from an image dataset.

“We once again went to the cloud to ensure that each team had their own cluster environment to work in,” said Cristin Merritt, Program Manager for Cloud HPC at Alces Flight. “Over years we’ve worked with our clients to rapidly build and deploy project-based environments such as the one we built for this challenge to speed up project goals, educate end users, and enable greater capacity for their on-premises systems. This challenge showcased different elements that go into building up an HPC cluster environment to work in — incorporating both hardware and software — to give both teams the opportunity to get some real hands-on experience.”

Moving away from the more traditional benchmarking exercises that often dominate most HPC cluster challenges, Alces Flight approached this first challenge with the modern job market in mind.

“HPC is getting more complex — not less — and there is a big call for diverse skills in the field. We took the time to review the jobs market and look at where demand is strongest. For our team we saw skills gaps in machine learning, Research Software Engineering (RSE), and GPU configuration and management. These jobs have a huge mix of both technical and critical thinking skills and we wanted to highlight that to the students.”

The cluster environment was created on the Microsoft Azure cloud platform using the OpenFlightHPC open-source HPC toolkit and included the Slurm job scheduler. The students could use these tools to download, install and configure a Docker™ container containing the neural network software and dataset to be used for training. The full cluster build included AMD-EPYC powered login and compute nodes, with Nvidia Tesla T4 GPUs. A shared cluster filesystem mounted across all nodes enabled students to submit their test jobs for execution during the challenge.

The teams had two hours to complete as many of the challenges set before them. They were each assigned a dedicated mentor with an overall support team ensuring that the students got the most out of their time spent in the cluster environment. At the end of the challenge Team Bristol/Bath took the win and scored the first 10 points of the overall event.

“Both teams really put everything into the first challenge,” Cristin said. “I look forward to seeing how they progress over the coming weeks and we are excited to host one of their finale challenges in-person at Manchester Central this December.”

The CIUK Student Cluster Challenge continues with four online challenges and an on-site finale series determining the final winner. If you would like to read more about the challenges as they progress visit the CIUK website.

If you would like to see the finale as well as get the latest in HPC news and updates CIUK will be holding their annual event in Manchester, UK on December 9 & 10.

If you would like to talk to the Alces Flight team about how we build, manage and grow HPC clusters and cluster environments you can read more or get in touch.

Software for research computing