Call for Presentations and Participation: hpc-ch forum on Virtualization

DATE AND LOCATION

hpc-ch forum on Virtualization
Thursday, May 20, 2021
Virtual Event

FORUM DETAILS

Description

Traditional HPC computing centres have mainly been used to solve scientific problems posed by the so-called “hard” sciences such as physics, chemistry, or astrophysics. Their infrastructures include most of the compute and storage resources available on most university campuses.

However, the arrival of life sciences, in particular medical sciences, and more recently human sciences has “generated” a new class of users for whom the use of traditional tools (command line, scripting in bash or python) is considered an insurmountable obstacle. Nevertheless, the data analysis needs of these neophyte users grows exponentially.

Therefore, tools trying to simplify these tasks exist but remain outside of the scope of the applications available on traditional HPC resources. This has led these researchers to install many powerful workstations under their desks that are never used optimally, are problematic to manage, and are not cost effective from a budgetary standpoint.

These “novice” users nevertheless are the main group of researchers using scientific computing resources in generalist universities. It is important to address their needs by providing them adequate and timely available resources without burdening the IT personnel with the management and support of heterogeneous and geographically scattered systems.

Key Questions

System virtualisation: The main goal of systems virtualisation is to optimise the use of computer cycles and memory by mutualizing them in a centrally managed infrastructure. In which cases are virtualisation technologies relevant to the public mentioned in the introduction? Which virtualisation technology do you implement? How do you efficiently deploy workstations and servers? How do you adapt the software stacks to the different needs of the users? How do you manage the lifecycle of these VM?
On premise cloud technologies: Cloud technologies bring an unprecedented ease of use to deploy applications on virtual infrastructures. However, on premise cloud infrastructure usually are very complex to manage and require a very skilled team to manage and support the underlying systems. What are your use cases for on premise cloud infrastructures? How does it compare with standard virtualization technologies? In the rapidly evolving landscape of cloud technologies which one do you use? Have you considered an exit strategy if the selected product is not any more actively maintained or only provided commercially with outrageously expensive licensing fees?
Containers: The demand to support containers for scientific computing applications is growing. There are several container technologies and orchestration platforms competing each other. To name a few we can site Docker, Singularity, Mesos for the container technologies and Swarm, Kubernetes, Chronos/Marathon for orchestration. What are the most suitable container solutions for scientific computing applications? What are the good scientific computing use cases to be run on a container cluster instead of a classical HPC machines? Do we need to manage two different clusters for classical HPC services and container services, or can we efficiently blend the two class of computing services on the same cluster? Containers and GPU computing is an ongoing trend. Nvidia supplies and support on their NGC Catalog a set of curated GPU-optimized containerized applications running on CUDA for AI, HPC and Visualization applications. Should we build our GPU container platforms around NVIDIA-Certified Systems to (as promised by the constructors) seamlessly take profit of the applications packaging and distribution solution offered by NGC or should we build our own vanilla GPU container clusters and packaging platforms?
File systems on demand: Generally HPC infrastructures rely on massive central filesystems attached to all compute nodes in order to provide high throughput and IOPS. However in some cases these filesystems themselves become the bottleneck slowing the data analysis. Some filesystem technologies allow to federate the local disks of the compute nodes to create a transient filesystem usually on the users’ request in order to distinguish the data flux to the main storage. Do you think that such technologies are useful in an HPC environment? Which technologies have you tested and for which use case? Similar question on the container side. Docker, Mesos and Kubernetes support on demand persistent storage through CSI (the Container Storage Interface). Different HPC storage products vendors like IBM Spectrum Scale, WekaIO Matrix, BeeGFS or VAST Data among many other storage vendors have created their driver to interconnect their storage platform to a CSI interface. This provides on demand persistent file or block storage services to the containers. Is this technology mature enough to be used in a production environment? Is anyone using it and if yes what are the use cases?