HPC Systems Engineer (W/M) – Open Position

The SCITAS platform (www.epfl.ch/research/facilities/scitas/) provides EPFL researchers (and partners) access to infrastructure and expertise in High-Performance Computing (HPC). SCITAS also contributes to research and development activities to maintain the EPFL’s reputation as a leading research facility.

Our infrastructure includes:

  • 2k+ compute nodes (CPU + GPU),
  • Large storage systems,

Deployment and configuration management tools,

  • A centralized platform for storing, visualizing and monitoring metrics and events.

And these ongoing developments:

  • 100+ nodes for remote 3D visualization,
  • Elastic compute solutions in the cloud,
  • Long-term archive systems.

To strengthen our team, we are seeking an HPC Systems Engineer. The successful candidate will join the Systems Team responsible for the HPC services’ deployment, operations, and evolution.

Working for the EPFL means being part of a prestigious school that consistently ranks among the top 20 universities worldwide.

Main duties and responsibilities include :

As an ideal candidate, your mindset will be to automate recurring infrastructure operations while optimizing existing architectures by following FURPS+ doctrine and agile methods.

More specifically, you will:

  • Design, build and deploy highly scalable scientific compute environments in less than one hour to allow infinite growth.
  • Design and manage a variety of storage solutions and tools to monitor it.
  • Investigate and troubleshoot issues with hardware, operating systems, networking and scientific applications.
  • Develop automated tests to ensure no regression is caused by a changes.
  • Participate in the SCITAS selection process for the acquisition of next-generation HPC systems.
  • Support and train the user community.
  • Take a leading role in one or more of the activities described above.

Your profile :

You have a robust technical ability to solve complex problems in a large-scale production environment and investigate independently in a collaborative environment with a solid open-source culture.

Applicants must have:

  • Extensive experience in managing distributed GNU/Linux computing systems (Beowulf clusters), including services, low latency network and massive upgrade operations.
  • Comprehensive knowledge of distributed file systems used for large-scale computing clusters (GPFS, Lustre or BeeGFS).
  • Deep knowledge of Configuration Management tools, such as Puppet, Ansible or similar.
  • Experience with container technologies, including Docker and Kubernetes.
  • Experience with Infrastructure as Code and other development tools, including Terraform, Vault, Git and Jenkins.
  • Robust Python and shell scripting experience.
  • Ability to clearly document procedures with a focus on sharing knowledge.

Applicants should have:

  • Master’s or Bachelor’s degree in an applicable field.
  • Experience with workload management systems such as Slurm at large scales.
  • Experience with monitoring and alerting systems.

We offer :

What you can expect from us

  • We offer competitive salaries which takes into account job profiles, skills and years of experience.
  • Employees are affiliated to the EPFL’s advantageous pension system.
  • Based on a five-day week, the work week is 41 hours for full-time employees. Work schedules are mutually agreed upon by supervisors and staff members based on service requirements. Some flexibility is allowed.
  • In addition to public holidays, staff members are eligible for five to six weeks of vacation per year based on age.
  • EPFL offers family allowances. Employees receive a cantonal allowance plus a supplement from EPFL.
  • We have day cares located on campus that welcome (in priority) children of EPFL employees.
  • More than 100 sports activities are available at the Sports Centre with very attractive rates.

What you need to know before applying

  • At SCITAS we speak French and English fluently. We accept non-bilingual applicants willing to learn the other language.
  • Only candidates who applied through the EPFL website or our partner Jobup’s website will be considered.
  • Promoting equality between women and men in scientific careers as well as within the administrative and technical staff is an integral part of the policy of continued excellence implemented by EPFL.
  • You will work remotely from any location within Switzerland during the periods when EPFL imposes telework.
  • There will be no relocation assistance provided.
  • In response to the COVID-19 pandemic, all interviews will be conducted virtually.
  • Only selected candidates will be contacted. We appreciate the time spent on the application.
Start date :
as soon as possible
Term of employment :
Unlimited (CDI)
Work rate :
100%
Contact :
For additional information, please contact M. Antonio Russo by e-mail at antonio.russo@epfl.ch
Remark :
Only candidates who applied through EPFL website or our partner Jobup’s website will be considered. Files sent by agencies without a mandate will not be taken into account.