Get “Up to Speed” with Cray Cascade Course
CSCS’ next flagship system to be brought into production in April 2013 is a Cray XC30 (Cascade) system with a peak performance of 750 Teraflops using Intel Sandy Bridge processors and with a new network interface chip and an advanced interconnect topology. The CSCS system is currently the largest Cray XC30 in the world with 2256 compute nodes, over 36,000 compute cores and 70 Terabytes of memory with a five-fold increase in system bisection network bandwidth compared to Monte Rosa.
This three and a half day course gives an introduction to the Cray XC30 at CSCS, demonstrates how to get the best performance out of the Intel Sandy Bridge processors and shows how to take full advantage of the interconnect. Experts from Intel will dive into the features of the Sandy Bridge processor and demonstrate the use of the Intel tools for generating optimised code and Cray specialists will show how to use their set of tools and numerical libraries and how to take full advantage of the MPI libraries and communication strategies.
The course will be rich in hands-on practical sessions to demonstrate the tools and techniques .
Course tutors will be from Intel’s High Performance Computing Team (Christopher Dahnken and Michael Klemm), Cray’s Performance and Exascale Teams (Nathan Wichmann and Alfio Lazzaro) and CSCS own staff (Neil Stringfellow).
Prerequisites: Competency in Fortran or C++ or C, combined with MPI and OpenMP. You will need to bring a laptop computer with the capability of ssh access to CSCS machines and the ability to display output from applications using the X11 window system.
Day 1 (afternoon only) – Introduction to Piz Daint at CSCS
Cray will give an overview of the XC architecture in general and highlight major differences from previous XT/XE architectures.
CSCS staff will describe the particular configuration of the XC30 at CSCS – “Piz Daint” – the filesystems and external login environment. Best practices for using Slurm and ALPS interfaces will be demonstrated and useful tools in the operating environment will be shown with plenty of practical opportunity for participants to familiarise themselves with the usage of the machine.
Day 2 (full day) and Day 3 (morning) – Intel tuition on optimisation for Sandy Bridge processors
Experts from Intel will give a deep-dive into the Sandy Bridge architecture, demonstrate the usage of Intel tools including compilers and numerical libraries, and will have practical sessions on getting best performance through vectorization and threading.
Day 3 (afternoon) and Day 4 (full day) – Cray tuition on optimizing for XC30 system
Cray specialists will describe the network architecture and demonstrate how to get the best from communication libraries, they will introduce the Cray programming environment including compilers, performance analysis tools and optimised numerical libraries, and show how to get the best from the Sandy Bridge processors and the XC30 system as a whole using their tools.
There will then be the opportunity for attendees to stay on for a fifth day to participate in the debugging course given by Allinea on the DDT debugging tool. Those people who wish to do so should register separately for that event.
Parallel Debugging with Allinea’s DDT Course
When writing high-performance distributed memory applications on multiple nodes, potentially also exploiting hybrid multi-core+accelerator nodes, the complexity of data volumes and workflows can make tracking bugs and solving errors an extremely time-consuming and laborious task. The use of professional parallel debugging tools can greatly accelerate the problem solution process and consequently CSCS has recently purchased a number of licences for Allinea’s DDT debugger to give the user community the opportunity to increase their productivity.
This one day course gives a practical hands-on introduction to the parallel debugger DDT from Allinea, and shows how to use the tool to find bugs, examine data structures and understand program execution.
This course will be taught by staff from Allinea.
Prerequisites: Competency in Fortran or C++ or C; understanding of MPI, OpenMP and/or GPU programming would be desirable. You will need to bring a laptop computer with the capability of ssh access to CSCS machines and the ability to display output from applications using the X11 window system.