Course on Getting the best out of multi-core at CSCS

CSCS has the pleasure to announce the following course which will be held at CSCS in

Lugano on December 10-12, 2012:
Getting the best out of multi-core

Modern multi-core x86 processors have 100 times more peak performance than similar single-core processors from ten years ago, but most applications haven’t been able to leverage this power to their advantage. This three-day hands-on oriented course shows how to get the most out of Intel Sandy Bridge and AMD Interlagos processors by investigating the following techniques:

Code vectorization

Understanding processor architecture and the potential speedup from vectorization
Using compiler feedback to understand where vectorization is and is not achieved
Using compiler feedback, compiler options and pragmas to improve vectorization

Tuning for the cache hierarchy

Understanding the cache and memory hierarchy on modern multi-core processors
Analysing performance reports to determine poor cache utilisation
Code changes and compiler options to improve cache utilisation

Multi-threading

An example of a threading model – OpenMP
Use of tools to help produce multi-threaded code
Understanding of threading pitfalls that affect code correctness
Understanding of threading performance issues on multi-socket multi-core nodes

We will make use of powerful tools to help understand code performance and to introduce vectorization and threading, with the Cray tools CrayPAT/Apprentice2/Reveal being used on a Cray system and Intel tools on a Sandy Bridge cluster. In particular we will use the Reveal tool to analyse compiler optimisations and performance reports and to use its powerful OpenMP directive insertion options to help introduce multi-threading into codes.

The course will be rich in hands-on practical sessions to demonstrate these tools and in addition the course will allow the developer to see the critical effects of poor resource utilisation, methods to alleviate these problems, and best practices in implementing multi-process multi-threaded codes.

We will also give a demonstration of how these techniques can be applies to the Intel Xeon Phi (also known as MIC – Many Integrated Core) architecture.

Prerequisites:

Competency in C++, Fortran or C.
Basic understanding of OpenMP.
Basic understanding of MPI.

You will need to bring a laptop computer with the capability of ssh access to CSCS machines and the ability to display output from X11 applications.

Course registration is open until the 1st December 2012.

For course syllabus and registrations see »

Latest Posts

Senior Storage & Data Engineer – Open Position

Systems Engineer – Platform Automation – Open Position

CSCS User Lab Day 2026

DevOps Engineer – Open Position

Call for Participation: hpc-ch forum on Improving Access to HPC

2026 ETH Summer School “Beyond the Visible: AI, Sensing, and the Future of Terrestrial Resources”

HPC-AI Advisory Council Swiss Conference 2026

Head of Research Data Management Facility – Open Postion

End-of-Year Wrap-Up 2025

Insights and Exchange at the HPC-CH Forum on Financial Aspects of HPC

Apertus: A fully open, transparent, multilingual language model

Call for Presentations and Participation: hpc-ch forum on Financial Aspects of HPC

Course on Getting the best out of multi-core at CSCS