Course on Getting the best out of multi-core at CSCS

CSCS has the pleasure to announce the following course which will be held at CSCS in

Lugano on December 10-12, 2012:
Getting the best out of multi-core

Modern multi-core x86 processors have 100 times more peak performance than similar single-core processors from ten years ago, but most applications haven’t been able to leverage this power to their advantage. This three-day hands-on oriented course shows how to get the most out of Intel Sandy Bridge and AMD Interlagos processors by investigating the following techniques:

Code vectorization

  • Understanding processor architecture and the potential speedup from vectorization
  • Using compiler feedback to understand where vectorization is and is not achieved
  • Using compiler feedback, compiler options and pragmas to improve vectorization

Tuning for the cache hierarchy

  • Understanding the cache and memory hierarchy on modern multi-core processors
  • Analysing performance reports to determine poor cache utilisation
  • Code changes and compiler options to improve cache utilisation


  • An example of a threading model – OpenMP
  • Use of tools to help produce multi-threaded code
  • Understanding of threading pitfalls that affect code correctness
  • Understanding of threading performance issues on multi-socket multi-core nodes

We will make use of powerful tools to help understand code performance and to introduce vectorization and threading, with the Cray tools CrayPAT/Apprentice2/Reveal being used on a Cray system and Intel tools on a Sandy Bridge cluster. In particular we will use the Reveal tool to analyse compiler optimisations and performance reports and to use its powerful OpenMP directive insertion options to help introduce multi-threading into codes.

The course will be rich in hands-on practical sessions to demonstrate these tools and in addition the course will allow the developer to see the critical effects of poor resource utilisation, methods to alleviate these problems, and best practices in implementing multi-process multi-threaded codes.

We will also give a demonstration of how these techniques can be applies to the Intel Xeon Phi (also known as MIC – Many Integrated Core) architecture.


  • Competency in C++, Fortran or C.
  • Basic understanding of OpenMP.
  • Basic understanding of MPI.

You will need to bring a laptop computer with the capability of ssh access to CSCS machines and the ability to display output from X11 applications.

Course registration is open until the 1st December 2012.

For course syllabus and registrations see »