Tutorial on Node-Level Performance Engineering at University of Basel, Switzerland March 13-14, 2017
The advent of multi- and many-core chips has led to a further opening of the gap between peak and application performance for many scientific codes. This trend is accelerating as we move from petascale to exascale. Paradoxically, bad node-level performance helps to “efficiently” scale to massive parallelism, but at the price of increased overall time to solution. If the user cares about time to solution on any scale, optimal performance on the node level is often the key factor. We teach the architectural features of current processor chips, multiprocessor nodes, and accelerators, as far as they are relevant for the practitioner. Peculiarities like SIMD vectorization, shared vs. separate caches, bandwidth bottlenecks, and ccNUMA characteristics are introduced, and the influence of system topology and affinity on the performance of typical parallel programming constructs is demonstrated. Performance engineering and performance models are powerful tools that help the user understand the bottlenecks at hand and to assess the impact of possible code optimizations. A cornerstone of these concepts is the Roofline model, which is described in detail. Hands-on exercises will provide ample opportunity for attendees to put all concepts to the test.
Lecturers: Dr. Georg Hager and Prof. Gerhard Wellein, FAU Erlangen, Germany
Location: Kollegienhaus, Petersplatz 1, Petersplatz, 4003 Basel, Regenzzimmer 111
Contact: Mrs. Yvonne Wegmüller (yvonne.wegmueller@unibas.ch)
Registration deadline: March 10, 2017 (by email)
Fees: MA/PhD students CHF 50.-, Academic staff CHF 150.-, Industry staff CHF 250.-.
Requirements: Laptop with an SSH client installed.
*****
Agenda
Monday, March 13, 2017
08:00 – 09:00 Registration
09:00 – 10:15 Computer architecture and simple multicore tools
10:15 – 10:45 Coffee break
10:45 – 12:00 Microbenchmarking for architectural exploration (and more)
12:15 – 13:30 Lunch (provided)
13:30 – 14:30 Introduction to the Roofline performance model
14:30 – 17:00 Hands-on exercise (with on-the-fly coffee break)
Tuesday, March 14, 2017
09:00 – 10:15 Roofline case study: dense matrix-vector multiplication
10:15 – 10:45 Coffee break
10:45 – 12:00 Roofline case study: Jacobi smoother
12:15 – 13:30 Lunch (provided)
13:30 – 14:30 Optimal use of parallel resources: SIMD & ccNUMA
14:30 – 17:00 Hands-on exercise (with on-the-fly coffee break)
*****
Please download here the pdf with further information.
Further tutorial details: https://moodle.rrze.uni-erlangen.de/course/view.php?id=274&username=guest&password=guest