“Logging and Monitoring”

to be held on Thursday, October 3, 2019 from 09:30 until 16:30, kindly hosted by WSL Birmensdorf is soon approaching.

hpc-ch forum on Logging and Monitoring
Thursday, October 3, 2019
WSL Birmensdorf
Room Englersaal
Zürcherstrasse 111



High Performance Computing systems generate a huge amount of logs and metric data during their operations: information about resources utilization, performance, failures, errors and so on is worth to be stored and analyzed.
This kind of data is often unstructured and not easily comprehensible: finding correlations, recognizing meaningful events, discard false positives is a common challenge all HPC centers have to face.
The reward is worth the effort: post mortem investigation, problems and incidents trouble shooting, security threat hunting, early warning and alerting, applications performance analysis, evaluation of resources utilization are all contexts that take advantage of a careful elaboration of logs and metrics data.
A thorough understanding of the underlying infrastructure producing this information is essential to make sense of it especially considering the complex hardware and software stack modern large scale systems comprise.

Key Questions

• What are the benefits of collecting logs and metrics data
• How to correlate logs from different systems
• Centralized collection of logs and metrics: challenges and returns
• Are logs and metrics Big Data?
• How to tackle the increasing complexity of multi-layered architectures (virtualization, containers, etc.)
• Threat Intelligence: Proactively identifying unusual network activities and unauthorized accesses

Further information

The forum provides open discussion for the exchange of new ideas among researchers and practitioners working across various aspects of collecting logs and metrics from HPC clusters and infrastructures.

