Call for presentations and participants: hpc-ch Forum on Handling huge amount of data for HPC

Dear members and guests of the hpc-ch community,

We are pleased to invite you to the upcoming hpc-ch forum on

Handling huge amount of data for HPC

to be held on Thursday, October 25th, 2012 from 10:15 until 16:40, kindly hosted by CSCS.

The hpc-ch meeting will already start on Wednesday, October 24th with a common dinner at the Antica Osteria del Porto in Lugano, starting at 19:00. The dinner will be sponsored by the Institute of Computational Science of the Università della Svizzera italiana.

Here below you find a description of the forum and a few key questions as guideline for contributions.

Please, tell michele.delorenzi(at)cscs.ch until October 5th, 2012:

  • The names of the representatives of your organization who will participate at the Topic Forum, and
  • if they will participate to the common dinner and
  • (Optional) the title and speakers’ names for a short presentations (20 minutes including Q&A).

We are looking forward to meet you all in October.

Sincerely,
Stefano Gorini, CSCS
Michele De Lorenzi, CSCS
Rolf Krause, USI/ICS

Setting the Scope

As HPC service provider we are concerned with the handling of an increasing amount of data. The size of the data we are managing for HPC applications is quickly growing and it appears we have a Data Explosion Problem to deal with.

Many scientific fields are relying on increasingly large data sets. The data can be the result of measurements, as in astrophysics or genomics or be the result of simulations, as in climate research. The buzz word related to this trend is “Big Data”. Our challenge, as provider of HPC systems, will be to handle petabytes and even exabytes of data and to provide customers an efficient way to store and use the generated data. This will mean striking a balance between bandwidth, capital investments, operational costs, security, availability, reliability, etc.

The trend to Big Data in HPC is also modifying the relationships inside our organizations. Up to now the management of data was almost a speciality of the departments providing general IT services but now HPC has the necessity to have a proper Big Data management with a specific knowhow that partially diverge from the standard way to preserve/use data. This opens the discussion about leadership on storage.

Key Questions

What are useful storage hierarchies? (scratch, project, archive, …)
What are our experiences with Hierarchical Storage Management (HSM) system (HPSS, TSM, DMF, StorNext, others…)?
What are the different storage policies we support ?
What are the pros and cons between the different parallel file systems we have in use? (Lustre, GPFS, …)
How well are those parallel filesystem integrated with HSM solutions?
How do we integrate different storage technologies and migrate between different systems over time as technology evolves?
How do we provide access to data to remote users and systems? And how far do you have to go to be to be considered as “remote”?
How long we need to preserve data?
How does new technology help? SSD, or venerable TAPES?
The cost of HPC system cpu time decreases, time to solution is reduced, is it cheaper archiving the results or re-running the whole simulation?

Location / Hosting

The Topic Forum will take place at CSCS Swiss National Supercomputing Centre.

Chairmanship

Stefano Gorini, CSCS, tel 091 610 82 92
Michele De Lorenzi, CSCS, tel 091 610 82 08
Rolf Krause, USI/ICS, tel. 058 666 43 09

Agenda (preliminary)

10:00 – 10:30 Coffee and registration
10:30 – 10:40 Greeting and introduction, Stefano Gorini (CSCS) and Michele De Lorenzi (CSCS & hpc-ch)
10:40 – 12:00 Presentations
12:00 – 12:15 Community Development, Michele De Lorenzi
12:15 – 13:15 Lunch
13:15 – 14:00 Guided tour of the new building
14:00 – 17:00 Presentations (with a short break at 15:30)
16:40 Farewell and end of the meeting