SC10 Report: Climate Thrust Area (Part 1)

The short series of reports from the SC11 conference closes with two posting from our colleague Will Sawyer (CSCS) on the Climate Thrust Area. The climate thrust area had a significant presence at the conference. The basic program consisted of one plenary talk, four masterworks presentations, one BoF session, two panels and a number of technical papers. The first posting is dedicated to the Climate Plenary Talks, Challenges in Analysis and Visualization of Large Climate Data Sets, Parallel I/O, Pushing the Frontiers of Climate and Weather Models and Climate Computing.

Climate Plenary Talks

In the climate thrust area plenary talk “Climate Prediction and Research: The Next 20 years”, Terry Davies from the UK Meteorological Office presented the current status of climate models, and the prospects for the future. As a general overview, there were not many surprises: ensembles are central in climate studies; climate projections are improving, but the metrics of reliability are not the same forecast skill for numerical weather prediction; there are significant impediments to scaling the current models to petascale, one such hurdle is the amount of data which models currently generate.

BoF: Challenges in Analysis and Visualization of Large Climate Data Sets

Rob Jacob of Argonne Nat. Lab, outlined data production of climate models, and the fact that we are now drowning. The Coupled Model Intercomparison Project phase 5 is set to generate 2 PB of data. For example, CAM-HOMME at 0.125 degree resolution run for 100 year simulation would produce 40 TB of output. The turn-around of the analysis of the data will be the limiting factor.

Three new DOE projects (PIs: Williams, Bethel, Jacob) have been funded to address the visualization problem. For example, Jacob is trying to parallelize NCL. Williams’ team is parallelizing CDAT: they are using the VTK data model and introducing CDAT into that. Visit and ParaView have not been able to do mappings well, and thus have not found acceptance in the climate community.

Possible solutions to save us from drowning in data:

  • “Try not to output so much”, but
  • Exascale says, “try not to use too much memory”.
  • There should be a commitment from scientists not to generate multiple copies of the data, e.g., monthly, yearly, two-yearly, five-yearly means, etc.

We found these solutions insufficient to address the exponentially growing amounts of data, and asked Rob whether anyone employed Principal Component Analysis (PCA) to remove the vast correlations in time series data (the potential compression of this admittedly lossy technique could be as much as 1000). The concept, he replied, is virtually unknown in the field although he did mention Kwan-Liu Ma (UC Davis) is doing something in this direction for ultrascale visualization.

Panel: Parallel I/O

John Biddiscombe (CSCS) moderated a panel with participants Quincy Koziol (HDF Group), John Dennis (UCAR), Jay Lofstead (Georgia Tech.), and Rob Ross (Argonne), to discuss different parallel I/O paradigms. John took the point of view of a user who wants parallel complexity to be transparent.

John Dennis gave an introduction to the Parallel I/O Library (PIO), which is a user application library to reduce memory usage and improve performance, single file multiple writers. Backends: pNetCDF, NetCDF-4, MPI-IO. The existing parallel librarys, p-NetCDF and NetCDF4, were deemed too simplistic and make numerous assumptions. Multi-dimensional fields on non-rectangular domains are supported, no imprinting on the file (looks like serial).

Rob Ross summarized: none of the current libraries are exascale-ready, but we can learn from them. The future: simulation data models should be central to designs, and analysis should be perfom where most optimal. POSIX data model needs to go away!

Quincy Koziol is trying to keep HDF5 relevant to exascale. He lead off with a brief introduction to HDF, explaining the abstract data model. He claimed that HDF will be relevant in the long term. He conceded there are limitations to HDF which other (research) libraries are addressing, but suggested the business nature of HDF made it the long term choice. In a subsequent discussion, he explained that HDF’s abstract data model might be extended to support GRIB, which would be of significance to european models such as COSMO, IFS and ECHAM. Jay Lofstead introduced ADIOS. It has the advantage of being a new software development, and is conceived to scale (e.g., on the Jaguar platform). It supports numerous file formats (BP, p-NetCDF, NetCDF-4, HDF5). A key advantages of ADIOS are the ability to give asynchronous I/O “hints” and the possibility to get great performance. For out-of-core file staging, ADIOS is not a solution. Current users: ORNL, Sandia, Rutgers.

There was a subsequent discussion and considerable agreement on what is needed for exascale, though some of the panelists clearly stated they were in competition. One key difference in competing libraries: PIO uses collective communication, while ADIOS manages to avoid collective communication (a requirement from their users). There was a discussion between Jay and the audience whether ADIOS-generated output files are thus truly imprint-free.

Panel: Pushing the Frontiers of Climate and Weather Models: High Performance Computing, Numerical Techniques and Physical Consistency

Christiane Jablonowski (UMICH) moderated a scientific panel on Pushing the Frontiers of Climate and Weather Models. The panelists were Peter Lauritzen (NCAR), Terry Davies (UKMO), Shian-Jiann Lin (GFDL), Bill Putman (NASA), and Dave Randall (CSU).

Peter presented the CAM model used for paleoclimate (300km, 40 days/day), climate change (100km, 5 days/day), ultra high resolution (25km). Constraints of models: conserve mass and usually energy (energy fixer is computationally expensive), consistent and shape-preserving multi-tracer transport. Climate models, unlike weather models, do not converge out of the box. Terry mentioned that the upper lid needs to double, from 40 to 80 km. Coupled atm-ocean models should run at least 1 year/day, and asked other panelists what adequate horizontal resolution should be (no consensus). Convection is parameterized in 1-D, but is inherently 3D, and soon it will not be parameterized for less than 3-D. This will add significant communication and will be a new scalability limitation. S.-J. presented his work on kilometer-scale direct cloud resolution in climate models. There has been no breakthrough in deep convective clouds since 1970’s, and gravity wave drag is still then dominant mechanism controlling dynamics of strato/mesosphere. Resolving tropical convection and stratosphere dynamics is key to mid-range projections. Do we really need to resolve clouds? There are some indications that it is not strictly necessary, and 25km may be sufficient. Stretched grids may be an effective way to avoid mesh refinement.

Bill discussed the non-hydrostatic explicit cloud resolving model GEOS-5 developed at GFDL and NASA. 3.5km resolution makes the tropical convection much more realistic than 28km. At 3.5km resolution, the dynamical core takes 90% of the execution time. Mixed precision can get the same accuracy with a 60% improvement in performance. GPUs are being investigated for GEOS-6.

“Starting Over Keeps you Young” was the theme of Dave’s introduction. Key climate model goal would be to reproduce ice-age cycles: would require 100km resolution with million year simulations. We are out of luck. Secondly, increased resolution makes the NWP forecast better and better, but it reaches an asymptotic limit for climate runs. Physics: GCRMs are simpler than existing parameterizations, but are much more expensive. “Super parameterization” is a compromise approach.

The subsequent discussion was very lively, and covered scalability limits (data transpositions of spectral models are becoming prohibitive), forecast skill (skill must not suffer as scalability improves), IO schemes (GEOS-5: restarts with MPI-IO, history parallelizes over the number of streams, UM: in-situ analysis, but now looking at parallel I/O, same for other models), tuning, adaptive mesh refinement (cloud microphysics is an issue with variable grid size), and implicit methods (not very scalable, thus we may go back to explicit schemes).

Climate Computing: Computational, Data and Scientific Scalability

In a masterworks presentation, V. Balaji gave a summary on the HPC issues in Climate Modeling. Scientifically challenging axes: resolution (number of points), complexity (new subsystems), and capacity (storage issues). He briefly discussed the climate-specific computer, Gaea, at ORNL.

An example of climate change research is hurricane simulation: models tell us that number of hurricanes may not increase, by the number of intensivity of strength-4 and -5 hurricane is likely to increase. Another example is whether climate is ergodic. Some results may imply that, but others imply not.

The rest of his talk treated HPC aspect of climate, e.g., it 2000 years to initialize a run, I/O is a huge problem, and fault tolerance an issue. He addressed data scalability: 40 TB for the IPCC-3 runs were located at one center; for future IPCC runs this will no longer be possible. Developing metadata standards are an unglamorous, but an important component to make the data useful.

He also addressed “scientific scalability”: enabling “downstream” science: there are far more climate data consumers as producers.

Finally he analyzed potential exascale scalability, where we will have O(10^5) more processors. He anticipates O(10) from concurrent components and possibly O(10-100) from ensemble members. It is therefore an open question how climate models will manage at exascale.