Summit Darshan Archival Dataset

10.13139/OLCF/2305496

Summit Darshan Archival Dataset contains 2021 Summit Darshan log data for 25 applications and is grouped into science domains. The dataset is processed, and all the propriety fields are anonymized. The resultant data is converted into a tabular structure and saved in parquet file format. In this notebook, we demonstrate how to access the data. Data Organization: The data is organized into two directories: Darshan total (`darshan_total`): List all the high levels generated by the `darshan-parser --total` command on `.darshan` files. There is one parquet file for each application. Note: `uid` and `exe` field are masked Darshan detail (`darshan_detail`): This data contains detailed job level log information extracted by command `darshan-parser` on the raw `.darshan` files. The data is sorted by directory hierarchy in the order of `year/month/day (2021/12/07)`. For instance, to get the data for a `job_id` 3819766 of application `App11`, which was executed on `2021-12-07`can be accessed as follows. Note:`uid` and `filename` fields are masked

Published: 2024-02-15 15:14:23 Download Dataset

Dataset Properties

Field Value
Authors
  • Karimi, Ahmad Maroof Oak Ridge National Laboratory
  • Khan, Awais Oak Ridge National Laboratory
  • Oral, Sarp Oak Ridge National Laboratory
  • Zimmer, Christopher Oak Ridge National Laboratory
Project Identifier STF218, STF008
Dataset Type ND Numeric Data
Subjects
  • 97 MATHEMATICS AND COMPUTING
Keywords
  • Darshan
  • Summit storage system
Software Needed .parquet file reader
Originating Organizations Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organizations Oak Ridge Leadership Computing Facility, Oak Ridge National Laboratory
DOE Contract DE-AC05-00OR22725

Acknowledgements

Papers using this dataset are requested to include the following text in their acknowledgements:

*Support for 10.13139/OLCF/2305496 is provided by the U.S. Department of Energy, project STF218, STF008 under Contract DE-AC05-00OR22725. This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility.