Disk Failure Dataset from the Campaign Storage System
- Ransom, Garret Wilson | Los Alamos National Laboratory
 - George, Anjus | Oak Ridge National Laboratory
 
          Overview
        
        Description
            This dataset consists of 1,389 disk (HDD) failure events collected from the Campaign storage system at LANL. The Campaign system supported various compute platforms throughout its lifespan, including Cielo, Fire, Ice, and notably, the Trinity supercomputer. Each recorded event includes its detection timestamp (in ISO 8601 format) and details such as its location within the storage system—rack, enclosure, and drive slot number. The data, spanning from May 4, 2021, to July 25, 2023 (2 years, 2 months, and 22 days), represents failure events from the terminal years of Campaign's operational period, accounting for 26% of its total operational time.
          
        Funding Resources
DOE Contract Number
DE-AC05-00OR22725Originating Research Organization
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)Other Contributing Organizations
Oak Ridge National Laboratory, Los Alamos National LaboratorySponsoring Organization
Office of Science (SC)Project Identifier
STF008Related Resources
- Continues (DOI): https://doi.org/10.1145/3624062.3624119
 - Continues (DOI): https://doi.org/10.13139/OLCF/2441482
 - Continues (DOI): https://doi.org/10.13139/ORNLNCCS/1868941
 
Details
DOI
10.13139/OLCF/2446874Release Date
September 25, 2024Dataset
Dataset Type
ND Numeric DataOther ID Number(s)
LA-UR-24-30003 (LANL)Acknowledgements
This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Advanced Scientific Computing Research programs in the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Category
- 97 MATHEMATICS AND COMPUTING
 
Keywords
- Disk failures,
 - Parallel File System,
 - MarFS,
 - Campaign,
 - Trinity,
 - HPC storage,
 - Supercomputer