Skip to main content

Disk Failure Dataset from the Campaign Storage System

  • Ransom, Garret Wilson | Los Alamos National Laboratory
  • George, Anjus | Oak Ridge National Laboratory
Download dataset
Overview

Description

This dataset consists of 1,389 disk (HDD) failure events collected from the Campaign storage system at LANL. The Campaign system supported various compute platforms throughout its lifespan, including Cielo, Fire, Ice, and notably, the Trinity supercomputer. Each recorded event includes its detection timestamp (in ISO 8601 format) and details such as its location within the storage system—rack, enclosure, and drive slot number. The data, spanning from May 4, 2021, to July 25, 2023 (2 years, 2 months, and 22 days), represents failure events from the terminal years of Campaign's operational period, accounting for 26% of its total operational time.

Funding resources

DOE contract number

DE-AC05-00OR22725

Originating research organization

Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)

Other contributing organizations

Oak Ridge National Laboratory, Los Alamos National Laboratory

Sponsoring organization

Office of Science (SC)

Related resources

Details

DOI

10.13139/OLCF/2446874

Release date

September 25, 2024

Dataset

Dataset type

ND Numeric Data

Other ID number(s)

LA-UR-24-30003 (LANL)

Acknowledgements

Papers using this dataset are requested to include the following text in their acknowledgements:

*Support for 10.13139/OLCF/2446874 is provided by the U.S. Department of Energy, project STF008 under Contract DE-AC05-00OR22725. This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility.

Category

  • 97 MATHEMATICS AND COMPUTING

Keywords

  • Disk failures,
  • Parallel File System,
  • MarFS,
  • Campaign,
  • Trinity,
  • HPC storage,
  • Supercomputer