OLCF Summit Supercomputer GPU Snapshots During Double-Bit Errors and Normal Operations
-
Woong Shin | Oak Ridge National Laboratory
Vladyslav Oles | Oak Ridge National Laboratory
Anna Schmedding | Williams and Mary
George Ostrouchov | Oak Ridge National Laboratory
Evgenia Smirni | Williams and Mary
Description
Funding Information
DOE Contract Number
DE-AC05-00OR22725Originating Research Organization
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)Sponsoring Organization
Office of Science (SC)Related Works
- IsDerivedFrom (DOI): https://doi.org/10.1145/3458817.3476188
- IsSupplementTo (DOI): https://doi.org/10.13139/OLCF/1861393
- IsSupplementedBy (DOI): https://doi.org/10.1145/3650200.3656615
Details
Release Date
April 20, 2023Subject
97 MATHEMATICS AND COMPUTING, 99 GENERAL AND MISCELLANEOUSKeywords
High-performance Computing, system power and thermal, reliability, HBM2e, GPUs, medium temperature water cooling, direct liquid coolingDataset
Dataset Type
ND Numeric DataOther ID Number(s)
GEN150Cite This Dataset:
Shin, W., Oles, V., Schmedding, A., Ostrouchov, G., Smirni, E., Engelmann, C., Wang, F. (2023). OLCF Summit Supercomputer GPU Snapshots During Double-Bit Errors and Normal Operations. Oak Ridge National Laboratory. https://doi.org/10.13139/OLCF/1970187.
Acknowledgements
This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Advanced Scientific Computing Research programs in the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.