OLCF Summit Supercomputer GPU Snapshots During Double-Bit Errors and Normal Operations
- Shin, Woong | Oak Ridge National Laboratory
- Oles, Vladyslav | Oak Ridge National Laboratory
- Schmedding, Anna | Williams and Mary
- Ostrouchov, George | Oak Ridge National Laboratory
- Smirni, Evgenia | Williams and Mary
- Engelmann, Christian | Oak Ridge National Laboratory
- Wang, Feiyi | Oak Ridge National Laboratory
Description
Funding Resources
DOE Contract Number
DE-AC05-00OR22725Originating Research Organization
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)Sponsoring Organization
Office of Science (SC)Project Identifier
STF218Related Resources
- IsDerivedFrom (DOI): https://doi.org/10.1145/3458817.3476188
- IsSupplementTo (DOI): https://doi.org/10.13139/OLCF/1861393
- IsSupplementedBy (DOI): https://doi.org/10.1145/3650200.3656615
Details
DOI
10.13139/OLCF/1970187Release Date
April 20, 2023Dataset
Dataset Type
ND Numeric DataOther ID Number(s)
GEN150Acknowledgements
Users should acknowledge the OLCF in all publications and presentations that speak to work performed on OLCF resources:
This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Category
- 97 MATHEMATICS AND COMPUTING,
- 99 GENERAL AND MISCELLANEOUS
Keywords
- High-performance Computing,
- system power and thermal,
- reliability,
- HBM2e,
- GPUs,
- medium temperature water cooling,
- direct liquid cooling