ORBIT-2 Dataset for Scaling Exascale Vision Foundation Models for Weather and Climate Downscaling
- Lu, Dan | Oak Ridge National Laboratory
- Wang, Xiao | Oak Ridge National Laboratory
- Tsaris, Aristeidis | Oak Ridge National Laboratory
- Choi, Jong Youl | Oak Ridge National Laboratory
- Ashfaq, Moetasim | Oak Ridge National Laboratory
Overview
Description
This dataset release corresponds to the work conducted in ORBIT-2: Scaling Exascale Vision Foundation Models for Weather and Climate Downscaling, where large-scale AI methods were applied to improve climate and weather resolution. The collection integrates four widely used, publicly available datasets: ERA5, PRISM, DAYMET, and IMERG.
To prepare the data for ORBIT-2 model training and evaluation, we applied a preprocessing pipeline that generates paired low-resolution and high-resolution samples, enabling supervised downscaling experiments. The transformation from coarse to fine scales was performed using bilinear regridding, consistent with the procedures described in WeatherBench2, a community benchmark for weather and climate AI models.
This dataset supports the development and evaluation of foundation models designed for weather and climate downscaling at exascale. Additional details on methodology and applications can be found in Wang et al., ORBIT-2 (arXiv:2505.04802, 2025).
Funding Resources
DOE Contract Number
AC05-00OR22725Originating Research Organization
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)Sponsoring Organization
Office of Science (SC)Project Identifier
LRN036Related Resources
- References (URL): https://arxiv.org/abs/2505.04802
- References (URL): https://weatherbench2.readthedocs.io/en/latest/index.html
- References (URL): https://prism.oregonstate.edu
- References (URL): https://daymet.ornl.gov
- References (URL): https://gpm.nasa.gov/data/imerg
Details
DOI
10.13139/OLCF/2589526Release Date
October 10, 2025Dataset
Dataset Type
ND Numeric DataSoftware
The data can be interpreted and analyzed using widely adopted scientific computing and visualization libraries, including NumPy, Matplotlib, Xarray, and scikit-learn. All datasets are provided in the standard NumPy binary format (.npz or .npy).Acknowledgements
This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Advanced Scientific Computing Research programs in the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Category
- 54 ENVIRONMENTAL SCIENCES,
- 58 GEOSCIENCES,
- 97 MATHEMATICS AND COMPUTING