Skip to main content

ORBIT-2 Dataset for Scaling Exascale Vision Foundation Models for Weather and Climate Downscaling

  • Lu, Dan | Oak Ridge National Laboratory
  • Wang, Xiao | Oak Ridge National Laboratory
  • Tsaris, Aristeidis | Oak Ridge National Laboratory
  • Choi, Jong Youl | Oak Ridge National Laboratory
  • Ashfaq, Moetasim | Oak Ridge National Laboratory
Download Dataset
Overview

Description

This dataset release corresponds to the work conducted in ORBIT-2: Scaling Exascale Vision Foundation Models for Weather and Climate Downscaling, where large-scale AI methods were applied to improve climate and weather resolution. The collection integrates four widely used, publicly available datasets: ERA5, PRISM, DAYMET, and IMERG. To prepare the data for ORBIT-2 model training and evaluation, we applied a preprocessing pipeline that generates paired low-resolution and high-resolution samples, enabling supervised downscaling experiments. The transformation from coarse to fine scales was performed using bilinear regridding, consistent with the procedures described in WeatherBench2, a community benchmark for weather and climate AI models. This dataset supports the development and evaluation of foundation models designed for weather and climate downscaling at exascale. Additional details on methodology and applications can be found in Wang et al., ORBIT-2 (arXiv:2505.04802, 2025).

Funding Resources

DOE Contract Number

AC05-00OR22725

Originating Research Organization

Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization

Office of Science (SC)

Project Identifier

LRN036

Related Resources

Details

DOI

10.13139/OLCF/2589526

Release Date

October 10, 2025

Dataset

Dataset Type

ND Numeric Data

Software

The data can be interpreted and analyzed using widely adopted scientific computing and visualization libraries, including NumPy, Matplotlib, Xarray, and scikit-learn. All datasets are provided in the standard NumPy binary format (.npz or .npy).

Acknowledgements

This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Advanced Scientific Computing Research programs in the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Category

  • 54 ENVIRONMENTAL SCIENCES,
  • 58 GEOSCIENCES,
  • 97 MATHEMATICS AND COMPUTING