Supporting data for climatic clustering and longitudinal analysis with impacts on food, bioenergy, and pandemics


This data supports the conclusions found in climatic clustering and longitudinal analysis with impacts on food, bioenergy, and pandemics. Included here are (i) the binarized geolocation vectors used for exhaustive vector comparisons, (ii) the resulting climatic networks, (iii) the results of applying Markov clustering to the climatic networks, and (iv) the results of applying Correlation-of-Correlations (cor-cor) to the climatic networks. The set of binarized geolocation vectors that are used as inputs for the Combinatorial Metrics library (CoMet) are of the form comet-UUUUUxVVVVV-XXXX-YYYY.shuffled.tped where UUUUU is the number of vectors, VVVVV is the length of each vector, XXXX is the starting year, and YYYY is the ending year. Each line corresponds to a geolocation vector of binary elements A (i.e., 0) and T (i.e., 1). The set of climatic networks that are used for downstream network analysis are of the form network-U-way-XXXX-YYYY.parsed.txt where U is the order of the comparison (2-way or 3-way), XXXX is the starting year, and YYYY is the ending year. Each line corresponds to an edge linking two geolocations (defined by latitude and longitude) with its corresponding edge weight (i.e., DUO score). The set of cluster results are of the form clusters-U-way-XXXX-YYYY-thresh-VVVV-inflation-WWW.clustered.txt where U is the order of the comparison (2-way or 3-way), XXXX is the starting year, YYYY is the ending year, VVVV is the similarity threshold, and WWW is the Markov clustering inflation rate. Each line corresponds to a single cluster and is composed of a number of corresponding geolocations (defined by latitude and longitude). The set of cor-cor results are of the form corcor-U-way-XXXX-YYYY.cumulative.txt where U is the order of the comparison (2-way or 3-way), XXXX is the starting year, and YYYY is the ending year. Each line corresponds to a single geolocation with it's corresponding cor-cor value.

Published: 2021-11-18 09:18:07 Download Dataset

Dataset Properties

Field Value
  • Lagergren, John Oak Ridge National Laboratory
  • Cashman, Mikaela Oak Ridge National Laboratory
  • Melesse Vergara, Veronica Oak Ridge National Laboratory
  • Eller, Paul Oak Ridge National Laboratory
  • Gazolla, Joao Oak Ridge National Laboratory
  • Chhetri, Hari Oak Ridge National Laboratory
  • Streich, Jared Oak Ridge National Laboratory
  • Climer, Sharlee University of Missouri
  • Thornton, Peter Oak Ridge National Laboratory
  • Joubert, Wayne Oak Ridge National Laboratory
  • Jacobson, Daniel Oak Ridge National Laboratory
Project Identifier SYB105
Dataset Type ND Numeric Data
  • Exhaustive vector comparison
  • climatic clustering
  • longitudinal network analysis
  • high-performance computing
  • exascale computing
  • predictive modeling
Software Needed CoMet (
Originating Organizations Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States);University of Tennessee, Knoxville, TN; University of Missouri, St. Louis, MO
Sponsoring Organizations Office of Science (SC), Biological and Environmental Research (BER) (SC-23);National Institutes of Health (NIH)
DOE Contract DE-AC05-00OR22725


Papers using this dataset are requested to include the following text in their acknowledgements:

*Support for 10.13139/ORNLNCCS/1828678 is provided by the U.S. Department of Energy, project SYB105 under Contract DE-AC05-00OR22725. This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility.