Supporting information for Few-Shot Learning Enables Population-Scale Analysis of Leaf Traits in Populus trichocarpa

10.13139/ORNLNCCS/1908723

In this work, we use few-shot learning to segment the body and vein architecture of P. trichocarpa leaves from high-resolution scans obtained in the UC Davis common garden. Leaf and vein segmentation are formulated as separate tasks, in which convolutional neural networks (CNNs) are used to iteratively expand partial segmentations until reaching stopping criteria. Our leaf and vein segmentation approaches use just 50 and 8 manually traced images for training, respectively, and are applied to a set of 2,634 top and bottom leaf scans. We show that both methods achieve high segmentation accuracy, in some cases exceeding even human-level segmentation. The leaf and vein segmentations are subsequently used to extract 68 morphological traits using traditional open-source image processing tools, which are validated using real-world physical measurements. For a biological perspective, we perform a genome-wide association study using the vein density trait to discover novel genetic architectures associated with multiple physiological processes relating to leaf development and function. In addition to sharing all of the few-shot learning code (see https://github.com/jlager/few-shot-leaf-segmentation), we are releasing all images, manual segmentations, model predictions, 68 extracted leaf phenotypes, and a new set of SNPs called against the v4 P. trichocarpa genome for 1,419 genotypes. The data folder includes all images, ground truth segmentations, predicted segmentations, and extracted leaf traits. All images encode the sample ID in the file name by indicating the treatment, block, row, position, and leaf side, respectively. For example, the file, C_1_1_2_bot.jpeg, indicates the control treatment, block 1, row 1, position 2, and the bottom side of the leaf. Tabulated results include position IDs as well as the corresponding genotype IDs. The images folder includes the 2,906 high-resolution leaf scans taken in the field. The leaf_masks folder includes 50 ground truth segmentations used for training the leaf tracing algorithm. The leaf_preds folder includes the 2,906 predicted segmentations from the leaf tracing algorithm. The vein_masks folder includes 8 ground truth segmentations used for training the vein growing algorithm. The vein_preds folder includes the 1,453 predicted segmentations from the vein growing algorithm. The vein_probs folder includes the 1,453 predicted probability maps from the vein growing algorithm before thresholding. The genomes folder includes the set of SNPs called against the v4 P. trichocarpa genome for 1,419 genotypes with a README file detailing the steps taken. The results folder includes: (i) raw values of the 68 predicted leaf traits in digital_traits.tsv, (ii) manually measured values of petiole length and width in manual_traits.tsv, (iii) thin plate spline (TPS) adjusted values of the vein density trait in vein_density_tps_adj.tsv, (iv) best linear unbiased prediction (BLUP) adjusted values of the vein density trait in vein_density_blups.tsv, and (v) GWAS results for the vein density trait, including chromosome positions and corresponding P values, in gwas_results.csv.

Published: 2023-01-26 14:37:16 Download Dataset

Dataset Properties

Field Value
Authors
  • Lagergren, John Oak Ridge National Laboratory
  • Pavicic, Mirko Oak Ridge National Laboratory
  • Chhetri, Hari Oak Ridge National Laboratory
  • York, Larry Oak Ridge National Laboratory
  • Hyatt, Doug University of Tennessee, Knoxville
  • Kainer, David Oak Ridge National Laboratory
  • Rutter, Erica M University of California, Merced
  • Flores, Kevin North Carolina State University
  • Taylor, Gail University of California, Davis
  • Jacobson, Daniel Oak Ridge National Laboratory
  • Streich, Jared Oak Ridge National Laboratory
Project Identifier SYB105
Dataset Type IP Still Images or Photos
Subjects
  • 09 BIOMASS FUELS
  • 59 BASIC BIOLOGICAL SCIENCES
Keywords
  • Few-shot learning
  • image-based plant phenotyping
  • genomic analysis
Originating Organizations Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organizations Office of Science (SC), Biological and Environmental Research (BER) (SC-23);USDOE; ORNL Laboratory Directed Research and Development (LDRD)
DOE Contract DE-AC05-00OR22725

Acknowledgements

Papers using this dataset are requested to include the following text in their acknowledgements:

*Support for 10.13139/ORNLNCCS/1908723 is provided by the U.S. Department of Energy, project SYB105 under Contract DE-AC05-00OR22725. This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility.