Proteome-scale Structure Prediction Data - Pseudodesulfovibrio mercurii

10.13139/ORNLNCCS/1861307

The number of proteins predicted for Pseudodesulfovibrio mercurii is 3,446, each of which have five predicted structures from an AlphaFold run, as well as structural alignment results using the TMscore-based structural alignment method within the APoc program. Specifically, AlphaFold outputs the atoms and coordinates of the protein model in human-readable PDB files and quantitative prediction metrics in Python PICKLE files. The 5 models have been ranked based on the predicted TM-score (pTMS), a quantitative confidence metric output by AlphaFold that reports on protein model quality. The top ranked model has undergone an energy minimization calculation to relax and remove any potential clashes in the atomic coordinates. Structural alignment results are stored in two files for each protein; the top ranked model (as discussed above) is used for all alignment analyses. Both are compressed gzip files that, once unpacked, are human readable. The first file is the TMalign score results and contains the quantitative metrics for the top alignments between the predicted structure and experimental structures from the PDB70, a curated non-redundant database of about 80,000 experimental structures developed by the Soding lab. Each data point in this file is directly associated with one experimental structure; PDB ID and brief meta-data about the protein taken from the PDB70 file are reported alongside the quantitative metrics. The second results file contains the raw results associated with each alignment reported in the score results file. Specifically, the translation and rotation arrays for each alignment are provided so that the structural alignment can be recreated. Additionally, residue-level scores are reported to quantify the closeness of the aligned residues between the predicted and experimental models.

Published: 2022-04-11 16:28:43 Download Dataset

Dataset Properties

Field Value
Authors
  • Davidson, Russell B Oak Ridge National Laboratory
  • Coletti, Mark Oak Ridge National Laboratory
  • Gao, Mu Georgia Institute of Technology
  • Parks, Jerry M Oak Ridge National Laboratory
  • Sedova, Ada Oak Ridge National Laboratory
Project Identifier bif135_pdsvm
Dataset Type ND Numeric Data
Subjects
  • 59 BASIC BIOLOGICAL SCIENCES
  • 71 CLASSICAL AND QUANTUM MECHANICS, GENERAL PHYSICS
  • 99 GENERAL AND MISCELLANEOUS
Keywords
  • Structural Proteomics
  • AlphaFold
  • APoc
Software Needed Python
Originating Organizations Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organizations Office of Science (SC);Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21);Office of Science (SC), Biological and Environmental Research (BER) (SC-23)
DOE Contract DE-AC05-00OR22725
Related Identifiers
  • IsDerivedFrom (DOI) 10.1038/s41586-021-03819-2
  • IsDerivedFrom (DOI) 10.1093/bioinformatics/btt024

Acknowledgements

Papers using this dataset are requested to include the following text in their acknowledgements:

*Support for 10.13139/ORNLNCCS/1861307 is provided by the U.S. Department of Energy, project bif135_pdsvm under Contract DE-AC05-00OR22725. This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility.