Proteome-scale Structure Prediction Data - Pseudodesulfovibrio mercurii
- Davidson, Russell B | Oak Ridge National Laboratory
- Coletti, Mark | Oak Ridge National Laboratory
- Gao, Mu | Georgia Institute of Technology
- Parks, Jerry M | Oak Ridge National Laboratory
- Sedova, Ada | Oak Ridge National Laboratory
Overview
Description
The number of proteins predicted for Pseudodesulfovibrio mercurii is 3,446, each of which have five predicted structures from an AlphaFold run, as well as structural alignment results using the TMscore-based structural alignment method within the APoc program. Specifically, AlphaFold outputs the atoms and coordinates of the protein model in human-readable PDB files and quantitative prediction metrics in Python PICKLE files. The 5 models have been ranked based on the predicted TM-score (pTMS), a quantitative confidence metric output by AlphaFold that reports on protein model quality. The top ranked model has undergone an energy minimization calculation to relax and remove any potential clashes in the atomic coordinates. Structural alignment results are stored in two files for each protein; the top ranked model (as discussed above) is used for all alignment analyses. Both are compressed gzip files that, once unpacked, are human readable. The first file is the TMalign score results and contains the quantitative metrics for the top alignments between the predicted structure and experimental structures from the PDB70, a curated non-redundant database of about 80,000 experimental structures developed by the Soding lab. Each data point in this file is directly associated with one experimental structure; PDB ID and brief meta-data about the protein taken from the PDB70 file are reported alongside the quantitative metrics. The second results file contains the raw results associated with each alignment reported in the score results file. Specifically, the translation and rotation arrays for each alignment are provided so that the structural alignment can be recreated. Additionally, residue-level scores are reported to quantify the closeness of the aligned residues between the predicted and experimental models.
Funding resources
DOE contract number
DE-AC05-00OR22725Originating research organization
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)Sponsoring organization
Office of Science (SC);Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21);Office of Science (SC), Biological and Environmental Research (BER) (SC-23)Related resources
- IsDerivedFrom (DOI): https://doi.org/10.1038/s41586-021-03819-2
- IsDerivedFrom (DOI): https://doi.org/10.1093/bioinformatics/btt024
Details
DOI
10.13139/ORNLNCCS/1861307Release date
April 11, 2022Dataset
Dataset type
ND Numeric DataSoftware
PythonAcknowledgements
Users should acknowledge the OLCF in all publications and presentations that speak to work performed on OLCF resources:
This work was carried out [in part] at Oak Ridge National Laboratory, managed by UT-Battelle, LLC for the U.S. Department of Energy under contract DE-AC05-00OR22725.
Category
- 59 BASIC BIOLOGICAL SCIENCES,
- 71 CLASSICAL AND QUANTUM MECHANICS, GENERAL PHYSICS,
- 99 GENERAL AND MISCELLANEOUS
Keywords
- Structural Proteomics,
- AlphaFold,
- APoc