Proteome-scale Structure Prediction Data - Pseudodesulfovibrio mercurii

Davidson, Russell B | Oak Ridge National Laboratory
Coletti, Mark | Oak Ridge National Laboratory
Gao, Mu | Georgia Institute of Technology
Parks, Jerry M | Oak Ridge National Laboratory
Sedova, Ada | Oak Ridge National Laboratory

Overview

Description

The number of proteins predicted for Pseudodesulfovibrio mercurii is 3,446, each of which have five predicted structures from an AlphaFold run, as well as structural alignment results using the TMscore-based structural alignment method within the APoc program. Specifically, AlphaFold outputs the atoms and coordinates of the protein model in human-readable PDB files and quantitative prediction metrics in Python PICKLE files. The 5 models have been ranked based on the predicted TM-score (pTMS), a quantitative confidence metric output by AlphaFold that reports on protein model quality. The top ranked model has undergone an energy minimization calculation to relax and remove any potential clashes in the atomic coordinates. Structural alignment results are stored in two files for each protein; the top ranked model (as discussed above) is used for all alignment analyses. Both are compressed gzip files that, once unpacked, are human readable. The first file is the TMalign score results and contains the quantitative metrics for the top alignments between the predicted structure and experimental structures from the PDB70, a curated non-redundant database of about 80,000 experimental structures developed by the Soding lab. Each data point in this file is directly associated with one experimental structure; PDB ID and brief meta-data about the protein taken from the PDB70 file are reported alongside the quantitative metrics. The second results file contains the raw results associated with each alignment reported in the score results file. Specifically, the translation and rotation arrays for each alignment are provided so that the structural alignment can be recreated. Additionally, residue-level scores are reported to quantify the closeness of the aligned residues between the predicted and experimental models.

Funding Resources

DOE Contract Number

DE-AC05-00OR22725

Originating Research Organization

Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization

Office of Science (SC);Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21);Office of Science (SC), Biological and Environmental Research (BER) (SC-23)

Project Identifier

bif135_pdsvm

Related Resources

IsDerivedFrom (DOI): https://doi.org/10.1038/s41586-021-03819-2
IsDerivedFrom (DOI): https://doi.org/10.1093/bioinformatics/btt024

Details

DOI

10.13139/ORNLNCCS/1861307

Release Date

April 11, 2022

Dataset

Dataset Type

ND Numeric Data

Software

Python

Acknowledgements

Users should acknowledge the OLCF in all publications and presentations that speak to work performed on OLCF resources:

This work was carried out [in part] at Oak Ridge National Laboratory, managed by UT-Battelle, LLC for the U.S. Department of Energy under contract DE-AC05-00OR22725.

Keywords

Structural Proteomics,
AlphaFold,
APoc