PDB NextGen Archive Now Provides Intra-molecular Connectivity

JY
Jasmine Young
Thu, Jul 6, 2023 4:31 PM

Dear PDB-l,

Version 1.0 of the next generation archive repository (NextGen) for the
PDB archive was made available in early 2023
http://www.wwpdb.org/news/news?year=2023#63cedad9b5f08ee94ab73826.
This “NextGen” archive hosts enriched atomic coordinate files, in both
PDBx/mmCIF and PDBML formats, with files available to download at
files-nextgen.wwpdb.org https://files-nextgen.wwpdb.org/.

The initial launch of the NextGen archive
http://www.wwpdb.org/news/news?year=2023#63cedad9b5f08ee94ab73826
enriched coordinate files from the core PDB archive with sequence
annotation from external resources such as UniProt, SCOP2 and Pfam at
atom, residue, and chain levels. After consulting with user community,
this release has added intra-molecular connectivity for each residue
present in an entry, helping users transitioning from legacy PDB format
to PDBx/mmCIF format
http://www.wwpdb.org/news/news?year=2023#63ff72ccc031758bf1c30ff7. The
connectivity information includes atom pairs, bond order, aromatic flag,
and stereochemistry as incorporated from the PDB Chemical Component
Dictionary (CCD). Users can extract this information from the
/_chem_comp_bond/ and /_chem_comp_atom/ categories of the
PDBx/mmCIF-formatted files from the NextGen archive.

To transition from legacy PDB format to PDBx/mmCIF, the file naming and
data are structured based on extended PDB IDs
http://www.wwpdb.org/news/news?year=2023#63ff72ccc031758bf1c30ff7 with
a two letter hash code, ‘third from last character' and 'second from
last character’. This hash code will remain consistent once PDB ID codes
are extended beyond four characters with the pdb_ prefix, e.g., PDB
entry 8aly:
https://files-nextgen.wwpdb.org/pdb_nextgen/data/entries/divided/al/pdb_00008aly/pdb_00008aly_xyz-enrich.cif.gz.

Users are encouraged to adopt PDBx/mmCIF format as early as possible.
Learn more about PDBx/mmCIF format and related software resources at
mmcif.wwpdb.org https://mmcif.wwpdb.org/.

In the future, the PDB NextGen archive will continue to be updated with
more enriched annotations from external database resources in the
metadata, building on the content already provided in the structure
model files in the PDB archive at files.wwpdb.org
https://files.wwpdb.org/.

--
Regards,

Jasmine

---==========================
Jasmine Young, Ph.D.
Biocuration Team Lead
RCSB Protein Data Bank
Research Professor
Institute for Quantitative Biomedicine
Rutgers, The State University of New Jersey
174 Frelinghuysen Rd
Piscataway, NJ 08854-8087

Email:jasmine@rcsb.rutgers.edu
Phone: (848)445-0103 ext 4920
Fax: (732)445-4320

---==========================

Dear PDB-l, Version 1.0 of the next generation archive repository (NextGen) for the PDB archive was made available in early 2023 <http://www.wwpdb.org/news/news?year=2023#63cedad9b5f08ee94ab73826>. This “NextGen” archive hosts enriched atomic coordinate files, in both PDBx/mmCIF and PDBML formats, with files available to download at files-nextgen.wwpdb.org <https://files-nextgen.wwpdb.org/>. The initial launch of the NextGen archive <http://www.wwpdb.org/news/news?year=2023#63cedad9b5f08ee94ab73826> enriched coordinate files from the core PDB archive with sequence annotation from external resources such as UniProt, SCOP2 and Pfam at atom, residue, and chain levels. After consulting with user community, this release has added intra-molecular connectivity for each residue present in an entry, helping users transitioning from legacy PDB format to PDBx/mmCIF format <http://www.wwpdb.org/news/news?year=2023#63ff72ccc031758bf1c30ff7>. The connectivity information includes atom pairs, bond order, aromatic flag, and stereochemistry as incorporated from the PDB Chemical Component Dictionary (CCD). Users can extract this information from the /_chem_comp_bond/ and /_chem_comp_atom/ categories of the PDBx/mmCIF-formatted files from the NextGen archive. To transition from legacy PDB format to PDBx/mmCIF, the file naming and data are structured based on extended PDB IDs <http://www.wwpdb.org/news/news?year=2023#63ff72ccc031758bf1c30ff7> with a two letter hash code, ‘third from last character' and 'second from last character’. This hash code will remain consistent once PDB ID codes are extended beyond four characters with the pdb_ prefix, e.g., PDB entry 8aly: https://files-nextgen.wwpdb.org/pdb_nextgen/data/entries/divided/al/pdb_00008aly/pdb_00008aly_xyz-enrich.cif.gz. Users are encouraged to adopt PDBx/mmCIF format as early as possible. Learn more about PDBx/mmCIF format and related software resources at mmcif.wwpdb.org <https://mmcif.wwpdb.org/>. In the future, the PDB NextGen archive will continue to be updated with more enriched annotations from external database resources in the metadata, building on the content already provided in the structure model files in the PDB archive at files.wwpdb.org <https://files.wwpdb.org/>. -- Regards, Jasmine =========================================================== Jasmine Young, Ph.D. Biocuration Team Lead RCSB Protein Data Bank Research Professor Institute for Quantitative Biomedicine Rutgers, The State University of New Jersey 174 Frelinghuysen Rd Piscataway, NJ 08854-8087 Email:jasmine@rcsb.rutgers.edu Phone: (848)445-0103 ext 4920 Fax: (732)445-4320 ===========================================================