Empathy List Archives

pdb-l@lists.wwpdb.org

The PDB mailing list

View all threads

Resources for Supporting the Extended PDB ID Format (pdb_00001abc)

Jasmine Young

Tue, Jan 9, 2024 6:28 PM

Dear PDB-l,

wwPDB anticipates that all the four character PDB accession codes (PDB
ID) will be consumed by 2029.

With the continuous growth of PDB archive, wwPDB has revised the PDB
accession code format by extending its length and prepending “PDB”
(e.g., "1abc" will become "pdb_00001abc"). This process will enable text
mining detection of PDB entries in the published literature and allow
for more informative and transparent delivery of revised data files.

Entries with extended PDB IDs (12 characters) will not be compatible
with the legacy PDB file format once four-character PDB IDs are
consumed. wwPDB encourages scientific journals, PDB community and users
to transition to using the PDBx/mmCIF format and the extended PDB ID
format as soon as possible.

Resources are available to help PDB users with this transition through
the wwPDB resource portal page (Extended PDB ID With 12 Characters)
http://www.wwpdb.org/documentation/new-format-for-pdb-ids. This page
links to useful resources for handling this change, including an FAQ on
PDB ID extension
http://www.wwpdb.org/documentation/pdb-id-extension-faq, materials to
learn more about PDBx/mmCIF format, and links to other PDBx/mmCIF
resources and software tools. As the transition phase progresses, more
training resources will be added to this page.

Additionally, a PDB “beta” archive will be provided during the
transition phase in 2026. The directory structure of this “beta” archive
will mirror the data organization of the PDB Versioned Archive
http://files-versioned.wwpdb.org/ in the form of
https://files-beta.org/pub/pdb/data/entries//two-letter-hash///pdb_accession_code///entry_data_File_names/.
The two-letter hash will be based on the n-2 and n-3 characters. For
example, PDB entry PDB_12345678 will be under /67/. This will maintain
consistency with the current PDB archive, where e.g. PDB entry 1abc is
under /ab.

Once all the four character PDB accession codes are consumed, this PDB
“beta” archive will become the PDB main archive and the current PDB
archive will be removed.

Download example files containing extended PDB IDs for software adoption
from GitHub https://github.com/wwPDB/extended-wwPDB-identifier-examples.

wwPDB recently announced that PDB three-character Chemical Component IDs
have been consumed.
http://www.wwpdb.org/news/news?year=2023#656f4404d78e004e766a96c6
Five-character alphanumeric accession codes for CCD IDs are now issued
by the OneDep system.

For any further information please contact us at info@wwpdb.org.

--
Regards,

Jasmine

---==========================
Jasmine Young, Ph.D.
Biocuration Team Lead
RCSB Protein Data Bank
Research Professor
Institute for Quantitative Biomedicine
Rutgers, The State University of New Jersey
174 Frelinghuysen Rd
Piscataway, NJ 08854-8087

Email:jasmine@rcsb.rutgers.edu
Phone: (848)445-0103 ext 4920
Fax: (732)445-4320

---==========================

Dear PDB-l, wwPDB anticipates that all the four character PDB accession codes (PDB ID) will be consumed by 2029. With the continuous growth of PDB archive, wwPDB has revised the PDB accession code format by extending its length and prepending “PDB” (e.g., "1abc" will become "pdb_00001abc"). This process will enable text mining detection of PDB entries in the published literature and allow for more informative and transparent delivery of revised data files. Entries with extended PDB IDs (12 characters) will not be compatible with the legacy PDB file format once four-character PDB IDs are consumed. wwPDB encourages scientific journals, PDB community and users to transition to using the PDBx/mmCIF format and the extended PDB ID format as soon as possible. Resources are available to help PDB users with this transition through the wwPDB resource portal page (Extended PDB ID With 12 Characters) <http://www.wwpdb.org/documentation/new-format-for-pdb-ids>. This page links to useful resources for handling this change, including an FAQ on PDB ID extension <http://www.wwpdb.org/documentation/pdb-id-extension-faq>, materials to learn more about PDBx/mmCIF format, and links to other PDBx/mmCIF resources and software tools. As the transition phase progresses, more training resources will be added to this page. Additionally, a PDB “beta” archive will be provided during the transition phase in 2026. The directory structure of this “beta” archive will mirror the data organization of the PDB Versioned Archive <http://files-versioned.wwpdb.org/> in the form of https://files-beta.org/pub/pdb/data/entries//two-letter-hash///pdb_accession_code///entry_data_File_names/. The two-letter hash will be based on the n-2 and n-3 characters. For example, PDB entry PDB_12345678 will be under /67/. This will maintain consistency with the current PDB archive, where e.g. PDB entry 1abc is under /ab. Once all the four character PDB accession codes are consumed, this PDB “beta” archive will become the PDB main archive and the current PDB archive will be removed. Download example files containing extended PDB IDs for software adoption from GitHub <https://github.com/wwPDB/extended-wwPDB-identifier-examples>. wwPDB recently announced that PDB three-character Chemical Component IDs have been consumed. <http://www.wwpdb.org/news/news?year=2023#656f4404d78e004e766a96c6> Five-character alphanumeric accession codes for CCD IDs are now issued by the OneDep system. For any further information please contact us at info@wwpdb.org. -- Regards, Jasmine =========================================================== Jasmine Young, Ph.D. Biocuration Team Lead RCSB Protein Data Bank Research Professor Institute for Quantitative Biomedicine Rutgers, The State University of New Jersey 174 Frelinghuysen Rd Piscataway, NJ 08854-8087 Email:jasmine@rcsb.rutgers.edu Phone: (848)445-0103 ext 4920 Fax: (732)445-4320 ===========================================================