Future Planning: PDB entries with extended CCD or PDB IDs will be distributed in the PDBx/mmCIF format only

JY
Jasmine Young
Mon, Sep 26, 2022 3:27 PM

Dear PDB-l,

wwPDB, in collaboration with the PDBx/mmCIF Working Group
http://www.wwpdb.org/task/mmcif, has set plans to extend the length of
accession codes (IDs) for PDB and Chemical Component Dictionary (CCD)
entries in the future. PDB entries containing these extended IDs will
not be supported by the legacy PDB file format. (see previous
announcement
http://www.wwpdb.org/news/news?year=2021#607760112786e73a79c76f9d)

    CCD ID extension

CCD entries are currently identified by unique three-character
alphanumeric IDs. At current growth rates, we anticipate running out of
three-character IDs before 2024. After this point, the wwPDB will issue
five-character alphanumeric accession codes for CCD IDs in the OneDep
system. To avoid confusion with current four-character PDB IDs,
four-character codes will not be used. Owing to limitations of the
legacy PDB file format, PDB entries containing the new five character ID
codes will only be distributed in PDBx/mmCIF format.

In addition, wwPDB has reserved a set of CCD IDs: 01 - 99, DRG, INH, LIG
that will never be used in the PDB. These reserved codes can be used for
new ligands during structure determination so that they can be
identified as new upon deposition and added to the CCD during biocuration.

    PDB ID extension

wwPDB will be extending PDB ID length to eight characters prefixed by
‘pdb’, e.g., pdb_00001abc. Each PDB entry has a corresponding Digital
Object Identifier (DOI), often required for manuscript submission to
journals and described in publications by the structure authors.
Extended PDB IDs and corresponding PDB DOIs have been included in the
PDBx/mmCIF formatted atomic coordinate files for all new and re-released
entries since August 2021.

For example, PDB entry issued with 4-character PDB ID, 1abc, will have
the extended PDB ID (pdb_00001abc) and corresponding PDB DOI
(10.2210/pdb1abc/pdb), as listed in the _database_2 PDBx/mmCIF category.

loop_
_database_2.database_id
_database_2.database_code
_database_2.pdbx_database_accession
_database_2.pdbx_DOI
PDB 1abc pdb_00001abc 10.2210/pdb1abc/pdb

For example, PDB entry issued with 8-character PDB ID, pdb_00099xyz,
after all 4-character IDs are consumed:

loop_
_database_2.database_id
_database_2.database_code
_database_2.pdbx_database_accession
_database_2.pdbx_DOI
PDB pdb_00099xyz pdb_00099xyz 10.2210/pdb_00099xyz/pdb

After all four-character PDB IDs are consumed, newly-deposited PDB
entries will only be issued extended PDB ID codes, and PDB entries will
only be distributed in PDBx/mmCIF format. PDB entries with
four-character PDB IDs will remain unchanged.

    Resources

wwPDB is asking users and software developers to review their code and
remove any current limitations on PDB and CCD ID lengths, and to enable
use of PDBx/mmCIF format files. Example files with extended PDB and/or
CCD IDs are available via github to assist code revisions, see
https://github.com/wwPDB/extended-wwPDB-identifier-examples. To learn
about PDBx/mmCIF, please visit https://mmcif.wwpdb.org/.

For any further information please contact us at info@wwpdb.org.

--
Regards,

Jasmine

---==========================
Jasmine Young, Ph.D.
Biocuration Team Lead
RCSB Protein Data Bank
Research Professor
Institute for Quantitative Biomedicine
Rutgers, The State University of New Jersey
174 Frelinghuysen Rd
Piscataway, NJ 08854-8087

Email:jasmine@rcsb.rutgers.edu
Phone: (848)445-0103 ext 4920
Fax: (732)445-4320

---==========================

Dear PDB-l, wwPDB, in collaboration with the PDBx/mmCIF Working Group <http://www.wwpdb.org/task/mmcif>, has set plans to extend the length of accession codes (IDs) for PDB and Chemical Component Dictionary (CCD) entries in the future. PDB entries containing these extended IDs will not be supported by the legacy PDB file format. (see previous announcement <http://www.wwpdb.org/news/news?year=2021#607760112786e73a79c76f9d>) CCD ID extension CCD entries are currently identified by unique three-character alphanumeric IDs. At current growth rates, we anticipate running out of three-character IDs before 2024. After this point, the wwPDB will issue *five-character alphanumeric accession codes for CCD IDs* in the OneDep system. To avoid confusion with current four-character PDB IDs, four-character codes will not be used. Owing to limitations of the legacy PDB file format, PDB entries containing the new five character ID codes will only be distributed in PDBx/mmCIF format. In addition, wwPDB has reserved a set of CCD IDs: 01 - 99, DRG, INH, LIG that will never be used in the PDB. These reserved codes can be used for new ligands during structure determination so that they can be identified as new upon deposition and added to the CCD during biocuration. PDB ID extension wwPDB will be extending PDB ID length to eight characters prefixed by ‘pdb’, e.g., pdb_00001abc. Each PDB entry has a corresponding Digital Object Identifier (DOI), often required for manuscript submission to journals and described in publications by the structure authors. Extended PDB IDs and corresponding PDB DOIs have been included in the PDBx/mmCIF formatted atomic coordinate files for all new and re-released entries since August 2021. For example, PDB entry issued with 4-character PDB ID, 1abc, will have the extended PDB ID (pdb_00001abc) and corresponding PDB DOI (10.2210/pdb1abc/pdb), as listed in the _database_2 PDBx/mmCIF category. loop_ _database_2.database_id _database_2.database_code _database_2.pdbx_database_accession _database_2.pdbx_DOI PDB 1abc pdb_00001abc 10.2210/pdb1abc/pdb For example, PDB entry issued with 8-character PDB ID, pdb_00099xyz, after all 4-character IDs are consumed: loop_ _database_2.database_id _database_2.database_code _database_2.pdbx_database_accession _database_2.pdbx_DOI PDB pdb_00099xyz pdb_00099xyz 10.2210/pdb_00099xyz/pdb After all four-character PDB IDs are consumed, newly-deposited PDB entries will only be issued extended PDB ID codes, and PDB entries will only be distributed in PDBx/mmCIF format. PDB entries with four-character PDB IDs will remain unchanged. Resources wwPDB is asking users and software developers to review their code and remove any current limitations on PDB and CCD ID lengths, and to enable use of PDBx/mmCIF format files. Example files with extended PDB and/or CCD IDs are available via github to assist code revisions, see https://github.com/wwPDB/extended-wwPDB-identifier-examples. To learn about PDBx/mmCIF, please visit https://mmcif.wwpdb.org/. For any further information please contact us at info@wwpdb.org. -- Regards, Jasmine =========================================================== Jasmine Young, Ph.D. Biocuration Team Lead RCSB Protein Data Bank Research Professor Institute for Quantitative Biomedicine Rutgers, The State University of New Jersey 174 Frelinghuysen Rd Piscataway, NJ 08854-8087 Email:jasmine@rcsb.rutgers.edu Phone: (848)445-0103 ext 4920 Fax: (732)445-4320 ===========================================================