Transitioning to PDBx/mmCIF and Extended PDB IDs

IP
Irina Persikova
Wed, Jul 16, 2025 3:45 PM
Transitioning to PDBx/mmCIF and Extended PDB IDs

wwPDB strongly encourages all users to adopt the extended PDB ID format
and transition to PDBx/mmCIF file format as soon as possible. This
includes making changes to software; referring to structures by the full
12-character ID in all communications; and encouraging your communities
to do the same.

    Transitioning to Extended PDB IDs

As the PDB archive continues to expand, the four-character PDB accession
codes (PDB IDs) are expected to be fully assigned before 2028. To
support the growth of the archive, the wwPDB has extended the length of
PDB IDs to 12 alphanumeric characters including "pdb_" prefix (e.g.,
"1abc" will become "pdb_00001abc", case insensitive) to improve text
mining capabilities in the published literature. Users or journals will
be able to parse/recognize PDB IDs using the prefix “pdb_”. The prefix
and zeros must be included in the extended PDB ID.

Once four-character PDB IDs are fully assigned, new entries will only
receive extended PDB IDs; data will not be provided in the legacy PDB
file format files.

Access further details, including a transition plan, example files, and
supporting FAQs at wwPDB: Extended PDB ID With 12 Characters
https://www.wwpdb.org/documentation/new-format-for-pdb-ids.

Users can adopt usage of extended PDB IDs for all PDB entries
immediately using the _database_2.pdbx_database_accession data item in
the PDBx/mmCIF formatted structure files.

    New PDB DOI Format

All existing PDB entries with four-character PDB IDs issued have DOI
formatted as 10.2210/pdb[4-character_PDB_ID]/pdb that resolve to the
corresponding wwPDB DOI landing page. For example, PDB entry 8y9m
(pdb_00008y9m) has the DOI https://doi.org/10.2210/pdb8y9m/pdb
https://doi.org/10.2210/pdb8Y9M/pdb. Importantly, this DOI will remain
unchanged in the future.

When all 4-character PDB IDs have been exhausted, all new PDB entries
will be issued extended PDB IDs issued and a NEW DOI formatted as
10.2210/[Extended_PDB_ID]/pdb that will resolve to the corresponding
wwPDB DOI landing page. For example, PDB entry “pdb_10001xyz” will have
the DOI https://doi.org/10.2210/pdb_10001xyz/pdb.

    Transitioning to PDBx/mmCIF Format

To help users adopt extended PDB ID and PDBx/mmCIF file format, wwPDB
offers an mmCIF User Guide
https://mmcif.wwpdb.org/docs/user-guide/guide.html and software
resources https://mmcif.wwpdb.org/docs/software-resources.html such as
mmCIF parsers and CIF Editor.

In addition, wwPDB will provide a Beta PDB Archive organized by extended
PDB ID (including file naming, directories, and datablock naming) in
early 2026. The current PDB archive organizes data files grouped by data
type, e.g., coordinates, experimental data, assemblies, validation
reports, etc.

A major change in the Beta PDB archive will be the re-organization of
file directory at entry level, following the same file organization as
the PDB Versioned Archive
https://www.wwpdb.org/ftp/pdb-versioned-ftp-site. In other words, all
the data files associated to an entry will be grouped together under its
PDB ID (extended PDB ID) with two letter hash. Please watch wwPDB.org
and community bulletin boards for announcements on the file organization
for the Beta PDB archive later this year.

We recommend users fully adopt all of these these changes before the
end of 2026
. Early adoption will contribute to the long-term
sustainability and interoperability of 3D biostructure data across the
scientific community.

In particular, journals should begin adopting the Extended PDB ID format
(in Text, Tables, and Data Availability Statements), updating links
included in journal articles for PDB IDs to the wwPDB DOI landing page
via CrossRef, and verifying software tools linked from a journal article
(e.g., FirstGlance or Jsmol for 3D visualization) support of extended
PDB IDs and PDBx/mmCIF.

Should you have any questions or require further assistance, please do
not hesitate to contact us at info@wwpdb.org. We greatly appreciate your
support and cooperation as we work together to enhance the future of
structural data accessibility.

Please find the full news here:
https://www.wwpdb.org/news/news?year=2025#6875133f3b59581b68019794

On behalf of the wwPDB

--
IRINA PERSIKOVA, Ph.D.
Deputy Biocuration Lead, RCSB Protein Data Bank
Research Associate, Institute for Quantitative Biomedicine
Rutgers, The State University of New Jersey
174 Frelinghuysen Road, Piscataway NJ 08854
P: 848.445.4938 | E: irina.persikova@rcsb.org
rcsb.org <www.rcsb.org> | iqb.rutgers.edu http://iqb.rutgers.edu |
facebook https://www.facebook.com/RCSBPDB | twitter
https://twitter.com/buildmodels

Transitioning to PDBx/mmCIF and Extended PDB IDs wwPDB strongly encourages all users to adopt the extended PDB ID format and transition to PDBx/mmCIF file format as soon as possible. This includes making changes to software; referring to structures by the full 12-character ID in all communications; and encouraging your communities to do the same. Transitioning to Extended PDB IDs As the PDB archive continues to expand, the four-character PDB accession codes (PDB IDs) are expected to be fully assigned before 2028. To support the growth of the archive, the wwPDB has extended the length of PDB IDs to 12 alphanumeric characters including "pdb_" prefix (e.g., "1abc" will become "pdb_00001abc", case insensitive) to improve text mining capabilities in the published literature. Users or journals will be able to parse/recognize PDB IDs using the prefix “pdb_”. The prefix and zeros must be included in the extended PDB ID. Once four-character PDB IDs are fully assigned, new entries will only receive extended PDB IDs; data will not be provided in the legacy PDB file format files. Access further details, including a transition plan, example files, and supporting FAQs at wwPDB: Extended PDB ID With 12 Characters <https://www.wwpdb.org/documentation/new-format-for-pdb-ids>. Users can adopt usage of extended PDB IDs for all PDB entries immediately using the _database_2.pdbx_database_accession data item in the PDBx/mmCIF formatted structure files. New PDB DOI Format All existing PDB entries with four-character PDB IDs issued have DOI formatted as 10.2210/pdb[4-character_PDB_ID]/pdb that resolve to the corresponding wwPDB DOI landing page. For example, PDB entry 8y9m (pdb_00008y9m) has the DOI https://doi.org/10.2210/pdb8y9m/pdb <https://doi.org/10.2210/pdb8Y9M/pdb>. Importantly, this DOI will remain unchanged in the future. When all 4-character PDB IDs have been exhausted, all new PDB entries will be issued extended PDB IDs issued and a NEW DOI formatted as 10.2210/[Extended_PDB_ID]/pdb that will resolve to the corresponding wwPDB DOI landing page. For example, PDB entry “pdb_10001xyz” will have the DOI https://doi.org/10.2210/pdb_10001xyz/pdb. Transitioning to PDBx/mmCIF Format To help users adopt extended PDB ID and PDBx/mmCIF file format, wwPDB offers an mmCIF User Guide <https://mmcif.wwpdb.org/docs/user-guide/guide.html> and software resources <https://mmcif.wwpdb.org/docs/software-resources.html> such as mmCIF parsers and CIF Editor. In addition, wwPDB will provide a Beta PDB Archive organized by extended PDB ID (including file naming, directories, and datablock naming) in early 2026. The current PDB archive organizes data files grouped by data type, e.g., coordinates, experimental data, assemblies, validation reports, etc. A major change in the Beta PDB archive will be the re-organization of file directory at entry level, following the same file organization as the PDB Versioned Archive <https://www.wwpdb.org/ftp/pdb-versioned-ftp-site>. In other words, all the data files associated to an entry will be grouped together under its PDB ID (extended PDB ID) with two letter hash. Please watch wwPDB.org and community bulletin boards for announcements on the file organization for the Beta PDB archive later this year. *We recommend users fully adopt all of these these changes before the end of 2026*. Early adoption will contribute to the long-term sustainability and interoperability of 3D biostructure data across the scientific community. In particular, journals should begin adopting the Extended PDB ID format (in Text, Tables, and Data Availability Statements), updating links included in journal articles for PDB IDs to the wwPDB DOI landing page via CrossRef, and verifying software tools linked from a journal article (e.g., FirstGlance or Jsmol for 3D visualization) support of extended PDB IDs and PDBx/mmCIF. Should you have any questions or require further assistance, please do not hesitate to contact us at info@wwpdb.org. We greatly appreciate your support and cooperation as we work together to enhance the future of structural data accessibility. Please find the full news here: https://www.wwpdb.org/news/news?year=2025#6875133f3b59581b68019794 On behalf of the wwPDB -- IRINA PERSIKOVA, Ph.D. Deputy Biocuration Lead, RCSB Protein Data Bank Research Associate, Institute for Quantitative Biomedicine Rutgers, The State University of New Jersey 174 Frelinghuysen Road, Piscataway NJ 08854 P: 848.445.4938 | E: irina.persikova@rcsb.org rcsb.org <www.rcsb.org> | iqb.rutgers.edu <http://iqb.rutgers.edu> | facebook <https://www.facebook.com/RCSBPDB> | twitter <https://twitter.com/buildmodels>