Dear colleagues,
We are pleased to announce the release of the PDBe mmCIF Validator, a
tool for validating mmCIF files against the official PDBx/mmCIF
dictionary. This tool aims at making pre-deposition validation more
accessible, help ensuring dictionary compliance and data quality before
submission via the wwPDB OneDep system.
The PDBe mmCIF Validator is available from:
_GitHub Repository_ <https://github.com/PDBeurope/mmcif-validator>
_VSCode Marketplace_
<https://marketplace.visualstudio.com/items?itemName=PDBEurope.pdbe-mmcif-validator>
_Open VSX_
<https://open-vsx.org/extension/PDBEurope/pdbe-mmcif-validator>
_PyPI (Python package)_ <https://pypi.org/project/pdbe-mmcif-validator/>
The tool is available in two complementary forms:
Visual Studio Code extension
Real-time validation as you edit (on open, save, and with debounced
validation on changes)
Errors and warnings highlighted in the editor with precise positioning
Full syntax highlighting and hover information for mmCIF tags
Metadata completeness indicator: percentage score in the status bar
and a dedicated Metadata Completeness view in the sidebar (missing
categories/items, method detection for xray/em/nmr)
Works out-of-the-box with automatic dictionary download from the
official wwPDB repository
- Standalone Python script
Command-line validation for single files or batch processing
No external dependencies (Python standard library only; Python 3.7+)
Machine-readable JSON output with line numbers, severity, and
character positions for CI/CD integration
Exit codes (0/1) for automated workflows
Supports local dictionary files or download from a URL
Both implementations share the same validation engine, ensuring
consistent results.
The validator performs comprehensive checks including: mandatory item
presence (category-aware), enumeration and data type validation, range
constraints (with distinction between strictly allowed and advisory
ranges), parent/child category relationships, foreign key and composite
key integrity, operation expression validation (e.g. for virus
assemblies), and duplicate category/item detection. It also applies
regex-based validation from deposition-specific dictionary categories
(e.g. email, ORCID ID, PDB ID) aligned with OneDep requirements.
The tool is aimed at structural biologists preparing structures for
deposition, and institutions implementing automated quality control
pipelines.
The software is freely available under the MIT licence. It is developed
by Deborah Harrus and maintained by the Protein Data Bank in Europe
(PDBe) at EMBL-EBI. We acknowledge the wwPDB mmCIF working group and our
wwPDB partners for maintaining the PDBx/mmCIF dictionary.
Demo videos are available here:
https://www.youtube.com/watch?v=CCkC9Bc6FY8
https://www.youtube.com/watch?v=CCkC9Bc6FY8
https://www.youtube.com/watch?v=li7ETeSA8FI
https://www.youtube.com/watch?v=li7ETeSA8FI
We welcome feedback and contributions. For questions or issues, please
email pdbehelp@ebi.ac.uk mailto:pdbehelp@ebi.ac.uk or use the GitHub
repository.
Kind regards,
Deborah Harrus
Protein Data Bank in Europe (PDBe) at EMBL-EBI
--
I may send communications outside of your normal working hours. Please do not feel obligated to reply until your normal working hours.
Deborah Harrus, Ph.D.
PDBe Archive Project Leader, Biocuration Lead
Protein Data Bank in Europe
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD UK
Dear colleagues,
We are pleased to announce the release of the PDBe mmCIF Validator, a
tool for validating mmCIF files against the official PDBx/mmCIF
dictionary. This tool aims at making pre-deposition validation more
accessible, help ensuring dictionary compliance and data quality before
submission via the wwPDB OneDep system.
The PDBe mmCIF Validator is available from:
*
_GitHub Repository_ <https://github.com/PDBeurope/mmcif-validator>
*
_VSCode Marketplace_
<https://marketplace.visualstudio.com/items?itemName=PDBEurope.pdbe-mmcif-validator>
*
_Open VSX_
<https://open-vsx.org/extension/PDBEurope/pdbe-mmcif-validator>
*
_PyPI (Python package)_ <https://pypi.org/project/pdbe-mmcif-validator/>
The tool is available in two complementary forms:
1.
Visual Studio Code extension
*
Real-time validation as you edit (on open, save, and with debounced
validation on changes)
*
Errors and warnings highlighted in the editor with precise positioning
*
Full syntax highlighting and hover information for mmCIF tags
*
Metadata completeness indicator: percentage score in the status bar
and a dedicated Metadata Completeness view in the sidebar (missing
categories/items, method detection for xray/em/nmr)
*
Works out-of-the-box with automatic dictionary download from the
official wwPDB repository
2. Standalone Python script
*
Command-line validation for single files or batch processing
*
No external dependencies (Python standard library only; Python 3.7+)
*
Machine-readable JSON output with line numbers, severity, and
character positions for CI/CD integration
*
Exit codes (0/1) for automated workflows
*
Supports local dictionary files or download from a URL
Both implementations share the same validation engine, ensuring
consistent results.
The validator performs comprehensive checks including: mandatory item
presence (category-aware), enumeration and data type validation, range
constraints (with distinction between strictly allowed and advisory
ranges), parent/child category relationships, foreign key and composite
key integrity, operation expression validation (e.g. for virus
assemblies), and duplicate category/item detection. It also applies
regex-based validation from deposition-specific dictionary categories
(e.g. email, ORCID ID, PDB ID) aligned with OneDep requirements.
The tool is aimed at structural biologists preparing structures for
deposition, and institutions implementing automated quality control
pipelines.
The software is freely available under the MIT licence. It is developed
by Deborah Harrus and maintained by the Protein Data Bank in Europe
(PDBe) at EMBL-EBI. We acknowledge the wwPDB mmCIF working group and our
wwPDB partners for maintaining the PDBx/mmCIF dictionary.
Demo videos are available here:
https://www.youtube.com/watch?v=CCkC9Bc6FY8
<https://www.youtube.com/watch?v=CCkC9Bc6FY8>
https://www.youtube.com/watch?v=li7ETeSA8FI
<https://www.youtube.com/watch?v=li7ETeSA8FI>
We welcome feedback and contributions. For questions or issues, please
email pdbehelp@ebi.ac.uk <mailto:pdbehelp@ebi.ac.uk> or use the GitHub
repository.
Kind regards,
Deborah Harrus
Protein Data Bank in Europe (PDBe) at EMBL-EBI
--
-------------------------------------------------------------------
*I may send communications outside of your normal working hours. Please do not feel obligated to reply until your normal working hours.*
Deborah Harrus, Ph.D.
PDBe Archive Project Leader, Biocuration Lead
Protein Data Bank in Europe
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD UK
http://www.PDBe.org
-------------------------------------------------------------------