Clustering of distinct protein conformations across the whole PDB archive

DA
David Armstrong
Mon, Jul 17, 2023 9:21 AM

Dear pdb-l,

The PDBe team is excited to unveil a new process for identifying protein
conformational states across the Protein Data Bank (PDB) archive. Using
deterministic clustering, we have developed a data pipeline that offers
unprecedented insights into the structural variability of proteins.

The pipeline considers all proteins in the PDB with 100% sequence
identity and calculates a global dissimilarity score called "GLOCON".
Chains with similar scores are clustered together. This process enables
us to explore the range of conformation across protein structures,
capturing both minor conformational differences and large domain movements.

The results of the data pipeline are available through PDBe-KB
aggregated views of proteins. Users can visualise the results per
UniProt accessions using the "3D view of superposed structures" button
to explore the structures in the Mol* viewer.

In a number of case studies, this clustering process can indicate
significant biological relevance. Some case studies, such as hexokinase,
aldose reductase and KaiB, demonstrate the ability of the pipeline to
identify distinct protein conformations and shed light on their
biological significance.

To find out more, view the full news item at
https://www.ebi.ac.uk/pdbe/news/illuminating-protein-conformational-landscapes-pdb-archive

We also encourage you to view our pre-print at
https://doi.org/10.1101/2023.07.13.545008

If you have any questions or would like further details, please feel
free to contact the team at pdbehelp@ebi.ac.uk.

Kind regards,
David

--
David Armstrong
Outreach and Training Lead
PDBe
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD UK

Dear pdb-l, The PDBe team is excited to unveil a new process for identifying protein conformational states across the Protein Data Bank (PDB) archive. Using deterministic clustering, we have developed a data pipeline that offers unprecedented insights into the structural variability of proteins. The pipeline considers all proteins in the PDB with 100% sequence identity and calculates a global dissimilarity score called "GLOCON". Chains with similar scores are clustered together. This process enables us to explore the range of conformation across protein structures, capturing both minor conformational differences and large domain movements. The results of the data pipeline are available through PDBe-KB aggregated views of proteins. Users can visualise the results per UniProt accessions using the "3D view of superposed structures" button to explore the structures in the Mol* viewer. In a number of case studies, this clustering process can indicate significant biological relevance. Some case studies, such as hexokinase, aldose reductase and KaiB, demonstrate the ability of the pipeline to identify distinct protein conformations and shed light on their biological significance. To find out more, view the full news item at https://www.ebi.ac.uk/pdbe/news/illuminating-protein-conformational-landscapes-pdb-archive We also encourage you to view our pre-print at https://doi.org/10.1101/2023.07.13.545008 If you have any questions or would like further details, please feel free to contact the team at pdbehelp@ebi.ac.uk. Kind regards, David -- David Armstrong Outreach and Training Lead PDBe European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UK
EM
Eric Martz
Mon, Jul 17, 2023 3:38 PM

Dear David and team,

This is a fabulous new tool! Are you planning to incorporate an option
to do a linear interpolation morph, and display the animation? For pairs
of structures, I think that would greatly help to visualize where and
how the structures differ most and least. I realize that implementing
this in a general manner for all possible pairs may be challenging.
However, I think a straightforward basic implementation would be
illuminating in most cases.

( See https://proteopedia.org/w/Morphs )

Best regards, -Eric

http://Martz.MolviZ.Org

On 7/17/23 5:21 AM, David Armstrong via pdb-l wrote:

Dear pdb-l,

The PDBe team is excited to unveil a new process for identifying
protein conformational states across the Protein Data Bank (PDB)
archive. Using deterministic clustering, we have developed a data
pipeline that offers unprecedented insights into the structural
variability of proteins.

[snip]

Dear David and team, This is a fabulous new tool! Are you planning to incorporate an option to do a linear interpolation morph, and display the animation? For pairs of structures, I think that would greatly help to visualize where and how the structures differ most and least. I realize that implementing this in a general manner for all possible pairs may be challenging. However, I think a straightforward basic implementation would be illuminating in most cases. ( See https://proteopedia.org/w/Morphs ) Best regards, -Eric http://Martz.MolviZ.Org On 7/17/23 5:21 AM, David Armstrong via pdb-l wrote: > Dear pdb-l, > > The PDBe team is excited to unveil a new process for identifying > protein conformational states across the Protein Data Bank (PDB) > archive. Using deterministic clustering, we have developed a data > pipeline that offers unprecedented insights into the structural > variability of proteins. > > [snip]
DA
David Armstrong
Wed, Jul 19, 2023 8:07 AM

Hi Eric,

Forwarding on the following from Joseph Ellaway from the PDBe team who
is working on this project (also copied).

"Dear Eric Marz,

Many thanks for the suggestion! We would definitely like to implement
some animations in order to highlight differences between clustered
chains. A linear interpolation would be a nice way of achieving this
goal, provided we ensure it does not get misinterpreted as “true” or
simulated molecular motion. If you have any suggestions on how we could
better visualise both the within- and between-cluster differences in 3D,
we’d love to hear them! We have been in contact with Steven Hayward, the
maintainer of DynDom, to explore ways of validly displaying molecular
motion given PDB models and David Sehnal, the developer of core Mol*.

Currently, we are working to enrich the now-released cluster annotations
with analysis on intra- and inter-cluster model variation. We’re also
working on improved ways to display the 2D data the current process
generates, a complement to overlaying information onto the 3D models.
Let us know if you have any further questions or suggestions.

Best,
Joseph Ellaway."

On 17/07/2023 16:38, Eric Martz via pdb-l wrote:

Dear David and team,

This is a fabulous new tool! Are you planning to incorporate an option
to do a linear interpolation morph, and display the animation? For
pairs of structures, I think that would greatly help to visualize
where and how the structures differ most and least. I realize that
implementing this in a general manner for all possible pairs may be
challenging. However, I think a straightforward basic implementation
would be illuminating in most cases.

( See https://proteopedia.org/w/Morphs )

Best regards, -Eric

http://Martz.MolviZ.Org

On 7/17/23 5:21 AM, David Armstrong via pdb-l wrote:

Dear pdb-l,

The PDBe team is excited to unveil a new process for identifying
protein conformational states across the Protein Data Bank (PDB)
archive. Using deterministic clustering, we have developed a data
pipeline that offers unprecedented insights into the structural
variability of proteins.

[snip]

The archive of messages, sent to pdb-l@lists.wwpdb.org, can be found at:
 https://lists.wwpdb.org/empathy/list/pdb-l.lists.wwpdb.org

To subscribe via email, send a message with subject or body
'subscribe' to:
 pdb-l-request@lists.wwpdb.org
and follow the instructions in the newly received email.

To unsubscribe via email, send a message with subject or body
'unsubscribe' to:
 pdb-l-request@lists.wwpdb.org
and follow the instructions in the newly received email.

--
David Armstrong
Outreach and Training Lead
PDBe
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD UK

Hi Eric, Forwarding on the following from Joseph Ellaway from the PDBe team who is working on this project (also copied). "Dear Eric Marz, Many thanks for the suggestion! We would definitely like to implement some animations in order to highlight differences between clustered chains. A linear interpolation would be a nice way of achieving this goal, provided we ensure it does not get misinterpreted as “true” or simulated molecular motion. If you have any suggestions on how we could better visualise both the within- and between-cluster differences in 3D, we’d love to hear them! We have been in contact with Steven Hayward, the maintainer of DynDom, to explore ways of validly displaying molecular motion given PDB models and David Sehnal, the developer of core Mol*. Currently, we are working to enrich the now-released cluster annotations with analysis on intra- and inter-cluster model variation. We’re also working on improved ways to display the 2D data the current process generates, a complement to overlaying information onto the 3D models. Let us know if you have any further questions or suggestions. Best, Joseph Ellaway." On 17/07/2023 16:38, Eric Martz via pdb-l wrote: > Dear David and team, > > This is a fabulous new tool! Are you planning to incorporate an option > to do a linear interpolation morph, and display the animation? For > pairs of structures, I think that would greatly help to visualize > where and how the structures differ most and least. I realize that > implementing this in a general manner for all possible pairs may be > challenging. However, I think a straightforward basic implementation > would be illuminating in most cases. > > ( See https://proteopedia.org/w/Morphs ) > > Best regards, -Eric > > http://Martz.MolviZ.Org > > On 7/17/23 5:21 AM, David Armstrong via pdb-l wrote: >> Dear pdb-l, >> >> The PDBe team is excited to unveil a new process for identifying >> protein conformational states across the Protein Data Bank (PDB) >> archive. Using deterministic clustering, we have developed a data >> pipeline that offers unprecedented insights into the structural >> variability of proteins. >> >> [snip] > The archive of messages, sent to pdb-l@lists.wwpdb.org, can be found at: >  https://lists.wwpdb.org/empathy/list/pdb-l.lists.wwpdb.org > > To subscribe via email, send a message with subject or body > 'subscribe' to: >  pdb-l-request@lists.wwpdb.org > and follow the instructions in the newly received email. > > To unsubscribe via email, send a message with subject or body > 'unsubscribe' to: >  pdb-l-request@lists.wwpdb.org > and follow the instructions in the newly received email. -- David Armstrong Outreach and Training Lead PDBe European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UK