Empathy List Archives

pdb-l@lists.wwpdb.org

The PDB mailing list

Correct ligand ID for partially modeled ligands?

Edward A. Berry

Mon, Sep 15, 2025 4:19 AM

A counter-example would be PEG polymers. Since the preparations contain a distribution of chain lengths, you don't know the exact length of the molecule, and what you observe is usually much shorter. You don't know which specific atoms of PEG-4K contribute to the three or four polymer units you see in the density, in fact it may be promiscuous. In that case I think it is reasonable to model it as PEG, PG4, PG6 etc.

When building a structure with a non-protein ligand based on electron density, is it recommended to use the ligand corresponding to the actual molecule, even if many atoms are missing due to disorder, and a smaller analog contains all the atoms modeled? Example in question, ubiquinone. Most of the 50-atom "tail" tends to be disordered, with the head group and 1 or two isoprenyl units of the tail visible. Ligand structures are defined for UQ, UQ6, UQ2, and UQ1. If it is the endogenous ubiquinone, or if it is a pure quinone that you have added, you should know what the actual molecule is. But is it acceptable to pick a smaller analog such as UQ1 or UQ2 that contains all the atoms you can model? A counter-example would be PEG polymers. Since the preparations contain a distribution of chain lengths, you don't know the exact length of the molecule, and what you observe is usually much shorter. You don't know which specific atoms of PEG-4K contribute to the three or four polymer units you see in the density, in fact it may be promiscuous. In that case I think it is reasonable to model it as PEG, PG4, PG6 etc.

Bernhard Rupp

Mon, Sep 15, 2025 9:35 AM

Hi Fellows,

For biological macromolecules, PDB policy is to include in the seqres
everything that was (supposedly) crystallized, i.e. the construct sequence,
and you can more or less omit in modelling what you cannot see. This
practice does in fact omit some information and we have discussed this:
https://journals.iucr.org/j/issues/2025/02/00/gj5316/index.html

In a similar fashion, the various HET records for the ligand and its
restraint file refer to the entire co-crystallized ligand, and it would be
tempting to apply the same philosophy of simply omitting the
density-invisible parts. However, in contrast to say a affinity
tag vanishing into the great solvent void, a ligand is often the actual
part of interest in a model, and the fact that/which parts don't bind well
does carry potentially useful information. Of course, great is the
temptation to 'improve' the real space metrics of the ligand (RSCC etc) by
modelling only the 'good' parts (your bias will probably determine what is
considered 'good'). Note also that the RSCC is insensitive to occupancy: A
small sphere correlates perfectly with a big sphere at the same location...

In the above reference Fig 5 we also suggest that ligands that are really
bound can reasonably be modelled/refined in ensembles, while ligands that
are, charitably speaking, poorly supported by density, sometimes even
depart instead of showing the brush-like ensemble of the 'invisible'
or uncertain part of a 'good' ligand (cf the phosphate group of the
ligand). The ligand RSCC of 0.98 for the entire ligand would suggest
near-perfection but the dynamic ER model 'fuzz' corresponding to the
higher B-factors of the static model illustrates that there are still
different local dynamics to consider (here even at 1 A resolution)

The PEG issue is another one - difficult to argue that the same molecule,
only because it is modelled incomplete - should have a different
restraint/HET/nomenclature file. We discussed that a while ago, 3.6.5. PEG
fragments and Table 4 in
https://journals.iucr.org/d/issues/2016/12/00/rr5136/index.html

Summary: I would model the complete and chemically correct, co-crystallized
ligand, and look at the Ensemble to get some idea what the density-poor
parts might do.

Best, BR

3.6.5. PEG fragments ,

On Mon, Sep 15, 2025 at 6:20 AM Edward A. Berry via pdb-l <
pdb-l@lists.wwpdb.org> wrote:

A counter-example would be PEG polymers. Since the preparations contain a
distribution of chain lengths, you don't know the exact length of the
molecule, and what you observe is usually much shorter. You don't know
which specific atoms of PEG-4K contribute to the three or four polymer
units you see in the density, in fact it may be promiscuous. In that case I
think it is reasonable to model it as PEG, PG4, PG6 etc.
The archive of messages, sent to pdb-l@lists.wwpdb.org, can be found at:
https://lists.wwpdb.org/empathy/list/pdb-l.lists.wwpdb.org

To subscribe via email, send a message with subject or body 'subscribe' to:
pdb-l-request@lists.wwpdb.org
and follow the instructions in the newly received email.

To unsubscribe via email, send a message with subject or body
'unsubscribe' to:
pdb-l-request@lists.wwpdb.org
and follow the instructions in the newly received email.

Hi Fellows, For biological macromolecules, PDB policy is to include in the seqres everything that was (supposedly) crystallized, i.e. the construct sequence, and you can more or less omit in modelling what you cannot see. This practice does in fact omit some information and we have discussed this: https://journals.iucr.org/j/issues/2025/02/00/gj5316/index.html In a similar fashion, the various HET records for the ligand and its restraint file refer to the entire co-crystallized ligand, and it would be tempting to apply the same philosophy of simply omitting the density-invisible parts. However, in contrast to say a affinity tag vanishing into the great solvent void, a ligand is often the actual part of interest in a model, and the fact that/which parts don't bind well does carry potentially useful information. Of course, great is the temptation to 'improve' the real space metrics of the ligand (RSCC etc) by modelling only the 'good' parts (your bias will probably determine what is considered 'good'). Note also that the RSCC is insensitive to occupancy: A small sphere correlates perfectly with a big sphere at the same location... In the above reference Fig 5 we also suggest that ligands that are really bound can reasonably be modelled/refined in ensembles, while ligands that are, charitably speaking, poorly supported by density, sometimes even depart instead of showing the brush-like ensemble of the 'invisible' or uncertain part of a 'good' ligand (cf the phosphate group of the ligand). The ligand RSCC of 0.98 for the entire ligand would suggest near-perfection but the dynamic ER model 'fuzz' corresponding to the higher B-factors of the static model illustrates that there are still different local dynamics to consider (here even at 1 A resolution) The PEG issue is another one - difficult to argue that the same molecule, only because it is modelled incomplete - should have a different restraint/HET/nomenclature file. We discussed that a while ago, 3.6.5. PEG fragments and Table 4 in https://journals.iucr.org/d/issues/2016/12/00/rr5136/index.html Summary: I would model the complete and chemically correct, co-crystallized ligand, and look at the Ensemble to get some idea what the density-poor parts might do. Best, BR 3.6.5. PEG fragments , On Mon, Sep 15, 2025 at 6:20 AM Edward A. Berry via pdb-l < pdb-l@lists.wwpdb.org> wrote: > When building a structure with a non-protein ligand based on electron > density, is it recommended to use the ligand corresponding to the actual > molecule, even if many atoms are missing due to disorder, and a smaller > analog contains all the atoms modeled? Example in question, ubiquinone. > Most of the 50-atom "tail" tends to be disordered, with the head group and > 1 or two isoprenyl units of the tail visible. Ligand structures are defined > for UQ, UQ6, UQ2, and UQ1. If it is the endogenous ubiquinone, or if it is > a pure quinone that you have added, you should know what the actual > molecule is. But is it acceptable to pick a smaller analog such as UQ1 or > UQ2 that contains all the atoms you can model? > > A counter-example would be PEG polymers. Since the preparations contain a > distribution of chain lengths, you don't know the exact length of the > molecule, and what you observe is usually much shorter. You don't know > which specific atoms of PEG-4K contribute to the three or four polymer > units you see in the density, in fact it may be promiscuous. In that case I > think it is reasonable to model it as PEG, PG4, PG6 etc. > The archive of messages, sent to pdb-l@lists.wwpdb.org, can be found at: > https://lists.wwpdb.org/empathy/list/pdb-l.lists.wwpdb.org > > To subscribe via email, send a message with subject or body 'subscribe' to: > pdb-l-request@lists.wwpdb.org > and follow the instructions in the newly received email. > > To unsubscribe via email, send a message with subject or body > 'unsubscribe' to: > pdb-l-request@lists.wwpdb.org > and follow the instructions in the newly received email. >