Mad Cow Home or Best Links

Coordinate data: quality control, quality assurance
Status of best available mouse and hamster coordinates
A few residual issues for Official Swiss Mouse
Table of bad bonds in refined Swiss mouse
What does bond strain in prion core mean?
Amber vs Eng & Huber force fields: are structural issues real?
Off-site force fields: AMBER ... CHARMM...CHEAT95

Coordinate data: QC, QA

3 Oct 97  Qa/Qc testing done by webmaster
0. I ran both mouse and hamster structures against an absolutely astonishing online 3D quality-control checker provided by which found a number of anomalies in both structures [below]. Never, ever use a pdb structure without running this software first. It's a little unnerving to use because it swoops down onto your hard drive to grab the file, so mount a 20 meg RAM disk ($3 a meg) first and put the file there for security -- your Web browser and cache should be entirely on a RAM disk anyway for speed doubling.

1. The refined Swiss mouse 121-231 coordinates, 1AG2, were not released, as long promised, by Brookhaven yesterday. Instead, Mike Miley (Protein Data Bank Help Desk, ) writes that "1AG2 is undergoing final quality checks and will be loaded onto the database this Tues night. It should be available for official download on WED OCT 8, 1997 although the Swiss group has posted its official submission.

2. The initial unrefined UCSF hamster 90-231 coordinates, accession number unasigned, were submitted to PDP on 3 Oct 97 by Shauna Farr-Jones, who also maintains a Web page. PDP will take some unknown amount of time to scrutinize these before posting.

Shauna writes, "Ours is not a high res structure, so don't pay much attention to [precision in] the side chains. We are working on a high res structure now. If you average the [ensemble of 15 structures posted on my Web site] structures you will get a mess.... My 'personal favorites' are ensembles 6 and 7, these are the best energetically." I used ensemble #7 for all hamster graphics.

3. The refined higher-resolution UCSF hamster 90-231 coordinates are being worked on right now by the James group and could be ready in January 1998.

4. Roland Riek kindly advised us yesterday of the availability of refined mouse coordinates (today updated to the official submission to pdb!!! ) and a mouse-hamster superpositioning [based on ensemble 1 of hamster]. This group maintains and updates Web pages 1, 2.

Riek will probably recalculate the hamster-mouse superpositioning figure based on ensemble 6 or 7 or wait for Shauna's entry to clear pdb scrutiny. The RMSD between the two proteins for the residues used for superposition is currently 2.4A. This is a very important validation of both structures -- has to make you a believer if two labs get two close structures in two species.

Roland has no doubt corrected the oddities of tyrosine 226, the spelling of His as Hip, Cys as Cyx, and will find some necessary but missing hydrogen bonds for buried internal arg, lys, glu, asp, his, asn and gln. These polar side chains are basically never observed to lack a hydrogen-bonding partner.

5. Manuel Peitsch writes that he does not allow the use of SwissModel for threading unauthorized or unofficial pdb structures [because these can be full of contradictions???], but that there is usually overnight updating from PDB, ie, mark Thurs OCT 9, 1997 on your threading calendars.

6. So what happens up at the Biotech Collaboration's "Validation Suite for Protein Structures" when you run the very latest official Swiss mouse pdb and Farr-Jones-James-UCSF ensemble 7? The output will not demonstrate that there exist real problems with the hamster or mouse structures, sometimes the problem lies in the structure checking programs. Still, they are excellenty cautionary advisories as to where anomalies might occur.

Recall that this precious program returns:

 Full geometric analysis by PROCHECK 
    (postscript graphics: configure browser to view in PS2EPS+)
              Ramachandran plot 
              Gly & Pro Ramachandran plot 
              Plot of chi1 vs chi2 
              Main-chain properties 
              Side-chain properties 
              Residue properties 
              Main-chain bond length distributions 
              Main-chain bond angle distributions 
              RMS distances from planarity 
              Distorted geometry plots 
              Residue-by-residue listing 
              Residue information 
              Main-chain bond lengths and bond angles 
              Bad contacts listing 
              Summary statistics and quality assessment 

Atomic volume analysis by SurVol (not working on day of test)
              Voronoi and radical planes methods to check for volume anomalies

Lots of checks by WHATIF
       Verification of bond angles 
       Verification of bond lengths 
       Buried Hydrogen Bond Donor Check *
       Bump Check 
       Verification of chain names 
       Peptide Plane Flip Check 
       Chiral Handedness Check 
       HIS GLN ASN side chain conformation Check 
       Nomenclature Check 
       Check of side chain planarity 
       Check of proline puckering 
       Quality Check 
       Side-chain Rotamer Check 
       Symmetry Check 
       Torsion angle Check 
       Check of water clusters 
       Check of atomic occupancy 
* It means that internal residues that are capable of hydrogen bonding, almost always are found to be hydrogen bonding, in 135 classical high-resolution structures:
"This check [BPOCHK by WHAT IF ] lists all the hydrogen bond donors that are not solvent accessible and nevertheless do not form a hydrogen bond to the protein itself. This situation occurs only seldomly in well-refined structures. For the side chains of charged residues this should not occur at all. ... Hydrogen bond donors and acceptors that are buried inside the protein normally form hydrogen bonds within the protein. If there are any non hydrogen bonded buried hydrogen bond donors/acceptors in the structure, they will be labelled here as 'NON-HBO'. The polar side chain atoms of ARG, LYS, GLU, ASP, HIS, ASN and GLN are, when they are buried, almost invariably involved in at least one hydrogen bond. If any of these atoms are listed, an investigation should be undertaken. They will be labelled 'MIS-HBO'. "

A few residual issues for Official Swiss Mouse

Thenew Swiss mouse pdb has many small improvements and shows few residual internal inconsistencies. A few hypothetical problems were detected at 11 residues within a 24 residue stretch 125-149 LGGYMLGSAMSRPMIHFGNDWEDRY; the checking program thinks clustering affirms single residue conclusions. The program found one weird bond angle, two van der Waal interferences, one peptide bond perhaps flipped, the rest obscure. Again, these aren't necessarily problems in the structure, just issues to be aware of. The problems are summarized in a graphic:

Caption to Plot 6. Residue properties

The various graphs and diagrams on this plot illustrate different properties of the residues in the structure:
1. Absolute deviation of chi1 torsion angle from the "ideal". 
2. Absolute deviation of omega torsion angle from the "ideal". 
3. Absolute deviation of zeta "virtual" torsion angle (defined by the atoms Calpha-N-C-Cbeta) from the "ideal". 
4. Absolute deviation of main-chain hydrogen bond energy from the "ideal". 
5. B-value of gamma atom (O, C, or S - being whichever is used in definition of the chi1 torsion angle). 
6. Average B-value of main-chain atoms. 
7. Average B-value of side-chain atoms. 
8. G-factors for phi-psi distributions. 
9. G-factors for chi1-chi2 distributions. 
10. Overall residue-by-residue G-factor. 
11. Approximate accessibility, as estimated by each residue's Ooi number. 

Secondary structure & estimated accessibility

Below the three main graphs is a schematic picture of the protein's secondary structure, as defined using the Kabsch & Sander (1983) assignments. The key just below the picture shows which structure is which. Beta strands are taken to include all residues with a Kabsch & Sander assignment of beta (E), helices corresponds to both helix (H) and G assignments, while everything else is taken to be random coil.

The shading behind the schematic picture gives an approximation to the residue accessibilities. The approximation is a fairly crude one, being based on each residue's Ooi number (Nishikawa & Ooi, 1986). An Ooi number is a count of the number of other Calpha atoms within a radius of, in this case, 14A of the given residue's own Calpha. Although crude, this does give a good impression of which parts of the structure are buried and which are exposed on the surface. Future versions of PROCHECK will include an accurate calculation of residue accessibility.

Sequence & Ramachandran regions

The next section shows the sequence of the structure (using the 20 standard amino-acid codes) and a set of markers that identify the region of the Ramachandran plot in which each residue is located in Plot 1 above. There are four marker types, one for each of the four different types of region, and the key explains which is which.

Max. deviation

The small histogram of asterisks and plus-signs shows each residue's "maximum deviation" from one of the ideal values. The asterisk scores are the same as those on the .out listing, and in fact correspond to the final column of that listing. Refer to the .out file to see which is the parameter that deviates by the amount shown here. (See also Part 1 of Appendix E).


The final part of the plot shows a shaded chequer-board of the G-factors for various residue properties (the PROCHECKer board). The darker the shading the more unusual the value of that property. Where several of a residue's properties are unusual, the overall G-factor for that residue wil reflect this and identify residues that may need closer scrutiny. Each G-factor is a measure of the 'normality' of a particular property. For the dihedral angle G-factors, G(dih), the standard distribution of each property, for each residue type, has been obtained from a non-homologous, high-resolution data set. For the main-chain bond lengths and bond angles, the Engh & Huber (1991) small-molecule means and standard deviations are used.

Mouse than two standard deviations from the norm, full documentation available (1, 2):

* Main-chain bond length distributions: no problems
* Main-chain bond angle distributions:  3 bad CNCa and 1 CaCO
* Main-chain bond angle distributions: 5 bad CbCaC, 2 NCaC
* Gly & Pro Ramachandran phi-psi plot*: 1 bad proline, 4 bad glycines
* Other AA Ramachandran plot of the phi-psi torsion angles: none; helix, beta-sheet angles acceptable
* Plot of chi1 vs chi2 for all residue types w both torsion angles: 1 bad arg, 2 asp,  1 his, 1 ile, 2 leu
* Side-chain properties: chi1 gauche minus torsion angles seem off
* Deviation of chi1 torsion angle,  omega torsion angle:  problems at 134 and 139; ten others **.
* Chi1-chi2 plots for lys, met, ophe, trp, tyr (have both torsion angles): 2 bad mets
* Peptide bond planarity (ideal of 180 degrees): collectively a little off 
* Distorted geometry plots: trp 145 is a little bent out of shape, not planar
* RMS distances from planarity for  planar groups: 
      aromatic rings (Phe, Tyr, Trp, His): 1 bad trp, 1 marginal tyr 
      planar end-groups (Arg, Asn, Asp, Gln, Glu): none
*Gly & Pro Ramachandran plot: data set of 163 non-homologous, high-resolution protein chains chosen from structures solved by X-ray crystallography to a resolution of 2.0A or better and an R-factor no greater than 20%. It is also possible to show all 20 individual Ramachandran plots, one for each residue type, by amending the appropriate parameter.

What does it mean when a protein shows bond strain?

1 Oct 97 comments by webmaster
In mouse prion, a few hypothetical problems were detected at 11 residues clustered within a 24 residue stretch 125-149 LGGYMLGSAMSRPMIHFGNDWEDRY; the checking program has been run on 6,000 different proteins and its authors have concluded that clustering of problems strongly affirms conclusions about single residues. So what does this mean form the prion protein? The options:

a. Nothing. Every protein has a few non-standard bonds.

b. Nothing. It is an experimental nmr artefact. It will disappear after more structure refinement.

c. Nothing. The protein was recombinant, lacks its CHO, was denatured in 8M GnHCl, isn't native.

d. This stretch of protein is genuinely under conformational strain, implying:

... 1. Nothing. This is just a fragment of the full protein, is conventional in hamster, or at full length.

... 2. Nothing. It is just under conformational strain, so what.

... 3. Nothing. It just misses its membrane association, another protein, a cofactor, a substrate.

... 4. This region -- and the bond strain -- play a key role in the energetics of the conformational switch from normal to rogue conformer, and from predominantly alpha to abundant beta. The fragment is an accident waiting to happen: a metastable state, a loaded mousetrap ready to be sprung, or at least be in equilibrium with rogue conformer, through binding to a tension-releasing chaparone, and to stably assume its low energy conformation when bound to a growing fiber already in the new conformation. This conformation might be assumed as the protein rebounds from denaturation during purification even though it is not the lowest energy state -- there is by no means kinetic time to explore configuration space. It is a little odd that a region implicated in rogue conformation beta should show up as the one region where the geometry-checker chokes. Let's see if the hamster is similar.

>Mus_mouse_long 124-226 Reference Sequence

Bad bonds in mouse?

Bad atom values official mouse: None

Bad structure values official mouse: None

Bad residue values official mouse:

       ANGCHK: Bond angles by WHAT IF 
       BMPCHK: Van der Waals overlap verification by WHAT IF 
       HNQCHK: Side chain hydrogen bonding by WHAT IF 
       QUACHK: Directional Atomic Contact Analysis by WHAT IF 












Bond angles

The bond angles in all protein residues were compared with standard bond angles taken from Engh and Huber [REF], for DNA/RNA residues with standard values taken from Parkinson et al [REF]. The table below gives for each residue the highest absolute deviation (max|Z-score|) from the ideal values.
Values are considered poor if they are larger than 3.00.
Values are considered bad if they are larger than 5.00.

Van der Waals overlap verification

The contact distances of all atom pairs have been checked. Two atoms are said to `bump' if they are closer than the sum of their Van der Waals radii minus 0.40 Angstrom. For hydrogen bonded pairs a tolerance of 0.55 Angstrom is used. To get a per-residue score all bumps made by the atoms in the residue are added together. A really good structure does not have any non-zero values in this table.
Values are considered poor if they are larger than 0.00.
Values are considered bad if they are larger than 0.10.

Side chain hydrogen bonding

Residues of these types that were found to hydrogen bond better if flipped by 180 degrees around their last chi angle are marked in this table.

Directional Atomic Contact Analysis

Residues marked bad are 1) incorrectly modelled, or 2) involved in ligand binding, or 3) involved in crystal contacts while the crystal partner is not available during the analysis, or 4) part of the active site. Residues marked poor are simply poor and perhaps need human intervention. A series of poor and/or bad residues in a row could indicate misthreading.
Values are considered poor if they are less than -3.00.
Values are considered bad if they are less than -5.00.

Amber versus Eng and Huber nmr force fields

6 Oct 97 correspondence
AMBER Home Page - Assisted Model Building with Energy Refinement
Amber Intro
Structure Calculations ... Spectral Assignments ... Structure Assessments
NMR Software

Dr. Martin Billeter and Roland Riek comment:

"A few days ago, an output of WHATIF evaulating our NMR structure on the mouse prion protein (PDB entry 1AG2) was posted. After checking the structure we come to the conclusion that all these "short-comings" are due to the use of different force fields.

Tests based on the Eng & Huber data are becoming a standard; nonetheless, our structure has been refined using the AMBER force field. Below is a list of all energy contributions exceeding 2 kcal/mol for van der Waals, bonds, bond angles and dihedral angles. Obviously, the most constrained regions according to AMBER are different than those identified by WHATIF.

Finally, all peptide bonds are in a trans conformation (180 +- 20 deg.)"

      Van der Waals energies >   2.0 kcal/mol
        TYR   128: OH   - ASP-  178: OD1        3.39    ( 2.57 )
        PRO   137: O    - TYR   150: OH         2.21    ( 2.64 )
        TYR   149: OH   - ASP-  202: OD1        2.86    ( 2.60 )
        ARG+  156: NH1  - GLU-  196: OE1        2.45    ( 2.69 )
        ASN   159: N    - GLN   160: N          2.28    ( 2.69 )
        ARG+  164: NH2  - ASP-  178: OD2        2.13    ( 2.71 )
        ASP-  178: N    - CYSS  179: N          2.25    ( 2.69 )
        VAL   180: N    - ASN   181: N          2.12    ( 2.70 )
        THR   190: N    - THR   190: OG1        3.05    ( 2.58 )
        THR   190: N    - THR   191: N          2.08    ( 2.70 )
        THR   199: OG1  - ASP-  202: OD2        2.92    ( 2.60 )
        ARG+  208: N    - VAL   209: N          2.04    ( 2.71 )
        VAL   215: O    - GLN   219: N          3.15    ( 2.65 )

      Bond energies >   2.0 kcal/mol

      Bond angle energies >   2.0 kcal/mol
        HIS   140: ND1  - CG   - CD2        3.17    ( 107.81 120.00 )
        HIS   140: CG   - ND1  - CE1        3.15    ( 107.84 120.00 )
        HIS   140: CG   - CD2  - NE2        3.42    ( 107.33 120.00 )
        HIS   140: ND1  - CE1  - NE2        2.03    ( 110.25 120.00 )
        HIS   140: CD2  - NE2  - CE1        2.23    ( 106.76 117.00 )
        ARG+  151: CA   - CB   - CG         2.24    ( 123.05 109.50 )
        HIS   177: ND1  - CG   - CD2        3.16    ( 107.83 120.00 )
        HIS   177: CG   - ND1  - CE1        3.38    ( 107.40 120.00 )
        HIS   177: CG   - CD2  - NE2        3.27    ( 107.61 120.00 )
        HIS   177: CD2  - NE2  - CE1        2.43    ( 106.32 117.00 )
        HIS   187: ND1  - CG   - CD2        3.49    ( 107.21 120.00 )
        HIS   187: CG   - ND1  - CE1        3.17    ( 107.81 120.00 )
        HIS   187: CG   - CD2  - NE2        3.13    ( 107.88 120.00 )
        HIS   187: CD2  - NE2  - CE1        2.36    ( 106.49 117.00 )

      Dihedral angle energies >   2.0 kcal/mol
        GLN   172: CG   - CD   - NE2  - HE22       2.05    ( 140.14  180.00  2 )
        GLY   126: N    - CA   - C    - N          2.43    ( 288.32  180.00  2 )
        MET   129: N    - CA   - C    - N          2.47    ( 107.07  180.00  2 )
        LEU   130: N    - CA   - C    - N          2.67    (  96.53  180.00  2 )
        HIS   140: N    - CA   - C    - N          2.44    ( 108.13  180.00  2 )
        ASN   143: OD1  - CG   - ND2  - HD22       4.00    (   2.59  0.00  1 )
        ASN   153: OD1  - CG   - ND2  - HD22       3.99    (   4.21  0.00  1 )
        ASN   159: OD1  - CG   - ND2  - HD22       3.96    (  10.78  0.00  1 )
        GLN   160: OE1  - CD   - NE2  - HE22       3.88    ( 339.80  0.00  1 )
        VAL   166: N    - CA   - C    - N          2.67    (  83.83  180.00  2 )
        GLN   168: OE1  - CD   - NE2  - HE22       3.97    ( 349.45  0.00  1 )
        SER   170: N    - CA   - C    - N          2.57    ( 257.38  180.00  2 )
        ASN   171: OD1  - CG   - ND2  - HD22       3.97    ( 349.99  0.00  1 )
        GLN   172: OE1  - CD   - NE2  - HE22       3.76    ( 331.62  0.00  1 )
        ASN   173: OD1  - CG   - ND2  - HD22       4.00    ( 358.06  0.00  1 )
        ASN   174: OD1  - CG   - ND2  - HD22       3.79    ( 333.73  0.00  1 )
        CYSS  179: N    - CA   - C    - N          2.37    ( 290.39  180.00  2 )
        ASN   181: N    - CA   - C    - N          2.66    ( 276.84  180.00  2 )
        ASN   181: OD1  - CG   - ND2  - HD22       3.99    (   4.98  0.00  1 )
        GLN   186: OE1  - CD   - NE2  - HE22       4.00    (   2.25  0.00  1 )
        THR   188: N    - CA   - C    - N          2.37    ( 290.61  180.00  2 )
        ASN   197: OD1  - CG   - ND2  - HD22       4.00    (   0.94  0.00  1 )
        GLN   212: N    - CA   - C    - N          2.21    ( 295.22  180.00  2 )
        GLN   212: OE1  - CD   - NE2  - HE22       3.99    ( 355.53  0.00  1 )
        GLN   217: N    - CA   - C    - N          2.59    ( 281.58  180.00  2 )
        GLN   217: OE1  - CD   - NE2  - HE22       4.00    (   3.16  0.00  1 )
        GLN   219: OE1  - CD   - NE2  - HE22       4.00    ( 359.81  0.00  1 )
        GLN   223: OE1  - CD   - NE2  - HE22       3.98    (   7.29  0.00  1 )

Mad Cow Home or Best Links