Coordinate data: quality control, quality assurance
Status of best available mouse and hamster coordinates
A few residual issues for Official Swiss Mouse
Table of bad bonds in refined Swiss mouse
What does bond strain in prion core mean?
Amber vs Eng & Huber force fields: are structural issues real?
Off-site force fields: AMBER ... CHARMM...CHEAT95
3 Oct 97 Qa/Qc testing done by webmaster0. I ran both mouse and hamster structures against an absolutely astonishing online 3D quality-control checker provided by which found a number of anomalies in both structures [below]. Never, ever use a pdb structure without running this software first. It's a little unnerving to use because it swoops down onto your hard drive to grab the file, so mount a 20 meg RAM disk ($3 a meg) first and put the file there for security -- your Web browser and cache should be entirely on a RAM disk anyway for speed doubling.
1. The refined Swiss mouse 121-231 coordinates, 1AG2, were not released, as long promised, by Brookhaven yesterday. Instead, Mike Miley (Protein Data Bank Help Desk, ) writes that "1AG2 is undergoing final quality checks and will be loaded onto the database this Tues night. It should be available for official download on WED OCT 8, 1997 although the Swiss group has posted its official submission.
2. The initial unrefined UCSF hamster 90-231 coordinates, accession number unasigned, were submitted to PDP on 3 Oct 97 by Shauna Farr-Jones, who also maintains a Web page. PDP will take some unknown amount of time to scrutinize these before posting.
Shauna writes, "Ours is not a high res structure, so don't pay much attention to [precision in] the side chains. We are working on a high res structure now. If you average the [ensemble of 15 structures posted on my Web site] structures you will get a mess.... My 'personal favorites' are ensembles 6 and 7, these are the best energetically." I used ensemble #7 for all hamster graphics.
3. The refined higher-resolution UCSF hamster 90-231 coordinates are being worked on right now by the James group and could be ready in January 1998.
4. Roland Riek kindly advised us yesterday of the availability of refined mouse coordinates (today updated to the official submission to pdb!!! ) and a mouse-hamster superpositioning [based on ensemble 1 of hamster]. This group maintains and updates Web pages 1, 2.
Riek will probably recalculate the hamster-mouse superpositioning figure based on ensemble 6 or 7 or wait for Shauna's entry to clear pdb scrutiny. The RMSD between the two proteins for the residues used for superposition is currently 2.4A. This is a very important validation of both structures -- has to make you a believer if two labs get two close structures in two species.
Roland has no doubt corrected the oddities of tyrosine 226, the spelling of His as Hip, Cys as Cyx, and will find some necessary but missing hydrogen bonds for buried internal arg, lys, glu, asp, his, asn and gln. These polar side chains are basically never observed to lack a hydrogen-bonding partner.
5. Manuel Peitsch writes that he does not allow the use of SwissModel for threading unauthorized or unofficial pdb structures [because these can be full of contradictions???], but that there is usually overnight updating from PDB, ie, mark Thurs OCT 9, 1997 on your threading calendars.
6. So what happens up at the Biotech Collaboration's "Validation Suite for Protein Structures" when you run the very latest official Swiss mouse pdb and Farr-Jones-James-UCSF ensemble 7? The output will not demonstrate that there exist real problems with the hamster or mouse structures, sometimes the problem lies in the structure checking programs. Still, they are excellenty cautionary advisories as to where anomalies might occur.
Recall that this precious program returns:
Full geometric analysis by PROCHECK (postscript graphics: configure browser to view in PS2EPS+) Ramachandran plot Gly & Pro Ramachandran plot Plot of chi1 vs chi2 Main-chain properties Side-chain properties Residue properties Main-chain bond length distributions Main-chain bond angle distributions RMS distances from planarity Distorted geometry plots Residue-by-residue listing Residue information Main-chain bond lengths and bond angles Bad contacts listing Summary statistics and quality assessment Atomic volume analysis by SurVol (not working on day of test) Voronoi and radical planes methods to check for volume anomalies Lots of checks by WHATIF Verification of bond angles Verification of bond lengths Buried Hydrogen Bond Donor Check * Bump Check Verification of chain names Peptide Plane Flip Check Chiral Handedness Check HIS GLN ASN side chain conformation Check Nomenclature Check Check of side chain planarity Check of proline puckering Quality Check Side-chain Rotamer Check Symmetry Check Torsion angle Check Check of water clusters Check of atomic occupancy* It means that internal residues that are capable of hydrogen bonding, almost always are found to be hydrogen bonding, in 135 classical high-resolution structures:
"This check [BPOCHK by WHAT IF ] lists all the hydrogen bond donors that are not solvent accessible and nevertheless do not form a hydrogen bond to the protein itself. This situation occurs only seldomly in well-refined structures. For the side chains of charged residues this should not occur at all. ... Hydrogen bond donors and acceptors that are buried inside the protein normally form hydrogen bonds within the protein. If there are any non hydrogen bonded buried hydrogen bond donors/acceptors in the structure, they will be labelled here as 'NON-HBO'. The polar side chain atoms of ARG, LYS, GLU, ASP, HIS, ASN and GLN are, when they are buried, almost invariably involved in at least one hydrogen bond. If any of these atoms are listed, an investigation should be undertaken. They will be labelled 'MIS-HBO'. "
1. Absolute deviation of chi1 torsion angle from the "ideal". 2. Absolute deviation of omega torsion angle from the "ideal". 3. Absolute deviation of zeta "virtual" torsion angle (defined by the atoms Calpha-N-C-Cbeta) from the "ideal". 4. Absolute deviation of main-chain hydrogen bond energy from the "ideal". 5. B-value of gamma atom (O, C, or S - being whichever is used in definition of the chi1 torsion angle). 6. Average B-value of main-chain atoms. 7. Average B-value of side-chain atoms. 8. G-factors for phi-psi distributions. 9. G-factors for chi1-chi2 distributions. 10. Overall residue-by-residue G-factor. 11. Approximate accessibility, as estimated by each residue's Ooi number.
The shading behind the schematic picture gives an approximation to the residue accessibilities. The approximation is a fairly crude one, being based on each residue's Ooi number (Nishikawa & Ooi, 1986). An Ooi number is a count of the number of other Calpha atoms within a radius of, in this case, 14A of the given residue's own Calpha. Although crude, this does give a good impression of which parts of the structure are buried and which are exposed on the surface. Future versions of PROCHECK will include an accurate calculation of residue accessibility.
Mouse than two standard deviations from the norm, full documentation available (1, 2):
* Main-chain bond length distributions: no problems * Main-chain bond angle distributions: 3 bad CNCa and 1 CaCO * Main-chain bond angle distributions: 5 bad CbCaC, 2 NCaC * Gly & Pro Ramachandran phi-psi plot*: 1 bad proline, 4 bad glycines * Other AA Ramachandran plot of the phi-psi torsion angles: none; helix, beta-sheet angles acceptable * Plot of chi1 vs chi2 for all residue types w both torsion angles: 1 bad arg, 2 asp, 1 his, 1 ile, 2 leu * Side-chain properties: chi1 gauche minus torsion angles seem off * Deviation of chi1 torsion angle, omega torsion angle: problems at 134 and 139; ten others **. * Chi1-chi2 plots for lys, met, ophe, trp, tyr (have both torsion angles): 2 bad mets * Peptide bond planarity (ideal of 180 degrees): collectively a little off * Distorted geometry plots: trp 145 is a little bent out of shape, not planar * RMS distances from planarity for planar groups: aromatic rings (Phe, Tyr, Trp, His): 1 bad trp, 1 marginal tyr planar end-groups (Arg, Asn, Asp, Gln, Glu): none*Gly & Pro Ramachandran plot: data set of 163 non-homologous, high-resolution protein chains chosen from structures solved by X-ray crystallography to a resolution of 2.0A or better and an R-factor no greater than 20%. It is also possible to show all 20 individual Ramachandran plots, one for each residue type, by amending the appropriate parameter.
1 Oct 97 comments by webmaster
a. Nothing. Every protein has a few non-standard bonds.
b. Nothing. It is an experimental nmr artefact. It will disappear after more structure refinement.
c. Nothing. The protein was recombinant, lacks its CHO, was denatured in 8M GnHCl, isn't native.
d. This stretch of protein is genuinely under conformational strain, implying:
... 1. Nothing. This is just a fragment of the full protein, is conventional in hamster, or at full length.
... 2. Nothing. It is just under conformational strain, so what.
... 3. Nothing. It just misses its membrane association, another protein, a cofactor, a substrate.
... 4. This region -- and the bond strain -- play a key role in the energetics of the conformational switch from normal to rogue conformer, and from predominantly alpha to abundant beta. The fragment is an accident waiting to happen: a metastable state, a loaded mousetrap ready to be sprung, or at least be in equilibrium with rogue conformer, through binding to a tension-releasing chaparone, and to stably assume its low energy conformation when bound to a growing fiber already in the new conformation. This conformation might be assumed as the protein rebounds from denaturation during purification even though it is not the lowest energy state -- there is by no means kinetic time to explore configuration space. It is a little odd that a region implicated in rogue conformation beta should show up as the one region where the geometry-checker chokes. Let's see if the hamster is similar.
>Mus_mouse_long 124-226 Reference Sequence GLGGYMLGSAMSRPMIHFGNDWEDRYYRENMYRYPNQVYYRPVDQYSNQNN FVHDCVNITIKQHTVVTTTKGENFTETDVKMMERVVEQMCVTQYQKESQAYY
Bad structure values official mouse: None
Bad residue values official mouse:
ANGCHK: Bond angles by WHAT IF BMPCHK: Van der Waals overlap verification by WHAT IF HNQCHK: Side chain hydrogen bonding by WHAT IF QUACHK: Directional Atomic Contact Analysis by WHAT IF
Values are considered poor if they are larger than 3.00. Values are considered bad if they are larger than 5.00.
Values are considered poor if they are larger than 0.00. Values are considered bad if they are larger than 0.10.
Values are considered poor if they are less than -3.00. Values are considered bad if they are less than -5.00.
6 Oct 97 correspondenceAMBER Home Page - Assisted Model Building with Energy Refinement
"A few days ago, an output of WHATIF evaulating our NMR structure on the mouse prion protein (PDB entry 1AG2) was posted. After checking the structure we come to the conclusion that all these "short-comings" are due to the use of different force fields.
Tests based on the Eng & Huber data are becoming a standard; nonetheless, our structure has been refined using the AMBER force field. Below is a list of all energy contributions exceeding 2 kcal/mol for van der Waals, bonds, bond angles and dihedral angles. Obviously, the most constrained regions according to AMBER are different than those identified by WHATIF.
Finally, all peptide bonds are in a trans conformation (180 +- 20 deg.)"
Van der Waals energies > 2.0 kcal/mol TYR 128: OH - ASP- 178: OD1 3.39 ( 2.57 ) PRO 137: O - TYR 150: OH 2.21 ( 2.64 ) TYR 149: OH - ASP- 202: OD1 2.86 ( 2.60 ) ARG+ 156: NH1 - GLU- 196: OE1 2.45 ( 2.69 ) ASN 159: N - GLN 160: N 2.28 ( 2.69 ) ARG+ 164: NH2 - ASP- 178: OD2 2.13 ( 2.71 ) ASP- 178: N - CYSS 179: N 2.25 ( 2.69 ) VAL 180: N - ASN 181: N 2.12 ( 2.70 ) THR 190: N - THR 190: OG1 3.05 ( 2.58 ) THR 190: N - THR 191: N 2.08 ( 2.70 ) THR 199: OG1 - ASP- 202: OD2 2.92 ( 2.60 ) ARG+ 208: N - VAL 209: N 2.04 ( 2.71 ) VAL 215: O - GLN 219: N 3.15 ( 2.65 ) Bond energies > 2.0 kcal/mol Bond angle energies > 2.0 kcal/mol HIS 140: ND1 - CG - CD2 3.17 ( 107.81 120.00 ) HIS 140: CG - ND1 - CE1 3.15 ( 107.84 120.00 ) HIS 140: CG - CD2 - NE2 3.42 ( 107.33 120.00 ) HIS 140: ND1 - CE1 - NE2 2.03 ( 110.25 120.00 ) HIS 140: CD2 - NE2 - CE1 2.23 ( 106.76 117.00 ) ARG+ 151: CA - CB - CG 2.24 ( 123.05 109.50 ) HIS 177: ND1 - CG - CD2 3.16 ( 107.83 120.00 ) HIS 177: CG - ND1 - CE1 3.38 ( 107.40 120.00 ) HIS 177: CG - CD2 - NE2 3.27 ( 107.61 120.00 ) HIS 177: CD2 - NE2 - CE1 2.43 ( 106.32 117.00 ) HIS 187: ND1 - CG - CD2 3.49 ( 107.21 120.00 ) HIS 187: CG - ND1 - CE1 3.17 ( 107.81 120.00 ) HIS 187: CG - CD2 - NE2 3.13 ( 107.88 120.00 ) HIS 187: CD2 - NE2 - CE1 2.36 ( 106.49 117.00 ) Dihedral angle energies > 2.0 kcal/mol GLN 172: CG - CD - NE2 - HE22 2.05 ( 140.14 180.00 2 ) GLY 126: N - CA - C - N 2.43 ( 288.32 180.00 2 ) MET 129: N - CA - C - N 2.47 ( 107.07 180.00 2 ) LEU 130: N - CA - C - N 2.67 ( 96.53 180.00 2 ) HIS 140: N - CA - C - N 2.44 ( 108.13 180.00 2 ) ASN 143: OD1 - CG - ND2 - HD22 4.00 ( 2.59 0.00 1 ) ASN 153: OD1 - CG - ND2 - HD22 3.99 ( 4.21 0.00 1 ) ASN 159: OD1 - CG - ND2 - HD22 3.96 ( 10.78 0.00 1 ) GLN 160: OE1 - CD - NE2 - HE22 3.88 ( 339.80 0.00 1 ) VAL 166: N - CA - C - N 2.67 ( 83.83 180.00 2 ) GLN 168: OE1 - CD - NE2 - HE22 3.97 ( 349.45 0.00 1 ) SER 170: N - CA - C - N 2.57 ( 257.38 180.00 2 ) ASN 171: OD1 - CG - ND2 - HD22 3.97 ( 349.99 0.00 1 ) GLN 172: OE1 - CD - NE2 - HE22 3.76 ( 331.62 0.00 1 ) ASN 173: OD1 - CG - ND2 - HD22 4.00 ( 358.06 0.00 1 ) ASN 174: OD1 - CG - ND2 - HD22 3.79 ( 333.73 0.00 1 ) CYSS 179: N - CA - C - N 2.37 ( 290.39 180.00 2 ) ASN 181: N - CA - C - N 2.66 ( 276.84 180.00 2 ) ASN 181: OD1 - CG - ND2 - HD22 3.99 ( 4.98 0.00 1 ) GLN 186: OE1 - CD - NE2 - HE22 4.00 ( 2.25 0.00 1 ) THR 188: N - CA - C - N 2.37 ( 290.61 180.00 2 ) ASN 197: OD1 - CG - ND2 - HD22 4.00 ( 0.94 0.00 1 ) GLN 212: N - CA - C - N 2.21 ( 295.22 180.00 2 ) GLN 212: OE1 - CD - NE2 - HE22 3.99 ( 355.53 0.00 1 ) GLN 217: N - CA - C - N 2.59 ( 281.58 180.00 2 ) GLN 217: OE1 - CD - NE2 - HE22 4.00 ( 3.16 0.00 1 ) GLN 219: OE1 - CD - NE2 - HE22 4.00 ( 359.81 0.00 1 ) GLN 223: OE1 - CD - NE2 - HE22 3.98 ( 7.29 0.00 1 )