Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Statistical potential

From Wikipedia, the free encyclopedia
Example ofinteratomic pseudopotential, between β-carbons of isoleucine and valine residues, generated by usingMyPMFs.[1]

Inprotein structure prediction,statistical potentials orknowledge-based potentials arescoring functions derived from an analysis of knownprotein structures in theProtein Data Bank (PDB).

The original method to obtain such potentials is thequasi-chemical approximation, due to Miyazawa and Jernigan.[2] It was later followed by thepotential of mean force (statistical PMF[Note 1]), developed by Sippl.[3] Although the obtained scores are often considered as approximations of thefree energy—thus referred to aspseudo-energies—this physical interpretation is incorrect.[4][5] Nonetheless, they are applied with success in many cases, because they frequently correlate with actualGibbs free energy differences.[6]

Overview

[edit]

Possible features to which a pseudo-energy can be assigned include:

The classic application is, however, based on pairwiseamino acid contacts or distances, thus producing statisticalinteratomic potentials. For pairwise amino acid contacts, a statistical potential is formulated as aninteraction matrix that assigns a weight orenergy value to each possible pair ofstandard amino acids. The energy of a particular structural model is then the combined energy of all pairwise contacts (defined as two amino acids within a certain distance of each other) in the structure. The energies are determined using statistics on amino acid contacts in a database of known protein structures (obtained from thePDB).

History

[edit]

Initial development

[edit]

Many textbooks present the statistical PMFs as proposed by Sippl[3] as a simple consequence of theBoltzmann distribution, as applied to pairwise distances between amino acids. This is incorrect, but a useful start to introduce the construction of the potential in practice.The Boltzmann distribution applied to a specific pair of amino acids,is given by:

P(r)=1ZeF(r)kT{\displaystyle P\left(r\right)={\frac {1}{Z}}e^{-{\frac {F\left(r\right)}{kT}}}}

wherer{\displaystyle r} is the distance,k{\displaystyle k} is theBoltzmann constant,T{\displaystyle T} isthe temperature andZ{\displaystyle Z} is thepartition function, with

Z=eF(r)kTdr{\displaystyle Z=\int e^{-{\frac {F(r)}{kT}}}dr}

The quantityF(r){\displaystyle F(r)} is the free energy assigned to the pairwise system.Simple rearrangement results in theinverse Boltzmann formula,which expresses the free energyF(r){\displaystyle F(r)} as a function ofP(r){\displaystyle P(r)}:

F(r)=kTlnP(r)kTlnZ{\displaystyle F\left(r\right)=-kT\ln P\left(r\right)-kT\ln Z}

To construct a PMF, one then introduces a so-calledreference state with a corresponding distributionQR{\displaystyle Q_{R}} and partition functionZR{\displaystyle Z_{R}}, and calculates the following free energy difference:

ΔF(r)=kTlnP(r)QR(r)kTlnZZR{\displaystyle \Delta F\left(r\right)=-kT\ln {\frac {P\left(r\right)}{Q_{R}\left(r\right)}}-kT\ln {\frac {Z}{Z_{R}}}}

The reference state typically results from a hypotheticalsystem in which the specific interactions between the amino acidsare absent. The second term involvingZ{\displaystyle Z} andZR{\displaystyle Z_{R}} can be ignored, as it is a constant.

In practice,P(r){\displaystyle P(r)} is estimated from the database of known proteinstructures, whileQR(r){\displaystyle Q_{R}(r)} typically results from calculationsor simulations. For example,P(r){\displaystyle P(r)} could be the conditional probabilityof finding theCβ{\displaystyle C\beta } atoms of a valine and a serine at a givendistancer{\displaystyle r} from each other, giving rise to the free energy differenceΔF{\displaystyle \Delta F}. The total free energy difference of a protein,ΔFT{\displaystyle \Delta F_{\textrm {T}}}, is then claimed to be the sumof all the pairwise free energies:

ΔFT=i<jΔF(rijai,aj)=kTi<jlnP(rijai,aj)QR(rijai,aj){\displaystyle \Delta F_{\textrm {T}}=\sum _{i<j}\Delta F(r_{ij}\mid a_{i},a_{j})=-kT\sum _{i<j}\ln {\frac {P\left(r_{ij}\mid a_{i},a_{j}\right)}{Q_{R}\left(r_{ij}\mid a_{i},a_{j}\right)}}}

where the sum runs over all amino acid pairsai,aj{\displaystyle a_{i},a_{j}}(withi<j{\displaystyle i<j}) andrij{\displaystyle r_{ij}} is their corresponding distance. In many studiesQR{\displaystyle Q_{R}} does not depend on theamino acid sequence.[7]

Conceptual issues

[edit]

Intuitively, it is clear that a low value forΔFT{\displaystyle \Delta F_{\textrm {T}}} indicatesthat the set of distances in a structure is more likely in proteins thanin the reference state. However, the physical meaning of these statistical PMFs hasbeen widely disputed, since their introduction.[4][5][8][9] The main issues are:

  1. The wrong interpretation of this "potential" as a true, physically validpotential of mean force;
  2. The nature of the so-calledreference state and its optimal formulation;
  3. The validity of generalizations beyond pairwise distances.

Controversial analogy

[edit]

In response to the issue regarding the physical validity, the first justification of statistical PMFs was attempted by Sippl.[10] It was based on an analogy with the statistical physics of liquids. For liquids, the potential of mean force is related to theradial distribution functiong(r){\displaystyle g(r)}, which is given by:[11]

g(r)=P(r)QR(r){\displaystyle g(r)={\frac {P(r)}{Q_{R}(r)}}}

whereP(r){\displaystyle P(r)} andQR(r){\displaystyle Q_{R}(r)} are the respective probabilities offinding two particles at a distancer{\displaystyle r} from each other in the liquidand in the reference state. For liquids, the reference stateis clearly defined; it corresponds to the ideal gas, consisting ofnon-interacting particles. The two-particle potential of mean forceW(r){\displaystyle W(r)} is related tog(r){\displaystyle g(r)} by:

W(r)=kTlogg(r)=kTlogP(r)QR(r){\displaystyle W(r)=-kT\log g(r)=-kT\log {\frac {P(r)}{Q_{R}(r)}}}

According to the reversible work theorem, the two-particlepotential of mean forceW(r){\displaystyle W(r)} is the reversible work required tobring two particles in the liquid from infinite separation to a distancer{\displaystyle r} from each other.[11]

Sippl justified the use of statistical PMFs—a few years after he introducedthem for use in protein structure prediction—byappealing to the analogy with the reversible work theorem for liquids. For liquids,g(r){\displaystyle g(r)} can be experimentally measuredusingsmall angle X-ray scattering; for proteins,P(r){\displaystyle P(r)} is obtainedfrom the set of known protein structures, as explained in the previoussection. However, asBen-Naim wrote in a publication on the subject:[5]

[...] the quantities, referred to as "statistical potentials," "structurebased potentials," or "pair potentials of mean force", as derived fromthe protein data bank (PDB), are neither "potentials" nor "potentials ofmean force," in the ordinary sense as used in the literature onliquids and solutions.

Moreover, this analogy does not solve the issue of how to specify a suitablereference state for proteins.

Machine learning

[edit]

In the mid-2000s, authors started to combine multiple statistical potentials, derived from different structural features, intocomposite scores.[12] For that purpose, they usedmachine learning techniques, such assupport vector machines (SVMs). Probabilisticneural networks (PNNs) have also been applied for the training of a position-specific distance-dependent statistical potential.[13] In 2016, theDeepMind artificial intelligence research laboratory started to applydeep learning techniques to the development of a torsion- and distance-dependent statistical potential.[14] The resulting method, namedAlphaFold, won the 13thCritical Assessment of Techniques for Protein Structure Prediction (CASP) by correctly predicting the most accurate structure for 25 out of 43free modellingdomains.

Explanation

[edit]

Bayesian probability

[edit]

Baker and co-workers[15] justified statistical PMFs from aBayesian point of view and used these insights in the construction ofthe coarse grainedROSETTA energy function. AccordingtoBayesian probability calculus, the conditional probabilityP(XA){\displaystyle P(X\mid A)} of a structureX{\displaystyle X}, given the amino acid sequenceA{\displaystyle A}, can bewritten as:

P(XA)=P(AX)P(X)P(A)P(AX)P(X){\displaystyle P\left(X\mid A\right)={\frac {P\left(A\mid X\right)P\left(X\right)}{P\left(A\right)}}\propto P\left(A\mid X\right)P\left(X\right)}

P(XA){\displaystyle P(X\mid A)} is proportional to the product ofthelikelihoodP(AX){\displaystyle P\left(A\mid X\right)} times thepriorP(X){\displaystyle P\left(X\right)}. By assuming that the likelihood can be approximatedas a product of pairwise probabilities, and applyingBayes' theorem, thelikelihood can be written as:

P(AX)i<jP(ai,ajrij)i<jP(rijai,aj)P(rij){\displaystyle P\left(A\mid X\right)\approx \prod _{i<j}P\left(a_{i},a_{j}\mid r_{ij}\right)\propto \prod _{i<j}{\frac {P\left(r_{ij}\mid a_{i},a_{j}\right)}{P(r_{ij})}}}

where the product runs over all amino acid pairsai,aj{\displaystyle a_{i},a_{j}} (withi<j{\displaystyle i<j}), andrij{\displaystyle r_{ij}} is the distance between amino acidsi{\displaystyle i} andj{\displaystyle j}.Obviously, the negative of the logarithm of the expressionhas the same functional form as the classicpairwise distance statistical PMFs, with the denominator playing the role of thereference state. This explanation has two shortcomings: it relies on the unfounded assumption the likelihood can be expressedas a product of pairwise probabilities, and it is purelyqualitative.

Probability kinematics

[edit]

Hamelryck and co-workers[6] later gave aquantitative explanation for the statistical potentials, according to which they approximate a form of probabilistic reasoning due toRichard Jeffrey and namedprobability kinematics. This variant of Bayesian thinking (sometimes called "Jeffrey conditioning") allowsupdating a prior distribution based on new information on the probabilities of the elements of a partition on the support of the prior. From this point of view, (i) it is not necessary to assume that the database of protein structures—used to build the potentials—follows a Boltzmann distribution, (ii) statistical potentials generalize readily beyond pairwise differences, and (iii) thereference ratio is determined by the prior distribution.

Reference ratio

[edit]
The reference ratio method.Q(X){\displaystyle Q(X)} is a probability distribution that describes the structure of proteins on a local length scale (right). Typically,Q(X){\displaystyle Q(X)} is embodied in a fragment library, but other possibilities are an energy function or agraphical model. In order to obtain a complete description of protein structure, one also needs a probability distributionP(Y){\displaystyle P(Y)} that describes nonlocal aspects, such as hydrogen bonding.P(Y){\displaystyle P(Y)} is typically obtained from a set of solved protein structures from thePDB (left). In order to combineQ(X){\displaystyle Q(X)} withP(Y){\displaystyle P(Y)} in a meaningful way, one needs the reference ratio expression (bottom), which takes the signal inQ(X){\displaystyle Q(X)} with respect toY{\displaystyle Y} into account.

Expressions that resemble statistical PMFs naturally result from the application ofprobability theory to solve a fundamental problem that arises in proteinstructure prediction: how to improve an imperfect probabilitydistributionQ(X){\displaystyle Q(X)} over a first variableX{\displaystyle X} using a probabilitydistributionP(Y){\displaystyle P(Y)} over a second variableY{\displaystyle Y}, withY=f(X){\displaystyle Y=f(X)}.[6] Typically,X{\displaystyle X} andY{\displaystyle Y} are fine and coarse grained variables, respectively. For example,Q(X){\displaystyle Q(X)} could concernthe local structure of the protein, whileP(Y){\displaystyle P(Y)} could concern the pairwise distances between the amino acids. In that case,X{\displaystyle X} could for example be a vector of dihedral angles that specifies all atom positions (assuming ideal bond lengths and angles).In order to combine the two distributions, such that the local structure will be distributed according toQ(X){\displaystyle Q(X)}, whilethe pairwise distances will be distributed according toP(Y){\displaystyle P(Y)}, the following expression is needed:

P(X,Y)=P(Y)Q(Y)Q(X){\displaystyle P(X,Y)={\frac {P(Y)}{Q(Y)}}Q(X)}

whereQ(Y){\displaystyle Q(Y)} is the distribution overY{\displaystyle Y} implied byQ(X){\displaystyle Q(X)}. The ratio in the expression corresponds to the PMF. Typically,Q(X){\displaystyle Q(X)} is brought in by sampling (typically from a fragment library), and not explicitly evaluated; the ratio, which in contrast is explicitly evaluated, corresponds to Sippl's PMF. This explanation is quantitive, and allows the generalization of statistical PMFs from pairwise distances to arbitrary coarse grained variables. It also provides a rigorous definition of the reference state, which is implied byQ(X){\displaystyle Q(X)}. Conventional applications of pairwise distance statistical PMFs usually lack twonecessary features to make them fully rigorous: the use of a proper probability distribution over pairwise distances in proteins, and the recognition that the reference state is rigorouslydefined byQ(X){\displaystyle Q(X)}.

Applications

[edit]

Statistical potentials are used asenergy functions in the assessment of an ensemble of structural models produced byhomology modeling orprotein threading. Many differently parameterized statistical potentials have been shown to successfully identify the native state structure from an ensemble ofdecoy or non-native structures.[16] Statistical potentials are not only used forprotein structure prediction, but also for modelling theprotein folding pathway.[17][18]

See also

[edit]

Notes

[edit]
  1. ^Not to be confused with actualPMF.

References

[edit]
  1. ^Postic, Guillaume; Hamelryck, Thomas; Chomilier, Jacques; Stratmann, Dirk (2018). "MyPMFs: a simple tool for creating statistical potentials to assess protein structural models".Biochimie.151:37–41.doi:10.1016/j.biochi.2018.05.013.ISSN 0300-9084.PMID 29857183.S2CID 46923560.
  2. ^Miyazawa S, Jernigan R (1985). "Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation".Macromolecules.18 (3):534–552.Bibcode:1985MaMol..18..534M.CiteSeerX 10.1.1.206.715.doi:10.1021/ma00145a039.
  3. ^abSippl MJ (1990). "Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins".J Mol Biol.213 (4):859–883.doi:10.1016/s0022-2836(05)80269-4.PMID 2359125.
  4. ^abThomas PD, Dill KA (1996). "Statistical potentials extracted from protein structures: how accurate are they?".J Mol Biol.257 (2):457–469.doi:10.1006/jmbi.1996.0175.PMID 8609636.
  5. ^abcBen-Naim A (1997). "Statistical potentials extracted from protein structures: Are these meaningful potentials?".J Chem Phys.107 (9):3698–3706.Bibcode:1997JChPh.107.3698B.doi:10.1063/1.474725.
  6. ^abcHamelryck T, Borg M, Paluszewski M, et al. (2010). Flower DR (ed.)."Potentials of mean force for protein structure prediction vindicated, formalized and generalized".PLOS ONE.5 (11) e13714.arXiv:1008.4006.Bibcode:2010PLoSO...513714H.doi:10.1371/journal.pone.0013714.PMC 2978081.PMID 21103041.
  7. ^Rooman M, Wodak S (1995). "Are database-derived potentials valid for scoring both forward and inverted protein folding?".Protein Eng.8 (9):849–858.doi:10.1093/protein/8.9.849.PMID 8746722.
  8. ^Koppensteiner WA, Sippl MJ (1998). "Knowledge-based potentials–back to the roots".Biochemistry Mosc.63 (3):247–252.PMID 9526121.
  9. ^Shortle D (2003)."Propensities, probabilities, and the Boltzmann hypothesis".Protein Sci.12 (6):1298–1302.doi:10.1110/ps.0306903.PMC 2323900.PMID 12761401.
  10. ^Sippl MJ, Ortner M, Jaritz M, Lackner P, Flockner H (1996). "Helmholtz free energies of atom pair interactions in proteins".Fold Des.1 (4):289–98.doi:10.1016/s1359-0278(96)00042-9.PMID 9079391.
  11. ^abChandler D (1987) Introduction to Modern Statistical Mechanics. New York: Oxford University Press, USA.
  12. ^Eramian, David; Shen, Min-yi; Devos, Damien; Melo, Francisco; Sali, Andrej; Marti-Renom, Marc (2006)."A composite score for predicting errors in protein structure models".Protein Science.15 (7):1653–1666.doi:10.1110/ps.062095806.PMC 2242555.PMID 16751606.
  13. ^Zhao, Feng; Xu, Jinbo (2012)."A Position-Specific Distance-Dependent Statistical Potential for Protein Structure and Functional Study".Structure.20 (6):1118–1126.doi:10.1016/j.str.2012.04.003.PMC 3372698.PMID 22608968.
  14. ^Senior AW, Evans R, Jumper J, et al. (2020)."Improved protein structure prediction using potentials from deep learning"(PDF).Nature.577 (7792):706–710.Bibcode:2020Natur.577..706S.doi:10.1038/s41586-019-1923-7.PMID 31942072.S2CID 210221987.
  15. ^Simons KT, Kooperberg C, Huang E, Baker D (1997). "Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions".J Mol Biol.268 (1):209–225.CiteSeerX 10.1.1.579.5647.doi:10.1006/jmbi.1997.0959.PMID 9149153.
  16. ^Lam SD, Das S, Sillitoe I, Orengo C (2017)."An overview of comparative modelling and resources dedicated to large-scale modelling of genome sequences".Acta Crystallogr D.73 (8):628–640.Bibcode:2017AcCrD..73..628L.doi:10.1107/S2059798317008920.PMC 5571743.PMID 28777078.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  17. ^Kmiecik S and Kolinski A (2007)."Characterization of protein-folding pathways by reduced-space modeling".Proc. Natl. Acad. Sci. U.S.A.104 (30):12330–12335.Bibcode:2007PNAS..10412330K.doi:10.1073/pnas.0702265104.PMC 1941469.PMID 17636132.
  18. ^Adhikari AN, Freed KF, Sosnick TR (2012)."De novo prediction of protein folding pathways and structure using the principle of sequential stabilization".Proc. Natl. Acad. Sci. U.S.A.109 (43):17442–17447.Bibcode:2012PNAS..10917442A.doi:10.1073/pnas.1209000109.PMC 3491489.PMID 23045636.
Retrieved from "https://en.wikipedia.org/w/index.php?title=Statistical_potential&oldid=1314669752"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp