Thepartition function orconfiguration integral, as used inprobability theory,information theory anddynamical systems, is a generalization of the definition of apartition function in statistical mechanics. It is a special case of anormalizing constant in probability theory, for theBoltzmann distribution. The partition function occurs in many problems of probability theory because, in situations where there is a natural symmetry, its associatedprobability measure, theGibbs measure, has theMarkov property. This means that the partition function occurs not only in physical systems with translation symmetry, but also in such varied settings as neural networks (theHopfield network), and applications such asgenomics,corpus linguistics andartificial intelligence, which employMarkov networks, andMarkov logic networks. The Gibbs measure is also the unique measure that has the property of maximizing theentropy for a fixed expectation value of the energy; this underlies the appearance of the partition function inmaximum entropy methods and the algorithms derived therefrom.
The partition function ties together many different concepts, and thus offers a general framework in which many different kinds of quantities may be calculated. In particular, it shows how to calculateexpectation values andGreen's functions, forming a bridge toFredholm theory. It also provides a natural setting for theinformation geometry approach to information theory, where theFisher information metric can be understood to be acorrelation function derived from the partition function; it happens to define aRiemannian manifold.
When the setting for random variables is oncomplex projective space orprojective Hilbert space, geometrized with theFubini–Study metric, the theory ofquantum mechanics and more generallyquantum field theory results. In these theories, the partition function is heavily exploited in thepath integral formulation, with great success, leading to many formulas nearly identical to those reviewed here. However, because the underlying measure space is complex-valued, as opposed to the real-valuedsimplex of probability theory, an extra factor ofi appears in many formulas. Tracking this factor is troublesome, and is not done here. This article focuses primarily on classical probability theory, where the sum of probabilities total to one.
Given a set ofrandom variables taking on values, and some sort ofpotential function orHamiltonian, the partition function is defined as
The functionH is understood to be a real-valued function on the space of states, while is a real-valued free parameter (conventionally, theinverse temperature). The sum over the is understood to be a sum over all possible values that each of the random variables may take. Thus, the sum is to be replaced by anintegral when the are continuous, rather than discrete. Thus, one writes
for the case of continuously-varying.
WhenH is anobservable, such as a finite-dimensionalmatrix or an infinite-dimensionalHilbert spaceoperator or element of aC-star algebra, it is common to express the summation as atrace, so that
WhenH is infinite-dimensional, then, for the above notation to be valid, the argument must betrace class, that is, of a form such that the summation exists and is bounded.
The number of variables need not becountable, in which case the sums are to be replaced byfunctional integrals. Although there are many notations for functional integrals, a common one would be
Such is the case for thepartition function in quantum field theory.
A common, useful modification to the partition function is to introduce auxiliary functions. This allows, for example, the partition function to be used as agenerating function forcorrelation functions. This is discussed in greater detail below.
The role or meaning of the parameter can be understood in a variety of different ways. In classical thermodynamics, it is aninverse temperature. More generally, one would say that it is the variable that isconjugate to some (arbitrary) function of the random variables. The wordconjugate here is used in the sense of conjugategeneralized coordinates inLagrangian mechanics, thus, properly is aLagrange multiplier. It is not uncommonly called thegeneralized force. All of these concepts have in common the idea that one value is meant to be kept fixed, as others, interconnected in some complicated way, are allowed to vary. In the current case, the value to be kept fixed is theexpectation value of, even as many differentprobability distributions can give rise to exactly this same (fixed) value.
For the general case, one considers a set of functions that each depend on the random variables. These functions are chosen because one wants to hold their expectation values constant, for one reason or another. To constrain the expectation values in this way, one applies the method ofLagrange multipliers. In the general case,maximum entropy methods illustrate the manner in which this is done.
Some specific examples are in order. In basic thermodynamics problems, when using thecanonical ensemble, the use of just one parameter reflects the fact that there is only one expectation value that must be held constant: thefree energy (due toconservation of energy). For chemistry problems involving chemical reactions, thegrand canonical ensemble provides the appropriate foundation, and there are two Lagrange multipliers. One is to hold the energy constant, and another, thefugacity, is to hold the particle count constant (as chemical reactions involve the recombination of a fixed number of atoms).
For the general case, one has
with a point in a space.
For a collection of observables, one would write
As before, it is presumed that the argument oftr istrace class.
The correspondingGibbs measure then provides a probability distribution such that the expectation value of each is a fixed value. More precisely, one has
with the angle brackets denoting the expected value of, and being a common alternative notation. A precise definition of this expectation value is given below.
Although the value of is commonly taken to be real, it need not be, in general; this is discussed in the sectionNormalization below. The values of can be understood to be the coordinates of points in a space; this space is in fact amanifold, as sketched below. The study of these spaces as manifolds constitutes the field ofinformation geometry.
The potential function itself commonly takes the form of a sum:
where the sum overs is a sum over some subset of thepower setP(X) of the set. For example, instatistical mechanics, such as theIsing model, the sum is over pairs of nearest neighbors. In probability theory, such asMarkov networks, the sum might be over thecliques of a graph; so, for the Ising model and otherlattice models, the maximal cliques are edges.
The fact that the potential function can be written as a sum usually reflects the fact that it is invariant under theaction of agroup symmetry, such astranslational invariance. Such symmetries can be discrete or continuous; they materialize in thecorrelation functions for the random variables (discussed below). Thus a symmetry in the Hamiltonian becomes a symmetry of the correlation function (and vice versa).
This symmetry has a critically important interpretation in probability theory: it implies that theGibbs measure has theMarkov property; that is, it is independent of the random variables in a certain way, or, equivalently, the measure is identical on theequivalence classes of the symmetry. This leads to the widespread appearance of the partition function in problems with the Markov property, such asHopfield networks.
The value of the expression
can be interpreted as a likelihood that a specificconfiguration of values occurs in the system. Thus, given a specific configuration,
is theprobability of the configuration occurring in the system, which is now properly normalized so that, and such that the sum over all configurations totals to one. As such, the partition function can be understood to provide ameasure (aprobability measure) on theprobability space; formally, it is called theGibbs measure. It generalizes the narrower concepts of thegrand canonical ensemble andcanonical ensemble in statistical mechanics.
There exists at least one configuration for which the probability is maximized; this configuration is conventionally called theground state. If the configuration is unique, the ground state is said to benon-degenerate, and the system is said to beergodic; otherwise the ground state isdegenerate. The ground state may or may not commute with the generators of the symmetry; if commutes, it is said to be aninvariant measure. When it does not commute, the symmetry is said to bespontaneously broken.
Conditions under which a ground state exists and is unique are given by theKarush–Kuhn–Tucker conditions; these conditions are commonly used to justify the use of the Gibbs measure in maximum-entropy problems.[citation needed]
The values taken by depend on themathematical space over which the random field varies. Thus, real-valued random fields take values on asimplex: this is the geometrical way of saying that the sum of probabilities must total to one. For quantum mechanics, the random variables range overcomplex projective space (or complex-valuedprojective Hilbert space), where the random variables are interpreted asprobability amplitudes. The emphasis here is on the wordprojective, as the amplitudes are still normalized to one. The normalization for the potential function is theJacobian for the appropriate mathematical space: it is 1 for ordinary probabilities, andi for Hilbert space; thus, inquantum field theory, one sees in the exponential, rather than. The partition function is very heavily exploited in thepath integral formulation of quantum field theory, to great effect. The theory there is very nearly identical to that presented here, aside from this difference, and the fact that it is usually formulated on four-dimensional space-time, rather than in a general way.
The partition function is commonly used as aprobability-generating function forexpectation values of various functions of the random variables. So, for example, taking as an adjustable parameter, then the derivative of with respect to
gives the average (expectation value) ofH. In physics, this would be called the averageenergy of the system.
Given the definition of the probability measure above, the expectation value of any functionf of the random variablesX may now be written as expected: so, for discrete-valuedX, one writes
The above notation makes sense for a finite number of discrete random variables. In more general settings, the summations should be replaced with integrals over aprobability space.
Thus, for example, theentropy is given by
The Gibbs measure is the unique statistical distribution that maximizes the entropy for a fixed expectation value of the energy; this underlies its use inmaximum entropy methods.
The points can be understood to form a space, and specifically, amanifold. Thus, it is reasonable to ask about the structure of this manifold; this is the task ofinformation geometry.
Multiple derivatives with regard to the Lagrange multipliers gives rise to a positive semi-definitecovariance matrixThis matrix is positive semi-definite, and may be interpreted as ametric tensor, specifically, aRiemannian metric. Equipping the space of Lagrange multipliers with a metric in this way turns it into aRiemannian manifold.[1] The study of such manifolds is referred to asinformation geometry; the metric above is theFisher information metric. Here, serves as a coordinate on the manifold. It is interesting to compare the above definition to the simplerFisher information, from which it is inspired.
That the above defines the Fisher information metric can be readily seen by explicitly substituting for the expectation value:
where we've written for and the summation is understood to be over all values of all random variables. For continuous-valued random variables, the summations are replaced by integrals, of course.
Curiously, theFisher information metric can also be understood as the flat-spaceEuclidean metric, after appropriate change of variables, as described in the main article on it. When the are complex-valued, the resulting metric is theFubini–Study metric. When written in terms ofmixed states, instead ofpure states, it is known as theBures metric.
By introducing artificial auxiliary functions into the partition function, it can then be used to obtain the expectation value of the random variables. Thus, for example, by writing
one then has
as the expectation value of. In thepath integral formulation ofquantum field theory, these auxiliary functions are commonly referred to assource fields.
Multiple differentiations lead to theconnected correlation functions of the random variables. Thus the correlation function between variables and is given by:
For the case whereH can be written as aquadratic form involving adifferential operator, that is, as
then partition function can be understood to be a sum orintegral over Gaussians. The correlation function can be understood to be theGreen's function for the differential operator (and generally giving rise toFredholm theory). In the quantum field theory setting, such functions are referred to aspropagators; higher order correlators are called n-point functions; working with them defines theeffective action of a theory.
When the random variables are anti-commutingGrassmann numbers, then the partition function can be expressed as a determinant of the operatorD. This is done by writing it as aBerezin integral (also called Grassmann integral).
Partition functions are used to discusscritical scaling,universality and are subject to therenormalization group.