Movatterモバイル変換

[0]ホーム

Jump to content

Log probability

Українська

Edit links

From Wikipedia, the free encyclopedia

(Redirected fromLog-probability)

Logarithm of probabilities, useful for calculations

Not to be confused withlog odds.

Inprobability theory andcomputer science, alog probability is simply alogarithm of aprobability.^[1] The use of log probabilities means representing probabilities on alogarithmic scale $(-\infty ,0]$ , instead of the standard $[0,1]$ unit interval.

Since the probabilities ofindependent events multiply, and logarithms convert multiplication to addition, log probabilities of independent events add. Log probabilities are thus practical for computations, and have an intuitive interpretation in terms ofinformation theory: the negativeexpected value of the log probabilities is theinformation entropy of an event. Similarly,likelihoods are often transformed to the log scale, and the correspondinglog-likelihood can be interpreted as the degree to which an event supports astatistical model. The log probability is widely used in implementations of computations with probability, and is studied as a concept in its own right in some applications of information theory, such asnatural language processing.

Motivation

[edit]

Representing probabilities in this way has several practical advantages:

Speed. Since multiplication is moreexpensive than addition, taking the product of a high number of probabilities is often faster if they are represented in log form. (The conversion to log form is expensive, but is only incurred once.) Multiplication arises from calculating the probability that multiple independent events occur: the probability that all independent events of interest occur is the product of all these events' probabilities.
Accuracy. The use of log probabilities improvesnumerical stability, when the probabilities are very small, because of the way in which computersapproximate real numbers.^[1]
Simplicity. Manyprobability distributions have an exponential form. Taking the log of these distributions eliminates the exponential function, unwrapping the exponent. For example, the log probability of the normal distribution'sprobability density function is $-((x-m_{x})/\sigma _{m})^{2}+C$ instead of $C_{2}\exp \left(-((x-m_{x})/\sigma _{m})^{2}\right)$ . Log probabilities make some mathematical manipulations easier to perform.
Optimization. Since most commonprobability distributions—notably theexponential family—are onlylogarithmically concave,^[2]^[3] andconcavity of theobjective function plays a key role in themaximization of a function such as probability, optimizers work better with log probabilities.

Representation issues

[edit]

The logarithm function is not defined for zero, so log probabilities can only represent non-zero probabilities. Since the logarithm of a number in $(0,1)$ interval is negative, often the negative log probabilities are used. In that case the log probabilities in the following formulas would beinverted.

Any base can be selected for the logarithm.

Basic manipulations

[edit]

Further information:Log semiring

In this section we would name probabilities in logarithmic space $x^{'} {\displaystyle x'}$ and $y^{'} {\displaystyle y'}$ for short:

x'=\log(x)\in \mathbb {R}

y'=\log(y)\in \mathbb {R}

The product of probabilities $x\cdot y$ corresponds to addition in logarithmic space.

\log(x\cdot y)=\log(x)+\log(y)=x'+y'.

Thesum of probabilities $x+y$ is a bit more involved to compute in logarithmic space, requiring the computation of one exponent and one logarithm.

However, in many applications a multiplication of probabilities (giving the probability of all independent events occurring) is used more often than their addition (giving the probability of at least one of mutually exclusive events occurring). Additionally, the cost of computing the addition can be avoided in some situations by simply using the highest probability as an approximation. Since probabilities are non-negative this gives a lower bound. This approximation is used in reverse to get acontinuous approximation of the max function.

Addition in log space

[edit]

{\begin{aligned}&\log(x+y)\\={}&\log(x+x\cdot y/x)\\={}&\log(x+x\cdot \exp(\log(y/x)))\\={}&\log(x\cdot (1+\exp(\log(y)-\log(x))))\\={}&\log(x)+\log(1+\exp(\log(y)-\log(x)))\\={}&x'+\log \left(1+\exp \left(y'-x'\right)\right)\end{aligned}}

The formula above is more accurate than $\log \left(e^{x'}+e^{y'}\right)$ , provided one takes advantage of the asymmetry in the addition formula. ${x'}$ should be the larger (least negative) of the two operands. This also produces the correct behavior if one of the operands isfloating-point negative infinity, which corresponds to a probability of zero.

-\infty +\log \left(1+\exp \left(y'-(-\infty )\right)\right)=-\infty +\infty

This quantity isindeterminate, and will result inNaN.

x'+\log \left(1+\exp \left(-\infty -x'\right)\right)=x'+0

This is the desired answer.

The above formula alone will incorrectly produce an indeterminate result in the case where both arguments are $-\infty$ . This should be checked for separately to return $-\infty$ .