Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Natural evolution strategy

From Wikipedia, the free encyclopedia
Part ofa series on the
Evolutionary algorithm
Genetic algorithm (GA)
Genetic programming (GP)
Differential evolution
Evolution strategy
Evolutionary programming
Related topics
This article includes alist of references,related reading, orexternal links,but its sources remain unclear because it lacksinline citations. Please helpimprove this article byintroducing more precise citations.(March 2015) (Learn how and when to remove this message)

Natural evolution strategies (NES) are a family ofnumerical optimization algorithms forblack box problems. Similar in spirit toevolution strategies, they iteratively update the (continuous) parameters of asearch distribution by following the natural gradient towards higher expected fitness.

Method

[edit]

The general procedure is as follows: theparameterized search distribution is used to produce a batch of search points, and thefitness function is evaluated at each such point. The distribution’s parameters (which includestrategy parameters) allow the algorithm to adaptively capture the (local) structure of the fitness function. For example, in the case of aGaussian distribution, this comprises the mean and thecovariance matrix. From the samples, NES estimates a search gradient on the parameters towards higher expected fitness. NES then performs a gradient ascent step along thenatural gradient, a second order method which, unlike the plain gradient, renormalizes the update with respect to uncertainty. This step is crucial, since it prevents oscillations, premature convergence, and undesired effects stemming from a given parameterization. The entire process reiterates until a stopping criterion is met.

All members of the NES family operate based on the same principles. They differ in the type ofprobability distribution and the gradientapproximation method used. Different search spaces require different search distributions; for example, in low dimensionality it can be highly beneficial to model the full covariance matrix. In high dimensions, on the other hand, a more scalable alternative is to limit the covariance to thediagonal only. In addition, highly multi-modal search spaces may benefit from moreheavy-tailed distributions (such asCauchy, as opposed to the Gaussian). A last distinction arises between distributions where we can analytically compute the natural gradient, and more general distributions where we need to estimate it from samples.

Search gradients

[edit]

Letθ{\displaystyle \theta } denote the parameters of the search distributionπ(x|θ){\displaystyle \pi (x\,|\,\theta )} andf(x){\displaystyle f(x)} the fitness function evaluated atx{\displaystyle x}. NES then pursues the objective of maximizing theexpected fitness under the search distribution

J(θ)=Eθ[f(x)]=f(x)π(x|θ)dx{\displaystyle J(\theta )=\operatorname {E} _{\theta }[f(x)]=\int f(x)\;\pi (x\,|\,\theta )\;dx}

throughgradient ascent. The gradient can be rewritten as

θJ(θ)=θf(x)π(x|θ)dx{\displaystyle \nabla _{\theta }J(\theta )=\nabla _{\theta }\int f(x)\;\pi (x\,|\,\theta )\;dx}
=f(x)θπ(x|θ)dx{\displaystyle =\int f(x)\;\nabla _{\theta }\pi (x\,|\,\theta )\;dx}
=f(x)θπ(x|θ)π(x|θ)π(x|θ)dx{\displaystyle =\int f(x)\;\nabla _{\theta }\pi (x\,|\,\theta )\;{\frac {\pi (x\,|\,\theta )}{\pi (x\,|\,\theta )}}\;dx}
=[f(x)θlogπ(x|θ)]π(x|θ)dx{\displaystyle =\int {\Big [}f(x)\;\nabla _{\theta }\log \pi (x\,|\,\theta ){\Big ]}\;\pi (x\,|\,\theta )\;dx}
=Eθ[f(x)θlogπ(x|θ)]{\displaystyle =\operatorname {E} _{\theta }\left[f(x)\;\nabla _{\theta }\log \pi (x\,|\,\theta )\right]}

that is, theexpected value off(x){\displaystyle f(x)} times thelog-derivatives atx{\displaystyle x}. In practice, it is possible to use theMonte Carlo approximation based on a finite number ofλ{\displaystyle \lambda } samples

θJ(θ)1λk=1λf(xk)θlogπ(xk|θ){\displaystyle \nabla _{\theta }J(\theta )\approx {\frac {1}{\lambda }}\sum _{k=1}^{\lambda }f(x_{k})\;\nabla _{\theta }\log \pi (x_{k}\,|\,\theta )}.

Finally, the parameters of the search distribution can be updated iteratively

θθ+ηθJ(θ){\displaystyle \theta \leftarrow \theta +\eta \nabla _{\theta }J(\theta )}

Natural gradient ascent

[edit]

Instead of using the plain stochastic gradient for updates, NESfollows thenatural gradient, which has been shown to possess numerous advantages over the plain (vanilla) gradient, e.g.:

  • the gradient direction is independent of the parameterization of the search distribution
  • the updates magnitudes are automatically adjusted based on uncertainty, in turn speeding convergence onplateaus and ridges.

The NES update is therefore

θθ+ηF1θJ(θ){\displaystyle \theta \leftarrow \theta +\eta \mathbf {F} ^{-1}\nabla _{\theta }J(\theta )},

whereF{\displaystyle \mathbf {F} } is theFisher information matrix.The Fisher matrix can sometimes be computed exactly, otherwise it is estimated from samples, reusing the log-derivativesθlogπ(x|θ){\displaystyle \nabla _{\theta }\log \pi (x|\theta )}.

Fitness shaping

[edit]

NES utilizesrank-based fitness shaping in order to render thealgorithm more robust, andinvariant under monotonicallyincreasing transformations of the fitness function. For this purpose, the fitness of the population is transformed into a set ofutility valuesu1uλ{\displaystyle u_{1}\geq \dots \geq u_{\lambda }}. Letxi{\displaystyle x_{i}} denote the ith best individual.Replacing fitness with utility, the gradient estimate becomes

θJ(θ)=k=1λukθlogπ(xk|θ){\displaystyle \nabla _{\theta }J(\theta )=\sum _{k=1}^{\lambda }u_{k}\;\nabla _{\theta }\log \pi (x_{k}\,|\,\theta )}.

The choice of utility function is a free parameter of the algorithm.

Pseudocode

[edit]
input:f,θinit{\displaystyle f,\;\;\theta _{init}}1repeat   2fork=1λ{\displaystyle k=1\ldots \lambda }do//λ is the population size       3draw samplexkπ(|θ){\displaystyle x_{k}\sim \pi (\cdot |\theta )}       4evaluate fitnessf(xk){\displaystyle f(x_{k})}       5calculate log-derivativesθlogπ(xk|θ){\displaystyle \nabla _{\theta }\log \pi (x_{k}|\theta )}       6end   7assign the utilitiesuk{\displaystyle u_{k}}// based on rank   8estimate the gradientθJ1λk=1λukθlogπ(xk|θ){\displaystyle \nabla _{\theta }J\leftarrow {\frac {1}{\lambda }}\sum _{k=1}^{\lambda }u_{k}\cdot \nabla _{\theta }\log \pi (x_{k}|\theta )}   9estimateF1λk=1λθlogπ(xk|θ)θlogπ(xk|θ){\displaystyle \mathbf {F} \leftarrow {\frac {1}{\lambda }}\sum _{k=1}^{\lambda }\nabla _{\theta }\log \pi (x_{k}|\theta )\nabla _{\theta }\log \pi (x_{k}|\theta )^{\top }}// or compute it exactly    10update parametersθθ+ηF1θJ{\displaystyle \theta \leftarrow \theta +\eta \cdot \mathbf {F} ^{-1}\nabla _{\theta }J}//η is the learning rate11until stopping criterion is met

See also

[edit]

Bibliography

[edit]

External links

[edit]
Main Topics
Algorithms
Related techniques
Metaheuristic methods
Related topics
Journals
Retrieved from "https://en.wikipedia.org/w/index.php?title=Natural_evolution_strategy&oldid=1267266303"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp