Rate this Page

Probability distributions - torch.distributions#

Created On: Oct 19, 2017 | Last Updated On: Jun 13, 2025

Thedistributions package contains parameterizable probability distributionsand sampling functions. This allows the construction of stochastic computationgraphs and stochastic gradient estimators for optimization. This packagegenerally follows the design of theTensorFlow Distributions package.

It is not possible to directly backpropagate through random samples. However,there are two main methods for creating surrogate functions that can bebackpropagated through. These are the score function estimator/likelihood ratioestimator/REINFORCE and the pathwise derivative estimator. REINFORCE is commonlyseen as the basis for policy gradient methods in reinforcement learning, and thepathwise derivative estimator is commonly seen in the reparameterization trickin variational autoencoders. Whilst the score function only requires the valueof samplesf(x)f(x), the pathwise derivative requires the derivativef(x)f'(x). The next sections discuss these two in a reinforcement learningexample. For more details seeGradient Estimation Using Stochastic Computation Graphs .

Score function#

When the probability density function is differentiable with respect to itsparameters, we only needsample() andlog_prob() to implement REINFORCE:

Δθ=αrlogp(aπθ(s))θ\Delta\theta = \alpha r \frac{\partial\log p(a|\pi^\theta(s))}{\partial\theta}

whereθ\theta are the parameters,α\alpha is the learning rate,rr is the reward andp(aπθ(s))p(a|\pi^\theta(s)) is the probability oftaking actionaa in statess given policyπθ\pi^\theta.

In practice we would sample an action from the output of a network, apply thisaction in an environment, and then uselog_prob to construct an equivalentloss function. Note that we use a negative because optimizers use gradientdescent, whilst the rule above assumes gradient ascent. With a categoricalpolicy, the code for implementing REINFORCE would be as follows:

probs=policy_network(state)# Note that this is equivalent to what used to be called multinomialm=Categorical(probs)action=m.sample()next_state,reward=env.step(action)loss=-m.log_prob(action)*rewardloss.backward()

Pathwise derivative#

The other way to implement these stochastic/policy gradients would be to use thereparameterization trick from thersample() method, where theparameterized random variable can be constructed via a parameterizeddeterministic function of a parameter-free random variable. The reparameterizedsample therefore becomes differentiable. The code for implementing the pathwisederivative would be as follows:

params=policy_network(state)m=Normal(*params)# Any distribution with .has_rsample == True could work based on the applicationaction=m.rsample()next_state,reward=env.step(action)# Assuming that reward is differentiableloss=-rewardloss.backward()

Distribution#

classtorch.distributions.distribution.Distribution(batch_shape=(),event_shape=(),validate_args=None)[source]#

Bases:object

Distribution is the abstract base class for probability distributions.

Parameters:
  • batch_shape (torch.Size) – The shape over which parameters are batched.

  • event_shape (torch.Size) – The shape of a single sample (without batching).

  • validate_args (bool,optional) – Whether to validate arguments. Default: None.

propertyarg_constraints:dict[str,Constraint]#

Returns a dictionary from argument names toConstraint objects thatshould be satisfied by each argument of this distribution. Args thatare not tensors need not appear in this dict.

propertybatch_shape:Size#

Returns the shape over which parameters are batched.

cdf(value)[source]#

Returns the cumulative density/mass function evaluated atvalue.

Parameters:

value (Tensor) –

Return type:

Tensor

entropy()[source]#

Returns entropy of distribution, batched over batch_shape.

Returns:

Tensor of shape batch_shape.

Return type:

Tensor

enumerate_support(expand=True)[source]#

Returns tensor containing all values supported by a discretedistribution. The result will enumerate over dimension 0, so the shapeof the result will be(cardinality,) + batch_shape + event_shape(whereevent_shape = () for univariate distributions).

Note that this enumerates over all batched tensors in lock-step[[0, 0], [1, 1], …]. Withexpand=False, enumeration happensalong dim 0, but with the remaining batch dimensions beingsingleton dimensions,[[0], [1], ...

To iterate over the full Cartesian product useitertools.product(m.enumerate_support()).

Parameters:

expand (bool) – whether to expand the support over thebatch dims to match the distribution’sbatch_shape.

Returns:

Tensor iterating over dimension 0.

Return type:

Tensor

propertyevent_shape:Size#

Returns the shape of a single sample (without batching).

expand(batch_shape,_instance=None)[source]#

Returns a new distribution instance (or populates an existing instanceprovided by a derived class) with batch dimensions expanded tobatch_shape. This method callsexpand onthe distribution’s parameters. As such, this does not allocate newmemory for the expanded distribution instance. Additionally,this does not repeat any args checking or parameter broadcasting in__init__.py, when an instance is first created.

Parameters:
  • batch_shape (torch.Size) – the desired expanded size.

  • _instance – new instance provided by subclasses thatneed to override.expand.

Returns:

New distribution instance with batch dimensions expanded tobatch_size.

icdf(value)[source]#

Returns the inverse cumulative density/mass function evaluated atvalue.

Parameters:

value (Tensor) –

Return type:

Tensor

log_prob(value)[source]#

Returns the log of the probability density/mass function evaluated atvalue.

Parameters:

value (Tensor) –

Return type:

Tensor

propertymean:Tensor#

Returns the mean of the distribution.

propertymode:Tensor#

Returns the mode of the distribution.

perplexity()[source]#

Returns perplexity of distribution, batched over batch_shape.

Returns:

Tensor of shape batch_shape.

Return type:

Tensor

rsample(sample_shape=())[source]#

Generates a sample_shape shaped reparameterized sample or sample_shapeshaped batch of reparameterized samples if the distribution parametersare batched.

Return type:

Tensor

sample(sample_shape=())[source]#

Generates a sample_shape shaped sample or sample_shape shaped batch ofsamples if the distribution parameters are batched.

Return type:

Tensor

sample_n(n)[source]#

Generates n samples or n batches of samples if the distributionparameters are batched.

Return type:

Tensor

staticset_default_validate_args(value)[source]#

Sets whether validation is enabled or disabled.

The default behavior mimics Python’sassert statement: validationis on by default, but is disabled if Python is run in optimized mode(viapython-O). Validation may be expensive, so you may want todisable it once a model is working.

Parameters:

value (bool) – Whether to enable validation.

propertystddev:Tensor#

Returns the standard deviation of the distribution.

propertysupport:Constraint|None#

Returns aConstraint objectrepresenting this distribution’s support.

propertyvariance:Tensor#

Returns the variance of the distribution.

ExponentialFamily#

classtorch.distributions.exp_family.ExponentialFamily(batch_shape=(),event_shape=(),validate_args=None)[source]#

Bases:Distribution

ExponentialFamily is the abstract base class for probability distributions belonging to anexponential family, whose probability mass/density function has the form is defined below

pF(x;θ)=exp(t(x),θF(θ)+k(x))p_{F}(x; \theta) = \exp(\langle t(x), \theta\rangle - F(\theta) + k(x))

whereθ\theta denotes the natural parameters,t(x)t(x) denotes the sufficient statistic,F(θ)F(\theta) is the log normalizer function for a given family andk(x)k(x) is the carriermeasure.

Note

This class is an intermediary between theDistribution class and distributions which belongto an exponential family mainly to check the correctness of the.entropy() and analytic KLdivergence methods. We use this class to compute the entropy and KL divergence using the ADframework and Bregman divergences (courtesy of: Frank Nielsen and Richard Nock, Entropies andCross-entropies of Exponential Families).

entropy()[source]#

Method to compute the entropy using Bregman divergence of the log normalizer.

Bernoulli#

classtorch.distributions.bernoulli.Bernoulli(probs=None,logits=None,validate_args=None)[source]#

Bases:ExponentialFamily

Creates a Bernoulli distribution parameterized byprobsorlogits (but not both).

Samples are binary (0 or 1). They take the value1 with probabilitypand0 with probability1 - p.

Example:

>>>m=Bernoulli(torch.tensor([0.3]))>>>m.sample()# 30% chance 1; 70% chance 0tensor([ 0.])
Parameters:
  • probs (Number,Tensor) – the probability of sampling1

  • logits (Number,Tensor) – the log-odds of sampling1

  • validate_args (bool,optional) – whether to validate arguments, None by default

arg_constraints={'logits':Real(),'probs':Interval(lower_bound=0.0,upper_bound=1.0)}#
entropy()[source]#
enumerate_support(expand=True)[source]#
expand(batch_shape,_instance=None)[source]#
has_enumerate_support=True#
log_prob(value)[source]#
propertylogits:Tensor#
propertymean:Tensor#
propertymode:Tensor#
propertyparam_shape:Size#
propertyprobs:Tensor#
sample(sample_shape=())[source]#
support=Boolean()#
propertyvariance:Tensor#

Beta#

classtorch.distributions.beta.Beta(concentration1,concentration0,validate_args=None)[source]#

Bases:ExponentialFamily

Beta distribution parameterized byconcentration1 andconcentration0.

Example:

>>>m=Beta(torch.tensor([0.5]),torch.tensor([0.5]))>>>m.sample()# Beta distributed with concentration concentration1 and concentration0tensor([ 0.1046])
Parameters:
  • concentration1 (float orTensor) – 1st concentration parameter of the distribution(often referred to as alpha)

  • concentration0 (float orTensor) – 2nd concentration parameter of the distribution(often referred to as beta)

arg_constraints={'concentration0':GreaterThan(lower_bound=0.0),'concentration1':GreaterThan(lower_bound=0.0)}#
propertyconcentration0:Tensor#
propertyconcentration1:Tensor#
entropy()[source]#
expand(batch_shape,_instance=None)[source]#
has_rsample=True#
log_prob(value)[source]#
propertymean:Tensor#
propertymode:Tensor#
rsample(sample_shape=())[source]#
Return type:

Tensor

support=Interval(lower_bound=0.0,upper_bound=1.0)#
propertyvariance:Tensor#

Binomial#

classtorch.distributions.binomial.Binomial(total_count=1,probs=None,logits=None,validate_args=None)[source]#

Bases:Distribution

Creates a Binomial distribution parameterized bytotal_count andeitherprobs orlogits (but not both).total_count must bebroadcastable withprobs/logits.

Example:

>>>m=Binomial(100,torch.tensor([0,.2,.8,1]))>>>x=m.sample()tensor([   0.,   22.,   71.,  100.])>>>m=Binomial(torch.tensor([[5.],[10.]]),torch.tensor([0.5,0.8]))>>>x=m.sample()tensor([[ 4.,  5.],        [ 7.,  6.]])
Parameters:
  • total_count (int orTensor) – number of Bernoulli trials

  • probs (Tensor) – Event probabilities

  • logits (Tensor) – Event log-odds

arg_constraints={'logits':Real(),'probs':Interval(lower_bound=0.0,upper_bound=1.0),'total_count':IntegerGreaterThan(lower_bound=0)}#
entropy()[source]#
enumerate_support(expand=True)[source]#
expand(batch_shape,_instance=None)[source]#
has_enumerate_support=True#
log_prob(value)[source]#
propertylogits:Tensor#
propertymean:Tensor#
propertymode:Tensor#
propertyparam_shape:Size#
propertyprobs:Tensor#
sample(sample_shape=())[source]#
propertysupport#
Return type:

_DependentProperty

propertyvariance:Tensor#

Categorical#

classtorch.distributions.categorical.Categorical(probs=None,logits=None,validate_args=None)[source]#

Bases:Distribution

Creates a categorical distribution parameterized by eitherprobs orlogits (but not both).

Note

It is equivalent to the distribution thattorch.multinomial()samples from.

Samples are integers from{0,,K1}\{0, \ldots, K-1\} whereK isprobs.size(-1).

Ifprobs is 1-dimensional with length-K, each element is the relative probabilityof sampling the class at that index.

Ifprobs is N-dimensional, the first N-1 dimensions are treated as a batch ofrelative probability vectors.

Note

Theprobs argument must be non-negative, finite and have a non-zero sum,and it will be normalized to sum to 1 along the last dimension.probswill return this normalized value.Thelogits argument will be interpreted as unnormalized log probabilitiesand can therefore be any real number. It will likewise be normalized so thatthe resulting probabilities sum to 1 along the last dimension.logitswill return this normalized value.

See also:torch.multinomial()

Example:

>>>m=Categorical(torch.tensor([0.25,0.25,0.25,0.25]))>>>m.sample()# equal probability of 0, 1, 2, 3tensor(3)
Parameters:
  • probs (Tensor) – event probabilities

  • logits (Tensor) – event log probabilities (unnormalized)

arg_constraints={'logits':IndependentConstraint(Real(),1),'probs':Simplex()}#
entropy()[source]#
enumerate_support(expand=True)[source]#
expand(batch_shape,_instance=None)[source]#
has_enumerate_support=True#
log_prob(value)[source]#
propertylogits:Tensor#
propertymean:Tensor#
propertymode:Tensor#
propertyparam_shape:Size#
propertyprobs:Tensor#
sample(sample_shape=())[source]#
propertysupport#
Return type:

_DependentProperty

propertyvariance:Tensor#

Cauchy#

classtorch.distributions.cauchy.Cauchy(loc,scale,validate_args=None)[source]#

Bases:Distribution

Samples from a Cauchy (Lorentz) distribution. The distribution of the ratio ofindependent normally distributed random variables with means0 follows aCauchy distribution.

Example:

>>>m=Cauchy(torch.tensor([0.0]),torch.tensor([1.0]))>>>m.sample()# sample from a Cauchy distribution with loc=0 and scale=1tensor([ 2.3214])
Parameters:
  • loc (float orTensor) – mode or median of the distribution.

  • scale (float orTensor) – half width at half maximum.

arg_constraints={'loc':Real(),'scale':GreaterThan(lower_bound=0.0)}#
cdf(value)[source]#
entropy()[source]#
expand(batch_shape,_instance=None)[source]#
has_rsample=True#
icdf(value)[source]#
log_prob(value)[source]#
propertymean:Tensor#
propertymode:Tensor#
rsample(sample_shape=())[source]#
Return type:

Tensor

support=Real()#
propertyvariance:Tensor#

Chi2#

classtorch.distributions.chi2.Chi2(df,validate_args=None)[source]#

Bases:Gamma

Creates a Chi-squared distribution parameterized by shape parameterdf.This is exactly equivalent toGamma(alpha=0.5*df,beta=0.5)

Example:

>>>m=Chi2(torch.tensor([1.0]))>>>m.sample()# Chi2 distributed with shape df=1tensor([ 0.1046])
Parameters:

df (float orTensor) – shape parameter of the distribution

arg_constraints={'df':GreaterThan(lower_bound=0.0)}#
propertydf:Tensor#
expand(batch_shape,_instance=None)[source]#

ContinuousBernoulli#

classtorch.distributions.continuous_bernoulli.ContinuousBernoulli(probs=None,logits=None,lims=(0.499,0.501),validate_args=None)[source]#

Bases:ExponentialFamily

Creates a continuous Bernoulli distribution parameterized byprobsorlogits (but not both).

The distribution is supported in [0, 1] and parameterized by ‘probs’ (in(0,1)) or ‘logits’ (real-valued). Note that, unlike the Bernoulli, ‘probs’does not correspond to a probability and ‘logits’ does not correspond tolog-odds, but the same names are used due to the similarity with theBernoulli. See [1] for more details.

Example:

>>>m=ContinuousBernoulli(torch.tensor([0.3]))>>>m.sample()tensor([ 0.2538])
Parameters:
  • probs (Number,Tensor) – (0,1) valued parameters

  • logits (Number,Tensor) – real valued parameters whose sigmoid matches ‘probs’

[1] The continuous Bernoulli: fixing a pervasive error in variationalautoencoders, Loaiza-Ganem G and Cunningham JP, NeurIPS 2019.https://arxiv.org/abs/1907.06845

arg_constraints={'logits':Real(),'probs':Interval(lower_bound=0.0,upper_bound=1.0)}#
cdf(value)[source]#
entropy()[source]#
expand(batch_shape,_instance=None)[source]#
has_rsample=True#
icdf(value)[source]#
log_prob(value)[source]#
propertylogits:Tensor#
propertymean:Tensor#
propertyparam_shape:Size#
propertyprobs:Tensor#
rsample(sample_shape=())[source]#
Return type:

Tensor

sample(sample_shape=())[source]#
propertystddev:Tensor#
support=Interval(lower_bound=0.0,upper_bound=1.0)#
propertyvariance:Tensor#

Dirichlet#

classtorch.distributions.dirichlet.Dirichlet(concentration,validate_args=None)[source]#

Bases:ExponentialFamily

Creates a Dirichlet distribution parameterized by concentrationconcentration.

Example:

>>>m=Dirichlet(torch.tensor([0.5,0.5]))>>>m.sample()# Dirichlet distributed with concentration [0.5, 0.5]tensor([ 0.1046,  0.8954])
Parameters:

concentration (Tensor) – concentration parameter of the distribution(often referred to as alpha)

arg_constraints={'concentration':IndependentConstraint(GreaterThan(lower_bound=0.0),1)}#
entropy()[source]#
expand(batch_shape,_instance=None)[source]#
has_rsample=True#
log_prob(value)[source]#
propertymean:Tensor#
propertymode:Tensor#
rsample(sample_shape=())[source]#
Return type:

Tensor

support=Simplex()#
propertyvariance:Tensor#

Exponential#

classtorch.distributions.exponential.Exponential(rate,validate_args=None)[source]#

Bases:ExponentialFamily

Creates a Exponential distribution parameterized byrate.

Example:

>>>m=Exponential(torch.tensor([1.0]))>>>m.sample()# Exponential distributed with rate=1tensor([ 0.1046])
Parameters:

rate (float orTensor) – rate = 1 / scale of the distribution

arg_constraints={'rate':GreaterThan(lower_bound=0.0)}#
cdf(value)[source]#
entropy()[source]#
expand(batch_shape,_instance=None)[source]#
has_rsample=True#
icdf(value)[source]#
log_prob(value)[source]#
propertymean:Tensor#
propertymode:Tensor#
rsample(sample_shape=())[source]#
Return type:

Tensor

propertystddev:Tensor#
support=GreaterThanEq(lower_bound=0.0)#
propertyvariance:Tensor#

FisherSnedecor#

classtorch.distributions.fishersnedecor.FisherSnedecor(df1,df2,validate_args=None)[source]#

Bases:Distribution

Creates a Fisher-Snedecor distribution parameterized bydf1 anddf2.

Example:

>>>m=FisherSnedecor(torch.tensor([1.0]),torch.tensor([2.0]))>>>m.sample()# Fisher-Snedecor-distributed with df1=1 and df2=2tensor([ 0.2453])
Parameters:
  • df1 (float orTensor) – degrees of freedom parameter 1

  • df2 (float orTensor) – degrees of freedom parameter 2

arg_constraints={'df1':GreaterThan(lower_bound=0.0),'df2':GreaterThan(lower_bound=0.0)}#
expand(batch_shape,_instance=None)[source]#
has_rsample=True#
log_prob(value)[source]#
propertymean:Tensor#
propertymode:Tensor#
rsample(sample_shape=())[source]#
Return type:

Tensor

support=GreaterThan(lower_bound=0.0)#
propertyvariance:Tensor#

Gamma#

classtorch.distributions.gamma.Gamma(concentration,rate,validate_args=None)[source]#

Bases:ExponentialFamily

Creates a Gamma distribution parameterized by shapeconcentration andrate.

Example:

>>>m=Gamma(torch.tensor([1.0]),torch.tensor([1.0]))>>>m.sample()# Gamma distributed with concentration=1 and rate=1tensor([ 0.1046])
Parameters:
  • concentration (float orTensor) – shape parameter of the distribution(often referred to as alpha)

  • rate (float orTensor) – rate parameter of the distribution(often referred to as beta), rate = 1 / scale

arg_constraints={'concentration':GreaterThan(lower_bound=0.0),'rate':GreaterThan(lower_bound=0.0)}#
cdf(value)[source]#
entropy()[source]#
expand(batch_shape,_instance=None)[source]#
has_rsample=True#
log_prob(value)[source]#
propertymean:Tensor#
propertymode:Tensor#
rsample(sample_shape=())[source]#
Return type:

Tensor

support=GreaterThanEq(lower_bound=0.0)#
propertyvariance:Tensor#

GeneralizedPareto#

classtorch.distributions.generalized_pareto.GeneralizedPareto(loc,scale,concentration,validate_args=None)[source]#

Bases:Distribution

Creates a Generalized Pareto distribution parameterized byloc,scale, andconcentration.

The Generalized Pareto distribution is a family of continuous probability distributions on the real line.Special cases include Exponential (whenloc = 0,concentration = 0), Pareto (whenconcentration > 0,loc =scale /concentration), and Uniform (whenconcentration = -1).

This distribution is often used to model the tails of other distributions. This implementation is based on theimplementation in TensorFlow Probability.

Example:

>>>m=GeneralizedPareto(torch.tensor([0.1]),torch.tensor([2.0]),torch.tensor([0.4]))>>>m.sample()# sample from a Generalized Pareto distribution with loc=0.1, scale=2.0, and concentration=0.4tensor([ 1.5623])
Parameters:
  • loc (float orTensor) – Location parameter of the distribution

  • scale (float orTensor) – Scale parameter of the distribution

  • concentration (float orTensor) – Concentration parameter of the distribution

arg_constraints={'concentration':Real(),'loc':Real(),'scale':GreaterThan(lower_bound=0.0)}#
cdf(value)[source]#
entropy()[source]#
expand(batch_shape,_instance=None)[source]#
has_rsample=True#
icdf(value)[source]#
log_cdf(value)[source]#
log_prob(value)[source]#
log_survival_function(value)[source]#
propertymean#
propertymode#
rsample(sample_shape=())[source]#
propertysupport#
Return type:

_DependentProperty

propertyvariance#

Geometric#

classtorch.distributions.geometric.Geometric(probs=None,logits=None,validate_args=None)[source]#

Bases:Distribution

Creates a Geometric distribution parameterized byprobs,whereprobs is the probability of success of Bernoulli trials.

P(X=k)=(1p)kp,k=0,1,...P(X=k) = (1-p)^{k} p, k = 0, 1, ...

Note

torch.distributions.geometric.Geometric()(k+1)(k+1)-th trial is the first successhence draws samples in{0,1,}\{0, 1, \ldots\}, whereastorch.Tensor.geometric_()k-th trial is the first success hence draws samples in{1,2,}\{1, 2, \ldots\}.

Example:

>>>m=Geometric(torch.tensor([0.3]))>>>m.sample()# underlying Bernoulli has 30% chance 1; 70% chance 0tensor([ 2.])
Parameters:
  • probs (Number,Tensor) – the probability of sampling1. Must be in range (0, 1]

  • logits (Number,Tensor) – the log-odds of sampling1.

arg_constraints={'logits':Real(),'probs':Interval(lower_bound=0.0,upper_bound=1.0)}#
entropy()[source]#
expand(batch_shape,_instance=None)[source]#
log_prob(value)[source]#
propertylogits:Tensor#
propertymean:Tensor#
propertymode:Tensor#
propertyprobs:Tensor#
sample(sample_shape=())[source]#
support=IntegerGreaterThan(lower_bound=0)#
propertyvariance:Tensor#

Gumbel#

classtorch.distributions.gumbel.Gumbel(loc,scale,validate_args=None)[source]#

Bases:TransformedDistribution

Samples from a Gumbel Distribution.

Examples:

>>>m=Gumbel(torch.tensor([1.0]),torch.tensor([2.0]))>>>m.sample()# sample from Gumbel distribution with loc=1, scale=2tensor([ 1.0124])
Parameters:
  • loc (float orTensor) – Location parameter of the distribution

  • scale (float orTensor) – Scale parameter of the distribution

arg_constraints:dict[str,Constraint]={'loc':Real(),'scale':GreaterThan(lower_bound=0.0)}#
entropy()[source]#
expand(batch_shape,_instance=None)[source]#
log_prob(value)[source]#
propertymean:Tensor#
propertymode:Tensor#
propertystddev:Tensor#
support=Real()#
propertyvariance:Tensor#

HalfCauchy#

classtorch.distributions.half_cauchy.HalfCauchy(scale,validate_args=None)[source]#

Bases:TransformedDistribution

Creates a half-Cauchy distribution parameterized byscale where:

X~Cauchy(0,scale)Y=|X|~HalfCauchy(scale)

Example:

>>>m=HalfCauchy(torch.tensor([1.0]))>>>m.sample()# half-cauchy distributed with scale=1tensor([ 2.3214])
Parameters:

scale (float orTensor) – scale of the full Cauchy distribution

arg_constraints:dict[str,Constraint]={'scale':GreaterThan(lower_bound=0.0)}#
base_dist:Cauchy#
cdf(value)[source]#
entropy()[source]#
expand(batch_shape,_instance=None)[source]#
has_rsample=True#
icdf(prob)[source]#
log_prob(value)[source]#
propertymean:Tensor#
propertymode:Tensor#
propertyscale:Tensor#
support=GreaterThanEq(lower_bound=0.0)#
propertyvariance:Tensor#

HalfNormal#

classtorch.distributions.half_normal.HalfNormal(scale,validate_args=None)[source]#

Bases:TransformedDistribution

Creates a half-normal distribution parameterized byscale where:

X~Normal(0,scale)Y=|X|~HalfNormal(scale)

Example:

>>>m=HalfNormal(torch.tensor([1.0]))>>>m.sample()# half-normal distributed with scale=1tensor([ 0.1046])
Parameters:

scale (float orTensor) – scale of the full Normal distribution

arg_constraints:dict[str,Constraint]={'scale':GreaterThan(lower_bound=0.0)}#
base_dist:Normal#
cdf(value)[source]#
entropy()[source]#
expand(batch_shape,_instance=None)[source]#
has_rsample=True#
icdf(prob)[source]#
log_prob(value)[source]#
propertymean:Tensor#
propertymode:Tensor#
propertyscale:Tensor#
support=GreaterThanEq(lower_bound=0.0)#
propertyvariance:Tensor#

Independent#

classtorch.distributions.independent.Independent(base_distribution,reinterpreted_batch_ndims,validate_args=None)[source]#

Bases:Distribution,Generic[D]

Reinterprets some of the batch dims of a distribution as event dims.

This is mainly useful for changing the shape of the result oflog_prob(). For example to create a diagonal Normal distribution withthe same shape as a Multivariate Normal distribution (so they areinterchangeable), you can:

>>>fromtorch.distributions.multivariate_normalimportMultivariateNormal>>>fromtorch.distributions.normalimportNormal>>>loc=torch.zeros(3)>>>scale=torch.ones(3)>>>mvn=MultivariateNormal(loc,scale_tril=torch.diag(scale))>>>[mvn.batch_shape,mvn.event_shape][torch.Size([]), torch.Size([3])]>>>normal=Normal(loc,scale)>>>[normal.batch_shape,normal.event_shape][torch.Size([3]), torch.Size([])]>>>diagn=Independent(normal,1)>>>[diagn.batch_shape,diagn.event_shape][torch.Size([]), torch.Size([3])]
Parameters:
arg_constraints:dict[str,Constraint]={}#
base_dist:D#
entropy()[source]#
enumerate_support(expand=True)[source]#
expand(batch_shape,_instance=None)[source]#
propertyhas_enumerate_support:bool#
propertyhas_rsample:bool#
log_prob(value)[source]#
propertymean:Tensor#
propertymode:Tensor#
rsample(sample_shape=())[source]#
Return type:

Tensor

sample(sample_shape=())[source]#
Return type:

Tensor

propertysupport#
Return type:

_DependentProperty

propertyvariance:Tensor#

InverseGamma#

classtorch.distributions.inverse_gamma.InverseGamma(concentration,rate,validate_args=None)[source]#

Bases:TransformedDistribution

Creates an inverse gamma distribution parameterized byconcentration andratewhere:

X~Gamma(concentration,rate)Y=1/X~InverseGamma(concentration,rate)

Example:

>>>m=InverseGamma(torch.tensor([2.0]),torch.tensor([3.0]))>>>m.sample()tensor([ 1.2953])
Parameters:
  • concentration (float orTensor) – shape parameter of the distribution(often referred to as alpha)

  • rate (float orTensor) – rate = 1 / scale of the distribution(often referred to as beta)

arg_constraints:dict[str,Constraint]={'concentration':GreaterThan(lower_bound=0.0),'rate':GreaterThan(lower_bound=0.0)}#
base_dist:Gamma#
propertyconcentration:Tensor#
entropy()[source]#
expand(batch_shape,_instance=None)[source]#
has_rsample=True#
propertymean:Tensor#
propertymode:Tensor#
propertyrate:Tensor#
support=GreaterThan(lower_bound=0.0)#
propertyvariance:Tensor#

Kumaraswamy#

classtorch.distributions.kumaraswamy.Kumaraswamy(concentration1,concentration0,validate_args=None)[source]#

Bases:TransformedDistribution

Samples from a Kumaraswamy distribution.

Example:

>>>m=Kumaraswamy(torch.tensor([1.0]),torch.tensor([1.0]))>>>m.sample()# sample from a Kumaraswamy distribution with concentration alpha=1 and beta=1tensor([ 0.1729])
Parameters:
  • concentration1 (float orTensor) – 1st concentration parameter of the distribution(often referred to as alpha)

  • concentration0 (float orTensor) – 2nd concentration parameter of the distribution(often referred to as beta)

arg_constraints:dict[str,Constraint]={'concentration0':GreaterThan(lower_bound=0.0),'concentration1':GreaterThan(lower_bound=0.0)}#
entropy()[source]#
expand(batch_shape,_instance=None)[source]#
has_rsample=True#
propertymean:Tensor#
propertymode:Tensor#
support=Interval(lower_bound=0.0,upper_bound=1.0)#
propertyvariance:Tensor#

LKJCholesky#

classtorch.distributions.lkj_cholesky.LKJCholesky(dim,concentration=1.0,validate_args=None)[source]#

Bases:Distribution

LKJ distribution for lower Cholesky factor of correlation matrices.The distribution is controlled byconcentration parameterη\etato make the probability of the correlation matrixMM generated froma Cholesky factor proportional todet(M)η1\det(M)^{\eta - 1}. Because of that,whenconcentration==1, we have a uniform distribution over Choleskyfactors of correlation matrices:

L~LKJCholesky(dim,concentration)X=L@L' ~ LKJCorr(dim, concentration)

Note that this distribution samples theCholesky factor of correlation matrices and not the correlation matricesthemselves and thereby differs slightly from the derivations in [1] fortheLKJCorr distribution. For sampling, this uses the Onion method from[1] Section 3.

Example:

>>>l=LKJCholesky(3,0.5)>>>l.sample()# l @ l.T is a sample of a correlation 3x3 matrixtensor([[ 1.0000,  0.0000,  0.0000],        [ 0.3516,  0.9361,  0.0000],        [-0.1899,  0.4748,  0.8593]])
Parameters:
  • dimension (dim) – dimension of the matrices

  • concentration (float orTensor) – concentration/shape parameter of thedistribution (often referred to as eta)

References

[1]Generating random correlation matrices based on vines and extended onion method (2009),Daniel Lewandowski, Dorota Kurowicka, Harry Joe.Journal of Multivariate Analysis. 100. 10.1016/j.jmva.2009.04.008

arg_constraints={'concentration':GreaterThan(lower_bound=0.0)}#
expand(batch_shape,_instance=None)[source]#
log_prob(value)[source]#
sample(sample_shape=())[source]#
support=CorrCholesky()#

Laplace#

classtorch.distributions.laplace.Laplace(loc,scale,validate_args=None)[source]#

Bases:Distribution

Creates a Laplace distribution parameterized byloc andscale.

Example:

>>>m=Laplace(torch.tensor([0.0]),torch.tensor([1.0]))>>>m.sample()# Laplace distributed with loc=0, scale=1tensor([ 0.1046])
Parameters:
arg_constraints={'loc':Real(),'scale':GreaterThan(lower_bound=0.0)}#
cdf(value)[source]#
entropy()[source]#
expand(batch_shape,_instance=None)[source]#
has_rsample=True#
icdf(value)[source]#
log_prob(value)[source]#
propertymean:Tensor#
propertymode:Tensor#
rsample(sample_shape=())[source]#
Return type:

Tensor

propertystddev:Tensor#
support=Real()#
propertyvariance:Tensor#

LogNormal#

classtorch.distributions.log_normal.LogNormal(loc,scale,validate_args=None)[source]#

Bases:TransformedDistribution

Creates a log-normal distribution parameterized byloc andscale where:

X~Normal(loc,scale)Y=exp(X)~LogNormal(loc,scale)

Example:

>>>m=LogNormal(torch.tensor([0.0]),torch.tensor([1.0]))>>>m.sample()# log-normal distributed with mean=0 and stddev=1tensor([ 0.1046])
Parameters:
  • loc (float orTensor) – mean of log of distribution

  • scale (float orTensor) – standard deviation of log of the distribution

arg_constraints:dict[str,Constraint]={'loc':Real(),'scale':GreaterThan(lower_bound=0.0)}#
base_dist:Normal#
entropy()[source]#
expand(batch_shape,_instance=None)[source]#
has_rsample=True#
propertyloc:Tensor#
propertymean:Tensor#
propertymode:Tensor#
propertyscale:Tensor#
support=GreaterThan(lower_bound=0.0)#
propertyvariance:Tensor#

LowRankMultivariateNormal#

classtorch.distributions.lowrank_multivariate_normal.LowRankMultivariateNormal(loc,cov_factor,cov_diag,validate_args=None)[source]#

Bases:Distribution

Creates a multivariate normal distribution with covariance matrix having a low-rank formparameterized bycov_factor andcov_diag:

covariance_matrix=cov_factor@cov_factor.T+cov_diag

Example

>>>m=LowRankMultivariateNormal(...torch.zeros(2),torch.tensor([[1.0],[0.0]]),torch.ones(2)...)>>>m.sample()# normally distributed with mean=`[0,0]`, cov_factor=`[[1],[0]]`, cov_diag=`[1,1]`tensor([-0.2102, -0.5429])
Parameters:
  • loc (Tensor) – mean of the distribution with shapebatch_shape + event_shape

  • cov_factor (Tensor) – factor part of low-rank form of covariance matrix with shapebatch_shape + event_shape + (rank,)

  • cov_diag (Tensor) – diagonal part of low-rank form of covariance matrix with shapebatch_shape + event_shape

Note

The computation for determinant and inverse of covariance matrix is avoided whencov_factor.shape[1] << cov_factor.shape[0] thanks toWoodbury matrix identity andmatrix determinant lemma.Thanks to these formulas, we just need to compute the determinant and inverse ofthe small size “capacitance” matrix:

capacitance=I+cov_factor.T@inv(cov_diag)@cov_factor
arg_constraints={'cov_diag':IndependentConstraint(GreaterThan(lower_bound=0.0),1),'cov_factor':IndependentConstraint(Real(),2),'loc':IndependentConstraint(Real(),1)}#
propertycovariance_matrix:Tensor#
entropy()[source]#
expand(batch_shape,_instance=None)[source]#
has_rsample=True#
log_prob(value)[source]#
propertymean:Tensor#
propertymode:Tensor#
propertyprecision_matrix:Tensor#
rsample(sample_shape=())[source]#
Return type:

Tensor

propertyscale_tril:Tensor#
support=IndependentConstraint(Real(),1)#
propertyvariance:Tensor#

MixtureSameFamily#

classtorch.distributions.mixture_same_family.MixtureSameFamily(mixture_distribution,component_distribution,validate_args=None)[source]#

Bases:Distribution

TheMixtureSameFamily distribution implements a (batch of) mixturedistribution where all component are from different parameterizations ofthe same distribution type. It is parameterized by aCategorical“selecting distribution” (overk component) and a componentdistribution, i.e., aDistribution with a rightmost batch shape(equal to[k]) which indexes each (batch of) component.

Examples:

>>># Construct Gaussian Mixture Model in 1D consisting of 5 equally>>># weighted normal distributions>>>mix=D.Categorical(torch.ones(5,))>>>comp=D.Normal(torch.randn(5,),torch.rand(5,))>>>gmm=MixtureSameFamily(mix,comp)>>># Construct Gaussian Mixture Model in 2D consisting of 5 equally>>># weighted bivariate normal distributions>>>mix=D.Categorical(torch.ones(5,))>>>comp=D.Independent(D.Normal(...torch.randn(5,2),torch.rand(5,2)),1)>>>gmm=MixtureSameFamily(mix,comp)>>># Construct a batch of 3 Gaussian Mixture Models in 2D each>>># consisting of 5 random weighted bivariate normal distributions>>>mix=D.Categorical(torch.rand(3,5))>>>comp=D.Independent(D.Normal(...torch.randn(3,5,2),torch.rand(3,5,2)),1)>>>gmm=MixtureSameFamily(mix,comp)
Parameters:
  • mixture_distribution (Categorical) –torch.distributions.Categorical-likeinstance. Manages the probability of selecting component.The number of categories must match the rightmost batchdimension of thecomponent_distribution. Must have eitherscalarbatch_shape orbatch_shape matchingcomponent_distribution.batch_shape[:-1]

  • component_distribution (Distribution) –torch.distributions.Distribution-likeinstance. Right-most batch dimension indexes component.

arg_constraints:dict[str,Constraint]={}#
cdf(x)[source]#
propertycomponent_distribution:Distribution#
expand(batch_shape,_instance=None)[source]#
has_rsample=False#
log_prob(x)[source]#
propertymean:Tensor#
propertymixture_distribution:Categorical#
sample(sample_shape=())[source]#
propertysupport#
Return type:

_DependentProperty

propertyvariance:Tensor#

Multinomial#

classtorch.distributions.multinomial.Multinomial(total_count=1,probs=None,logits=None,validate_args=None)[source]#

Bases:Distribution

Creates a Multinomial distribution parameterized bytotal_count andeitherprobs orlogits (but not both). The innermost dimension ofprobs indexes over categories. All other dimensions index over batches.

Note thattotal_count need not be specified if onlylog_prob() iscalled (see example below)

Note

Theprobs argument must be non-negative, finite and have a non-zero sum,and it will be normalized to sum to 1 along the last dimension.probswill return this normalized value.Thelogits argument will be interpreted as unnormalized log probabilitiesand can therefore be any real number. It will likewise be normalized so thatthe resulting probabilities sum to 1 along the last dimension.logitswill return this normalized value.

  • sample() requires a single sharedtotal_count for allparameters and samples.

  • log_prob() allows differenttotal_count for each parameter andsample.

Example:

>>>m=Multinomial(100,torch.tensor([1.,1.,1.,1.]))>>>x=m.sample()# equal probability of 0, 1, 2, 3tensor([ 21.,  24.,  30.,  25.])>>>Multinomial(probs=torch.tensor([1.,1.,1.,1.])).log_prob(x)tensor([-4.1338])
Parameters:
  • total_count (int) – number of trials

  • probs (Tensor) – event probabilities

  • logits (Tensor) – event log probabilities (unnormalized)

arg_constraints={'logits':IndependentConstraint(Real(),1),'probs':Simplex()}#
entropy()[source]#
expand(batch_shape,_instance=None)[source]#
log_prob(value)[source]#
propertylogits:Tensor#
propertymean:Tensor#
propertyparam_shape:Size#
propertyprobs:Tensor#
sample(sample_shape=())[source]#
propertysupport#
Return type:

_DependentProperty

total_count:int#
propertyvariance:Tensor#

MultivariateNormal#

classtorch.distributions.multivariate_normal.MultivariateNormal(loc,covariance_matrix=None,precision_matrix=None,scale_tril=None,validate_args=None)[source]#

Bases:Distribution

Creates a multivariate normal (also called Gaussian) distributionparameterized by a mean vector and a covariance matrix.

The multivariate normal distribution can be parameterized eitherin terms of a positive definite covariance matrixΣ\mathbf{\Sigma}or a positive definite precision matrixΣ1\mathbf{\Sigma}^{-1}or a lower-triangular matrixL\mathbf{L} with positive-valueddiagonal entries, such thatΣ=LL\mathbf{\Sigma} = \mathbf{L}\mathbf{L}^\top. This triangular matrixcan be obtained via e.g. Cholesky decomposition of the covariance.

Example

>>>m=MultivariateNormal(torch.zeros(2),torch.eye(2))>>>m.sample()# normally distributed with mean=`[0,0]` and covariance_matrix=`I`tensor([-0.2102, -0.5429])
Parameters:
  • loc (Tensor) – mean of the distribution

  • covariance_matrix (Tensor) – positive-definite covariance matrix

  • precision_matrix (Tensor) – positive-definite precision matrix

  • scale_tril (Tensor) – lower-triangular factor of covariance, with positive-valued diagonal

Note

Only one ofcovariance_matrix orprecision_matrix orscale_tril can be specified.

Usingscale_tril will be more efficient: all computations internallyare based onscale_tril. Ifcovariance_matrix orprecision_matrix is passed instead, it is only used to computethe corresponding lower triangular matrices using a Cholesky decomposition.

arg_constraints={'covariance_matrix':PositiveDefinite(),'loc':IndependentConstraint(Real(),1),'precision_matrix':PositiveDefinite(),'scale_tril':LowerCholesky()}#
propertycovariance_matrix:Tensor#
entropy()[source]#
expand(batch_shape,_instance=None)[source]#
has_rsample=True#
log_prob(value)[source]#
propertymean:Tensor#
propertymode:Tensor#
propertyprecision_matrix:Tensor#
rsample(sample_shape=())[source]#
Return type:

Tensor

propertyscale_tril:Tensor#
support=IndependentConstraint(Real(),1)#
propertyvariance:Tensor#

NegativeBinomial#

classtorch.distributions.negative_binomial.NegativeBinomial(total_count,probs=None,logits=None,validate_args=None)[source]#

Bases:Distribution

Creates a Negative Binomial distribution, i.e. distributionof the number of successful independent and identical Bernoulli trialsbeforetotal_count failures are achieved. The probabilityof success of each Bernoulli trial isprobs.

Parameters:
  • total_count (float orTensor) – non-negative number of negative Bernoullitrials to stop, although the distribution is still valid for realvalued count

  • probs (Tensor) – Event probabilities of success in the half open interval [0, 1)

  • logits (Tensor) – Event log-odds for probabilities of success

arg_constraints={'logits':Real(),'probs':HalfOpenInterval(lower_bound=0.0,upper_bound=1.0),'total_count':GreaterThanEq(lower_bound=0)}#
expand(batch_shape,_instance=None)[source]#
log_prob(value)[source]#
propertylogits:Tensor#
propertymean:Tensor#
propertymode:Tensor#
propertyparam_shape:Size#
propertyprobs:Tensor#
sample(sample_shape=())[source]#
support=IntegerGreaterThan(lower_bound=0)#
propertyvariance:Tensor#

Normal#

classtorch.distributions.normal.Normal(loc,scale,validate_args=None)[source]#

Bases:ExponentialFamily

Creates a normal (also called Gaussian) distribution parameterized byloc andscale.

Example:

>>>m=Normal(torch.tensor([0.0]),torch.tensor([1.0]))>>>m.sample()# normally distributed with loc=0 and scale=1tensor([ 0.1046])
Parameters:
  • loc (float orTensor) – mean of the distribution (often referred to as mu)

  • scale (float orTensor) – standard deviation of the distribution(often referred to as sigma)

arg_constraints={'loc':Real(),'scale':GreaterThan(lower_bound=0.0)}#
cdf(value)[source]#
entropy()[source]#
expand(batch_shape,_instance=None)[source]#
has_rsample=True#
icdf(value)[source]#
log_prob(value)[source]#
propertymean:Tensor#
propertymode:Tensor#
rsample(sample_shape=())[source]#
Return type:

Tensor

sample(sample_shape=())[source]#
propertystddev:Tensor#
support=Real()#
propertyvariance:Tensor#

OneHotCategorical#

classtorch.distributions.one_hot_categorical.OneHotCategorical(probs=None,logits=None,validate_args=None)[source]#

Bases:Distribution

Creates a one-hot categorical distribution parameterized byprobs orlogits.

Samples are one-hot coded vectors of sizeprobs.size(-1).

Note

Theprobs argument must be non-negative, finite and have a non-zero sum,and it will be normalized to sum to 1 along the last dimension.probswill return this normalized value.Thelogits argument will be interpreted as unnormalized log probabilitiesand can therefore be any real number. It will likewise be normalized so thatthe resulting probabilities sum to 1 along the last dimension.logitswill return this normalized value.

See also:torch.distributions.Categorical() for specifications ofprobs andlogits.

Example:

>>>m=OneHotCategorical(torch.tensor([0.25,0.25,0.25,0.25]))>>>m.sample()# equal probability of 0, 1, 2, 3tensor([ 0.,  0.,  0.,  1.])
Parameters:
  • probs (Tensor) – event probabilities

  • logits (Tensor) – event log probabilities (unnormalized)

arg_constraints={'logits':IndependentConstraint(Real(),1),'probs':Simplex()}#
entropy()[source]#
enumerate_support(expand=True)[source]#
expand(batch_shape,_instance=None)[source]#
has_enumerate_support=True#
log_prob(value)[source]#
propertylogits:Tensor#
propertymean:Tensor#
propertymode:Tensor#
propertyparam_shape:Size#
propertyprobs:Tensor#
sample(sample_shape=())[source]#
support=OneHot()#
propertyvariance:Tensor#

Pareto#

classtorch.distributions.pareto.Pareto(scale,alpha,validate_args=None)[source]#

Bases:TransformedDistribution

Samples from a Pareto Type 1 distribution.

Example:

>>>m=Pareto(torch.tensor([1.0]),torch.tensor([1.0]))>>>m.sample()# sample from a Pareto distribution with scale=1 and alpha=1tensor([ 1.5623])
Parameters:
  • scale (float orTensor) – Scale parameter of the distribution

  • alpha (float orTensor) – Shape parameter of the distribution

arg_constraints:dict[str,Constraint]={'alpha':GreaterThan(lower_bound=0.0),'scale':GreaterThan(lower_bound=0.0)}#
entropy()[source]#
Return type:

Tensor

expand(batch_shape,_instance=None)[source]#
Return type:

Pareto

propertymean:Tensor#
propertymode:Tensor#
propertysupport:Constraint#
Return type:

_DependentProperty

propertyvariance:Tensor#

Poisson#

classtorch.distributions.poisson.Poisson(rate,validate_args=None)[source]#

Bases:ExponentialFamily

Creates a Poisson distribution parameterized byrate, the rate parameter.

Samples are nonnegative integers, with a pmf given by

ratekeratek!\mathrm{rate}^k \frac{e^{-\mathrm{rate}}}{k!}

Example:

>>>m=Poisson(torch.tensor([4]))>>>m.sample()tensor([ 3.])
Parameters:

rate (Number,Tensor) – the rate parameter

arg_constraints={'rate':GreaterThanEq(lower_bound=0.0)}#
expand(batch_shape,_instance=None)[source]#
log_prob(value)[source]#
propertymean:Tensor#
propertymode:Tensor#
sample(sample_shape=())[source]#
support=IntegerGreaterThan(lower_bound=0)#
propertyvariance:Tensor#

RelaxedBernoulli#

classtorch.distributions.relaxed_bernoulli.RelaxedBernoulli(temperature,probs=None,logits=None,validate_args=None)[source]#

Bases:TransformedDistribution

Creates a RelaxedBernoulli distribution, parametrized bytemperature, and eitherprobs orlogits(but not both). This is a relaxed version of theBernoulli distribution,so the values are in (0, 1), and has reparametrizable samples.

Example:

>>>m=RelaxedBernoulli(torch.tensor([2.2]),...torch.tensor([0.1,0.2,0.3,0.99]))>>>m.sample()tensor([ 0.2951,  0.3442,  0.8918,  0.9021])
Parameters:
  • temperature (Tensor) – relaxation temperature

  • probs (Number,Tensor) – the probability of sampling1

  • logits (Number,Tensor) – the log-odds of sampling1

arg_constraints:dict[str,Constraint]={'logits':Real(),'probs':Interval(lower_bound=0.0,upper_bound=1.0)}#
base_dist:LogitRelaxedBernoulli#
expand(batch_shape,_instance=None)[source]#
has_rsample=True#
propertylogits:Tensor#
propertyprobs:Tensor#
support=Interval(lower_bound=0.0,upper_bound=1.0)#
propertytemperature:Tensor#

LogitRelaxedBernoulli#

classtorch.distributions.relaxed_bernoulli.LogitRelaxedBernoulli(temperature,probs=None,logits=None,validate_args=None)[source]#

Bases:Distribution

Creates a LogitRelaxedBernoulli distribution parameterized byprobsorlogits (but not both), which is the logit of a RelaxedBernoullidistribution.

Samples are logits of values in (0, 1). See [1] for more details.

Parameters:
  • temperature (Tensor) – relaxation temperature

  • probs (Number,Tensor) – the probability of sampling1

  • logits (Number,Tensor) – the log-odds of sampling1

[1] The Concrete Distribution: A Continuous Relaxation of Discrete RandomVariables (Maddison et al., 2017)

[2] Categorical Reparametrization with Gumbel-Softmax(Jang et al., 2017)

arg_constraints={'logits':Real(),'probs':Interval(lower_bound=0.0,upper_bound=1.0)}#
expand(batch_shape,_instance=None)[source]#
log_prob(value)[source]#
propertylogits:Tensor#
propertyparam_shape:Size#
propertyprobs:Tensor#
rsample(sample_shape=())[source]#
Return type:

Tensor

support=Real()#

RelaxedOneHotCategorical#

classtorch.distributions.relaxed_categorical.RelaxedOneHotCategorical(temperature,probs=None,logits=None,validate_args=None)[source]#

Bases:TransformedDistribution

Creates a RelaxedOneHotCategorical distribution parametrized bytemperature, and eitherprobs orlogits.This is a relaxed version of theOneHotCategorical distribution, soits samples are on simplex, and are reparametrizable.

Example:

>>>m=RelaxedOneHotCategorical(torch.tensor([2.2]),...torch.tensor([0.1,0.2,0.3,0.4]))>>>m.sample()tensor([ 0.1294,  0.2324,  0.3859,  0.2523])
Parameters:
  • temperature (Tensor) – relaxation temperature

  • probs (Tensor) – event probabilities

  • logits (Tensor) – unnormalized log probability for each event

arg_constraints:dict[str,Constraint]={'logits':IndependentConstraint(Real(),1),'probs':Simplex()}#
base_dist:ExpRelaxedCategorical#
expand(batch_shape,_instance=None)[source]#
has_rsample=True#
propertylogits:Tensor#
propertyprobs:Tensor#
support=Simplex()#
propertytemperature:Tensor#

StudentT#

classtorch.distributions.studentT.StudentT(df,loc=0.0,scale=1.0,validate_args=None)[source]#

Bases:Distribution

Creates a Student’s t-distribution parameterized by degree offreedomdf, meanloc and scalescale.

Example:

>>>m=StudentT(torch.tensor([2.0]))>>>m.sample()# Student's t-distributed with degrees of freedom=2tensor([ 0.1046])
Parameters:
arg_constraints={'df':GreaterThan(lower_bound=0.0),'loc':Real(),'scale':GreaterThan(lower_bound=0.0)}#
entropy()[source]#
expand(batch_shape,_instance=None)[source]#
has_rsample=True#
log_prob(value)[source]#
propertymean:Tensor#
propertymode:Tensor#
rsample(sample_shape=())[source]#
Return type:

Tensor

support=Real()#
propertyvariance:Tensor#

TransformedDistribution#

classtorch.distributions.transformed_distribution.TransformedDistribution(base_distribution,transforms,validate_args=None)[source]#

Bases:Distribution

Extension of the Distribution class, which applies a sequence of Transformsto a base distribution. Let f be the composition of transforms applied:

X~BaseDistributionY=f(X)~TransformedDistribution(BaseDistribution,f)logp(Y)=logp(X)+log|det(dX/dY)|

Note that the.event_shape of aTransformedDistribution is themaximum shape of its base distribution and its transforms, since transformscan introduce correlations among events.

An example for the usage ofTransformedDistribution would be:

# Building a Logistic Distribution# X ~ Uniform(0, 1)# f = a + b * logit(X)# Y ~ f(X) ~ Logistic(a, b)base_distribution=Uniform(0,1)transforms=[SigmoidTransform().inv,AffineTransform(loc=a,scale=b)]logistic=TransformedDistribution(base_distribution,transforms)

For more examples, please look at the implementations ofGumbel,HalfCauchy,HalfNormal,LogNormal,Pareto,Weibull,RelaxedBernoulli andRelaxedOneHotCategorical

arg_constraints:dict[str,Constraint]={}#
cdf(value)[source]#

Computes the cumulative distribution function by inverting thetransform(s) and computing the score of the base distribution.

expand(batch_shape,_instance=None)[source]#
propertyhas_rsample:bool#
icdf(value)[source]#

Computes the inverse cumulative distribution function usingtransform(s) and computing the score of the base distribution.

log_prob(value)[source]#

Scores the sample by inverting the transform(s) and computing the scoreusing the score of the base distribution and the log abs det jacobian.

rsample(sample_shape=())[source]#

Generates a sample_shape shaped reparameterized sample or sample_shapeshaped batch of reparameterized samples if the distribution parametersare batched. Samples first from base distribution and appliestransform() for every transform in the list.

Return type:

Tensor

sample(sample_shape=())[source]#

Generates a sample_shape shaped sample or sample_shape shaped batch ofsamples if the distribution parameters are batched. Samples first frombase distribution and appliestransform() for every transform in thelist.

propertysupport#
Return type:

_DependentProperty

Uniform#

classtorch.distributions.uniform.Uniform(low,high,validate_args=None)[source]#

Bases:Distribution

Generates uniformly distributed random samples from the half-open interval[low,high).

Example:

>>>m=Uniform(torch.tensor([0.0]),torch.tensor([5.0]))>>>m.sample()# uniformly distributed in the range [0.0, 5.0)tensor([ 2.3418])
Parameters:
propertyarg_constraints#
cdf(value)[source]#
entropy()[source]#
expand(batch_shape,_instance=None)[source]#
has_rsample=True#
icdf(value)[source]#
log_prob(value)[source]#
propertymean:Tensor#
propertymode:Tensor#
rsample(sample_shape=())[source]#
Return type:

Tensor

propertystddev:Tensor#
propertysupport#
Return type:

_DependentProperty

propertyvariance:Tensor#

VonMises#

classtorch.distributions.von_mises.VonMises(loc,concentration,validate_args=None)[source]#

Bases:Distribution

A circular von Mises distribution.

This implementation uses polar coordinates. Theloc andvalue argscan be any real number (to facilitate unconstrained optimization), but areinterpreted as angles modulo 2 pi.

Example::
>>>m=VonMises(torch.tensor([1.0]),torch.tensor([1.0]))>>>m.sample()# von Mises distributed with loc=1 and concentration=1tensor([1.9777])
Parameters:
arg_constraints={'concentration':GreaterThan(lower_bound=0.0),'loc':Real()}#
expand(batch_shape,_instance=None)[source]#
has_rsample=False#
log_prob(value)[source]#
propertymean:Tensor#

The provided mean is the circular one.

propertymode:Tensor#
sample(sample_shape=())[source]#

The sampling algorithm for the von Mises distribution is based on thefollowing paper: D.J. Best and N.I. Fisher, “Efficient simulation of thevon Mises distribution.” Applied Statistics (1979): 152-157.

Sampling is always done in double precision internally to avoid a hangin _rejection_sample() for small values of the concentration, whichstarts to happen for single precision around 1e-4 (see issue #88443).

support=Real()#
propertyvariance:Tensor#

The provided variance is the circular one.

Weibull#

classtorch.distributions.weibull.Weibull(scale,concentration,validate_args=None)[source]#

Bases:TransformedDistribution

Samples from a two-parameter Weibull distribution.

Example

>>>m=Weibull(torch.tensor([1.0]),torch.tensor([1.0]))>>>m.sample()# sample from a Weibull distribution with scale=1, concentration=1tensor([ 0.4784])
Parameters:
  • scale (float orTensor) – Scale parameter of distribution (lambda).

  • concentration (float orTensor) – Concentration parameter of distribution (k/shape).

  • validate_args (bool,optional) – Whether to validate arguments. Default: None.

arg_constraints:dict[str,Constraint]={'concentration':GreaterThan(lower_bound=0.0),'scale':GreaterThan(lower_bound=0.0)}#
entropy()[source]#
expand(batch_shape,_instance=None)[source]#
propertymean:Tensor#
propertymode:Tensor#
support=GreaterThan(lower_bound=0.0)#
propertyvariance:Tensor#

Wishart#

classtorch.distributions.wishart.Wishart(df,covariance_matrix=None,precision_matrix=None,scale_tril=None,validate_args=None)[source]#

Bases:ExponentialFamily

Creates a Wishart distribution parameterized by a symmetric positive definite matrixΣ\Sigma,or its Cholesky decompositionΣ=LL\mathbf{\Sigma} = \mathbf{L}\mathbf{L}^\top

Example

>>>m=Wishart(torch.Tensor([2]),covariance_matrix=torch.eye(2))>>>m.sample()# Wishart distributed with mean=`df * I` and>>># variance(x_ij)=`df` for i != j and variance(x_ij)=`2 * df` for i == j
Parameters:
  • df (float orTensor) – real-valued parameter larger than the (dimension of Square matrix) - 1

  • covariance_matrix (Tensor) – positive-definite covariance matrix

  • precision_matrix (Tensor) – positive-definite precision matrix

  • scale_tril (Tensor) – lower-triangular factor of covariance, with positive-valued diagonal

Note

Only one ofcovariance_matrix orprecision_matrix orscale_tril can be specified.Usingscale_tril will be more efficient: all computations internallyare based onscale_tril. Ifcovariance_matrix orprecision_matrix is passed instead, it is only used to computethe corresponding lower triangular matrices using a Cholesky decomposition.‘torch.distributions.LKJCholesky’ is a restricted Wishart distribution.[1]

References

[1] Wang, Z., Wu, Y. and Chu, H., 2018.On equivalence of the LKJ distribution and the restricted Wishart distribution.[2] Sawyer, S., 2007.Wishart Distributions and Inverse-Wishart Sampling.[3] Anderson, T. W., 2003.An Introduction to Multivariate Statistical Analysis (3rd ed.).[4] Odell, P. L. & Feiveson, A. H., 1966.A Numerical Procedure to Generate a SampleCovariance Matrix. JASA, 61(313):199-203.[5] Ku, Y.-C. & Bloomfield, P., 2010.Generating Random Wishart Matrices with Fractional Degrees of Freedom in OX.

propertyarg_constraints#
propertycovariance_matrix:Tensor#
entropy()[source]#
expand(batch_shape,_instance=None)[source]#
has_rsample=True#
log_prob(value)[source]#
propertymean:Tensor#
propertymode:Tensor#
propertyprecision_matrix:Tensor#
rsample(sample_shape=(),max_try_correction=None)[source]#

Warning

In some cases, sampling algorithm based on Bartlett decomposition may return singular matrix samples.Several tries to correct singular samples are performed by default, but it may end up returningsingular matrix samples. Singular samples may return-inf values in.log_prob().In those cases, the user should validate the samples and either fix the value ofdfor adjustmax_try_correction value for argument in.rsample accordingly.

Return type:

Tensor

propertyscale_tril:Tensor#
support=PositiveDefinite()#
propertyvariance:Tensor#

KLDivergence#

torch.distributions.kl.kl_divergence(p,q)[source]#

Compute Kullback-Leibler divergenceKL(pq)KL(p \| q) between two distributions.

KL(pq)=p(x)logp(x)q(x)dxKL(p \| q) = \int p(x) \log\frac {p(x)} {q(x)} \,dx
Parameters:
Returns:

A batch of KL divergences of shapebatch_shape.

Return type:

Tensor

Raises:

NotImplementedError – If the distribution types have not been registered viaregister_kl().

KL divergence is currently implemented for the following distribution pairs:
  • Bernoulli andBernoulli

  • Bernoulli andPoisson

  • Beta andBeta

  • Beta andContinuousBernoulli

  • Beta andExponential

  • Beta andGamma

  • Beta andNormal

  • Beta andPareto

  • Beta andUniform

  • Binomial andBinomial

  • Categorical andCategorical

  • Cauchy andCauchy

  • ContinuousBernoulli andContinuousBernoulli

  • ContinuousBernoulli andExponential

  • ContinuousBernoulli andNormal

  • ContinuousBernoulli andPareto

  • ContinuousBernoulli andUniform

  • Dirichlet andDirichlet

  • Exponential andBeta

  • Exponential andContinuousBernoulli

  • Exponential andExponential

  • Exponential andGamma

  • Exponential andGumbel

  • Exponential andNormal

  • Exponential andPareto

  • Exponential andUniform

  • ExponentialFamily andExponentialFamily

  • Gamma andBeta

  • Gamma andContinuousBernoulli

  • Gamma andExponential

  • Gamma andGamma

  • Gamma andGumbel

  • Gamma andNormal

  • Gamma andPareto

  • Gamma andUniform

  • Geometric andGeometric

  • Gumbel andBeta

  • Gumbel andContinuousBernoulli

  • Gumbel andExponential

  • Gumbel andGamma

  • Gumbel andGumbel

  • Gumbel andNormal

  • Gumbel andPareto

  • Gumbel andUniform

  • HalfNormal andHalfNormal

  • Independent andIndependent

  • Laplace andBeta

  • Laplace andContinuousBernoulli

  • Laplace andExponential

  • Laplace andGamma

  • Laplace andLaplace

  • Laplace andNormal

  • Laplace andPareto

  • Laplace andUniform

  • LowRankMultivariateNormal andLowRankMultivariateNormal

  • LowRankMultivariateNormal andMultivariateNormal

  • MultivariateNormal andLowRankMultivariateNormal

  • MultivariateNormal andMultivariateNormal

  • Normal andBeta

  • Normal andContinuousBernoulli

  • Normal andExponential

  • Normal andGamma

  • Normal andGumbel

  • Normal andLaplace

  • Normal andNormal

  • Normal andPareto

  • Normal andUniform

  • OneHotCategorical andOneHotCategorical

  • Pareto andBeta

  • Pareto andContinuousBernoulli

  • Pareto andExponential

  • Pareto andGamma

  • Pareto andNormal

  • Pareto andPareto

  • Pareto andUniform

  • Poisson andBernoulli

  • Poisson andBinomial

  • Poisson andPoisson

  • TransformedDistribution andTransformedDistribution

  • Uniform andBeta

  • Uniform andContinuousBernoulli

  • Uniform andExponential

  • Uniform andGamma

  • Uniform andGumbel

  • Uniform andNormal

  • Uniform andPareto

  • Uniform andUniform

torch.distributions.kl.register_kl(type_p,type_q)[source]#

Decorator to register a pairwise function withkl_divergence().Usage:

@register_kl(Normal,Normal)defkl_normal_normal(p,q):# insert implementation here

Lookup returns the most specific (type,type) match ordered by subclass. Ifthe match is ambiguous, aRuntimeWarning is raised. For example toresolve the ambiguous situation:

@register_kl(BaseP,DerivedQ)defkl_version1(p,q):...@register_kl(DerivedP,BaseQ)defkl_version2(p,q):...

you should register a third most-specific implementation, e.g.:

register_kl(DerivedP,DerivedQ)(kl_version1)# Break the tie.
Parameters:
  • type_p (type) – A subclass ofDistribution.

  • type_q (type) – A subclass ofDistribution.

Transforms#

classtorch.distributions.transforms.AbsTransform(cache_size=0)[source]#

Transform via the mappingy=xy = |x|.

classtorch.distributions.transforms.AffineTransform(loc,scale,event_dim=0,cache_size=0)[source]#

Transform via the pointwise affine mappingy=loc+scale×xy = \text{loc} + \text{scale} \times x.

Parameters:
  • loc (Tensor orfloat) – Location parameter.

  • scale (Tensor orfloat) – Scale parameter.

  • event_dim (int) – Optional size ofevent_shape. This should be zerofor univariate random variables, 1 for distributions over vectors,2 for distributions over matrices, etc.

classtorch.distributions.transforms.CatTransform(tseq,dim=0,lengths=None,cache_size=0)[source]#

Transform functor that applies a sequence of transformstseqcomponent-wise to each submatrix atdim, of lengthlengths[dim],in a way compatible withtorch.cat().

Example:

x0=torch.cat([torch.range(1,10),torch.range(1,10)],dim=0)x=torch.cat([x0,x0],dim=0)t0=CatTransform([ExpTransform(),identity_transform],dim=0,lengths=[10,10])t=CatTransform([t0,t0],dim=0,lengths=[20,20])y=t(x)
classtorch.distributions.transforms.ComposeTransform(parts,cache_size=0)[source]#

Composes multiple transforms in a chain.The transforms being composed are responsible for caching.

Parameters:
  • parts (list ofTransform) – A list of transforms to compose.

  • cache_size (int) – Size of cache. If zero, no caching is done. If one,the latest single value is cached. Only 0 and 1 are supported.

classtorch.distributions.transforms.CorrCholeskyTransform(cache_size=0)[source]#

Transforms an unconstrained real vectorxx with lengthD(D1)/2D*(D-1)/2 into theCholesky factor of a D-dimension correlation matrix. This Cholesky factor is a lowertriangular matrix with positive diagonals and unit Euclidean norm for each row.The transform is processed as follows:

  1. First we convert x into a lower triangular matrix in row order.

  2. For each rowXiX_i of the lower triangular part, we apply asigned version ofclassStickBreakingTransform to transformXiX_i into aunit Euclidean length vector using the following steps:- Scales into the interval(1,1)(-1, 1) domain:ri=tanh(Xi)r_i = \tanh(X_i).- Transforms into an unsigned domain:zi=ri2z_i = r_i^2.- Appliessi=StickBreakingTransform(zi)s_i = StickBreakingTransform(z_i).- Transforms back into signed domain:yi=sign(ri)siy_i = sign(r_i) * \sqrt{s_i}.

classtorch.distributions.transforms.CumulativeDistributionTransform(distribution,cache_size=0)[source]#

Transform via the cumulative distribution function of a probability distribution.

Parameters:

distribution (Distribution) – Distribution whose cumulative distribution function to use forthe transformation.

Example:

# Construct a Gaussian copula from a multivariate normal.base_dist=MultivariateNormal(loc=torch.zeros(2),scale_tril=LKJCholesky(2).sample(),)transform=CumulativeDistributionTransform(Normal(0,1))copula=TransformedDistribution(base_dist,[transform])
classtorch.distributions.transforms.ExpTransform(cache_size=0)[source]#

Transform via the mappingy=exp(x)y = \exp(x).

classtorch.distributions.transforms.IndependentTransform(base_transform,reinterpreted_batch_ndims,cache_size=0)[source]#

Wrapper around another transform to treatreinterpreted_batch_ndims-many extra of the right most dimensions asdependent. This has no effect on the forward or backward transforms, butdoes sum outreinterpreted_batch_ndims-many of the rightmost dimensionsinlog_abs_det_jacobian().

Parameters:
  • base_transform (Transform) – A base transform.

  • reinterpreted_batch_ndims (int) – The number of extra rightmostdimensions to treat as dependent.

classtorch.distributions.transforms.LowerCholeskyTransform(cache_size=0)[source]#

Transform from unconstrained matrices to lower-triangular matrices withnonnegative diagonal entries.

This is useful for parameterizing positive definite matrices in terms oftheir Cholesky factorization.

classtorch.distributions.transforms.PositiveDefiniteTransform(cache_size=0)[source]#

Transform from unconstrained matrices to positive-definite matrices.

classtorch.distributions.transforms.PowerTransform(exponent,cache_size=0)[source]#

Transform via the mappingy=xexponenty = x^{\text{exponent}}.

classtorch.distributions.transforms.ReshapeTransform(in_shape,out_shape,cache_size=0)[source]#

Unit Jacobian transform to reshape the rightmost part of a tensor.

Note thatin_shape andout_shape must have the same number ofelements, just as fortorch.Tensor.reshape().

Parameters:
  • in_shape (torch.Size) – The input event shape.

  • out_shape (torch.Size) – The output event shape.

  • cache_size (int) – Size of cache. If zero, no caching is done. If one,the latest single value is cached. Only 0 and 1 are supported. (Default 0.)

classtorch.distributions.transforms.SigmoidTransform(cache_size=0)[source]#

Transform via the mappingy=11+exp(x)y = \frac{1}{1 + \exp(-x)} andx=logit(y)x = \text{logit}(y).

classtorch.distributions.transforms.SoftplusTransform(cache_size=0)[source]#

Transform via the mappingSoftplus(x)=log(1+exp(x))\text{Softplus}(x) = \log(1 + \exp(x)).The implementation reverts to the linear function whenx>20x > 20.

classtorch.distributions.transforms.TanhTransform(cache_size=0)[source]#

Transform via the mappingy=tanh(x)y = \tanh(x).

It is equivalent to

ComposeTransform([AffineTransform(0.0,2.0),SigmoidTransform(),AffineTransform(-1.0,2.0),])

However this might not be numerically stable, thus it is recommended to useTanhTransforminstead.

Note that one should usecache_size=1 when it comes toNaN/Inf values.

classtorch.distributions.transforms.SoftmaxTransform(cache_size=0)[source]#

Transform from unconstrained space to the simplex viay=exp(x)y = \exp(x) thennormalizing.

This is not bijective and cannot be used for HMC. However this acts mostlycoordinate-wise (except for the final normalization), and thus isappropriate for coordinate-wise optimization algorithms.

classtorch.distributions.transforms.StackTransform(tseq,dim=0,cache_size=0)[source]#

Transform functor that applies a sequence of transformstseqcomponent-wise to each submatrix atdimin a way compatible withtorch.stack().

Example:

x=torch.stack([torch.range(1,10),torch.range(1,10)],dim=1)t=StackTransform([ExpTransform(),identity_transform],dim=1)y=t(x)
classtorch.distributions.transforms.StickBreakingTransform(cache_size=0)[source]#

Transform from unconstrained space to the simplex of one additionaldimension via a stick-breaking process.

This transform arises as an iterated sigmoid transform in a stick-breakingconstruction of theDirichlet distribution: the first logit istransformed via sigmoid to the first probability and the probability ofeverything else, and then the process recurses.

This is bijective and appropriate for use in HMC; however it mixescoordinates together and is less appropriate for optimization.

classtorch.distributions.transforms.Transform(cache_size=0)[source]#

Abstract class for invertable transformations with computable logdet jacobians. They are primarily used intorch.distributions.TransformedDistribution.

Caching is useful for transforms whose inverses are either expensive ornumerically unstable. Note that care must be taken with memoized valuessince the autograd graph may be reversed. For example while the followingworks with or without caching:

y=t(x)t.log_abs_det_jacobian(x,y).backward()# x will receive gradients.

However the following will error when caching due to dependency reversal:

y=t(x)z=t.inv(y)grad(z.sum(),[y])# error because z is x

Derived classes should implement one or both of_call() or_inverse(). Derived classes that setbijective=True should alsoimplementlog_abs_det_jacobian().

Parameters:

cache_size (int) – Size of cache. If zero, no caching is done. If one,the latest single value is cached. Only 0 and 1 are supported.

Variables:
  • domain (Constraint) – The constraint representing valid inputs to this transform.

  • codomain (Constraint) – The constraint representing valid outputs to this transformwhich are inputs to the inverse transform.

  • bijective (bool) – Whether this transform is bijective. A transformt is bijective ifft.inv(t(x))==x andt(t.inv(y))==y for everyx in the domain andy inthe codomain. Transforms that are not bijective should at leastmaintain the weaker pseudoinverse propertiest(t.inv(t(x))==t(x) andt.inv(t(t.inv(y)))==t.inv(y).

  • sign (int orTensor) – For bijective univariate transforms, thisshould be +1 or -1 depending on whether transform is monotoneincreasing or decreasing.

propertyinv:Transform#

Returns the inverseTransform of this transform.This should satisfyt.inv.invist.

propertysign:int#

Returns the sign of the determinant of the Jacobian, if applicable.In general this only makes sense for bijective transforms.

log_abs_det_jacobian(x,y)[source]#

Computes the log det jacobianlog |dy/dx| given input and output.

forward_shape(shape)[source]#

Infers the shape of the forward computation, given the input shape.Defaults to preserving shape.

inverse_shape(shape)[source]#

Infers the shapes of the inverse computation, given the output shape.Defaults to preserving shape.

Constraints#

classtorch.distributions.constraints.Constraint[source]#

Abstract base class for constraints.

A constraint object represents a region over which a variable is valid,e.g. within which a variable can be optimized.

Variables:
  • is_discrete (bool) – Whether constrained space is discrete.Defaults to False.

  • event_dim (int) – Number of rightmost dimensions that together definean event. Thecheck() method will remove this many dimensionswhen computing validity.

check(value)[source]#

Returns a byte tensor ofsample_shape+batch_shape indicatingwhether each event in value satisfies this constraint.

torch.distributions.constraints.cat[source]#

alias of_Cat

torch.distributions.constraints.dependent_property[source]#

alias of_DependentProperty

torch.distributions.constraints.greater_than[source]#

alias of_GreaterThan

torch.distributions.constraints.greater_than_eq[source]#

alias of_GreaterThanEq

torch.distributions.constraints.independent[source]#

alias of_IndependentConstraint

torch.distributions.constraints.integer_interval[source]#

alias of_IntegerInterval

torch.distributions.constraints.interval[source]#

alias of_Interval

torch.distributions.constraints.half_open_interval[source]#

alias of_HalfOpenInterval

torch.distributions.constraints.is_dependent(constraint)[source]#

Checks ifconstraint is a_Dependent object.

Parameters:

constraint – AConstraint object.

Returns:

True ifconstraint can be refined to the type_Dependent, False otherwise.

Return type:

bool

Examples

>>>importtorch>>>fromtorch.distributionsimportBernoulli>>>fromtorch.distributions.constraintsimportis_dependent
>>>dist=Bernoulli(probs=torch.tensor([0.6],requires_grad=True))>>>constraint1=dist.arg_constraints["probs"]>>>constraint2=dist.arg_constraints["logits"]
>>>forconstraintin[constraint1,constraint2]:>>>ifis_dependent(constraint):>>>continue
torch.distributions.constraints.less_than[source]#

alias of_LessThan

classtorch.distributions.constraints.MixtureSameFamilyConstraint(base_constraint)[source]#

Constraint for theMixtureSameFamilydistribution that adds back the rightmost batch dimension beforeperforming the validity check with the component distributionconstraint.

Parameters:

base_constraint – TheConstraint object ofthe component distribution oftheMixtureSameFamily distribution.

check(value)[source]#

Check validity ofvalue as a possible outcome of samplingtheMixtureSameFamily distribution.

torch.distributions.constraints.multinomial[source]#

alias of_Multinomial

torch.distributions.constraints.stack[source]#

alias of_Stack

ConstraintRegistry#

PyTorch provides two globalConstraintRegistry objects that linkConstraint objects toTransform objects. These objects bothinput constraints and return transforms, but they have different guarantees onbijectivity.

  1. biject_to(constraint) looks up a bijectiveTransform fromconstraints.realto the givenconstraint. The returned transform is guaranteed to have.bijective=True and should implement.log_abs_det_jacobian().

  2. transform_to(constraint) looks up a not-necessarily bijectiveTransform fromconstraints.realto the givenconstraint. The returned transform is not guaranteed toimplement.log_abs_det_jacobian().

Thetransform_to() registry is useful for performing unconstrainedoptimization on constrained parameters of probability distributions, which areindicated by each distribution’s.arg_constraints dict. These transforms oftenoverparameterize a space in order to avoid rotation; they are thus moresuitable for coordinate-wise optimization algorithms like Adam:

loc=torch.zeros(100,requires_grad=True)unconstrained=torch.zeros(100,requires_grad=True)scale=transform_to(Normal.arg_constraints["scale"])(unconstrained)loss=-Normal(loc,scale).log_prob(data).sum()

Thebiject_to() registry is useful for Hamiltonian Monte Carlo, wheresamples from a probability distribution with constrained.support arepropagated in an unconstrained space, and algorithms are typically rotationinvariant.:

dist=Exponential(rate)unconstrained=torch.zeros(100,requires_grad=True)sample=biject_to(dist.support)(unconstrained)potential_energy=-dist.log_prob(sample).sum()

Note

An example wheretransform_to andbiject_to differ isconstraints.simplex:transform_to(constraints.simplex) returns aSoftmaxTransform that simplyexponentiates and normalizes its inputs; this is a cheap and mostlycoordinate-wise operation appropriate for algorithms like SVI. Incontrast,biject_to(constraints.simplex) returns aStickBreakingTransform thatbijects its input down to a one-fewer-dimensional space; this a moreexpensive less numerically stable transform but is needed for algorithmslike HMC.

Thebiject_to andtransform_to objects can be extended by user-definedconstraints and transforms using their.register() method either as afunction on singleton constraints:

transform_to.register(my_constraint,my_transform)

or as a decorator on parameterized constraints:

@transform_to.register(MyConstraintClass)defmy_factory(constraint):assertisinstance(constraint,MyConstraintClass)returnMyTransform(constraint.param1,constraint.param2)

You can create your own registry by creating a newConstraintRegistryobject.

classtorch.distributions.constraint_registry.ConstraintRegistry[source]#

Registry to link constraints to transforms.

register(constraint,factory=None)[source]#

Registers aConstraintsubclass in this registry. Usage:

@my_registry.register(MyConstraintClass)defconstruct_transform(constraint):assertisinstance(constraint,MyConstraint)returnMyTransform(constraint.arg_constraints)
Parameters:
  • constraint (subclass ofConstraint) – A subclass ofConstraint, ora singleton object of the desired class.

  • factory (Callable) – A callable that inputs a constraint object and returnsaTransform object.

On this page