View On GitHubOpen In ColabOpen In molab

General API quickstart#

importarvizasazimportmatplotlib.pyplotaspltimportnumpyasnpimportpymcaspmimportpytensor.tensoraspt
/home/xian/anaconda3/envs/pymc-dev-py39/lib/python3.9/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: main is an invalid version and will not be supported in a future release  warnings.warn(
RANDOM_SEED=8927rng=np.random.default_rng(RANDOM_SEED)az.style.use("arviz-darkgrid")

1. Model creation#

Models in PyMC are centered around theModel class. It has references to all random variables (RVs) and computes the model logp and its gradients. Usually, you would instantiate it as part of awith context:

withpm.Model()asmodel:# Model definitionpass

We discuss RVs further below but let’s create a simple model to explore theModel class.

withpm.Model()asmodel:mu=pm.Normal("mu",mu=0,sigma=1)obs=pm.Normal("obs",mu=mu,sigma=1,observed=rng.standard_normal(100))
model.basic_RVs
[mu ~ N(0, 1), obs ~ N(mu, 1)]
model.free_RVs
[mu ~ N(0, 1)]
model.observed_RVs
[obs ~ N(mu, 1)]
model.compile_logp()({"mu":0})
array(-143.03962875)

It’s worth highlighting the design choice we made withlogp. As you can see above,logp is being called with arguments, so it’s a method of the model instance. More precisely, it puts together a function based on the current state of the model – or on the state given as argument tologp (see example below).

For diverse reasons, we assume that aModel instance isn’t static. If you need to uselogp in an inner loop and it needs to be static, simply use something likelogp=model.logp. Here is an example below – note the caching effect and the speed up:

%timeit model.compile_logp()({"mu": 0.1})logp=model.compile_logp()%timeit logp({"mu": 0.1})
83 ms ± 7.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)18 µs ± 276 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

2. Probability Distributions#

Every probabilistic program consists of observed and unobserved Random Variables (RVs). Observed RVs are defined via likelihood distributions, while unobserved RVs are defined via prior distributions. In the PyMC module, the structure for probability distributions looks like this:

Distributions

  • pymc:api_distributions_continuous

  • pymc:api_distributions_discrete

  • pymc:api_distributions_multivariate

  • pymc:api_distributions_mixture

  • pymc:api_distributions_timeseries

  • pymc:api_distributions_censored

  • pymc:api_distributions_simulator

Unobserved Random Variables#

Every unobserved RV has the following calling signature: name (str), parameter keyword arguments. Thus, a normal prior can be defined in a model context like this:

withpm.Model():x=pm.Normal("x",mu=0,sigma=1)

As with the model, we can evaluate its logp:

pm.logp(x,0).eval()
array(-0.91893853)

Observed Random Variables#

Observed RVs are defined just like unobserved RVs but require data to be passed into theobserved keyword argument:

withpm.Model():obs=pm.Normal("x",mu=0,sigma=1,observed=rng.standard_normal(100))

observed supports lists,numpy.ndarray andpytensor data structures.

Deterministic transforms#

PyMC allows you to freely do algebra with RVs in all kinds of ways:

withpm.Model():x=pm.Normal("x",mu=0,sigma=1)y=pm.Gamma("y",alpha=1,beta=1)plus_2=x+2summed=x+ysquared=x**2sined=pm.math.sin(x)

Though these transformations work seamlessly, their results are not stored automatically. Thus, if you want to keep track of a transformed variable, you have to usepm.Deterministic:

withpm.Model():x=pm.Normal("x",mu=0,sigma=1)plus_2=pm.Deterministic("x plus 2",x+2)

Note thatplus_2 can be used in the identical way to above, we only tell PyMC to keep track of this RV for us.

Lists of RVs / higher-dimensional RVs#

Above we have seen how to create scalar RVs. In many models, we want multiple RVs. Users will sometimes try to create lists of RVs, like this:

withpm.Model():# bad:x=[pm.Normal(f"x_{i}",mu=0,sigma=1)foriinrange(10)]

This works, but it is slow and not recommended. Instead, we can usecoordinates:

coords={"cities":["Santiago","Mumbai","Tokyo"]}withpm.Model(coords=coords)asmodel:# good:x=pm.Normal("x",mu=0,sigma=1,dims="cities")

x is now a array of length 3 and each of the 3 variables within this array is associated with a label. This will make it very easy to distinguish the 3 different variables when we go to look at results. We can index into this array or do linear algebra operations on it:

withmodel:y=x[0]*x[1]# indexing is supportedx.dot(x.T)# linear algebra is supported

Initialize Random Variables#

Though PyMC automatically initializes models, it is sometimes helpful to define initial values for RVs. This can be done via theinitval kwarg:

withpm.Model(coords={"idx":np.arange(5)})asmodel:x=pm.Normal("x",mu=0,sigma=1,dims="idx")model.initial_point()
{'x': array([0., 0., 0., 0., 0.])}
withpm.Model(coords={"idx":np.arange(5)})asmodel:x=pm.Normal("x",mu=0,sigma=1,dims="idx",initval=rng.standard_normal(5))model.initial_point()
{'x': array([-0.36012097, -0.16168135,  1.07485641, -0.08855632, -0.03857412])}

This technique is sometimes useful when trying to identify problems with model specification or initialization.

3. Inference#

Once we have defined our model, we have to perform inference to approximate the posterior distribution. PyMC supports two broad classes of inference: sampling and variational inference.

3.1 Sampling#

The main entry point to MCMC sampling algorithms is via thepm.sample() function. By default, this function tries to auto-assign the right sampler(s).pm.sample() returns anarviz.InferenceData object.InferenceData objects can easily be saved/loaded from a file and can carry additional (meta)data such as date/version and posterior predictive samples. Take a look at theArviZ Quickstart to learn more.

withpm.Model()asmodel:mu=pm.Normal("mu",mu=0,sigma=1)obs=pm.Normal("obs",mu=mu,sigma=1,observed=rng.standard_normal(100))idata=pm.sample(2000)
Auto-assigning NUTS sampler...Initializing NUTS using jitter+adapt_diag...Multiprocess sampling (2 chains in 2 jobs)NUTS: [mu]
100.00% [6000/6000 00:03<00:00 Sampling 2 chains, 0 divergences]
Sampling 2 chains for 1_000 tune and 2_000 draw iterations (2_000 + 4_000 draws total) took 4 seconds.

As you can see, with model that exclusively contains continuous variables, PyMC assigns the NUTS sampler, which is very efficient even for complex models. PyMC also runs initial tuning to find good starting parameters for the sampler. Here we draw 2000 samples from the posterior in each chain and allow the sampler to adjust its parameters in an additional 1500 iterations.

If not set via thechains kwarg, the number of chains is determined from the number of available CPU cores.

idata.posterior.dims
Frozen({'chain': 2, 'draw': 2000})

The tuning samples are discarded by default. Withdiscard_tuned_samples=False they can be kept and end up in a separate group within theInferenceData object (i.e.,idata.warmup_posterior).

You can control how the chains are run in parallel using thechains andcores kwargs:

withpm.Model()asmodel:mu=pm.Normal("mu",mu=0,sigma=1)obs=pm.Normal("obs",mu=mu,sigma=1,observed=rng.standard_normal(100))idata=pm.sample(cores=4,chains=6)
Auto-assigning NUTS sampler...Initializing NUTS using jitter+adapt_diag...Multiprocess sampling (6 chains in 4 jobs)NUTS: [mu]
100.00% [12000/12000 00:07<00:00 Sampling 6 chains, 0 divergences]
Sampling 6 chains for 1_000 tune and 1_000 draw iterations (6_000 + 6_000 draws total) took 7 seconds.
idata.posterior["mu"].shape
(6, 1000)
# get values of a single chainidata.posterior["mu"].sel(chain=2).shape
(1000,)

3.2 Analyze sampling results#

The most common used plot to analyze sampling results is the so-called trace-plot:

withpm.Model()asmodel:mu=pm.Normal("mu",mu=0,sigma=1)sd=pm.HalfNormal("sd",sigma=1)obs=pm.Normal("obs",mu=mu,sigma=sd,observed=rng.standard_normal(100))idata=pm.sample()
Auto-assigning NUTS sampler...Initializing NUTS using jitter+adapt_diag...Multiprocess sampling (2 chains in 2 jobs)NUTS: [mu, sd]
100.00% [4000/4000 00:03<00:00 Sampling 2 chains, 0 divergences]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 4 seconds.

Another common metric to look at is the Gelman-Rubin statistic, or R-hat:

az.summary(idata)
meansdhdi_3%hdi_97%mcse_meanmcse_sdess_bulkess_tailr_hat
mu-0.0190.096-0.1930.1630.0020.0021608.01343.01.0
sd0.9670.0690.8351.0890.0020.0011836.01406.01.0

R-hat is also presented as part of theaz.plot_forest:

az.plot_forest(idata,r_hat=True);
../_images/bd7fcac63f54b2492d34096ec1114c8b06d920e82dfd2816ce0b473d03dafc9b.png

Finally, for a plot of the posterior that is inspired by[Kruschke, 2014], you can use the:

For high-dimensional models it becomes cumbersome to look at the traces for all parameters. When usingNUTS we can look at the energy plot to assess problems of convergence:

withpm.Model(coords={"idx":np.arange(100)})asmodel:x=pm.Normal("x",mu=0,sigma=1,dims="idx")idata=pm.sample()az.plot_energy(idata);
Auto-assigning NUTS sampler...Initializing NUTS using jitter+adapt_diag...Multiprocess sampling (2 chains in 2 jobs)NUTS: [x]
100.00% [4000/4000 00:04<00:00 Sampling 2 chains, 0 divergences]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 5 seconds.
../_images/8422788abb1fab86798d0ba9ff90652e9ca219320c97242d6e0fa906139c14a4.png

For more information on sampler stats and the energy plot, seeSampler Statistics. For more information on identifying sampling problems and what to do about them, seeDiagnosing Biased Inference with Divergences.

3.3 Variational inference#

PyMC supports various Variational Inference techniques. While these methods are much faster, they are often also less accurate and can lead to biased inference. The main entry point ispymc.fit().

withpm.Model()asmodel:mu=pm.Normal("mu",mu=0,sigma=1)sd=pm.HalfNormal("sd",sigma=1)obs=pm.Normal("obs",mu=mu,sigma=sd,observed=rng.standard_normal(100))approx=pm.fit()
100.00% [10000/10000 00:00<00:00 Average Loss = 142.01]
Finished [100%]: Average Loss = 142

The returnedApproximation object has various capabilities, like drawing samples from the approximated posterior, which we can analyse like a regular sampling run:

idata=approx.sample(1000)az.summary(idata)
arviz - WARNING - Shape validation failed: input_shape: (1, 1000), minimum_shape: (chains=2, draws=4)
meansdhdi_3%hdi_97%mcse_meanmcse_sdess_bulkess_tailr_hat
mu-0.0230.169-0.3380.2960.0050.004973.0880.0NaN
sd0.9890.1580.6941.2620.0050.004972.01026.0NaN

Thevariational submodule offers a lot of flexibility in which VI to use and follows an object oriented design. For example, full-rank ADVI estimates a full covariance matrix:

mu=pm.floatX([0.0,0.0])cov=pm.floatX([[1,0.5],[0.5,1.0]])withpm.Model(coords={"idx":np.arange(2)})asmodel:pm.MvNormal("x",mu=mu,cov=cov,dims="idx")approx=pm.fit(method="fullrank_advi")
100.00% [10000/10000 00:03<00:00 Average Loss = 0.013113]
Finished [100%]: Average Loss = 0.012772

An equivalent expression using the object-oriented interface is:

withpm.Model(coords={"idx":np.arange(2)})asmodel:pm.MvNormal("x",mu=mu,cov=cov,dims="idx")approx=pm.FullRankADVI().fit()
100.00% [10000/10000 00:03<00:00 Average Loss = 0.020591]
Finished [100%]: Average Loss = 0.020531
withpm.Model(coords={"idx":np.arange(2)})asmodel:pm.MvNormal("x",mu=mu,cov=cov,dims="idx")approx=pm.FullRankADVI().fit()
100.00% [10000/10000 00:03<00:00 Average Loss = 0.014234]
Finished [100%]: Average Loss = 0.014143
plt.figure()idata=approx.sample(10000)az.plot_pair(idata,var_names="x",coords={"idx":[0,1]});
<Figure size 432x288 with 0 Axes>
../_images/fc6b5bd7b4db1fd17c9a8bfec9fcbaf36039b9ece654beb3fb051b03be3d13c8.png

Stein Variational Gradient Descent (SVGD) uses particles to estimate the posterior:

w=pm.floatX([0.2,0.8])mu=pm.floatX([-0.3,0.5])sd=pm.floatX([0.1,0.1])withpm.Model()asmodel:pm.NormalMixture("x",w=w,mu=mu,sigma=sd)approx=pm.fit(method=pm.SVGD(n_particles=200,jitter=1.0))
100.00% [10000/10000 01:20<00:00]
withpm.Model()asmodel:pm.NormalMixture("x",w=[0.2,0.8],mu=[-0.3,0.5],sigma=[0.1,0.1])
plt.figure()idata=approx.sample(10000)az.plot_dist(idata.posterior["x"]);
../_images/434538d8660bf2399ebf9df11cbd2b7cec62d8abafc588da625315074b628118.png

For more information on variational inference, seeVariational Inference.

4. Posterior Predictive Sampling#

Thesample_posterior_predictive() function performs prediction on hold-out data and posterior predictive checks.

data=rng.standard_normal(100)withpm.Model()asmodel:mu=pm.Normal("mu",mu=0,sigma=1)sd=pm.HalfNormal("sd",sigma=1)obs=pm.Normal("obs",mu=mu,sigma=sd,observed=data)idata=pm.sample()
Auto-assigning NUTS sampler...Initializing NUTS using jitter+adapt_diag...Multiprocess sampling (2 chains in 2 jobs)NUTS: [mu, sd]
100.00% [4000/4000 00:03<00:00 Sampling 2 chains, 0 divergences]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 4 seconds.
withmodel:idata.extend(pm.sample_posterior_predictive(idata))
100.00% [2000/2000 00:00<00:00]
fig,ax=plt.subplots()az.plot_ppc(idata,ax=ax)ax.axvline(data.mean(),ls="--",color="r",label="True mean")ax.legend(fontsize=10);
/home/xian/anaconda3/envs/pymc-dev-py39/lib/python3.9/site-packages/IPython/core/pylabtools.py:151: UserWarning: Creating legend with loc="best" can be slow with large amounts of data.  fig.canvas.print_figure(bytes_io, **kw)
../_images/39efd09b12953741f958e0a4e30c960050d4f5134413a47538ee821e2f1d7766.png

4.1 Predicting on hold-out data#

In many cases you want to predict on unseen / hold-out data. This is especially relevant in Probabilistic Machine Learning and Bayesian Deep Learning. PyMC includes apm.MutableData container to help with such uses. It is a wrapper around apytensor.shared variable and allows the values of the data to be changed later. Otherwise,pm.MutableData objects can be used just like any other numpy array or tensor.

This distinction is significant since internally all models in PyMC are giant symbolic expressions. When you pass raw data directly into a model, you are giving PyTensor permission to treat this data as a constant and optimize it away if doing so makes sense. If you need to change this data later you may not have any way to point at it within the larger symbolic expression. Usingpm.MutableData offers a way to point to a specific place in the symbolic expression and change what is there.

x=rng.standard_normal(100)y=x>0coords={"idx":np.arange(100)}withpm.Model()asmodel:# create shared variables that can be changed later onx_obs=pm.MutableData("x_obs",x,dims="idx")y_obs=pm.MutableData("y_obs",y,dims="idx")coeff=pm.Normal("x",mu=0,sigma=1)logistic=pm.math.sigmoid(coeff*x_obs)pm.Bernoulli("obs",p=logistic,observed=y_obs,dims="idx")idata=pm.sample()
Auto-assigning NUTS sampler...Initializing NUTS using jitter+adapt_diag...Multiprocess sampling (2 chains in 2 jobs)NUTS: [x]
100.00% [4000/4000 00:03<00:00 Sampling 2 chains, 0 divergences]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 3 seconds.

Now assume we want to predict on unseen data. For this we have to change the values ofx_obs andy_obs. Theoretically we don’t need to sety_obs as we want to predict it but it has to match the shape ofx_obs.

withmodel:# change the value and shape of the datapm.set_data({"x_obs":[-1,0,1.0],# use dummy values with the same shape:"y_obs":[0,0,0],},coords={"idx":[1001,1002,1003]},)idata.extend(pm.sample_posterior_predictive(idata))
100.00% [2000/2000 00:00<00:00]
idata.posterior_predictive["obs"].mean(dim=["draw","chain"])
<xarray.DataArray 'obs' (idx: 3)>array([0.0215, 0.488 , 0.982 ])Coordinates:  * idx      (idx) int64 1001 1002 1003
xarray.DataArray
'obs'
  • idx: 3
  • 0.0215 0.488 0.982
    array([0.0215, 0.488 , 0.982 ])
    • idx
      (idx)
      int64
      1001 1002 1003
      array([1001, 1002, 1003])

References#

[1]

John Kruschke.Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press, 2014.

%load_ext watermark%watermark -n -u -v -iv -w -p pytensor,aeppl,xarray
Last updated: Fri Jun 03 2022Python implementation: CPythonPython version       : 3.9.13IPython version      : 8.4.0pytensor: 2.6.2aeppl : 0.0.31xarray: 2022.3.0arviz     : 0.12.1numpy     : 1.22.4pymc      : 4.0.0b6pytensor    : 2.6.2matplotlib: 3.5.2Watermark: 2.3.1

License notice#

All the notebooks in this example gallery are provided under theMIT Licensewhich allows modification, and redistribution for anyuse provided the copyright and license notices are preserved.

Citing PyMC examples#

To cite this notebook, use the DOI provided by Zenodo for the pymc-examples repository.

Important

Many notebooks are adapted from other sources: blogs, books… In such cases you shouldcite the original source as well.

Also remember to cite the relevant libraries used by your code.

Here is an citation template in bibtex:

@incollection{citekey,author="<notebook authors, see above>",title="<notebook title>",editor="PyMC Team",booktitle="PyMC examples",doi="10.5281/zenodo.5654871"}

which once rendered could look like: