Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A backend for storing MCMC draws.

License

NotificationsYou must be signed in to change notification settings

pymc-devs/mcbackend

Repository files navigation

PyPI versionpipelinecoverage

Where doyou want to store your MCMC draws?In memory?On disk?Or in a database running in a datacenter?

No matter where you want to put them, or which PPL generates them: McBackend takes care of your MCMC samples.

Quickstart

Themcbackend package consists of three parts:

Part 1: A schema for MCMC run & chain metadata

No matter which programming language your favorite PPL is written in, theProtocolBuffers from McBackend can be used to generate code in languages like C++, C#, Python and many more to represent commonly used metadata about MCMC runs, chains and model variables.

The definitions inprotobufs/meta.proto are designed to maximize compatibility withArviZ objects, making it easy to transform MCMC draws stored according to the McBackend schema toInferenceData objects for plotting & analysis.

Part 2: A storage backend interface

Thedraws andstats created by MCMC sampling algorithms at runtime need to be storedsomewhere.

This "somewhere" is called the storagebackend in PPLs/MCMC frameworks likePyMC oremcee.

Most storage backends must be initialized with metadata about the model variables so they can, for example, pre-allocated memory for thedraws andstats they're about to receive.After then receiving thousands ofdraws andstats they must then provide methods by which thedraws/stats can be retrieved.

Themcbackend.core module has classes such asBackend,Run, andChain to define these interfaces for any storage backend, no matter if it's an in-memory, filesystem or database storage.Albeit this implementation is currently Python-only, the interface signature should be portable to e.g. C++.

Viamcbackend.backends the McBackend package then provides backendimplementations.Currently you may choose from:

backend=mcbackend.NumPyBackend()backend=mcbackend.ClickHouseBackend(client=clickhouse_driver.Client("localhost") )# All that matters:isinstance(backend,mcbackend.Backend)# >>> True

Part 3: PPL adapters

Anything that is aBackend can be wrapped by anadapter that makes it compatible with your favorite PPL.

In the example below, aClickHouseBackend is initialized to store MCMC draws from a PyMC model in aClickHouse database.See below forhow to run it in Docker.

importclickhouse_driverimportmcbackendimportpymcaspm# 1. Create _any_ kind of backendch_client=clickhouse_driver.Client("localhost")backend=mcbackend.ClickHouseBackend(ch_client)withpm.Model():# 2. Create your model    ...# 3. Hit the inference button ™ while passing the backend!pm.sample(trace=backend)

In case of PyMC the adapter lives in the PyMC codebasesince version 5.1.1,so all you need to do is pass anymcbackend.Backend via thepm.sample(trace=...) parameter!

Instead of using PyMC's built-in NumPy backend, the MCMC draws now end up in ClickHouse.

Retrieving thedraws &stats

Continuing the example from above we can now retrieve draws from the backend.

Note that since this example wrote the draws to ClickHouse, we could run the code below on another machine, and even while the above model is still sampling!

backend=mcbackend.ClickHouseBackend(ch_client)# Fetch the run from the database (downloads just metadata)run=backend.get_run(trace.run_id)# Get all draws from a chainchain=run.get_chains()[0]chain.get_draws("my favorite variable")# >>> array([ ... ])# Convert everything to `InferenceData`idata=run.to_inferencedata()print(idata)# >>> Inference data with groups:# >>> > posterior# >>> > sample_stats# >>> > observed_data# >>> > constant_data# >>># >>> Warmup iterations saved (warmup_*).

Contributing what's next

McBackend just started and is looking for contributions.For example:

  • Schema discussion: Which metadata is needed? (related:PyMC #5160)
  • Interface discussion: How shouldBackend/Run/Chain evolve?
  • Python Backends for disk storage (HDF5,*.proto, ...)
  • C++Backend/Run/Chain interfaces
  • C++ ClickHouse backend (viaclickhouse-cpp)

As the schema and API stabilizes a mid-term goal might be to replace PyMCBaseTrace/MultiTrace entirely to rely onmcbackend.

Getting rid ofMultiTrace was along-term goal behind makingpm.sample(return_inferencedata=True) the default.

Development

First clone the repository and set up a development environment containing the protobuf compiler.

mamba create -n mcb python=3.11 grpcio-tools protobuf -yactivate mcbpip install -r requirements-dev.txtpip install --pre"betterproto[compiler]"pip install -e.

To compile the*.proto files for languages other than Python, check theProtocolBuffers documentation.

The following script compiles them for Python using thebetterproto compiler plugin to get nice-looking dataclasses.It also copies the generated files to the right place inmcbackend.

python protobufs/generate.pypre-commit run --all

To run the tests:

pytest -v

Some tests need a ClickHouse database server running locally.To start one in Docker:

docker run --detach --rm --name mcbackend-db -p 9000:9000 --ulimit nofile=262144:262144 clickhouse/clickhouse-server

About

A backend for storing MCMC draws.

Resources

License

Code of conduct

Stars

Watchers

Forks

Contributors7

Languages


[8]ページ先頭

©2009-2025 Movatter.jp