- Notifications
You must be signed in to change notification settings - Fork6
A backend for storing MCMC draws.
License
pymc-devs/mcbackend
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Where doyou want to store your MCMC draws?In memory?On disk?Or in a database running in a datacenter?
No matter where you want to put them, or which PPL generates them: McBackend takes care of your MCMC samples.
Themcbackend package consists of three parts:
No matter which programming language your favorite PPL is written in, theProtocolBuffers from McBackend can be used to generate code in languages like C++, C#, Python and many more to represent commonly used metadata about MCMC runs, chains and model variables.
The definitions inprotobufs/meta.proto are designed to maximize compatibility withArviZ objects, making it easy to transform MCMC draws stored according to the McBackend schema toInferenceData objects for plotting & analysis.
Thedraws andstats created by MCMC sampling algorithms at runtime need to be storedsomewhere.
This "somewhere" is called the storagebackend in PPLs/MCMC frameworks likePyMC oremcee.
Most storage backends must be initialized with metadata about the model variables so they can, for example, pre-allocated memory for thedraws andstats they're about to receive.After then receiving thousands ofdraws andstats they must then provide methods by which thedraws/stats can be retrieved.
Themcbackend.core module has classes such asBackend,Run, andChain to define these interfaces for any storage backend, no matter if it's an in-memory, filesystem or database storage.Albeit this implementation is currently Python-only, the interface signature should be portable to e.g. C++.
Viamcbackend.backends the McBackend package then provides backendimplementations.Currently you may choose from:
backend=mcbackend.NumPyBackend()backend=mcbackend.ClickHouseBackend(client=clickhouse_driver.Client("localhost") )# All that matters:isinstance(backend,mcbackend.Backend)# >>> True
Anything that is aBackend can be wrapped by anadapter that makes it compatible with your favorite PPL.
In the example below, aClickHouseBackend is initialized to store MCMC draws from a PyMC model in aClickHouse database.See below forhow to run it in Docker.
importclickhouse_driverimportmcbackendimportpymcaspm# 1. Create _any_ kind of backendch_client=clickhouse_driver.Client("localhost")backend=mcbackend.ClickHouseBackend(ch_client)withpm.Model():# 2. Create your model ...# 3. Hit the inference button ™ while passing the backend!pm.sample(trace=backend)
In case of PyMC the adapter lives in the PyMC codebasesince version 5.1.1,so all you need to do is pass anymcbackend.Backend via thepm.sample(trace=...) parameter!
Instead of using PyMC's built-in NumPy backend, the MCMC draws now end up in ClickHouse.
Continuing the example from above we can now retrieve draws from the backend.
Note that since this example wrote the draws to ClickHouse, we could run the code below on another machine, and even while the above model is still sampling!
backend=mcbackend.ClickHouseBackend(ch_client)# Fetch the run from the database (downloads just metadata)run=backend.get_run(trace.run_id)# Get all draws from a chainchain=run.get_chains()[0]chain.get_draws("my favorite variable")# >>> array([ ... ])# Convert everything to `InferenceData`idata=run.to_inferencedata()print(idata)# >>> Inference data with groups:# >>> > posterior# >>> > sample_stats# >>> > observed_data# >>> > constant_data# >>># >>> Warmup iterations saved (warmup_*).
McBackend just started and is looking for contributions.For example:
- Schema discussion: Which metadata is needed? (related:PyMC #5160)
- Interface discussion: How should
Backend/Run/Chainevolve? - Python Backends for disk storage (HDF5,
*.proto, ...) - C++
Backend/Run/Chaininterfaces - C++ ClickHouse backend (via
clickhouse-cpp)
As the schema and API stabilizes a mid-term goal might be to replace PyMCBaseTrace/MultiTrace entirely to rely onmcbackend.
Getting rid ofMultiTrace was along-term goal behind makingpm.sample(return_inferencedata=True) the default.
First clone the repository and set up a development environment containing the protobuf compiler.
mamba create -n mcb python=3.11 grpcio-tools protobuf -yactivate mcbpip install -r requirements-dev.txtpip install --pre"betterproto[compiler]"pip install -e.
To compile the*.proto files for languages other than Python, check theProtocolBuffers documentation.
The following script compiles them for Python using thebetterproto compiler plugin to get nice-looking dataclasses.It also copies the generated files to the right place inmcbackend.
python protobufs/generate.pypre-commit run --all
To run the tests:
pytest -v
Some tests need a ClickHouse database server running locally.To start one in Docker:
docker run --detach --rm --name mcbackend-db -p 9000:9000 --ulimit nofile=262144:262144 clickhouse/clickhouse-server
About
A backend for storing MCMC draws.
Resources
License
Code of conduct
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Uh oh!
There was an error while loading.Please reload this page.
Contributors7
Uh oh!
There was an error while loading.Please reload this page.