Support units of measurement in PyMC#7812

drbenvincent started this conversation inIdeas

drbenvincent

Jun 8, 2025

· 3 comments· 4 replies

Return to top

Discussion options

drbenvincent
Jun 8, 2025
Collaborator

Many statistical models (especially in scientific, engineering, and health applications) are built around physical quantities that carry units (e.g., meters, seconds, kilograms). Currently, PyMC treats all variables as unitless, which may lead to misinterpretations or errors when combining data with different units or when interpreting model parameters.

Adding optional support for units would:

Improve model readability and transparency
Enable automated unit-checking to catch errors (e.g., adding kg to m)
Facilitate interpretation of parameter estimates and priors

This could be implemented via integration with existing Python libraries likepint

In general, the idea would be tooptionally specify the units. Initially it might be required to specify the units of all data and parameters and pymc could help with unit checking to avoid errors.

This would be useful as we could check unit consistency, but also make errors less likely (e.g. expressing slope priors in the wrong units).

It could be very fun to exploreunit inference. For example, if you specify the units of data but not the parameters, if we are regressingweight ~ age where weight is in kg and age is in years, the model could infer that the slope is in units of kg/year and the intercept is in units of kg.

I'll leave it there - this is an initial proposal which is intended to spark discussion.

You must be logged in to vote

Replies: 3 comments 4 replies

Comment options

ricardoV94
Jun 8, 2025
Maintainer

Code examples?

You must be logged in to vote

2 replies

Comment options

drbenvincent Jun 8, 2025
Collaborator Author

Something vaguely like this?

importpintimportpymcaspmureg=pint.UnitRegistry()age=pm.Data("age", [10,20,30,40]*ureg.year)weight=pm.Data("weight", [50,60,65,67]*ureg.kg)withpm.Model()asmodel:intercept=pm.Normal("intercept",mu=1.2*ureg.kg,sigma=0.2*ureg.kg)beta=pm.Normal("beta",mu=0.5*ureg.kg/ureg.year,sigma=0.1*ureg.kg/ureg.year)mu=pm.Deterministic("mu",intercept+beta*age)# PyMC infers mu should be in kgpm.Normal("obs",mu=mu,sigma=0.5*ureg.kg,observed=weight)

PyMC does unit checks and throws errors if there are incompatibilities
PyMC optionally infers units of any nodes where units are not provided, or throws an error if it is not possible, asking for units of more variables.
In theory I guess you could allow the intercept's mu to be provided in kg and the sigma in another weight unit and auto-convert, but perhaps emit a warning.
Units would be incorporated in the idata

Comment options

Armavica Jun 13, 2025
Maintainer

A few thoughts:

"PyMC optionally infers units of any nodes where units are not provided": I am not sure how this would work, how would you distinguish a dimensionless variable and a variable with unspecified units?
"PyMC infers mu should be in kg": would there be a way to impose that? something likepm.Deterministic("mu", [...], unit=ureg.kg) that would throw an error if the expression is incompatible?
Perhaps it could also allowunit=ureg.gram, which is compatible because it's also a mass, and make the conversion transparently?

Comment options

ErikRingen
Jun 13, 2025

Chiming in here to say that Ireally like the idea of explicit units. Mis-managing units is a really common error in data analysis, leading to mistakes in published papers (a couple off the top of my head:https://www.pnas.org/doi/10.1073/pnas.1900438116,https://www.sciencedirect.com/science/article/pii/S004565352402811X).

You must be logged in to vote

0 replies

Comment options

jessegrabowski
Jun 13, 2025
Maintainer

I could have sworn there was another discussion thread on this somewhere else started by@williambdean where I put some thoughts on this, but I can't find it now.

First, I love this idea, and I would like to have it. I think there's a ton of powerful stuff we can do with automatic reparameterization if we know units and we know conversions between the units a scientists wants to "think" in and units that are naturally more compatible for sampling. These could form the basis for RV transformations (with appropriate jacobian correction), the same way we handle sampling RVs that don't like on R+.

That said, I think it's something that should be developed on top of pytensor first. We really want to be able to reason graphically about metadata. I've had been thinking mostly about mathematical properties like "strictly postitive" or "real", or matrix structure like "lower triangular", "banded", "block diagonal". But I think units also fits very naturally into this structure, and it's an incredibly exciting direction to go in.

You must be logged in to vote

2 replies