NotificationsYou must be signed in to change notification settings
Fork37
Star251

LockingVarInfo#1078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Draft

mhauru wants to merge29 commits intomain

base:main

Choose a base branch

frommhauru/lockingvarinfo

Draft

LockingVarInfo#1078

mhauru wants to merge29 commits intomainfrommhauru/lockingvarinfo

Conversation

Copy link

Member

mhauru commentedOct 20, 2025•
edited
Loading

A sketch of what using locks to implement a thread-safe VarInfo might look like. The idea is summarised by this:

struct LockingVarInfo{T<:AbstractVarInfo}<:AbstractVarInfo    inner::Ref{T}    lock::ReentrantLockendfunctiongetindex(vi::LockingVarInfo, vn::VarName, dist::Distribution)returngetindex(vi.inner[], vn, dist)endfunction BangBang.setindex!!(vi::LockingVarInfo, vals, vn::VarName)@lock vi.lockbegin        vi.inner[]= BangBang.setindex!!(vi.inner[], vals, vn)endreturn viendfunctionlink(t::AbstractTransformation, vi::LockingVarInfo, args...)returnLockingVarInfo(link(t, vi.inner[], args...))end

In other words, all operations that may modify the VarInfo modifyvi.inner while holdingvi.lock. Everything else just pipes through tovi.inner[].

I have not thought this through carefully, and this certainly doesn't provide proper thread-safety. However, it's good enough to run the following little benchmark:

module Benchusing BenchmarkToolsusing Distributionsusing DynamicPPLusing LinearAlgebrausing RandomRandom.seed!(23)@modelfunctionf(dim)    x~MvNormal(fill(0.0, dim), I)    y=Vector{Float64}(undef, dim)    Threads.@threadsfor iineachindex(x)        y[i]~Normal(x[i])endendm=f(1_000)| (; y=randn(1_000))vi=VarInfo(m)tvi= DynamicPPL.ThreadSafeVarInfo(vi)lvi= DynamicPPL.LockingVarInfo(vi)@showgetlogjoint(last(DynamicPPL.evaluate_threadunsafe!!(m, tvi)))@showgetlogjoint(last(DynamicPPL.evaluate_threadunsafe!!(m, lvi)))@btime DynamicPPL.evaluate_threadunsafe!!(m, tvi)@btime DynamicPPL.evaluate_threadunsafe!!(m, lvi)end

Results on 1 thread:

DynamicPPL.logjoint(m, tvi) = -3377.744304597237DynamicPPL.logjoint(m, lvi) = -3377.744304597237  383.042 μs (9509 allocations: 353.56 KiB)  1.049 ms (20531 allocations: 682.55 KiB)

Results on 8 threads (on a 10 core laptop):

DynamicPPL.logjoint(m, tvi) = -3377.744304597236DynamicPPL.logjoint(m, lvi) = -3377.744304597236  95.500 μs (9544 allocations: 357.34 KiB)  1.923 ms (20566 allocations: 686.33 KiB)

In sum: This is bad. Either I've done something very suboptimally, or locks have large overheads.

cc@yebai,@penelopeysm

penelopeysmand others added29 commits

August 8, 2025 11:21

Bump minor version

b1cdc2a

Merge branch 'main' into breaking

f742103

bump benchmarks compat

5a9e9d2

Merge branch 'main' into breaking

f4db67a

add a skeletal changelog

7b55aa3

InitContext, part 3 - IntroduceInitContext(#981)

991e825

* Implement InitContext* Fix loading order of modules; move `prefix(::Model)` to model.jl* Add tests for InitContext behaviour* inline `rand(::Distributions.Uniform)`Note that, apart from being simpler code, Distributions.Uniform alsodoesn't allow the lower and upper bounds to be exactly equal (but wemight like to keep that option open in DynamicPPL, e.g. if the userwants to initialise all values to the same value in linked space).* Document* Add a test to check that `init!!` doesn't change linking* Fix `push!` for VarNamedVectorThis should have been changed in#940, but slipped through as the filewasn't listed as one of the changed files.* Add some line breaksCo-authored-by: Markus Hauru <markus@mhauru.org>* Add the option of no fallback for ParamsInit* Improve docstrings* typo* `p.default` -> `p.fallback`* Rename `{Prior,Uniform,Params}Init` -> `InitFrom{Prior,Uniform,Params}`---------Co-authored-by: Markus Hauru <markus@mhauru.org>

Merge branch 'main' into breaking

2d18ce3

usevarname_leavesfrom AbstractPPL instead (#1030)

c8e5841

* use `varname_leaves` from AbstractPPL instead* add changelog entry* fix import

tidy occurrences of varname_leaves as well (#1031)

fead2a2

Merge branch 'main' into breaking

1e1cd94

InitContext, part 4 - Useinit!! to replace `evaluate_and_sample!…

729bfba

…!`, `predict`, `returned`, and `initialize_values` (#984)* Replace `evaluate_and_sample!!` -> `init!!`* Use `ParamsInit` for `predict`; remove `setval_and_resample!` and friends* Use `init!!` for initialisation* Paper over the `Sampling->Init` context stack (pending removal of SamplingContext)* Remove SamplingContext from JETExt to avoid triggering `Sampling->Init` pathway* Remove `predict` on vector of VarInfo* Fix some tests* Remove duplicated test* Simplify context testing* Rename FooInit -> InitFromFoo* Fix JETExt* Fix JETExt properly* Fix tests* Improve comments* Remove duplicated tests* Docstring improvementsCo-authored-by: Markus Hauru <markus@mhauru.org>* Concretise `chain_sample_to_varname_dict` using chain value type* Clarify testset name* Re-add comment that shouldn't have vanished* Fix stale Requires dep* Fix default_varinfo/initialisation for odd models* Add comment to src/sampler.jlCo-authored-by: Markus Hauru <markus@mhauru.org>---------Co-authored-by: Markus Hauru <markus@mhauru.org>

Merge branch 'main' into breaking

2ca382e

Merge branch 'main' into breaking

6d43231

Merge branch 'main' into breaking

0114e64

InitContext, part 5 - RemoveSamplingContext, `SampleFrom{Prior,U…

d3d32e4

…niform}`, `{tilde_,}assume` (#985)* Remove `SamplingContext` for good* Remove `tilde_assume` as well* Split up tilde_observe!! for Distribution / Submodel* Tidy up tilde-pipeline methods and docstrings* Fix tests* fix ambiguity* Add changelog* Update HISTORY.mdCo-authored-by: Markus Hauru <markus@mhauru.org>---------Co-authored-by: Markus Hauru <markus@mhauru.org>

fix missing import

5a98037

Shuffle context code around and remove dead code (#1050)

7311465

Delete the"del"flag (#1058)

c08cfa5

* Delete del* Fix a typo* Add HISTORY entry about del

Merge branch 'main' into breaking

11d0b69

Fixes for Turing 0.41 (#1057)

08212a2

* setleafcontext(model, ctx) and various other fixes* fix a bug* Add warning for `initial_parameters=...`

Removeresume_fromanddefault_chain_type(#1061)

7abd5fb

* Remove resume_from* Format* Fix test

remove initial_params warning

908d402

Allow more flexibleinitial_params(#1064)

07aab61

* Enable NamedTuple/Dict initialisation* Add more tests

Merge branch 'main' into breaking

44d092e

Merge branch 'main' into breaking

ec65b4f

fix include_all kwarg for predict, improve perf (#1068)

0fa5540

* Fix `include_all` for predict* Fix include_all for predict, some perf improvements

ReplaceMetadata.flagswithMetadata.trans(#1060)

01bf0bc

* Replace Medata.flags with Metadata.trans* Fix a bug* Fix a typo* Fix two bugs* Rename trans to is_transformed* Rename islinked to is_transformed, remove duplication

Change pointwise_logdensities default key type to VarName (#1071)

9bd8f16

* Change pointwise_logdensities default key type to VarName* Fix a doctest

LockingVarInfo sketch

8fcfaf4

github-actionsbot assignedmhauru

Oct 20, 2025

Copy link

Contributor

github-actionsbot commentedOct 20, 2025

DynamicPPL.jl documentation for PR#1078 is available at:
https://TuringLang.github.io/DynamicPPL.jl/previews/PR1078/

Copy link

Member

penelopeysm commentedOct 20, 2025•
edited
Loading

This is v intersting. Here are my completely unprompted thoughts:

I've never used ReentrantLock before so I'm not sure whether the drop performance is to do with this specific method of concurrency, or whether it's a concurrency thing in general. The past times I've worked with concurrency, it's always via a Channelhttps://docs.julialang.org/en/v1/base/parallel/#Base.Channel. You can queue (or buffer) up to N things that you want to do, and then there's a separate thread that is solely dedicated to doing those things one by one. My suspicion is that the lock is completely unbuffered and I wonder if changing the buffer size makes a difference. (My recollection when working on AbstractMCMC parallel progress bars is that it makes a difference, although unfortunately I don't remember whether it makes it better or worse!)
I didnot implement this so I'm not saying that I think this is the right way to do things or that you should change it. But in my mind when I was thinking about these things, I was thinking that the lock should be at the higher level of tilde_obssume. What that means is that instead of queuing individual getindex / setindex operations, you queue entire tilde_obssume calls. The assumption of course is that everything within a single tilde_obssume is threadsafe. I don't know if that is better or worse. It could be quite conceivably worse.

Copy link

codecovbot commentedOct 20, 2025•
edited
Loading

Codecov Report

❌ Patch coverage is0% with131 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.06%. Comparing base (9bd8f16) to head (8fcfaf4).
⚠️ Report is 2 commits behind head on breaking.

Files with missing lines	Patch %	Lines
src/lockingvarinfo.jl	0.00%	131 Missing⚠️

Additional details and impacted files

@@             Coverage Diff              @@##           breaking    #1078      +/-   ##============================================- Coverage     82.40%   78.06%   -4.34%============================================  Files            42       41       -1       Lines          3791     3848      +57     ============================================- Hits           3124     3004     -120- Misses          667      844     +177

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report?Share it here.

🚀 New features to boost your workflow:

❄️Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

Member

penelopeysm commentedOct 20, 2025

hold on, why is CI failing........ 😱

Copy link

MemberAuthor

mhauru commentedOct 20, 2025•
edited
Loading

A local experiment shows me that about 80% of the time in the 8-core case is due to the@lock vi.lock thing inmap_accumulators!!, which is whataccumulate_obssume!! calls. In case you were wondering, like me, whether the problem was with theRefs or in some unexpected place.

We should try locking at a higher level, like attilde_obbsume level. The nice thing with being lower down is that you can do thelogpdf calls in parallel and only drop to serial execution when accumulating the results inaccumulate_observe!!. We could also try locking at a lower level, namely at the level of a individual accumulators. I have no good intuition for whether these would be better/worse.

I've never usedChannels and it's not obvious to me how to use them here. I guess you would queue something likeaccumulate_observe!! calls? Would need to think about how to manage the non-mutating nature of many accumulators.

Copy link

Member

yebai commentedOct 20, 2025

A quick thought: we could introduce locks at the model level. This would prevent users from using threading, except through an explicit opt-in mechanism such as a hypotheticalx ~ threadsafe_product_distribution(...).

Copy link

Member

penelopeysm commentedOct 20, 2025

The nice thing with being lower down is that you can do thelogpdf calls in parallel and only drop to serial execution when accumulating the results inaccumulate_observe!!

Oh, yes, that's completely correct. I didn't think of that. I guess that means even the current implementation here is too high level right? Because the logpdf calculation is actually done by the accumulator itself, way way inside map_accumulator.

And yeah, channels are pretty ugly to work with. The general structure of channel based code looks like this

Distributed.@syncbegin    Distributed.@asyncbegin# read from the channel and do things with it one at a timeend    Distributed.@asyncbegin# here do whatever you like that writes to the channelendend

I think the whole thing would get lumped inside_evaluate!!, the first async block we would have to custom write, the second one would basically be model evaluation.

Because the Channel must contain concrete objects (not function calls) probably we'd need a struct that contains all the arguments to accumulate_obssume plus a boolean to indicate whether it's an assume or observe.

Copy link

MemberAuthor

mhauru commentedOct 20, 2025

I guess that means even the current implementation here is too high level right? Because the logpdf calculation is actually done by the accumulator itself, way way inside map_accumulator.

Oh yes, silly me.

Another option could be to not even try to support threadedtilde_assume!!, but replaceThreadSafeVarInfo with accumulators that useThreads.Atomic. This could be really simple and easy for LogProbAccumulators.

Copy link

Member

penelopeysm commentedOct 20, 2025

I like the sound of AtomicAccumulator. In a way that is better because we want things like addlogprob to also be threadsafe, not just tildes. I think threadsafe distributions are also a good idea but atm I don't know how to generalise them. It would work very nicely for the iid case though.

Copy link

MemberAuthor

mhauru commentedOct 20, 2025

I think the problem with atomic accumulators is that if we take that road, all accumulators must be made thread-safe. I can easily wrap theLogPriorAccumulator.logp field in aThreads.Atomic and be very happy, but then I'm also committed to makingDebugAccumulator thread-safe and that doesn't sound fun.

A possible way out would be to have an abstract typeThreadSafeAccumulator <: AbstractAccumulator, and anything that is not of that type falls back onto using locks. I don't care if I pay an overhead for using locks on something likePointwiseLogdensityAccumulator. Not sure where to put this fallback locking code though, in such a way that it doesn't complicate and cause overheads in theaccummulate_obssume!! call stack.