Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork5.6k
Insert eager calls tofinalize
for otherwise-dead finalizeable objects#44056
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
17791bb
to1d03b6d
CompareThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I rebasedavi/EASROA
. Maybe rebasing this branch against it would fix the build error?
Uh oh!
There was an error while loading.Please reload this page.
1d03b6d
to9dff04d
CompareLatest push post-rebase still spams |
Error:
|
Per discussion: the issue here is probably that we don't check that the This change will also potentially cause extended lifetimes for some allocations, but that's apparently a general issue that needs resolving. |
ef8f7be
to04f2d48
Compare9dff04d
to9714afc
CompareWe're now factoring in dominance information (hopefully correctly), which appears to have fixed the errors! |
This commit ports [EscapeAnalysis.jl](https://github.com/aviatesk/EscapeAnalysis.jl) into Julia base.You can find the documentation of this escape analysis at [this GitHub page](https://aviatesk.github.io/EscapeAnalysis.jl/dev/)[^1].[^1]: The same documentation will be included into Julia's developer documentation by this commit.This escape analysis will hopefully be an enabling technology for variousmemory-related optimizations at Julia's high level compilation pipeline.Possible target optimization includes alias aware SROA (JuliaLang#43888),array SROA (JuliaLang#43909), `mutating_arrayfreeze` optimization (JuliaLang#42465),stack allocation of mutables, finalizer elision and so on[^2].[^2]: It would be also interesting if LLVM-level optimizations can consume IPO information derived by this escape analysis to broaden optimization possibilities.The primary motivation for porting EA in this PR is to check its impacton latency as well as to get feedbacks from a broader range of developers.The plan is that we first introduce EA in this commit, and then merge thedepending PRs built on top of this commit likeJuliaLang#43888,JuliaLang#43909 andJuliaLang#42465This commit simply defines and runs EA inside Julia base compiler andenables the existing test suite with it. In this commit, we just run EAbefore inlining to generate IPO cache. The depending PRs, EA will beinvoked again after inlining to be used for various local optimizations.
Enhances SROA of mutables using the novel Julia-level escape analysis (on top ofJuliaLang#43800):1. alias-aware SROA, mutable ϕ-node elimination2. `isdefined` check elimination3. load-forwarding for non-eliminable but analyzable mutables---1. alias-aware SROA, mutable ϕ-node eliminationEA's alias analysis allows this new SROA to handle nested mutables allocationspretty well. Now we can eliminate the heap allocations completely fromthis insanely nested examples by the single analysis/optimization pass:```juliajulia> function refs(x) (Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref((x))))))))))))[][][][][][][][][][] endrefs (generic function with 1 method)julia> refs("julia");@allocated refs("julia")0```EA can also analyze escape of ϕ-node as well as its aliasing.Mutable ϕ-nodes would be eliminated even for a very tricky case as like:```juliajulia> code_typed((Bool,String,)) do cond, x # these allocation form multiple ϕ-nodes if cond ϕ2 = ϕ1 = Ref{Any}("foo") else ϕ2 = ϕ1 = Ref{Any}("bar") end ϕ2[] = x y = ϕ1[] # => x return y end1-element Vector{Any}: CodeInfo(1 ─ gotoJuliaLang#3 if not cond2 ─ gotoJuliaLang#43 ─ nothing::Nothing4 ┄ return x) => Any```Combined with the alias analysis and ϕ-node handling above,allocations in the following "realistic" examples will be optimized:```juliajulia> # demonstrate the power of our field / alias analysis with realistic end to end examples # adapted fromhttp://wiki.luajit.org/Allocation-Sinking-Optimization#implementation%5B abstract type AbstractPoint{T} endjulia> struct Point{T} <: AbstractPoint{T} x::T y::T endjulia> mutable struct MPoint{T} <: AbstractPoint{T} x::T y::T endjulia> add(a::P, b::P) where P<:AbstractPoint = P(a.x + b.x, a.y + b.y);julia> function compute_point(T, n, ax, ay, bx, by) a = T(ax, ay) b = T(bx, by) for i in 0:(n-1) a = add(add(a, b), b) end a.x, a.y end;julia> function compute_point(n, a, b) for i in 0:(n-1) a = add(add(a, b), b) end a.x, a.y end;julia> function compute_point!(n, a, b) for i in 0:(n-1) a′ = add(add(a, b), b) a.x = a′.x a.y = a′.y end end;julia> compute_point(MPoint, 10, 1+.5, 2+.5, 2+.25, 4+.75);julia> compute_point(MPoint, 10, 1+.5im, 2+.5im, 2+.25im, 4+.75im);julia>@allocated compute_point(MPoint, 10000, 1+.5, 2+.5, 2+.25, 4+.75)0julia>@allocated compute_point(MPoint, 10000, 1+.5im, 2+.5im, 2+.25im, 4+.75im)0julia> compute_point(10, MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75));julia> compute_point(10, MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im));julia>@allocated compute_point(10000, MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75))0julia>@allocated compute_point(10000, MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im))0julia> af, bf = MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75);julia> ac, bc = MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im);julia> compute_point!(10, af, bf);julia> compute_point!(10, ac, bc);julia>@allocated compute_point!(10000, af, bf)0julia>@allocated compute_point!(10000, ac, bc)0```2. `isdefined` check eliminationThis commit also implements a simple optimization to eliminate`isdefined` call by checking load-fowardability.This optimization may be especially useful to eliminate extra allocationinvolved with a capturing closure, e.g.:```juliajulia> callit(f, args...) = f(args...);julia> function isdefined_elim() local arr::Vector{Any} callit() do arr = Any[] end return arr end;julia> code_typed(isdefined_elim)1-element Vector{Any}: CodeInfo(1 ─ %1 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Any}, svec(Any, Int64), 0, :(:ccall), Vector{Any}, 0, 0))::Vector{Any}└── gotoJuliaLang#3 if not true2 ─ gotoJuliaLang#43 ─ $(Expr(:throw_undef_if_not, :arr, false))::Any4 ┄ return %1) => Vector{Any}```3. load-forwarding for non-eliminable but analyzable mutablesEA also allows us to forward loads even when the mutable allocationcan't be eliminated but still its fields are known precisely.The load forwarding might be useful since it may derive new type informationthat succeeding optimization passes can use (or just because it allowssimpler code transformations down the load):```juliajulia> code_typed((Bool,String,)) do c, s r = Ref{Any}(s) if c return r[]::String # adce_pass! will further eliminate this type assert call also else return r end end1-element Vector{Any}: CodeInfo(1 ─ %1 = %new(Base.RefValue{Any}, s)::Base.RefValue{Any}└── gotoJuliaLang#3 if not c2 ─ return s3 ─ return %1) => Union{Base.RefValue{Any}, String}```---Please refer to the newly added test cases for more examples.Also, EA's alias analysis already succeeds to reason about arrays, andso this EA-based SROA will hopefully be generalized for array SROA as well.
Co-authored-by: Shuhei Kadowaki <aviatesk@gmail.com>
9714afc
toea0aaab
CompareNote that calling If you can prove that the object has only known finalizers, and you know what those are, you should be able to call the finalizer directly without going through the normal GC logic. If you can't prove that exception won't occur, which you probably won't be able to prove at this level, you can just set a flag in the GC to tell it to not run any currently registered finalizers ( Also, as I've said many time before, it is fairly easy to effectively let the GC know about these object. In most cases all what you need to do is to call |
AriMKatz commentedMar 7, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@chflood any thoughts on this? (Particularly wrt to GPU) |
Yes, MemPool.jl is using a global non-reentrant lock (taken from CUDAdrv/CUDAnative originally), which is taken during allocation, and during finalization. I wouldn't mind changing it to a regular |
In principal I love the idea of escape analysis stack allocating objects that go away without garbage collector intervention however objects with finalizers pose special issues that I don't fully understand. I'm still learning Julia so please indulge my questions. Does Julia have a rule about finalizers running exactly once like Java does? Can a finalizer bring an object back to life by stashing it somewhere? My concern is that the GC might accidentally run a finalizer again on a zombie object. There are also memory model issues in Java as detailedhere which may or may not be applicable to Julia. If you run the finalizer without some sort of memory barrier is it possible that instructions may be reordered in incorrect ways? GC provides that memory barrier. |
We don't appear to document such a property, (or really anything about how finalizers are run).https://docs.julialang.org/en/v1.9-dev/base/base/#Base.finalizer andhttps://docs.julialang.org/en/v1.9-dev/manual/multi-threading/#Safe-use-of-Finalizers are the only places where they are documented at all. We should probably figure out what properties we want to guarantee and document them. |
It appears to me that the current implementation of |
25e4b37
toe708e23
Compare
C api should be fairly straightforward as well. The issue with the pr as is is that it will introduce regression.
yes.
no that is not supposed to happen, each finalizer will run only once. It is removed from the list before being called.
not just that, finalizer can run at any time that a gc can run. There were proposals about running hem on separate thread but it is not done and still have issues regarding gc triggered on the finalizer thread. |
9c84ddc
tocdef102
CompareThis is a variant of the eager-finalization idea(e.g. as seen in#44056), but with a focus on the mechanismof finalizer insertion, since I need a similar pass downstream.Integration of EscapeAnalysis is left to#44056.My motivation for this change is somewhat different. In particular,I want to be able to insert finalize call such that I cansubsequently SROA the mutable object. This requires a coupledesign points that are more stringent than the pass from#44056,so I decided to prototype them as an independent PR. The primarythings I need here that are not seen in#44056 are:- The ability to forgo finalizer registration with the runtime entirely (requires additional legality analyis)- The ability to inline the registered finalizer at the deallocation point (to enable subsequent SROA)To this end, adding a finalizer is promoted to a builtinthat is recognized by inference and inlining (such that inferencecan produce an inferred version of the finalizer for inlining).The current status is that this fixes the minimal example I wantedto have work, but does not yet extend to the motivating case I had.Nevertheless, I felt that this was a good checkpoint to synchronizewith other efforts along these lines.Currently working demo:```julia> const total_deallocations = Ref{Int}(0)Base.RefValue{Int64}(0)julia> mutable struct DoAlloc function DoAlloc() this = new() Core._add_finalizer(this, function(this) global total_deallocations[] += 1 end) return this end endjulia> function foo() for i = 1:1000 DoAlloc() end endfoo (generic function with 1 method)julia> @code_llvm foo(); @ REPL[3]:1 within `foo`define void @julia_foo_111() #0 {top: %.promoted = load i64, i64* inttoptr (i64 140370001753968 to i64*), align 16; @ REPL[3]:2 within `foo` %0 = add i64 %.promoted, 1000; @ REPL[3] within `foo` store i64 %0, i64* inttoptr (i64 140370001753968 to i64*), align 16; @ REPL[3]:4 within `foo` ret void}```
This is a variant of the eager-finalization idea(e.g. as seen in#44056), but with a focus on the mechanismof finalizer insertion, since I need a similar pass downstream.Integration of EscapeAnalysis is left to#44056.My motivation for this change is somewhat different. In particular,I want to be able to insert finalize call such that I cansubsequently SROA the mutable object. This requires a coupledesign points that are more stringent than the pass from#44056,so I decided to prototype them as an independent PR. The primarythings I need here that are not seen in#44056 are:- The ability to forgo finalizer registration with the runtime entirely (requires additional legality analyis)- The ability to inline the registered finalizer at the deallocation point (to enable subsequent SROA)To this end, adding a finalizer is promoted to a builtinthat is recognized by inference and inlining (such that inferencecan produce an inferred version of the finalizer for inlining).The current status is that this fixes the minimal example I wantedto have work, but does not yet extend to the motivating case I had.Nevertheless, I felt that this was a good checkpoint to synchronizewith other efforts along these lines.Currently working demo:```julia> const total_deallocations = Ref{Int}(0)Base.RefValue{Int64}(0)julia> mutable struct DoAlloc function DoAlloc() this = new() Core._add_finalizer(this, function(this) global total_deallocations[] += 1 end) return this end endjulia> function foo() for i = 1:1000 DoAlloc() end endfoo (generic function with 1 method)julia> @code_llvm foo(); @ REPL[3]:1 within `foo`define void @julia_foo_111() #0 {top: %.promoted = load i64, i64* inttoptr (i64 140370001753968 to i64*), align 16; @ REPL[3]:2 within `foo` %0 = add i64 %.promoted, 1000; @ REPL[3] within `foo` store i64 %0, i64* inttoptr (i64 140370001753968 to i64*), align 16; @ REPL[3]:4 within `foo` ret void}```
This is a variant of the eager-finalization idea(e.g. as seen in#44056), but with a focus on the mechanismof finalizer insertion, since I need a similar pass downstream.Integration of EscapeAnalysis is left to#44056.My motivation for this change is somewhat different. In particular,I want to be able to insert finalize call such that I cansubsequently SROA the mutable object. This requires a coupledesign points that are more stringent than the pass from#44056,so I decided to prototype them as an independent PR. The primarythings I need here that are not seen in#44056 are:- The ability to forgo finalizer registration with the runtime entirely (requires additional legality analyis)- The ability to inline the registered finalizer at the deallocation point (to enable subsequent SROA)To this end, adding a finalizer is promoted to a builtinthat is recognized by inference and inlining (such that inferencecan produce an inferred version of the finalizer for inlining).The current status is that this fixes the minimal example I wantedto have work, but does not yet extend to the motivating case I had.Nevertheless, I felt that this was a good checkpoint to synchronizewith other efforts along these lines.Currently working demo:```julia> const total_deallocations = Ref{Int}(0)Base.RefValue{Int64}(0)julia> mutable struct DoAlloc function DoAlloc() this = new() Core._add_finalizer(this, function(this) global total_deallocations[] += 1 end) return this end endjulia> function foo() for i = 1:1000 DoAlloc() end endfoo (generic function with 1 method)julia> @code_llvm foo(); @ REPL[3]:1 within `foo`define void @julia_foo_111() #0 {top: %.promoted = load i64, i64* inttoptr (i64 140370001753968 to i64*), align 16; @ REPL[3]:2 within `foo` %0 = add i64 %.promoted, 1000; @ REPL[3] within `foo` store i64 %0, i64* inttoptr (i64 140370001753968 to i64*), align 16; @ REPL[3]:4 within `foo` ret void}```
This is a variant of the eager-finalization idea(e.g. as seen in#44056), but with a focus on the mechanismof finalizer insertion, since I need a similar pass downstream.Integration of EscapeAnalysis is left to#44056.My motivation for this change is somewhat different. In particular,I want to be able to insert finalize call such that I cansubsequently SROA the mutable object. This requires a coupledesign points that are more stringent than the pass from#44056,so I decided to prototype them as an independent PR. The primarythings I need here that are not seen in#44056 are:- The ability to forgo finalizer registration with the runtime entirely (requires additional legality analyis)- The ability to inline the registered finalizer at the deallocation point (to enable subsequent SROA)To this end, adding a finalizer is promoted to a builtinthat is recognized by inference and inlining (such that inferencecan produce an inferred version of the finalizer for inlining).The current status is that this fixes the minimal example I wantedto have work, but does not yet extend to the motivating case I had.Nevertheless, I felt that this was a good checkpoint to synchronizewith other efforts along these lines.Currently working demo:```julia> const total_deallocations = Ref{Int}(0)Base.RefValue{Int64}(0)julia> mutable struct DoAlloc function DoAlloc() this = new() Core._add_finalizer(this, function(this) global total_deallocations[] += 1 end) return this end endjulia> function foo() for i = 1:1000 DoAlloc() end endfoo (generic function with 1 method)julia> @code_llvm foo(); @ REPL[3]:1 within `foo`define void @julia_foo_111() #0 {top: %.promoted = load i64, i64* inttoptr (i64 140370001753968 to i64*), align 16; @ REPL[3]:2 within `foo` %0 = add i64 %.promoted, 1000; @ REPL[3] within `foo` store i64 %0, i64* inttoptr (i64 140370001753968 to i64*), align 16; @ REPL[3]:4 within `foo` ret void}```
This is a variant of the eager-finalization idea(e.g. as seen in#44056), but with a focus on the mechanismof finalizer insertion, since I need a similar pass downstream.Integration of EscapeAnalysis is left to#44056.My motivation for this change is somewhat different. In particular,I want to be able to insert finalize call such that I cansubsequently SROA the mutable object. This requires a coupledesign points that are more stringent than the pass from#44056,so I decided to prototype them as an independent PR. The primarythings I need here that are not seen in#44056 are:- The ability to forgo finalizer registration with the runtime entirely (requires additional legality analyis)- The ability to inline the registered finalizer at the deallocation point (to enable subsequent SROA)To this end, adding a finalizer is promoted to a builtinthat is recognized by inference and inlining (such that inferencecan produce an inferred version of the finalizer for inlining).The current status is that this fixes the minimal example I wantedto have work, but does not yet extend to the motivating case I had.Nevertheless, I felt that this was a good checkpoint to synchronizewith other efforts along these lines.Currently working demo:```julia> const total_deallocations = Ref{Int}(0)Base.RefValue{Int64}(0)julia> mutable struct DoAlloc function DoAlloc() this = new() Core._add_finalizer(this, function(this) global total_deallocations[] += 1 end) return this end endjulia> function foo() for i = 1:1000 DoAlloc() end endfoo (generic function with 1 method)julia> @code_llvm foo(); @ REPL[3]:1 within `foo`define void @julia_foo_111() #0 {top: %.promoted = load i64, i64* inttoptr (i64 140370001753968 to i64*), align 16; @ REPL[3]:2 within `foo` %0 = add i64 %.promoted, 1000; @ REPL[3] within `foo` store i64 %0, i64* inttoptr (i64 140370001753968 to i64*), align 16; @ REPL[3]:4 within `foo` ret void}```
This is a variant of the eager-finalization idea(e.g. as seen in#44056), but with a focus on the mechanismof finalizer insertion, since I need a similar pass downstream.Integration of EscapeAnalysis is left to#44056.My motivation for this change is somewhat different. In particular,I want to be able to insert finalize call such that I cansubsequently SROA the mutable object. This requires a coupledesign points that are more stringent than the pass from#44056,so I decided to prototype them as an independent PR. The primarythings I need here that are not seen in#44056 are:- The ability to forgo finalizer registration with the runtime entirely (requires additional legality analyis)- The ability to inline the registered finalizer at the deallocation point (to enable subsequent SROA)To this end, adding a finalizer is promoted to a builtinthat is recognized by inference and inlining (such that inferencecan produce an inferred version of the finalizer for inlining).The current status is that this fixes the minimal example I wantedto have work, but does not yet extend to the motivating case I had.Nevertheless, I felt that this was a good checkpoint to synchronizewith other efforts along these lines.Currently working demo:```julia> const total_deallocations = Ref{Int}(0)Base.RefValue{Int64}(0)julia> mutable struct DoAlloc function DoAlloc() this = new() Core._add_finalizer(this, function(this) global total_deallocations[] += 1 end) return this end endjulia> function foo() for i = 1:1000 DoAlloc() end endfoo (generic function with 1 method)julia> @code_llvm foo(); @ REPL[3]:1 within `foo`define void @julia_foo_111() #0 {top: %.promoted = load i64, i64* inttoptr (i64 140370001753968 to i64*), align 16; @ REPL[3]:2 within `foo` %0 = add i64 %.promoted, 1000; @ REPL[3] within `foo` store i64 %0, i64* inttoptr (i64 140370001753968 to i64*), align 16; @ REPL[3]:4 within `foo` ret void}```
This is a variant of the eager-finalization idea(e.g. as seen in#44056), but with a focus on the mechanismof finalizer insertion, since I need a similar pass downstream.Integration of EscapeAnalysis is left to#44056.My motivation for this change is somewhat different. In particular,I want to be able to insert finalize call such that I cansubsequently SROA the mutable object. This requires a coupledesign points that are more stringent than the pass from#44056,so I decided to prototype them as an independent PR. The primarythings I need here that are not seen in#44056 are:- The ability to forgo finalizer registration with the runtime entirely (requires additional legality analyis)- The ability to inline the registered finalizer at the deallocation point (to enable subsequent SROA)To this end, adding a finalizer is promoted to a builtinthat is recognized by inference and inlining (such that inferencecan produce an inferred version of the finalizer for inlining).The current status is that this fixes the minimal example I wantedto have work, but does not yet extend to the motivating case I had.Nevertheless, I felt that this was a good checkpoint to synchronizewith other efforts along these lines.Currently working demo:```julia> const total_deallocations = Ref{Int}(0)Base.RefValue{Int64}(0)julia> mutable struct DoAlloc function DoAlloc() this = new() Core._add_finalizer(this, function(this) global total_deallocations[] += 1 end) return this end endjulia> function foo() for i = 1:1000 DoAlloc() end endfoo (generic function with 1 method)julia> @code_llvm foo(); @ REPL[3]:1 within `foo`define void @julia_foo_111() #0 {top: %.promoted = load i64, i64* inttoptr (i64 140370001753968 to i64*), align 16; @ REPL[3]:2 within `foo` %0 = add i64 %.promoted, 1000; @ REPL[3] within `foo` store i64 %0, i64* inttoptr (i64 140370001753968 to i64*), align 16; @ REPL[3]:4 within `foo` ret void}```
This is a variant of the eager-finalization idea(e.g. as seen in#44056), but with a focus on the mechanismof finalizer insertion, since I need a similar pass downstream.Integration of EscapeAnalysis is left to#44056.My motivation for this change is somewhat different. In particular,I want to be able to insert finalize call such that I cansubsequently SROA the mutable object. This requires a coupledesign points that are more stringent than the pass from#44056,so I decided to prototype them as an independent PR. The primarythings I need here that are not seen in#44056 are:- The ability to forgo finalizer registration with the runtime entirely (requires additional legality analyis)- The ability to inline the registered finalizer at the deallocation point (to enable subsequent SROA)To this end, adding a finalizer is promoted to a builtinthat is recognized by inference and inlining (such that inferencecan produce an inferred version of the finalizer for inlining).The current status is that this fixes the minimal example I wantedto have work, but does not yet extend to the motivating case I had.Nevertheless, I felt that this was a good checkpoint to synchronizewith other efforts along these lines.Currently working demo:```julia> const total_deallocations = Ref{Int}(0)Base.RefValue{Int64}(0)julia> mutable struct DoAlloc function DoAlloc() this = new() Core._add_finalizer(this, function(this) global total_deallocations[] += 1 end) return this end endjulia> function foo() for i = 1:1000 DoAlloc() end endfoo (generic function with 1 method)julia> @code_llvm foo(); @ REPL[3]:1 within `foo`define void @julia_foo_111() #0 {top: %.promoted = load i64, i64* inttoptr (i64 140370001753968 to i64*), align 16; @ REPL[3]:2 within `foo` %0 = add i64 %.promoted, 1000; @ REPL[3] within `foo` store i64 %0, i64* inttoptr (i64 140370001753968 to i64*), align 16; @ REPL[3]:4 within `foo` ret void}```
This is a variant of the eager-finalization idea(e.g. as seen in#44056), but with a focus on the mechanismof finalizer insertion, since I need a similar pass downstream.Integration of EscapeAnalysis is left to#44056.My motivation for this change is somewhat different. In particular,I want to be able to insert finalize call such that I cansubsequently SROA the mutable object. This requires a coupledesign points that are more stringent than the pass from#44056,so I decided to prototype them as an independent PR. The primarythings I need here that are not seen in#44056 are:- The ability to forgo finalizer registration with the runtime entirely (requires additional legality analyis)- The ability to inline the registered finalizer at the deallocation point (to enable subsequent SROA)To this end, adding a finalizer is promoted to a builtinthat is recognized by inference and inlining (such that inferencecan produce an inferred version of the finalizer for inlining).The current status is that this fixes the minimal example I wantedto have work, but does not yet extend to the motivating case I had.Nevertheless, I felt that this was a good checkpoint to synchronizewith other efforts along these lines.Currently working demo:```julia> const total_deallocations = Ref{Int}(0)Base.RefValue{Int64}(0)julia> mutable struct DoAlloc function DoAlloc() this = new() Core._add_finalizer(this, function(this) global total_deallocations[] += 1 end) return this end endjulia> function foo() for i = 1:1000 DoAlloc() end endfoo (generic function with 1 method)julia> @code_llvm foo(); @ REPL[3]:1 within `foo`define void @julia_foo_111() #0 {top: %.promoted = load i64, i64* inttoptr (i64 140370001753968 to i64*), align 16; @ REPL[3]:2 within `foo` %0 = add i64 %.promoted, 1000; @ REPL[3] within `foo` store i64 %0, i64* inttoptr (i64 140370001753968 to i64*), align 16; @ REPL[3]:4 within `foo` ret void}```
* Eager finalizer insertionThis is a variant of the eager-finalization idea(e.g. as seen in#44056), but with a focus on the mechanismof finalizer insertion, since I need a similar pass downstream.Integration of EscapeAnalysis is left to#44056.My motivation for this change is somewhat different. In particular,I want to be able to insert finalize call such that I cansubsequently SROA the mutable object. This requires a coupledesign points that are more stringent than the pass from#44056,so I decided to prototype them as an independent PR. The primarythings I need here that are not seen in#44056 are:- The ability to forgo finalizer registration with the runtime entirely (requires additional legality analyis)- The ability to inline the registered finalizer at the deallocation point (to enable subsequent SROA)To this end, adding a finalizer is promoted to a builtinthat is recognized by inference and inlining (such that inferencecan produce an inferred version of the finalizer for inlining).The current status is that this fixes the minimal example I wantedto have work, but does not yet extend to the motivating case I had.Nevertheless, I felt that this was a good checkpoint to synchronizewith other efforts along these lines.Currently working demo:```julia> const total_deallocations = Ref{Int}(0)Base.RefValue{Int64}(0)julia> mutable struct DoAlloc function DoAlloc() this = new() Core._add_finalizer(this, function(this) global total_deallocations[] += 1 end) return this end endjulia> function foo() for i = 1:1000 DoAlloc() end endfoo (generic function with 1 method)julia> @code_llvm foo(); @ REPL[3]:1 within `foo`define void @julia_foo_111() #0 {top: %.promoted = load i64, i64* inttoptr (i64 140370001753968 to i64*), align 16; @ REPL[3]:2 within `foo` %0 = add i64 %.promoted, 1000; @ REPL[3] within `foo` store i64 %0, i64* inttoptr (i64 140370001753968 to i64*), align 16; @ REPL[3]:4 within `foo` ret void}```* rm redundant copyCo-authored-by: Shuhei Kadowaki <40514306+aviatesk@users.noreply.github.com>
Abandoned in favor of#45272 |
Finalizers are a great way to defer the freeing side of memory management until some later point; however, they can have unpredictable behavior when the data that they free is not fully known to the GC (e.g. GPU allocations, or distributed references). This can result in behavior like out-of-memory situations, excessive memory usage, and sometimes more costly freeing behavior (in the event that locks need to be taken).
This seems like a bad situation, but there is a silver lining: some code patterns which allocate such objects don't actually need the allocations to stick around very long, and the lifetime of the object could (in theory) be statically determined by the compiler. Thankfully, with the ongoing work of integrating EscapeAnalysis.jl into the optimizer, we can use the generated escape information to improve this situation.
This PR uses escape info from EA to determine when an object has an attached finalizer, and when its lifetime is provably finite (i.e. the object does not escape the analyzed scope). For such objects, we can insert an early call to
finalize(obj)
at the end ofobj
's lifetime, which will allow the object's finalizer to be enqueued for execution immediately, minimizing how long finalizeable object stay live in the GC.