Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fixclose error due to race ind_closeall#248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
andreasnoack merged 1 commit intomasterfromtan/misc
Oct 3, 2023
Merged

Conversation

tanmaykm
Copy link
Member

It seems possible thatDistributedArrays.d_closeall() may encounter a condition where it finds a darray id in theregistry, but the corresponding weakref value isnothing because the referenced darray got garbage collected. It has been enountered many times in CI and elsewhere, but is hard to replicate normally. Adding a check for the weakref value, before actually invokingclose on it, to fix it.

fixes#246

@tanmaykm
Copy link
MemberAuthor

The below code that adds some code to regulate GC behavior can recreate the situation.

julia>using Distributedjulia>addprocs(4);julia>@everywherebeginusing Distributed, DistributedArraysendjulia> A=rand(1:100, (100,100));julia> DA=distribute(A);julia> GC.enable(false)truejulia> DA=nothingjulia>function DistributedArrays.d_from_weakref_or_d(id)           d=get(DistributedArrays.registry, id,nothing)           GC.enable(true)           GC.gc()isa(d, WeakRef)&&return d.valuereturn dendjulia> DistributedArrays.d_closeall()ERROR: MethodError: no method matchingclose(::Nothing)Closest candidates are:close(::Union{Base.AsyncCondition, Timer})   @ Base asyncevent.jl:162close(::Union{FileWatching.FileMonitor, FileWatching.FolderMonitor, FileWatching.PollingFileWatcher})   @ FileWatching/data/Work/julia/binaries/julia-1.9.3/share/julia/stdlib/v1.9/FileWatching/src/FileWatching.jl:328close(::DArray)   @ DistributedArrays~/.julia/dev/DistributedArrays/src/core.jl:34...Stacktrace: [1]d_closeall()   @ DistributedArrays~/.julia/dev/DistributedArrays/src/core.jl:47 [2] top-level scope   @ REPL[9]:1

It seems possible that `DistributedArrays.d_closeall()` may encounter a condition where it finds a darray id in the `registry`, but the corresponding weakref value is `nothing` because the referenced darray got garbage collected. It has been enountered many times in CI and elsewhere, but is hard to replicate normally. Adding a check for the weakref value, before actually invoking `close` on it, to fix it.
@andreasnoack
Copy link
Member

Can we be sure that the remote memory is properly freed if the reference has been garbage collected?

@tanmaykm
Copy link
MemberAuthor

Yes, I think so, because the finalizer invokesclose. Did a small test to confirm that:

julia>using Distributedjulia>addprocs(4);julia>@everywherebeginusing Distributed, DistributedArraysfunctionprint_registry_entries()for kinkeys(DistributedArrays.registry)                   hasval=!(DistributedArrays.d_from_weakref_or_d(k)===nothing)println("key$k hasval$hasval")endendendjulia>@everywhereprint_registry_entries()julia> A=rand(1:100, (100,100));julia> DA=distribute(A);julia> GC.enable(false)truejulia> DA=nothingjulia>@everywhereprint_registry_entries()key (1,1) hasvaltrue      From worker5:key (1,1) hasvaltrue      From worker3:key (1,1) hasvaltrue      From worker2:key (1,1) hasvaltrue      From worker4:key (1,1) hasvaltruejulia>function DistributedArrays.d_from_weakref_or_d(id)           d=get(DistributedArrays.registry, id,nothing)           GC.enable(true)           GC.gc()isa(d, WeakRef)&&return d.valuereturn dendjulia>@everywhereprint_registry_entries()key (1,1) hasvalfalse      From worker2:key (1,1) hasvaltrue      From worker3:key (1,1) hasvaltrue      From worker5:key (1,1) hasvaltrue      From worker4:key (1,1) hasvaltruejulia> DistributedArrays.d_closeall()julia>@everywhereprint_registry_entries()

Would it be good to add this as a test?

Copy link
Member

@andreasnoackandreasnoack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it is fine as it is

@andreasnoackandreasnoack merged commit4e82ecf intomasterOct 3, 2023
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@andreasnoackandreasnoackandreasnoack approved these changes

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

MethodError: no method matching close(::Nothing) with Julia 1.9
2 participants
@tanmaykm@andreasnoack

[8]ページ先頭

©2009-2025 Movatter.jp