Visit thisblog post for a presentation of Julia multi-threading features.
By default, Julia starts up with 2 threads of execution; 1 worker thread and 1 interactive thread. This can be verified by using the commandThreads.nthreads():
julia> Threads.nthreads(:default)1julia> Threads.nthreads(:interactive)1The number of execution threads is controlled either by using the-t/--threads command line argument or by using theJULIA_NUM_THREADS environment variable. When both are specified, then-t/--threads takes precedence.
The number of threads can either be specified as an integer (--threads=4) or asauto (--threads=auto), whereauto tries to infer a useful default number of threads to use (seeCommand-line Options for more details).
Seethreadpools for how to control how many:default and:interactive threads are in each threadpool.
The-t/--threads command line argument requires at least Julia 1.5. In older versions you must use the environment variable instead.
Usingauto as value of the environment variableJULIA_NUM_THREADS requires at least Julia 1.7. In older versions, this value is ignored.
Starting by default with 1 interactive thread, as well as the 1 worker thread, was made as such in Julia 1.12 If the number of threads is set to 1 by either doing-t1 orJULIA_NUM_THREADS=1 an interactive thread will not be spawned.
Lets start Julia with 4 threads:
$ julia --threads 4Let's verify there are 4 threads at our disposal.
julia> Threads.nthreads()4But we are currently on the master thread. To check, we use the functionThreads.threadid
julia> Threads.threadid()1If you prefer to use the environment variable you can set it as follows in Bash (Linux/macOS):
export JULIA_NUM_THREADS=4C shell on Linux/macOS, CMD on Windows:
set JULIA_NUM_THREADS=4Powershell on Windows:
$env:JULIA_NUM_THREADS=4Note that this must be donebefore starting Julia.
The number of threads specified with-t/--threads is propagated to worker processes that are spawned using the-p/--procs or--machine-file command line options. For example,julia -p2 -t2 spawns 1 main process with 2 worker processes, and all three processes have 2 threads enabled. For more fine grained control over worker threads useaddprocs and pass-t/--threads asexeflags.
The Garbage Collector (GC) can use multiple threads. The amount used by default matches the compute worker threads or can configured by either the--gcthreads command line argument or by using theJULIA_NUM_GC_THREADS environment variable.
For more details about garbage collection configuration and performance tuning, seeMemory Management and Garbage Collection.
When a program's threads are busy with many tasks to run, tasks may experience delays which may negatively affect the responsiveness and interactivity of the program. To address this, you can specify that a task is interactive when youThreads.@spawn it:
using Base.Threads@spawn :interactive f()Interactive tasks should avoid performing high latency operations, and if they are long duration tasks, should yield frequently.
By default Julia starts with one interactive thread reserved to run interactive tasks, but that number can be controlled with:
$ julia --threads 3,1julia> Threads.nthreads(:interactive)1$ julia --threads 3,0julia> Threads.nthreads(:interactive)0The environment variableJULIA_NUM_THREADS can also be used similarly:
export JULIA_NUM_THREADS=3,1This starts Julia with 3 threads in the:default threadpool and 1 thread in the:interactive threadpool:
julia> using Base.Threadsjulia> nthreadpools()2julia> threadpool() # the main thread is in the interactive thread pool:interactivejulia> nthreads(:default)3julia> nthreads(:interactive)1julia> nthreads()3Explicitly asking for 1 thread by doing-t1 orJULIA_NUM_THREADS=1 does not add an interactive thread.
Depending on whether Julia has been started with interactive threads, the main thread is either in the default or interactive thread pool.
Either or both numbers can be replaced with the wordauto, which causes Julia to choose a reasonable default.
@threads MacroLet's work a simple example using our native threads. Let us create an array of zeros:
julia> a = zeros(10)10-element Vector{Float64}: 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0Let us operate on this array simultaneously using 4 threads. We'll have each thread write its thread ID into each location.
Julia supports parallel loops using theThreads.@threads macro. This macro is affixed in front of afor loop to indicate to Julia that the loop is a multi-threaded region:
julia> Threads.@threads for i = 1:10 a[i] = Threads.threadid() endThe iteration space is split among the threads, after which each thread writes its thread ID to its assigned locations:
julia> a10-element Vector{Float64}: 1.0 1.0 1.0 2.0 2.0 2.0 3.0 3.0 4.0 4.0Note thatThreads.@threads does not have an optional reduction parameter like@distributed.
@threads without data-racesThe concept of a data-race is elaborated on in"Communication and data races between threads". For now, just known that a data race can result in incorrect results and dangerous errors.
Lets say we want to make the functionsum_single below multithreaded.
julia> function sum_single(a) s = 0 for i in a s += i end s endsum_single (generic function with 1 method)julia> sum_single(1:1_000_000)500000500000Simply adding@threads exposes a data race with multiple threads reading and writings at the same time.
julia> function sum_multi_bad(a) s = 0 Threads.@threads for i in a s += i end s endsum_multi_bad (generic function with 1 method)julia> sum_multi_bad(1:1_000_000)70140554652Note that the result is not500000500000 as it should be, and will most likely change each evaluation.
To fix this, buffers that are specific to the task may be used to segment the sum into chunks that are race-free. Heresum_single is reused, with its own internal buffers. The input vectora is split into at mostnthreads() chunks for parallel work. We then useThreads.@spawn to create tasks that individually sum each chunk. Finally, we sum the results from each task usingsum_single again:
julia> function sum_multi_good(a) chunks = Iterators.partition(a, cld(length(a), Threads.nthreads())) tasks = map(chunks) do chunk Threads.@spawn sum_single(chunk) end chunk_sums = fetch.(tasks) return sum_single(chunk_sums) endsum_multi_good (generic function with 1 method)julia> sum_multi_good(1:1_000_000)500000500000Buffers should not be managed based onthreadid() i.e.buffers = zeros(Threads.nthreads()) because concurrent tasks can yield, meaning multiple concurrent tasks may use the same buffer on a given thread, introducing risk of data races. Further, when more than one thread is available tasks may change thread at yield points, which is known astask migration.
Another option is the use of atomic operations on variables shared across tasks/threads, which may be more performant depending on the characteristics of the operations.
Although Julia's threads can communicate through shared memory, it is notoriously difficult to write correct and data-race free multi-threaded code. Julia'sChannels are thread-safe and may be used to communicate safely. There are also sections below that explain how to uselocks andatomics to avoid data-races.
In certain cases, Julia is able to detect a detect safety violations, in particular in regards to deadlocks or other known-unsafe operations such as yielding to the currently running task. In these cases, aConcurrencyViolationError is thrown.
You are entirely responsible for ensuring that your program is data-race free, and nothing promised here can be assumed if you do not observe that requirement. The observed results may be highly unintuitive.
If data-races are introduced, Julia is not memory safe.Be very careful about readingany data if another thread might write to it, as it could result in segmentation faults or worse. Below are a couple of unsafe ways to access global variables from different threads:
Thread 1:global b = falseglobal a = rand()global b = trueThread 2:while !b; endbad_read1(a) # it is NOT safe to access `a` here!Thread 3:while !@isdefined(a); endbad_read2(a) # it is NOT safe to access `a` hereAn important tool to avoid data-races, and thereby write thread-safe code, is the concept of a "lock". A lock can be locked and unlocked. If a thread has locked a lock, and not unlocked it, it is said to "hold" the lock. If there is only one lock, and we write code the requires holding the lock to access some data, we can ensure that multiple threads will never access the same data simultaneously. Note that the link between a lock and a variable is made by the programmer, and not the program.
For example, we can create a lockmy_lock, and lock it while we mutate a variablemy_variable. This is done most simply with the@lock macro:
julia> my_lock = ReentrantLock();julia> my_variable = [1, 2, 3];julia> @lock my_lock my_variable[1] = 100100By using a similar pattern with the same lock and variable, but on another thread, the operations are free from data-races.
We could have performed the operation above with the functional version oflock, in the following two ways:
julia> lock(my_lock) do my_variable[1] = 100 end100julia> begin lock(my_lock) try my_variable[1] = 100 finally unlock(my_lock) end end100All three options are equivalent. Note how the final version requires an explicittry-block to ensure that the lock is always unlocked, whereas the first two version do this internally. One should always use the lock pattern above when changing data (such as assigning to a global or closure variable) accessed by other threads. Failing to do this could have unforeseen and serious consequences.
Julia supports accessing and modifying valuesatomically, that is, in a thread-safe way to avoidrace conditions. A value (which must be of a primitive type) can be wrapped asThreads.Atomic to indicate it must be accessed in this way. Here we can see an example:
julia> i = Threads.Atomic{Int}(0);julia> ids = zeros(4);julia> old_is = zeros(4);julia> Threads.@threads for id in 1:4 old_is[id] = Threads.atomic_add!(i, id) ids[id] = id endjulia> old_is4-element Vector{Float64}: 0.0 1.0 7.0 3.0julia> i[] 10julia> ids4-element Vector{Float64}: 1.0 2.0 3.0 4.0Had we tried to do the addition without the atomic tag, we might have gotten the wrong answer due to a race condition. An example of what would happen if we didn't avoid the race:
julia> using Base.Threadsjulia> Threads.nthreads()4julia> acc = Ref(0)Base.RefValue{Int64}(0)julia> @threads for i in 1:1000 acc[] += 1 endjulia> acc[]926julia> acc = Atomic{Int64}(0)Atomic{Int64}(0)julia> @threads for i in 1:1000 atomic_add!(acc, 1) endjulia> acc[]1000We can also use atomics on a more granular level using the@atomic,@atomicswap,@atomicreplace macros, and@atomiconce macros.
Specific details of the memory model and other details of the design are written in theJulia Atomics Manifesto, which will later be published formally.
Any field in a struct declaration can be decorated with@atomic, and then any write must be marked with@atomic also, and must use one of the defined atomic orderings (:monotonic,:acquire,:release,:acquire_release, or:sequentially_consistent). Any read of an atomic field can also be annotated with an atomic ordering constraint, or will be done with monotonic (relaxed) ordering if unspecified.
When using multi-threading we have to be careful when using functions that are notpure as we might get a wrong answer. For instance functions that have aname ending with! by convention modify their arguments and thus are not pure.
External libraries, such as those called viaccall, pose a problem for Julia's task-based I/O mechanism. If a C library performs a blocking operation, that prevents the Julia scheduler from executing any other tasks until the call returns. (Exceptions are calls into custom C code that call back into Julia, which may then yield, or C code that callsjl_yield(), the C equivalent ofyield.)
The@threadcall macro provides a way to avoid stalling execution in such a scenario. It schedules a C function for execution in a separate thread. A threadpool with a default size of 4 is used for this. The size of the threadpool is controlled via environment variableUV_THREADPOOL_SIZE. While waiting for a free thread, and during function execution once a thread is available, the requesting task (on the main Julia event loop) yields to other tasks. Note that@threadcall does not return until the execution is complete. From a user point of view, it is therefore a blocking call like other Julia APIs.
It is very important that the called function does not call back into Julia, as it will segfault.
@threadcall may be removed/changed in future versions of Julia.
At this time, most operations in the Julia runtime and standard libraries can be used in a thread-safe manner, if the user code is data-race free. However, in some areas work on stabilizing thread support is ongoing. Multi-threaded programming has many inherent difficulties, and if a program using threads exhibits unusual or undesirable behavior (e.g. crashes or mysterious results), thread interactions should typically be suspected first.
There are a few specific limitations and warnings to be aware of when using threads in Julia:
push! on arrays, or inserting items into aDict).@spawn is nondeterministic and should not be relied on.GC.safepoint() to allow GC to run. This limitation will be removed in the future.include, oreval of type, method, and module definitions in parallel.After a task starts running on a certain thread it may move to a different thread if the task yields.
Such tasks may have been started with@spawn or@threads, although the:static schedule option for@threads does freeze the threadid.
This means that in most casesthreadid() should not be treated as constant within a task, and therefore should not be used to index into a vector of buffers or stateful objects.
Task migration was introduced in Julia 1.7. Before this tasks always remained on the same thread that they were started on.
Because finalizers can interrupt any code, they must be very careful in how they interact with any global state. Unfortunately, the main reason that finalizers are used is to update global state (a pure function is generally rather pointless as a finalizer). This leads us to a bit of a conundrum. There are a few approaches to dealing with this problem:
When single-threaded, code could call the internaljl_gc_enable_finalizers C function to prevent finalizers from being scheduled inside a critical region. Internally, this is used inside some functions (such as our C locks) to prevent recursion when doing certain operations (incremental package loading, codegen, etc.). The combination of a lock and this flag can be used to make finalizers safe.
A second strategy, employed by Base in a couple places, is to explicitly delay a finalizer until it may be able to acquire its lock non-recursively. The following example demonstrates how this strategy could be applied toDistributed.finalize_ref:
function finalize_ref(r::AbstractRemoteRef) if r.where > 0 # Check if the finalizer is already run if islocked(client_refs) || !trylock(client_refs) # delay finalizer for later if we aren't free to acquire the lock finalizer(finalize_ref, r) return nothing end try # `lock` should always be followed by `try` if r.where > 0 # Must check again here # Do actual cleanup here r.where = 0 end finally unlock(client_refs) end end nothingendA related third strategy is to use a yield-free queue. We don't currently have a lock-free queue implemented in Base, butBase.IntrusiveLinkedListSynchronized{T} is suitable. This can frequently be a good strategy to use for code with event loops. For example, this strategy is employed byGtk.jl to manage lifetime ref-counting. In this approach, we don't do any explicit work inside thefinalizer, and instead add it to a queue to run at a safer time. In fact, Julia's task scheduler already uses this, so defining the finalizer asx -> @spawn do_cleanup(x) is one example of this approach. Note however that this doesn't control which threaddo_cleanup runs on, sodo_cleanup would still need to acquire a lock. That doesn't need to be true if you implement your own queue, as you can explicitly only drain that queue from your thread.
Settings
This document was generated withDocumenter.jl version 1.16.0 onThursday 20 November 2025. Using Julia version 1.12.2.