There are two main approaches to CPU profiling julia code:
@profile
Where profiling is enabled for a given call via the@profile
macro.
julia> using Profilejulia> @profile foo()julia> Profile.print()Overhead ╎ [+additional indent] Count File:Line; Function========================================================= ╎147 @Base/client.jl:506; _start() ╎ 147 @Base/client.jl:318; exec_options(opts::Base.JLOptions)...
Tasks that are already running can also be profiled for a fixed time period at any user-triggered time.
To trigger the profiling:
ctrl-t
or pass aSIGINFO
signal to the julia process i.e.% kill -INFO $julia_pid
SIGUSR1
signal to the julia process i.e.% kill -USR1 $julia_pid
First, a single stack trace at the instant that the signal was thrown is shown, then a 1 second profile is collected, followed by the profile report at the next yield point, which may be at task completion for code without yield points e.g. tight loops.
Optionally set environment variableJULIA_PROFILE_PEEK_HEAP_SNAPSHOT
to1
to also automatically collect aheap snapshot.
julia> foo()##== the user sends a trigger while foo is running ==##load: 2.53 cmd: julia 88903 running 6.16u 0.97s======================================================================================Information request received. A stacktrace will print followed by a 1.0 second profile======================================================================================signal (29): Information request: 29__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)_pthread_cond_wait at /usr/lib/system/libsystem_pthread.dylib (unknown line)...======================================================================Profile collected. A report will print if the Profile module is loaded======================================================================Overhead ╎ [+additional indent] Count File:Line; Function=========================================================Thread 1 Task 0x000000011687c010 Total snapshots: 572. Utilization: 100% ╎147 @Base/client.jl:506; _start() ╎ 147 @Base/client.jl:318; exec_options(opts::Base.JLOptions)...Thread 2 Task 0x0000000116960010 Total snapshots: 572. Utilization: 0% ╎572 @Base/task.jl:587; task_done_hook(t::Task) ╎ 572 @Base/task.jl:879; wait()...
The duration of the profiling can be adjusted viaProfile.set_peek_duration
The profile report is broken down by thread and task. Pass a no-arg function toProfile.peek_report[]
to override this. i.e.Profile.peek_report[] = () -> Profile.print()
to remove any grouping. This could also be overridden by an external profile data consumer.
Profile.@profile
—Macro@profile
@profile <expression>
runs your expression while taking periodic backtraces. These are appended to an internal buffer of backtraces.
The methods inProfile
are not exported and need to be called e.g. asProfile.print()
.
Profile.clear
—Functionclear()
Clear any existing backtraces from the internal buffer.
Profile.print
—Functionprint([io::IO = stdout,] [data::Vector = fetch()], [lidict::Union{LineInfoDict, LineInfoFlatDict} = getdict(data)]; kwargs...)
Prints profiling results toio
(by default,stdout
). If you do not supply adata
vector, the internal buffer of accumulated backtraces will be used.
The keyword arguments can be any combination of:
format
– Determines whether backtraces are printed with (default,:tree
) or without (:flat
) indentation indicating tree structure.
C
– Iftrue
, backtraces from C and Fortran code are shown (normally they are excluded).
combine
– Iftrue
(default), instruction pointers are merged that correspond to the same line of code.
maxdepth
– Limits the depth higher thanmaxdepth
in the:tree
format.
sortedby
– Controls the order in:flat
format.:filefuncline
(default) sorts by the source line,:count
sorts in order of number of collected samples, and:overhead
sorts by the number of samples incurred by each function by itself.
groupby
– Controls grouping over tasks and threads, or no grouping. Options are:none
(default),:thread
,:task
,[:thread, :task]
, or[:task, :thread]
where the last two provide nested grouping.
noisefloor
– Limits frames that exceed the heuristic noise floor of the sample (only applies to format:tree
). A suggested value to try for this is 2.0 (the default is 0). This parameter hides samples for whichn <= noisefloor * √N
, wheren
is the number of samples on this line, andN
is the number of samples for the callee.
mincount
– Limits the printout to only those lines with at leastmincount
occurrences.
recur
– Controls the recursion handling in:tree
format.:off
(default) prints the tree as normal.:flat
instead compresses any recursion (by ip), showing the approximate effect of converting any self-recursion into an iterator.:flatc
does the same but also includes collapsing of C frames (may do odd things aroundjl_apply
).
threads::Union{Int,AbstractVector{Int}}
– Specify which threads to include snapshots from in the report. Note that this does not control which threads samples are collected on (which may also have been collected on another machine).
tasks::Union{Int,AbstractVector{Int}}
– Specify which tasks to include snapshots from in the report. Note that this does not control which tasks samples are collected within.
Thegroupby
,threads
, andtasks
keyword arguments were introduced in Julia 1.8.
Profiling on windows is limited to the main thread. Other threads have not been sampled and will not show in the report.
print([io::IO = stdout,] data::Vector, lidict::LineInfoDict; kwargs...)
Prints profiling results toio
. This variant is used to examine results exported by a previous call toretrieve
. Supply the vectordata
of backtraces and a dictionarylidict
of line information.
SeeProfile.print([io], data)
for an explanation of the valid keyword arguments.
Profile.init
—Functioninit(; n::Integer, delay::Real)
Configure thedelay
between backtraces (measured in seconds), and the numbern
of instruction pointers that may be stored per thread. Each instruction pointer corresponds to a single line of code; backtraces generally consist of a long list of instruction pointers. Note that 6 spaces for instruction pointers per backtrace are used to store metadata and two NULL end markers. Current settings can be obtained by calling this function with no arguments, and each can be set independently using keywords or in the order(n, delay)
.
Profile.fetch
—Functionfetch(;include_meta = true) -> data
Return a copy of the buffer of profile backtraces. Note that the values indata
have meaning only on this machine in the current session, because it depends on the exact memory addresses used in JIT-compiling. This function is primarily for internal use;retrieve
may be a better choice for most users. By default metadata such as threadid and taskid is included. Setinclude_meta
tofalse
to strip metadata.
Profile.retrieve
—Functionretrieve(; kwargs...) -> data, lidict
"Exports" profiling results in a portable format, returning the set of all backtraces (data
) and a dictionary that maps the (session-specific) instruction pointers indata
toLineInfo
values that store the file name, function name, and line number. This function allows you to save profiling results for future analysis.
Profile.callers
—Functioncallers(funcname, [data, lidict], [filename=<filename>], [linerange=<start:stop>]) -> Vector{Tuple{count, lineinfo}}
Given a previous profiling run, determine who called a particular function. Supplying the filename (and optionally, range of line numbers over which the function is defined) allows you to disambiguate an overloaded method. The returned value is a vector containing a count of the number of calls and line information about the caller. One can optionally supply backtracedata
obtained fromretrieve
; otherwise, the current internal profile buffer is used.
Profile.clear_malloc_data
—Functionclear_malloc_data()
Clears any stored memory allocation data when running julia with--track-allocation
. Execute the command(s) you want to test (to force JIT-compilation), then callclear_malloc_data
. Then execute your command(s) again, quit Julia, and examine the resulting*.mem
files.
Profile.get_peek_duration
—Functionget_peek_duration()
Get the duration in seconds of the profile "peek" that is triggered viaSIGINFO
orSIGUSR1
, depending on platform.
Profile.set_peek_duration
—Functionset_peek_duration(t::Float64)
Set the duration in seconds of the profile "peek" that is triggered viaSIGINFO
orSIGUSR1
, depending on platform.
Profile.Allocs.@profile
—MacroProfile.Allocs.@profile [sample_rate=0.1] expr
Profile allocations that happen duringexpr
, returning both the result and and AllocResults struct.
A sample rate of 1.0 will record everything; 0.0 will record nothing.
julia> Profile.Allocs.@profile sample_rate=0.01 peakflops()1.03733270279065e11julia> results = Profile.Allocs.fetch()julia> last(sort(results.allocs, by=x->x.size))Profile.Allocs.Alloc(Vector{Any}, Base.StackTraces.StackFrame[_new_array_ at array.c:127, ...], 5576)
See the profiling tutorial in the Julia documentation for more information.
Older versions of Julia could not capture types in all cases. In older versions of Julia, if you see an allocation of typeProfile.Allocs.UnknownType
, it means that the profiler doesn't know what type of object was allocated. This mainly happened when the allocation was coming from generated code produced by the compiler. Seeissue #43688 for more info.
Since Julia 1.11, all allocations should have a type reported.
The allocation profiler was added in Julia 1.8.
The methods inProfile.Allocs
are not exported and need to be called e.g. asProfile.Allocs.fetch()
.
Profile.Allocs.clear
—FunctionProfile.Allocs.clear()
Clear all previously profiled allocation information from memory.
Profile.Allocs.print
—FunctionProfile.Allocs.print([io::IO = stdout,] [data::AllocResults = fetch()]; kwargs...)
Prints profiling results toio
(by default,stdout
). If you do not supply adata
vector, the internal buffer of accumulated backtraces will be used.
SeeProfile.print
for an explanation of the valid keyword arguments.
Profile.Allocs.fetch
—FunctionProfile.Allocs.fetch()
Retrieve the recorded allocations, and decode them into Julia objects which can be analyzed.
Profile.Allocs.start
—FunctionProfile.Allocs.start(sample_rate::Real)
Begin recording allocations with the given sample rate A sample rate of 1.0 will record everything; 0.0 will record nothing.
Profile.Allocs.stop
—FunctionProfile.Allocs.stop()
Stop recording allocations.
Profile.take_heap_snapshot
—FunctionProfile.take_heap_snapshot(filepath::String, all_one::Bool=false, streaming=false)Profile.take_heap_snapshot(all_one::Bool=false; dir::String, streaming=false)
Write a snapshot of the heap, in the JSON format expected by the Chrome Devtools Heap Snapshot viewer (.heapsnapshot extension) to a file ($pid_$timestamp.heapsnapshot
) in the current directory by default (or tempdir if the current directory is unwritable), or indir
if given, or the given full file path, or IO stream.
Ifall_one
is true, then report the size of every object as one so they can be easily counted. Otherwise, report the actual size.
Ifstreaming
is true, we will stream the snapshot data out into four files, using filepath as the prefix, to avoid having to hold the entire snapshot in memory. This option should be used for any setting where your memory is constrained. These files can then be reassembled by calling Profile.HeapSnapshot.assemble_snapshot(), which can be done offline.
NOTE: We strongly recommend setting streaming=true for performance reasons. Reconstructing the snapshot from the parts requires holding the entire snapshot in memory, so if the snapshot is large, you can run out of memory while processing it. Streaming allows you to reconstruct the snapshot offline, after your workload is done running. If you do attempt to collect a snapshot with streaming=false (the default, for backwards-compatibility) and your process is killed, note that this will always save the parts in the same directory as your provided filepath, so you can still reconstruct the snapshot after the fact, viaassemble_snapshot()
.
The methods inProfile
are not exported and need to be called e.g. asProfile.take_heap_snapshot()
.
julia> using Profilejulia> Profile.take_heap_snapshot("snapshot.heapsnapshot")
Traces and records julia objects on the heap. This only records objects known to the Julia garbage collector. Memory allocated by external libraries not managed by the garbage collector will not show up in the snapshot.
To avoid OOMing while recording the snapshot, we added a streaming option to stream out the heap snapshot into four files,
julia> using Profilejulia> Profile.take_heap_snapshot("snapshot"; streaming=true)
where "snapshot" is the filepath as the prefix for the generated files.
Once the snapshot files are generated, they could be assembled offline with the following command:
julia> using Profilejulia> Profile.HeapSnapshot.assemble_snapshot("snapshot", "snapshot.heapsnapshot")
The resulting heap snapshot file can be uploaded to chrome devtools to be viewed. For more information, see thechrome devtools docs. An alternative for analyzing Chromium heap snapshots is with the VS Code extensionms-vscode.vscode-js-profile-flame
.
The Firefox heap snapshots are of a different format, and Firefox currently maynot be used for viewing the heap snapshots generated by Julia.
Settings
This document was generated withDocumenter.jl version 1.8.0 onWednesday 9 July 2025. Using Julia version 1.11.6.