Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 454 – Add a new tracemalloc module to trace Python memory allocations

Author:
Victor Stinner <vstinner at python.org>
BDFL-Delegate:
Charles-François Natali <cf.natali at gmail.com>
Status:
Final
Type:
Standards Track
Created:
03-Sep-2013
Python-Version:
3.4
Resolution:
Python-Dev message

Table of Contents

Abstract

This PEP proposes to add a newtracemalloc module to trace memoryblocks allocated by Python.

Rationale

Classic generic tools like Valgrind can get the C traceback where amemory block was allocated. Using such tools to analyze Python memoryallocations does not help because most memory blocks are allocated inthe same C function, inPyMem_Malloc() for example. Moreover, Pythonhas an allocator for small objects called “pymalloc” which keeps freeblocks for efficiency. This is not well handled by these tools.

There are debug tools dedicated to the Python language likeHeapyPympler andMeliae which lists all alive objects using thegarbage collector module (functions likegc.get_objects(),gc.get_referrers() andgc.get_referents()), compute their size(ex: usingsys.getsizeof()) and group objects by type. These toolsprovide a better estimation of the memory usage of an application. Theyare useful when most memory leaks are instances of the same type andthis type is only instantiated in a few functions. Problems arise whenthe object type is very common likestr ortuple, and it is hardto identify where these objects are instantiated.

Finding reference cycles is also a difficult problem. There aredifferent tools to draw a diagram of all references. These toolscannot be used on large applications with thousands of objects becausethe diagram is too huge to be analyzed manually.

Proposal

Using the customized allocation API fromPEP 445, it becomes easy toset up a hook on Python memory allocators. A hook can inspect Pythoninternals to retrieve Python tracebacks. The idea of getting the currenttraceback comes from the faulthandler module. The faulthandler dumpsthe traceback of all Python threads on a crash, here is the idea is toget the traceback of the current Python thread when a memory block isallocated by Python.

This PEP proposes to add a newtracemalloc module, a debug toolto trace memory blocks allocated by Python. The module provides thefollowing information:

  • Traceback where an object was allocated
  • Statistics on allocated memory blocks per filename and per linenumber: total size, number and average size of allocated memory blocks
  • Computed differences between two snapshots to detect memory leaks

The API of the tracemalloc module is similar to the API of the faulthandlermodule:enable() /start(),disable() /stop() andis_enabled() /is_tracing() functions, an environment variable(PYTHONFAULTHANDLER andPYTHONTRACEMALLOC), and a-X command lineoption (-Xfaulthandler and-Xtracemalloc). See thedocumentation ofthe faulthandler module.

The idea of tracing memory allocations is not new. It was firstimplemented in the PySizer project in 2005. PySizer was implementeddifferently: the traceback was stored in frame objects and some Pythontypes were linked the trace with the name of object type. PySizer patchon CPython adds an overhead on performances and memory footprint, even ifthe PySizer was not used. tracemalloc attaches a traceback to theunderlying layer, to memory blocks, and has no overhead when the moduleis not tracing memory allocations.

The tracemalloc module has been written for CPython. Otherimplementations of Python may not be able to provide it.

API

To trace most memory blocks allocated by Python, the module should bestarted as early as possible by setting thePYTHONTRACEMALLOCenvironment variable to1, or by using-Xtracemalloc commandline option. Thetracemalloc.start() function can be called atruntime to start tracing Python memory allocations.

By default, a trace of an allocated memory block only stores the mostrecent frame (1 frame). To store 25 frames at startup: set thePYTHONTRACEMALLOC environment variable to25, or use the-Xtracemalloc=25 command line option. Theset_traceback_limit()function can be used at runtime to set the limit.

Functions

clear_traces() function:

Clear traces of memory blocks allocated by Python.

See alsostop().

get_object_traceback(obj) function:

Get the traceback where the Python objectobj was allocated.Return aTraceback instance, orNone if thetracemallocmodule is not tracing memory allocations or did not trace theallocation of the object.

See alsogc.get_referrers() andsys.getsizeof() functions.

get_traceback_limit() function:

Get the maximum number of frames stored in the traceback of a trace.

Thetracemalloc module must be tracing memory allocations to getthe limit, otherwise an exception is raised.

The limit is set by thestart() function.

get_traced_memory() function:

Get the current size and maximum size of memory blocks traced by thetracemalloc module as a tuple:(size:int,max_size:int).

get_tracemalloc_memory() function:

Get the memory usage in bytes of thetracemalloc module used tostore traces of memory blocks. Return anint.

is_tracing() function:

True if thetracemalloc module is tracing Python memoryallocations,False otherwise.

See alsostart() andstop() functions.

start(nframe:int=1) function:

Start tracing Python memory allocations: install hooks on Pythonmemory allocators. Collected tracebacks of traces will be limited tonframe frames. By default, a trace of a memory block only storesthe most recent frame: the limit is1.nframe must be greateror equal to1.

Storing more than1 frame is only useful to compute statisticsgrouped by'traceback' or to compute cumulative statistics: seetheSnapshot.compare_to() andSnapshot.statistics() methods.

Storing more frames increases the memory and CPU overhead of thetracemalloc module. Use theget_tracemalloc_memory()function to measure how much memory is used by thetracemallocmodule.

ThePYTHONTRACEMALLOC environment variable(PYTHONTRACEMALLOC=NFRAME) and the-Xtracemalloc=NFRAMEcommand line option can be used to start tracing at startup.

See alsostop(),is_tracing() andget_traceback_limit()functions.

stop() function:

Stop tracing Python memory allocations: uninstall hooks on Pythonmemory allocators. Clear also traces of memory blocks allocated byPython

Calltake_snapshot() function to take a snapshot of tracesbefore clearing them.

See alsostart() andis_tracing() functions.

take_snapshot() function:

Take a snapshot of traces of memory blocks allocated by Python.Return a newSnapshot instance.

The snapshot does not include memory blocks allocated before thetracemalloc module started to trace memory allocations.

Tracebacks of traces are limited toget_traceback_limit()frames. Use thenframe parameter of thestart() function tostore more frames.

Thetracemalloc module must be tracing memory allocations totake a snapshot, see thestart() function.

See also theget_object_traceback() function.

Filter

Filter(inclusive:bool,filename_pattern:str,lineno:int=None,all_frames:bool=False) class:

Filter on traces of memory blocks.

See thefnmatch.fnmatch() function for the syntax offilename_pattern. The'.pyc' and'.pyo' file extensionsare replaced with'.py'.

Examples:

  • Filter(True,subprocess.__file__) only includes traces of thesubprocess module
  • Filter(False,tracemalloc.__file__) excludes traces of thetracemalloc module
  • Filter(False,"<unknown>") excludes empty tracebacks

inclusive attribute:

Ifinclusive isTrue (include), only trace memory blocksallocated in a file with a name matchingfilename_pattern atline numberlineno.

Ifinclusive isFalse (exclude), ignore memory blocksallocated in a file with a name matchingfilename_pattern atline numberlineno.

lineno attribute:

Line number (int) of the filter. Iflineno isNone, thefilter matches any line number.

filename_pattern attribute:

Filename pattern of the filter (str).

all_frames attribute:

Ifall_frames isTrue, all frames of the traceback arechecked. Ifall_frames isFalse, only the most recent frame ischecked.

This attribute is ignored if the traceback limit is less than2.See theget_traceback_limit() function andSnapshot.traceback_limit attribute.

Frame

Frame class:

Frame of a traceback.

TheTraceback class is a sequence ofFrame instances.

filename attribute:

Filename (str).

lineno attribute:

Line number (int).

Snapshot

Snapshot class:

Snapshot of traces of memory blocks allocated by Python.

Thetake_snapshot() function creates a snapshot instance.

compare_to(old_snapshot:Snapshot,group_by:str,cumulative:bool=False) method:

Compute the differences with an old snapshot. Get statistics as asorted list ofStatisticDiff instances grouped bygroup_by.

See thestatistics() method forgroup_by andcumulativeparameters.

The result is sorted from the biggest to the smallest by: absolutevalue ofStatisticDiff.size_diff,StatisticDiff.size,absolute value ofStatisticDiff.count_diff,Statistic.countand then byStatisticDiff.traceback.

dump(filename) method:

Write the snapshot into a file.

Useload() to reload the snapshot.

filter_traces(filters) method:

Create a newSnapshot instance with a filteredtracessequence,filters is a list ofFilter instances. Iffiltersis an empty list, return a newSnapshot instance with a copy ofthe traces.

All inclusive filters are applied at once, a trace is ignored if noinclusive filters match it. A trace is ignored if at least oneexclusive filter matches it.

load(filename) classmethod:

Load a snapshot from a file.

See alsodump().

statistics(group_by:str,cumulative:bool=False) method:

Get statistics as a sorted list ofStatistic instances groupedbygroup_by:
group_bydescription
'filename'filename
'lineno'filename and line number
'traceback'traceback

Ifcumulative isTrue, cumulate size and count of memoryblocks of all frames of the traceback of a trace, not only the mostrecent frame. The cumulative mode can only be used withgroup_byequals to'filename' and'lineno' andtraceback_limitgreater than1.

The result is sorted from the biggest to the smallest by:Statistic.size,Statistic.count and then byStatistic.traceback.

traceback_limit attribute:

Maximum number of frames stored in the traceback oftraces:result of theget_traceback_limit() when the snapshot was taken.

traces attribute:

Traces of all memory blocks allocated by Python: sequence ofTrace instances.

The sequence has an undefined order. Use theSnapshot.statistics() method to get a sorted list of statistics.

Statistic

Statistic class:

Statistic on memory allocations.

Snapshot.statistics() returns a list ofStatistic instances.

See also theStatisticDiff class.

count attribute:

Number of memory blocks (int).

size attribute:

Total size of memory blocks in bytes (int).

traceback attribute:

Traceback where the memory block was allocated,Tracebackinstance.

StatisticDiff

StatisticDiff class:

Statistic difference on memory allocations between an old and a newSnapshot instance.

Snapshot.compare_to() returns a list ofStatisticDiffinstances. See also theStatistic class.

count attribute:

Number of memory blocks in the new snapshot (int):0 if thememory blocks have been released in the new snapshot.

count_diff attribute:

Difference of number of memory blocks between the old and the newsnapshots (int):0 if the memory blocks have been allocatedin the new snapshot.

size attribute:

Total size of memory blocks in bytes in the new snapshot (int):0 if the memory blocks have been released in the new snapshot.

size_diff attribute:

Difference of total size of memory blocks in bytes between the oldand the new snapshots (int):0 if the memory blocks havebeen allocated in the new snapshot.

traceback attribute:

Traceback where the memory blocks were allocated,Tracebackinstance.

Trace

Trace class:

Trace of a memory block.

TheSnapshot.traces attribute is a sequence ofTraceinstances.

size attribute:

Size of the memory block in bytes (int).

traceback attribute:

Traceback where the memory block was allocated,Tracebackinstance.

Traceback

Traceback class:

Sequence ofFrame instances sorted from the most recent frame tothe oldest frame.

A traceback contains at least1 frame. If thetracemalloc modulefailed to get a frame, the filename"<unknown>" at line number0 isused.

When a snapshot is taken, tracebacks of traces are limited toget_traceback_limit() frames. See thetake_snapshot()function.

TheTrace.traceback attribute is an instance ofTracebackinstance.

Rejected Alternatives

Log calls to the memory allocator

A different approach is to log calls tomalloc(),realloc() andfree() functions. Calls can be logged into a file or send to anothercomputer through the network. Example of a log entry: name of thefunction, size of the memory block, address of the memory block, Pythontraceback where the allocation occurred, timestamp.

Logs cannot be used directly, getting the current status of the memoryrequires to parse previous logs. For example, it is not possible to getdirectly the traceback of a Python object, likeget_object_traceback(obj) does with traces.

Python uses objects with a very short lifetime and so makes an extensiveuse of memory allocators. It has an allocator optimized for smallobjects (less than 512 bytes) with a short lifetime. For example, thePython test suites callsmalloc(),realloc() orfree()270,000 times per second in average. If the size of log entry is 32bytes, logging produces 8.2 MB per second or 29.0 GB per hour.

The alternative was rejected because it is less efficient and has lessfeatures. Parsing logs in a different process or a different computer isslower than maintaining traces on allocated memory blocks in the sameprocess.

Prior Work

  • Python Memory Validator (2005-2013):commercial Python memory validator developed by Software Verification.It uses the Python Reflection API.
  • PySizer: Google Summer of Code 2005 project byNick Smallbone.
  • Heapy (2006-2013):part of the Guppy-PE project written by Sverker Nilsson.
  • Draft PEP:Support Tracking Low-Level Memory Usage in CPython(Brett Canon, 2006)
  • Muppy: project developed in 2008 by Robert Schuppenies.
  • asizeof:a pure Python module to estimate the size of objects by JeanBrouwers (2008).
  • Heapmonitor:It provides facilities to size individual objects and can track all objectsof certain classes. It was developed in 2008 by Ludwig Haehne.
  • Pympler (2008-2011):project based on asizeof, muppy and HeapMonitor
  • objgraph (2008-2012)
  • Dozer: WSGI Middleware versionof the CherryPy memory leak debugger, written by Marius Gedminas (2008-2013)
  • Meliae:Python Memory Usage Analyzer developed by John A Meinel since 2009
  • gdb-heap: gdb script written inPython by Dave Malcolm (2010-2011) to analyze the usage of the heap memory
  • memory_profiler:written by Fabian Pedregosa (2011-2013)
  • caulk: written by Ben Timby in 2012

See alsoPympler Related Work.

Links

tracemalloc:

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0454.rst

Last modified:2025-02-01 08:59:27 GMT


[8]ページ先頭

©2009-2025 Movatter.jp