Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 445 – Add new APIs to customize Python memory allocators

Author:
Victor Stinner <vstinner at python.org>
BDFL-Delegate:
Antoine Pitrou <solipsis at pitrou.net>
Status:
Final
Type:
Standards Track
Created:
15-Jun-2013
Python-Version:
3.4
Resolution:
Python-Dev message

Table of Contents

Abstract

This PEP proposes new Application Programming Interfaces (API) to customizePython memory allocators. The only implementation required to conform tothis PEP is CPython, but other implementations may choose to be compatible,or to re-use a similar scheme.

Rationale

Use cases:

  • Applications embedding Python which want to isolate Python memory fromthe memory of the application, or want to use a different memoryallocator optimized for its Python usage
  • Python running on embedded devices with low memory and slow CPU.A custom memory allocator can be used for efficiency and/or to getaccess all the memory of the device.
  • Debug tools for memory allocators:
    • track the memory usage (find memory leaks)
    • get the location of a memory allocation: Python filename and linenumber, and the size of a memory block
    • detect buffer underflow, buffer overflow and misuse of Pythonallocator APIs (seeRedesign Debug Checks on Memory BlockAllocators as Hooks)
    • force memory allocations to fail to test handling of theMemoryError exception

Proposal

New Functions and Structures

  • Add a new GIL-free (no need to hold the GIL) memory allocator:
    • void*PyMem_RawMalloc(size_tsize)
    • void*PyMem_RawRealloc(void*ptr,size_tnew_size)
    • voidPyMem_RawFree(void*ptr)
    • The newly allocated memory will not have been initialized in anyway.
    • Requesting zero bytes returns a distinct non-NULL pointer ifpossible, as ifPyMem_Malloc(1) had been called instead.
  • Add a newPyMemAllocator structure:
    typedefstruct{/*usercontextpassedasthefirstargumenttothe3functions*/void*ctx;/*allocateamemoryblock*/void*(*malloc)(void*ctx,size_tsize);/*allocateorresizeamemoryblock*/void*(*realloc)(void*ctx,void*ptr,size_tnew_size);/*releaseamemoryblock*/void(*free)(void*ctx,void*ptr);}PyMemAllocator;
  • Add a newPyMemAllocatorDomain enum to choose the Pythonallocator domain. Domains:
    • PYMEM_DOMAIN_RAW:PyMem_RawMalloc(),PyMem_RawRealloc()andPyMem_RawFree()
    • PYMEM_DOMAIN_MEM:PyMem_Malloc(),PyMem_Realloc() andPyMem_Free()
    • PYMEM_DOMAIN_OBJ:PyObject_Malloc(),PyObject_Realloc()andPyObject_Free()
  • Add new functions to get and set memory block allocators:
    • voidPyMem_GetAllocator(PyMemAllocatorDomaindomain,PyMemAllocator*allocator)
    • voidPyMem_SetAllocator(PyMemAllocatorDomaindomain,PyMemAllocator*allocator)
    • The new allocator must return a distinct non-NULL pointer whenrequesting zero bytes
    • For thePYMEM_DOMAIN_RAW domain, the allocator must bethread-safe: the GIL is not held when the allocator is called.
  • Add a newPyObjectArenaAllocator structure:
    typedefstruct{/*usercontextpassedasthefirstargumenttothe2functions*/void*ctx;/*allocateanarena*/void*(*alloc)(void*ctx,size_tsize);/*releaseanarena*/void(*free)(void*ctx,void*ptr,size_tsize);}PyObjectArenaAllocator;
  • Add new functions to get and set the arena allocator used bypymalloc:
    • voidPyObject_GetArenaAllocator(PyObjectArenaAllocator*allocator)
    • voidPyObject_SetArenaAllocator(PyObjectArenaAllocator*allocator)
  • Add a new function to reinstall the debug checks on memory allocators whena memory allocator is replaced withPyMem_SetAllocator():
    • voidPyMem_SetupDebugHooks(void)
    • Install the debug hooks on all memory block allocators. The function can becalled more than once, hooks are only installed once.
    • The function does nothing is Python is not compiled in debug mode.
  • Memory block allocators always returnNULL ifsize is greater thanPY_SSIZE_T_MAX. The check is done before calling the innerfunction.

Note

Thepymalloc allocator is optimized for objects smaller than 512 byteswith a short lifetime. It uses memory mappings with a fixed size of 256KB called “arenas”.

Here is how the allocators are set up by default:

  • PYMEM_DOMAIN_RAW,PYMEM_DOMAIN_MEM:malloc(),realloc() andfree(); callmalloc(1) when requesting zerobytes
  • PYMEM_DOMAIN_OBJ:pymalloc allocator which falls back onPyMem_Malloc() for allocations larger than 512 bytes
  • pymalloc arena allocator:VirtualAlloc() andVirtualFree() onWindows,mmap() andmunmap() when available, ormalloc()andfree()

Redesign Debug Checks on Memory Block Allocators as Hooks

Since Python 2.3, Python implements different checks on memoryallocators in debug mode:

  • Newly allocated memory is filled with the byte0xCB, freed memoryis filled with the byte0xDB.
  • Detect API violations, ex:PyObject_Free() called on a memoryblock allocated byPyMem_Malloc()
  • Detect write before the start of the buffer (buffer underflow)
  • Detect write after the end of the buffer (buffer overflow)

In Python 3.3, the checks are installed by replacingPyMem_Malloc(),PyMem_Realloc(),PyMem_Free(),PyObject_Malloc(),PyObject_Realloc() andPyObject_Free() using macros. The newallocator allocates a larger buffer and writes a pattern to detect bufferunderflow, buffer overflow and use after free (by filling the buffer withthe byte0xDB). It uses the originalPyObject_Malloc()function to allocate memory. SoPyMem_Malloc() andPyMem_Realloc() indirectly callPyObject_Malloc() andPyObject_Realloc().

This PEP redesigns the debug checks as hooks on the existing allocatorsin debug mode. Examples of call traces without the hooks:

  • PyMem_RawMalloc() =>_PyMem_RawMalloc() =>malloc()
  • PyMem_Realloc() =>_PyMem_RawRealloc() =>realloc()
  • PyObject_Free() =>_PyObject_Free()

Call traces when the hooks are installed (debug mode):

  • PyMem_RawMalloc() =>_PyMem_DebugMalloc()=>_PyMem_RawMalloc() =>malloc()
  • PyMem_Realloc() =>_PyMem_DebugRealloc()=>_PyMem_RawRealloc() =>realloc()
  • PyObject_Free() =>_PyMem_DebugFree()=>_PyObject_Free()

As a result,PyMem_Malloc() andPyMem_Realloc() now callmalloc() andrealloc() in both release mode and debug mode,instead of callingPyObject_Malloc() andPyObject_Realloc() indebug mode.

When at least one memory allocator is replaced withPyMem_SetAllocator(), thePyMem_SetupDebugHooks() function mustbe called to reinstall the debug hooks on top on the new allocator.

Don’t call malloc() directly anymore

PyObject_Malloc() falls back onPyMem_Malloc() instead ofmalloc() if size is greater or equal than 512 bytes, andPyObject_Realloc() falls back onPyMem_Realloc() instead ofrealloc()

Direct calls tomalloc() are replaced withPyMem_Malloc(), orPyMem_RawMalloc() if the GIL is not held.

External libraries like zlib or OpenSSL can be configured to allocate memoryusingPyMem_Malloc() orPyMem_RawMalloc(). If the allocator of alibrary can only be replaced globally (rather than on an object-by-objectbasis), it shouldn’t be replaced when Python is embedded in an application.

For the “track memory usage” use case, it is important to track memoryallocated in external libraries to have accurate reports, because theseallocations can be large (e.g. they can raise aMemoryError exception)and would otherwise be missed in memory usage reports.

Examples

Use case 1: Replace Memory Allocators, keep pymalloc

Dummy example wasting 2 bytes per memory block,and 10 bytes perpymalloc arena:

#include <stdlib.h>size_talloc_padding=2;size_tarena_padding=10;void*my_malloc(void*ctx,size_tsize){intpadding=*(int*)ctx;returnmalloc(size+padding);}void*my_realloc(void*ctx,void*ptr,size_tnew_size){intpadding=*(int*)ctx;returnrealloc(ptr,new_size+padding);}voidmy_free(void*ctx,void*ptr){free(ptr);}void*my_alloc_arena(void*ctx,size_tsize){intpadding=*(int*)ctx;returnmalloc(size+padding);}voidmy_free_arena(void*ctx,void*ptr,size_tsize){free(ptr);}voidsetup_custom_allocator(void){PyMemAllocatoralloc;PyObjectArenaAllocatorarena;alloc.ctx=&alloc_padding;alloc.malloc=my_malloc;alloc.realloc=my_realloc;alloc.free=my_free;PyMem_SetAllocator(PYMEM_DOMAIN_RAW,&alloc);PyMem_SetAllocator(PYMEM_DOMAIN_MEM,&alloc);/*leavePYMEM_DOMAIN_OBJunchanged,usepymalloc*/arena.ctx=&arena_padding;arena.alloc=my_alloc_arena;arena.free=my_free_arena;PyObject_SetArenaAllocator(&arena);PyMem_SetupDebugHooks();}

Use case 2: Replace Memory Allocators, override pymalloc

If you have a dedicated allocator optimized for allocations of objectssmaller than 512 bytes with a short lifetime, pymalloc can be overridden(replacePyObject_Malloc()).

Dummy example wasting 2 bytes per memory block:

#include <stdlib.h>size_tpadding=2;void*my_malloc(void*ctx,size_tsize){intpadding=*(int*)ctx;returnmalloc(size+padding);}void*my_realloc(void*ctx,void*ptr,size_tnew_size){intpadding=*(int*)ctx;returnrealloc(ptr,new_size+padding);}voidmy_free(void*ctx,void*ptr){free(ptr);}voidsetup_custom_allocator(void){PyMemAllocatoralloc;alloc.ctx=&padding;alloc.malloc=my_malloc;alloc.realloc=my_realloc;alloc.free=my_free;PyMem_SetAllocator(PYMEM_DOMAIN_RAW,&alloc);PyMem_SetAllocator(PYMEM_DOMAIN_MEM,&alloc);PyMem_SetAllocator(PYMEM_DOMAIN_OBJ,&alloc);PyMem_SetupDebugHooks();}

Thepymalloc arena does not need to be replaced, because it is no moreused by the new allocator.

Use case 3: Setup Hooks On Memory Block Allocators

Example to setup hooks on all memory block allocators:

struct{PyMemAllocatorraw;PyMemAllocatormem;PyMemAllocatorobj;/*...*/}hook;staticvoid*hook_malloc(void*ctx,size_tsize){PyMemAllocator*alloc=(PyMemAllocator*)ctx;void*ptr;/*...*/ptr=alloc->malloc(alloc->ctx,size);/*...*/returnptr;}staticvoid*hook_realloc(void*ctx,void*ptr,size_tnew_size){PyMemAllocator*alloc=(PyMemAllocator*)ctx;void*ptr2;/*...*/ptr2=alloc->realloc(alloc->ctx,ptr,new_size);/*...*/returnptr2;}staticvoidhook_free(void*ctx,void*ptr){PyMemAllocator*alloc=(PyMemAllocator*)ctx;/*...*/alloc->free(alloc->ctx,ptr);/*...*/}voidsetup_hooks(void){PyMemAllocatoralloc;staticintinstalled=0;if(installed)return;installed=1;alloc.malloc=hook_malloc;alloc.realloc=hook_realloc;alloc.free=hook_free;PyMem_GetAllocator(PYMEM_DOMAIN_RAW,&hook.raw);PyMem_GetAllocator(PYMEM_DOMAIN_MEM,&hook.mem);PyMem_GetAllocator(PYMEM_DOMAIN_OBJ,&hook.obj);alloc.ctx=&hook.raw;PyMem_SetAllocator(PYMEM_DOMAIN_RAW,&alloc);alloc.ctx=&hook.mem;PyMem_SetAllocator(PYMEM_DOMAIN_MEM,&alloc);alloc.ctx=&hook.obj;PyMem_SetAllocator(PYMEM_DOMAIN_OBJ,&alloc);}

Note

PyMem_SetupDebugHooks() does not need to be called becausememory allocator are not replaced: the debug checks on memoryblock allocators are installed automatically at startup.

Performances

The implementation of this PEP (issue #3329) has no visible overhead onthe Python benchmark suite.

Results of thePython benchmarks suite (-b 2n3): some tests are 1.04xfaster, some tests are 1.04 slower. Results of pybench microbenchmark:“+0.1%” slower globally (diff between -4.9% and +5.6%).

The full output of benchmarks is attached to the issue #3329.

Rejected Alternatives

More specific functions to get/set memory allocators

It was originally proposed a larger set of C API functions, with one pairof functions for each allocator domain:

  • voidPyMem_GetRawAllocator(PyMemAllocator*allocator)
  • voidPyMem_GetAllocator(PyMemAllocator*allocator)
  • voidPyObject_GetAllocator(PyMemAllocator*allocator)
  • voidPyMem_SetRawAllocator(PyMemAllocator*allocator)
  • voidPyMem_SetAllocator(PyMemAllocator*allocator)
  • voidPyObject_SetAllocator(PyMemAllocator*allocator)

This alternative was rejected because it is not possible to writegeneric code with more specific functions: code must be duplicated foreach memory allocator domain.

Make PyMem_Malloc() reuse PyMem_RawMalloc() by default

IfPyMem_Malloc() calledPyMem_RawMalloc() by default,callingPyMem_SetAllocator(PYMEM_DOMAIN_RAW,alloc) would alsopatchPyMem_Malloc() indirectly.

This alternative was rejected becausePyMem_SetAllocator() wouldhave a different behaviour depending on the domain. Always having thesame behaviour is less error-prone.

Add a new PYDEBUGMALLOC environment variable

It was proposed to add a newPYDEBUGMALLOC environment variable toenable debug checks on memory block allocators. It would have had the sameeffect as calling thePyMem_SetupDebugHooks(), without the needto write any C code. Another advantage is to allow to enable debug checkseven in release mode: debug checks would always be compiled in, but onlyenabled when the environment variable is present and non-empty.

This alternative was rejected because a new environment variable wouldmake Python initialization even more complex.PEP 432tries to simplify theCPython startup sequence.

Use macros to get customizable allocators

To have no overhead in the default configuration, customizableallocators would be an optional feature enabled by a configurationoption or by macros.

This alternative was rejected because the use of macros implies havingto recompile extensions modules to use the new allocator and allocatorhooks. Not having to recompile Python nor extension modules makes debughooks easier to use in practice.

Pass the C filename and line number

Define allocator functions as macros using__FILE__ and__LINE__to get the C filename and line number of a memory allocation.

Example ofPyMem_Malloc macro with the modifiedPyMemAllocator structure:

typedefstruct{/*usercontextpassedasthefirstargumenttothe3functions*/void*ctx;/*allocateamemoryblock*/void*(*malloc)(void*ctx,constchar*filename,intlineno,size_tsize);/*allocateorresizeamemoryblock*/void*(*realloc)(void*ctx,constchar*filename,intlineno,void*ptr,size_tnew_size);/*releaseamemoryblock*/void(*free)(void*ctx,constchar*filename,intlineno,void*ptr);}PyMemAllocator;void*_PyMem_MallocTrace(constchar*filename,intlineno,size_tsize);/*thefunctionisstillneededforthePythonstableABI*/void*PyMem_Malloc(size_tsize);#define PyMem_Malloc(size) \_PyMem_MallocTrace(__FILE__,__LINE__,size)

The GC allocator functions would also have to be patched. For example,_PyObject_GC_Malloc() is used in many C functions and so objects ofdifferent types would have the same allocation location.

This alternative was rejected because passing a filename and a linenumber to each allocator makes the API more complex: pass 3 newarguments (ctx, filename, lineno) to each allocator function, instead ofjust a context argument (ctx). Having to also modify GC allocatorfunctions adds too much complexity for a little gain.

GIL-free PyMem_Malloc()

In Python 3.3, when Python is compiled in debug mode,PyMem_Malloc()indirectly callsPyObject_Malloc() which requires the GIL to beheld (it isn’t thread-safe). That’s whyPyMem_Malloc() must be calledwith the GIL held.

This PEP changesPyMem_Malloc(): it now always callsmalloc()rather thanPyObject_Malloc(). The “GIL must be held” restrictioncould therefore be removed fromPyMem_Malloc().

This alternative was rejected because allowing to callPyMem_Malloc() without holding the GIL can break applicationswhich setup their own allocators or allocator hooks. Holding the GIL isconvenient to develop a custom allocator: no need to care about otherthreads. It is also convenient for a debug allocator hook: Pythonobjects can be safely inspected, and the C API may be used for reporting.

Moreover, callingPyGILState_Ensure() in a memory allocator hasunexpected behaviour, especially at Python startup and when creating of anew Python thread state. It is better to free custom allocators ofthe responsibility of acquiring the GIL.

Don’t add PyMem_RawMalloc()

Replacemalloc() withPyMem_Malloc(), but only if the GIL isheld. Otherwise, keepmalloc() unchanged.

ThePyMem_Malloc() is used without the GIL held in some Pythonfunctions. For example, themain() andPy_Main() functions ofPython callPyMem_Malloc() whereas the GIL do not exist yet. In thiscase,PyMem_Malloc() would be replaced withmalloc() (orPyMem_RawMalloc()).

This alternative was rejected becausePyMem_RawMalloc() is requiredfor accurate reports of the memory usage. When a debug hook is used totrack the memory usage, the memory allocated by direct calls tomalloc() cannot be tracked.PyMem_RawMalloc() can be hooked andso all the memory allocated by Python can be tracked, includingmemory allocated without holding the GIL.

Use existing debug tools to analyze memory use

There are many existing debug tools to analyze memory use. Someexamples:Valgrind,Purify,Clang AddressSanitizer,failmalloc, etc.

The problem is to retrieve the Python object related to a memory pointerto read its type and/or its content. Another issue is to retrieve thesource of the memory allocation: the C backtrace is usually useless(same reasoning than macros using__FILE__ and__LINE__, seePass the C filename and line number), the Python filename and linenumber (or even the Python traceback) is more useful.

This alternative was rejected because classic tools are unable tointrospect Python internals to collect such information. Being able tosetup a hook on allocators called with the GIL held allows to collect alot of useful data from Python internals.

Add a msize() function

Add another function toPyMemAllocator andPyObjectArenaAllocator structures:

size_tmsize(void*ptr);

This function returns the size of a memory block or a memory mapping.Return (size_t)-1 if the function is not implemented or if the pointeris unknown (ex: NULL pointer).

On Windows, this function can be implemented using_msize() andVirtualQuery().

The function can be used to implement a hook tracking the memory usage.Thefree() method of an allocator only gets the address of a memoryblock, whereas the size of the memory block is required to update thememory usage.

The additionalmsize() function was rejected because only fewplatforms implement it. For example, Linux with the GNU libc does notprovide a function to get the size of a memory block.msize() is notcurrently used in the Python source code. The function would only beused to track memory use, and make the API more complex. A debug hookcan implement the function internally, there is no need to add it toPyMemAllocator andPyObjectArenaAllocator structures.

No context argument

Simplify the signature of allocator functions, remove the contextargument:

  • void*malloc(size_tsize)
  • void*realloc(void*ptr,size_tnew_size)
  • voidfree(void*ptr)

It is likely for an allocator hook to be reused forPyMem_SetAllocator() andPyObject_SetAllocator(), or evenPyMem_SetRawAllocator(), but the hook must call a different functiondepending on the allocator. The context is a convenient way to reuse thesame custom allocator or hook for different Python allocators.

In C++, the context can be used to passthis.

External Libraries

Examples of API used to customize memory allocators.

Libraries used by Python:

Other libraries:

The newctx parameter of this PEP was inspired by the API of zlib andOracle’s OCI libraries.

See also theGNU libc: Memory Allocation Hookswhich uses a different approach to hook memory allocators.

Memory Allocators

The C standard library provides the well knownmalloc() function.Its implementation depends on the platform and of the C library. The GNUC library uses a modified ptmalloc2, based on “Doug Lea’s Malloc”(dlmalloc). FreeBSD usesjemalloc. Google providestcmalloc whichis part ofgperftools.

malloc() uses two kinds of memory: heap and memory mappings. Memorymappings are usually used for large allocations (ex: larger than 256KB), whereas the heap is used for small allocations.

On UNIX, the heap is handled bybrk() andsbrk() system calls,and it is contiguous. On Windows, the heap is handled byHeapAlloc() and can be discontiguous. Memory mappings are handled bymmap() on UNIX andVirtualAlloc() on Windows, they can bediscontiguous.

Releasing a memory mapping gives back immediately the memory to thesystem. On UNIX, the heap memory is only given back to the system if thereleased block is located at the end of the heap. Otherwise, the memorywill only be given back to the system when all the memory located afterthe released memory is also released.

To allocate memory on the heap, an allocator tries to reuse free space.If there is no contiguous space big enough, the heap must be enlarged,even if there is more free space than required size. This issue iscalled the “memory fragmentation”: the memory usage seen by the systemis higher than real usage. On Windows,HeapAlloc() createsa new memory mapping withVirtualAlloc() if there is not enough freecontiguous memory.

CPython has apymalloc allocator for allocations smaller than 512bytes. This allocator is optimized for small objects with a shortlifetime. It uses memory mappings called “arenas” with a fixed size of256 KB.

Other allocators:

This PEP allows to choose exactly which memory allocator is used for yourapplication depending on its usage of the memory (number of allocations,size of allocations, lifetime of objects, etc.).

Links

CPython issues related to memory allocation:

Projects analyzing the memory usage of Python applications:

Copyright

This document has been placed into the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0445.rst

Last modified:2025-02-01 08:59:27 GMT


[8]ページ先頭

©2009-2025 Movatter.jp