Important
This PEP is a historical document. The up-to-date, canonical documentation can now be found atreference count.
×
SeePEP 1 for how to propose changes.
The PEP was accepted with conditions:
tp_dealloc()) was appliedCurrently the CPython runtime maintains asmall amount of mutable state in theallocated memory of each object. Because of this, otherwise immutableobjects are actually mutable. This can have a large negative impacton CPU and memory performance, especially for approaches to increasingPython’s scalability.
This proposal mandates that, internally, CPython will support markingan object as one for which that runtime state will no longer change.Consequently, such an object’s refcount will never reach 0, and thusthe object will never be cleaned up (except when the runtime knowsit’s safe to do so, like during runtime finalization).We call these objects “immortal”. (Normally, only a relatively smallnumber of internal objects will ever be immortal.)The fundamental improvement here is that now an objectcan be truly immutable.
Object immortality is meant to be an internal-only feature, so thisproposal does not include any changes to public API or behavior(with one exception). As usual, we may still add some private(yet publicly accessible) API to do things like immortalize an objector tell if one is immortal. Any effort to expose this feature to userswould need to be proposed separately.
There is one exception to “no change in behavior”: refcounting semanticsfor immortal objects will differ in some cases from user expectations.This exception, and the solution, are discussed below.
Most of this PEP focuses on an internal implementation that satisfiesthe above mandate. However, those implementation details are not meantto be strictly proscriptive. Instead, at the least they are includedto help illustrate the technical considerations required by the mandate.The actual implementation may deviate somewhat as long as it satisfiesthe constraints outlined below. Furthermore, the acceptability of anyspecific implementation detail described below does not depend onthe status of this PEP, unless explicitly specified.
For example, the particular details of:
are not only CPython-specific but are also private implementationdetails that are expected to change in subsequent versions.
Here’s a high-level look at the implementation:
If an object’s refcount matches a very specific value (defined below)then that object is treated as immortal. The CPython C-API and runtimewill not modify the refcount (or other runtime state) of an immortalobject. The runtime will now be explicitly responsible for deallocatingall immortal objects during finalization, unless statically allocated.(SeeObject Cleanup below.)
Aside from the change to refcounting semantics, there is one otherpossible negative impact to consider. The threshold for an “acceptable”performance penalty for immortal objects is 2% (the consensus at the2022 Language Summit). A naive implementation of the approach describedbelow makes CPython roughly 4% slower. However, the implementationis ~performance-neutral~ once known mitigations are applied.
TODO: Update the performance impact for the latest branch(both for GCC and for clang).
As noted above, currently all objects are effectively mutable. Thatincludes “immutable” objects likestr instances. This is becauseevery object’s refcount is frequently modified as the object is usedduring execution. This is especially significant for a number ofcommonly used global (builtin) objects, e.g.None. Such objectsare used a lot, both in Python code and internally. That adds up toa consistent high volume of refcount changes.
The effective mutability of all Python objects has a concrete impacton parts of the Python community, e.g. projects that aim forscalability like Instagram or the effort to make the GILper-interpreter. Below we describe several ways in which refcountmodification has a real negative effect on such projects.None of that would happen for objects that are truly immutable.
Every modification of a refcount causes the corresponding CPU cacheline to be invalidated. This has a number of effects.
For one, the write must be propagated to other cache levelsand to main memory. This has small effect on all Python programs.Immortal objects would provide a slight relief in that regard.
On top of that, multi-core applications pay a price. If two threads(running simultaneously on distinct cores) are interacting with thesame object (e.g.None) then they will end up invalidating eachother’s caches with each incref and decref. This is true even forotherwise immutable objects likeTrue,0, andstr instances.CPython’s GIL helps reduce this effect, since only one thread runs at atime, but it doesn’t completely eliminate the penalty.
Speaking of multi-core, we are considering making the GILa per-interpreter lock, which would enable true multi-core parallelism.Among other things, the GIL currently protects against races betweenmultiple concurrent threads that may incref or decref the same object.Without a shared GIL, two running interpreters could not safely shareany objects, even otherwise immutable ones likeNone.
This means that, to have a per-interpreter GIL, each interpreter musthave its own copy ofevery object. That includes the singletons andstatic types. We have a viable strategy for that but it will requirea meaningful amount of extra effort and extra complexity.
The alternative is to ensure that all shared objects are truly immutable.There would be no races because there would be no modification. Thisis something that the immortality proposed here would enable forotherwise immutable objects. With immortal objects,support for a per-interpreter GILbecomes much simpler.
For some applications it makes sense to get the application intoa desired initial state and then fork the process for each worker.This can result in a large performance improvement, especiallymemory usage. Several enterprise Python users (e.g. Instagram,YouTube) have taken advantage of this. However, the aboverefcount semantics drastically reduce the benefits andhave led to some sub-optimal workarounds.
Also note that “fork” isn’t the only operating system mechanismthat uses copy-on-write semantics. Another example ismmap.Any such utility will potentially benefit from fewer copy-on-writeswhen immortal objects are involved, when compared to using only“mortal” objects.
The proposed solution is obvious enough that both of this proposal’sauthors came to the same conclusion (and implementation, more or less)independently. The Pyston projectuses a similar approach.Other designs were also considered. Several possibilities have alsobeen discussed on python-dev in past years.
Alternatives include:
Py_INCREF()tp_dealloc() is a no-op)Each of the above makes objects immortal, but none of them addressthe performance penalties from refcount modification described above.
In the case of per-interpreter GIL, the only realistic alternativeis to move all global objects intoPyInterpreterState and addone or more lookup functions to access them. Then we’d have toadd some hacks to the C-API to preserve compatibility for themay objects exposed there. The story is much, much simplerwith immortal objects.
Most notably, the cases described in the above examples standto benefit greatly from immortal objects. Projects using pre-forkcan drop their workarounds. For the per-interpreter GIL project,immortal objects greatly simplifies the solution for existing statictypes, as well as objects exposed by the public C-API.
In general, a strong immutability guarantee for objects enables Pythonapplications to scale better, particularly inmulti-process deployments. This is because they can thenleverage multi-core parallelism without such a significant tradeoff inmemory usage as they now have. The cases we just described, as well asthose described above inMotivation, reflect this improvement.
A naive implementation showsa 2% slowdown (3% with MSVC).We have demonstrated a return to ~performance-neutral~ with a handfulof basic mitigations applied. See themitigations section below.
On the positive side, immortal objects save a significant amount ofmemory when usedwith a pre-fork model. Also, immortalobjects provide opportunities for specialization in the eval loop thatwould improve performance.
Ideally this internal-only feature would be completely compatible.However, it does involve a change to refcount semantics in some cases.Only immortal objects are affected, but this includes high-use objectslikeNone,True, andFalse.
Specifically, when an immortal object is involved:
Py_SET_REFCNT())Again, those changes in behavior only apply to immortal objects,not the vast majority of objects a user will use. Furthermore,users cannot mark an object as immortal so no user-created objectswill ever have that changed behavior. Users that rely on any ofthe changing behavior for global (builtin) objects are alreadyin trouble. So the overall impact should be small.
Also note that code which checks for refleaks should keep working fine,unless it checks for hard-coded small values relative to some immortalobject. The problems noticed byPyston shouldn’t apply here sincewe do not modify the refcount.
SeePublic Refcount Details below for further discussion.
Hypothetically, a non-immortal object could be incref’ed so muchthat it reaches the magic value needed to be considered immortal.That means it would never be decref’ed all the way back to 0, so itwould accidentally leak (never be cleaned up).
With 64-bit refcounts, this accidental scenario is so unlikely thatwe need not worry. Even if done deliberately by usingPy_INCREF()in a tight loop and each iteration only took 1 CPU cycle, it would take2^60 cycles (if the immortal bit were 2^60). At a fast 5 GHz that wouldstill take nearly 250,000,000 seconds (over 2,500 days)!
Also note that it is doubly unlikely to be a problem because it wouldn’tmatter until the refcount would have gotten back to 0 and the objectcleaned up. So any object that hit that magic “immortal” refcount valuewould have to be decref’ed that many times again before the changein behavior would be noticed.
Again, the only realistic way that the magic refcount would be reached(and then reversed) is if it were done deliberately. (Of course, thesame thing could be done efficiently usingPy_SET_REFCNT() thoughthat would be even less of an accident.) At that point we don’tconsider it a concern of this proposal.
On builds with much smaller maximum refcounts, like 32-bit platforms,the consequences aren’t so obvious. Let’s say the magic refcountwere 2^30. Using the same specs as above, it would take roughly4 seconds to accidentally immortalize an object. Under reasonableconditions, it is still highly unlikely that an object be accidentallyimmortalized. It would have to meet these criteria:
Even at a much less frequent rate it would not take long to reachaccidental immortality (on 32-bit). However, then it would have to runthrough the same number of (now noop-ing) decrefs before that one objectwould be effectively leaking. This is highly unlikely, especially becausethe calculations assume no decrefs.
Furthermore, this isn’t all that different from how such 32-bit extensionscan already incref an object past 2^31 and turn the refcount negative.If that were an actual problem then we would have heard about it.
Between all of the above cases, the proposal doesn’t consideraccidental immortality a problem.
The implementation approach described in this PEP is compatiblewith extensions compiled to the stable ABI (with the exceptionofAccidental Immortality andAccidental De-Immortalizing).Due to the nature of the stable ABI, unfortunately, such extensionsuse versions ofPy_INCREF(), etc. that directly modify the object’sob_refcnt field. This will invalidate all the performance benefitsof immortal objects.
However, we do ensure that immortal objects (mostly) stay immortalin that situation. We set the initial refcount of immortal objects toa value for which we can identify the object as immortal and whichcontinues to do so even if the refcount is modified by an extension.(For example, suppose we used one of the high refcount bits to indicatethat an object was immortal. We would set the initial refcount to ahigher value that still matches the bit, like halfway to the next bit.See_Py_IMMORTAL_REFCNT.)At worst, objects in that situation would feel the effectsdescribed in theMotivation section. Even thenthe overall impact is unlikely to be significant.
32-bit builds of older stable ABI extensions can takeAccidental Immortality to the next level.
Hypothetically, such an extension could incref an object to a value onthe next highest bit above the magic refcount value. For example, ifthe magic value were 2^30 and the initial immortal refcount were thus2^30 + 2^29 then it would take 2^29 increfs by the extension to reacha value of 2^31, making the object non-immortal.(Of course, a refcount that high would probably already cause a crash,regardless of immortal objects.)
The more problematic case is where such a 32-bit stable ABI extensiongoes crazy decref’ing an already immortal object. Continuing with theabove example, it would take 2^29 asymmetric decrefs to drop below themagic immortal refcount value. So an object likeNone could bemade mortal and subject to decref. That still wouldn’t be a problemuntil somehow the decrefs continue on that object until it reaches 0.For statically allocated immortal objects, likeNone, the extensionwould crash the process if it tried to dealloc the object. For anyother immortal objects, the dealloc might be okay. However, theremight be runtime code expecting the formerly-immortal object to bearound forever. That code would probably crash.
Again, the likelihood of this happening is extremely small, even on32-bit builds. It would require roughly a billion decrefs on thatone object without a corresponding incref. The most likely scenario isthe following:
A “new” reference toNone is returned by many functions and methods.Unlike with non-immortal objects, the 3.12 runtime will basically neverincrefNone before giving it to the extension. However, theextensionwill decref it when done with it (unless it returns it).Each time that exchange happens with the one object, we get one stepcloser to a crash.
How realistic is it that some form of that exchange (with a singleobject) will happen a billion times in the lifetime of a Python processon 32-bit? If it is a problem, how could it be addressed?
As to how realistic, the answer isn’t clear currently. However, themitigation is simple enough that we can safely proceed under theassumption that it would not be a problem.
We look at possible solutionslater on.
This proposal is CPython-specific. However, it does relate to thebehavior of the C-API, which may affect other Python implementations.Consequently, the effect of changed behavior described inBackward Compatibility above also applies here (e.g. if anotherimplementation is tightly coupled to specific refcount values, otherthan 0, or on exactly how refcounts change, then they may impacted).
This feature has no known impact on security.
This is not a complex feature so it should not cause much mentaloverhead for maintainers. The basic implementation doesn’t touchmuch code so it should have much impact on maintainability. Theremay be some extra complexity due to performance penalty mitigation.However, that should be limited to where we immortalize all objectspost-init and later explicitly deallocate them during runtimefinalization. The code for this should be relatively concentrated.
The approach involves these fundamental changes:
Py_INCREF() andPy_DECREF() to no-op for objectsthat match the magic refcountPyGC_Head for immortal GC objects (“containers”)Then setting any object’s refcount to_Py_IMMORTAL_REFCNTmakes it immortal.
(There are other minor, internal changes which are not described here.)
In the following sub-sections we dive into the most significant details.First we will cover some conceptual topics, followed by more concreteaspects like specific affected APIs.
InBackward Compatibility we introduced possible ways that user codemight be broken by the change in this proposal. Any contributingmisunderstanding by users is likely due in large part to the names ofthe refcount-related API and to how the documentation explains thoseAPI (and refcounting in general).
Between the names and the docs, we can clearly see answersto the following questions:
As part of this proposal, we must make sure that users can clearlyunderstand on which parts of the refcount behavior they can rely andwhich are considered implementation details. Specifically, they shoulduse the existing public refcount-related API and the only refcountvalues with any meaning are 0 and 1. (Some code relies on 1 as anindicator that the object can be safely modified.) All other valuesare considered “not 0 or 1”.
This information will be clarifiedin thedocumentation.
Arguably, the existing refcount-related API should be modified to reflectwhat we want users to expect. Something like the following:
Py_INCREF() ->Py_ACQUIRE_REF() (or only supportPy_NewRef())Py_DECREF() ->Py_RELEASE_REF()Py_REFCNT() ->Py_HAS_REFS()Py_SET_REFCNT() ->Py_RESET_REFS() andPy_SET_NO_REFS()However, such a change is not a part of this proposal. It is includedhere to demonstrate the tighter focus for user expectations that wouldbenefit this change.
__del__ and weakrefs must continue working properlyRegarding “truly” immutable objects, this PEP doesn’t impact theeffective immutability of any objects, other than the per-objectruntime state (e.g. refcount). So whether or not some immortal objectis truly (or even effectively) immutable can only be settled separatelyfrom this proposal. For example, str objects are generally consideredimmutable, butPyUnicodeObject holds some lazily cached data. ThisPEP has no influence on how that state affects str immutability.
Any object can be marked as immortal. We do not propose anyrestrictions or checks. However, in practice the value of making anobject immortal relates to its mutability and depends on the likelihoodit would be used for a sufficient portion of the application’s lifetime.Marking a mutable object as immortal can make sense in some situations.
Many of the use cases for immortal objects center on immutability, sothat threads can safely and efficiently share such objects withoutlocking. For this reason a mutable object, like a dict or list, wouldnever be shared (and thus no immortality). However, immortality maybe appropriate if there is sufficient guarantee that the normallymutable object won’t actually be modified.
On the other hand, some mutable objects will never be shared betweenthreads (at least not without a lock like the GIL). In some cases itmay be practical to make some of those immortal too. For example,sys.modules is a per-interpreter dict that we do not expect toever get freed until the corresponding interpreter is finalized(assuming it isn’t replaced). By making it immortal, we wouldno longer incur the extra overhead during incref/decref.
We explore this idea further in themitigations section below.
If an immortal object holds a reference to a normal (mortal) objectthen that held object is effectively immortal. This is because thatobject’s refcount can never reach 0 until the immortal object releasesit.
Examples:
dict andlistPyTypeObject withitstp_subclasses andtp_weaklistob_type)Such held objects are thus implicitly immortal for as long as they areheld. In practice, this should have no real consequences since itreally isn’t a change in behavior. The only difference is that theimmortal object (holding the reference) doesn’t ever get cleaned up.
We do not propose that such implicitly immortal objects be changedin any way. They should not be explicitly marked as immortal justbecause they are held by an immortal object. That would provideno advantage over doing nothing.
This proposal does not include any mechanism for taking an immortalobject and returning it to a “normal” condition. Currently thereis no need for such an ability.
On top of that, the obvious approach is to simply set the refcountto a small value. However, at that point there is no way in knowingwhich value would be safe. Ideally we’d set it to the value that itwould have been if it hadn’t been made immortal. However, that valuewill have long been lost. Hence the complexities involved make it lesslikely that an object could safely be un-immortalized, even if wehad a good reason to do so.
We will add two internal constants:
_Py_IMMORTAL_BIT-hasthetop-mostavailablebitset(e.g.2^62)_Py_IMMORTAL_REFCNT-hasthetwotop-mostavailablebitsset
The actual top-most bit depends on existing uses for refcount bits,e.g. the sign bit or some GC uses. We will use the highest bit possibleafter consideration of existing uses.
The refcount for immortal objects will be set to_Py_IMMORTAL_REFCNT(meaning the value will be halfway between_Py_IMMORTAL_BIT and thevalue at the next highest bit). However, to check if an object isimmortal we will compare (bitwise-and) its refcount against just_Py_IMMORTAL_BIT.
The difference means that an immortal object will still be consideredimmortal, even if somehow its refcount were modified (e.g. by an olderstable ABI extension).
Note that top two bits of the refcount are already reserved for otheruses. That’s why we are using the third top-most bit.
The implementation is also open to using other values for the immortalbit, such as the sign bit or 2^31 (for saturated refcounts on 64-bit).
API that will now ignore immortal objects:
Py_INCREF()Py_DECREF()Py_SET_REFCNT()_Py_NewReference()API that exposes refcounts (unchanged but may now return large values):
Py_REFCNT()sys.getrefcount()(Note that_Py_RefTotal, and consequentlysys.gettotalrefcount(),will not be affected.)
TODO: clarify the status of_Py_RefTotal.
Also, immortal objects will not participate in GC.
All runtime-global (builtin) objects will be made immortal.That includes the following:
None,True,False,Ellipsis,NotImplemented)PyLong_Type,PyExc_Exception)_PyRuntimeState.global_objects (e.g. identifiers,small ints)The question of making the full objects actually immutable (e.g.for per-interpreter GIL) is not in the scope of this PEP.
In order to clean up all immortal objects during runtime finalization,we must keep track of them.
For GC objects (“containers”) we’ll leverage the GC’s permanentgeneration by pushing all immortalized containers there. Duringruntime shutdown, the strategy will be to first let the runtime tryto do its best effort of deallocating these instances normally. Mostof the module deallocation will now be handled bypylifecycle.c:finalize_modules() where we clean up the remainingmodules as best as we can. It will change which modules are availableduring__del__, but that’s already explicitly undefined behaviorin the docs. Optionally, we could do some topological orderingto guarantee that user modules will be deallocated first beforethe stdlib modules. Finally, anything left over (if any) can be foundthrough the permanent generation GC list which we can clearafterfinalize_modules() is done.
For non-container objects, the tracking approach will vary on acase-by-case basis. In nearly every case, each such object is directlyaccessible on the runtime state, e.g. in a_PyRuntimeState orPyInterpreterState field. We may need to add a tracking mechanismto the runtime state for a small number of objects.
None of the cleanup will have a significant effect on performance.
In the interest of clarity, here are some of the ways we are goingto try to recover some of the4% performancewe lose with the naive implementation of immortal objects.
Note that none of this section is actually part of the proposal.
We can apply the concept fromImmortal Mutable Objects in the pursuit of getting back some ofthat 4% performance we lose with the naive implementation of immortalobjects. At the end of runtime init we can markall objects asimmortal and avoid the extra cost in incref/decref. We only needto worry about immutability with objects that we plan on sharingbetween threads without a GIL.
Parts of the C-API interact specifically with objects that we knowto be immortal, likePy_RETURN_NONE. Such functions and macroscan be updated to drop any refcount operations.
There are opportunities to optimize operations in the eval loopinvolving specific known immortal objects (e.g.None). Thegeneral mechanism is described inPEP 659. Also seePyston.
In theAccidental De-Immortalizing section we outlined a possiblenegative consequence of immortal objects. Here we look at someof the options to deal with that.
Note that we enumerate solutions here to illustrate that satisfactoryoptions are available, rather than to dictate how the problem willbe solved.
Also note the following:
tp_dealloc() is called)None)tp_dealloc()One fundamental observation for a solution is that we can resetan immortal object’s refcount to_Py_IMMORTAL_REFCNTwhen some condition is met.
With all that in mind, a simple, yet effective, solution would beto reset an immortal object’s refcount intp_dealloc().NoneType andbool already have atp_dealloc() that callsPy_FatalError() if triggered. The same goes for other types basedon certain conditions, likePyUnicodeObject (depending onunicode_is_singleton()),PyTupleObject, andPyTypeObject.In fact, the same check is important for all statically declared object.For those types, we would instead reset the refcount. For theremaining cases we would introduce the check. In all cases,the overhead of the check intp_dealloc() should be too smallto matter.
Other (less practical) solutions:
(The discussion threadhas further detail.)
Regardless of the solution we end up with, we can do something elselater if necessary.
TODO: Add a note indicating that the implemented solution does notaffect the overall ~performance-neutral~ outcome.
The immortal objects behavior and API are internal, implementationdetails and will not be added to the documentation.
However, we will update the documentation to make public guaranteesabout refcount behavior more clear. That includes, specifically:
Py_INCREF() - change “Increment the reference count for object o.”to “Indicate taking a new reference to object o.”Py_DECREF() - change “Decrement the reference count for object o.”to “Indicate no longer using a previously taken reference to object o.”Py_XINCREF(),Py_XDECREF(),Py_NewRef(),Py_XNewRef(),Py_Clear()Py_REFCNT() - add “The refcounts 0 and 1 have specific meaningsand all others only mean code somewhere is using the object,regardless of the value.0 means the object is not used and will be cleaned up.1 means code holds exactly a single reference.”Py_SET_REFCNT() - refer toPy_REFCNT() about how values over 1may be substituted with some over valueWemay also add a note about immortal objects to the following,to help reduce any surprise users may have with the change:
Py_SET_REFCNT() (a no-op for immortal objects)Py_REFCNT() (value may be surprisingly large)sys.getrefcount() (value may be surprisingly large)Other API that might benefit from such notes are currently undocumented.We wouldn’t add such a note anywhere else (including forPy_INCREF()andPy_DECREF()) since the feature is otherwise transparent to users.
The implementation is proposed on GitHub:
This was discussed in December 2021 on python-dev:
Here is the internal state that the CPython runtime keepsfor each Python object:
ob_refcnt is part of the memory allocated for every object.However,_PyObject_HEAD_EXTRA is allocated only if CPython was builtwithPy_TRACE_REFS defined.PyGC_Head is allocated only if theobject’s type hasPy_TPFLAGS_HAVE_GC set. Typically this is onlycontainer types (e.g.list). Also note thatPyObject.ob_refcntand_PyObject_HEAD_EXTRA are part ofPyObject_HEAD.
Garbage collection is a memory management feature of some programminglanguages. It means objects are cleaned up (e.g. memory freed)once they are no longer used.
Refcounting is one approach to garbage collection. The language runtimetracks how many references are held to an object. When code takesownership of a reference to an object or releases it, the runtimeis notified and it increments or decrements the refcount accordingly.When the refcount reaches 0, the runtime cleans up the object.
With CPython, code must explicitly take or release references usingthe C-API’sPy_INCREF() andPy_DECREF(). These macros happento directly modify the object’s refcount (unfortunately, since thatcauses ABI compatibility issues if we want to change our garbagecollection scheme). Also, when an object is cleaned up in CPython,it also releases any references (and resources) it owns(before it’s memory is freed).
Sometimes objects may be involved in reference cycles, e.g. whereobject A holds a reference to object B and object B holds a referenceto object A. Consequently, neither object would ever be cleaned upeven if no other references were held (i.e. a memory leak). Themost common objects involved in cycles are containers.
CPython has dedicated machinery to deal with reference cycles, whichwe call the “cyclic garbage collector”, or often just“garbage collector” or “GC”. Don’t let the name confuse you.It only deals with breaking reference cycles.
See the docs for a more detailed explanation of refcountingand cyclic garbage collection:
This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.
Source:https://github.com/python/peps/blob/main/peps/pep-0683.rst
Last modified:2024-06-12 18:00:45 GMT