This PEP describes how Python programs may behave in the presence ofconcurrent reads and writes to shared variables from multiple threads.We use ahappens before relation to define when variable accessesare ordered or concurrent. Nearly all programs should simply use locksto guard their shared variables, and this PEP highlights some of thestrange things that can happen when they don’t, but programmers oftenassume that it’s ok to do “simple” things without locking, and it’ssomewhat unpythonic to let the language surprise them. Unfortunately,avoiding surprise often conflicts with making Python run quickly, sothis PEP tries to find a good tradeoff between the two.
So far, we have 4 major Python implementations – CPython,Jython,IronPython, andPyPy – as well as lots of minor ones. Some ofthese already run on platforms that do aggressive optimizations. Ingeneral, these optimizations are invisible within a single thread ofexecution, but they can be visible to other threads executingconcurrently. CPython currently uses aGIL to ensure that otherthreads see the results they expect, but this limits it to a singleprocessor. Jython and IronPython run on Java’s or .NET’s threadingsystem respectively, which allows them to take advantage of more coresbut can also show surprising values to other threads.
So that threaded Python programs continue to be portable betweenimplementations, implementers and library authors need to agree onsome ground rules.
del. Variables are fundamentally mutable, whileobjects may not be. There are several varieties of variables:module variables (often called “globals” when accessed from withinthe module), class variables, instance variables (also known asfields), and local variables. All of these can be shared betweenthreads (the local variables if they’re saved into a closure).The object in which the variables are scoped notionally has adict whose keys are the variables’ names.Before talking about the details of data races and the surprisingbehaviors they produce, I’ll present two simple memory models. Thefirst is probably too strong for Python, and the second is probablytoo weak.
In a sequentially-consistent concurrent execution, actions appear tohappen in a global total order with each read of a particular variableseeing the value written by the last write that affected thatvariable. The total order for actions must be consistent with theprogram order. A program has a data race on a given input when one ofits sequentially consistent executions puts two conflicting actionsnext to each other.
This is the easiest memory model for humans to understand, although itdoesn’t eliminate all confusion, since operations can be split in oddplaces.
The program contains a collection ofsynchronization actions, whichin Python currently include lock acquires and releases and threadstarts and joins. Synchronization actions happen in a global totalorder that is consistent with the program order (they don’thave tohappen in a total order, but it simplifies the description of themodel). A lock releasesynchronizes with all later acquires of thesame lock. Similarly, givent=threading.Thread(target=worker):
t.start() synchronizes with the first statement inworker().worker() synchronizes with the return fromt.join().t.start() happens before (see below) a calltot.isAlive() that returnsFalse, the return fromworker() synchronizes with that call.We call the source of the synchronizes-with edge arelease operationon the relevant variable, and we call the target anacquire operation.
Thehappens before order is the transitive closure of the programorder with the synchronizes-with edges. That is, actionA happensbefore actionB if:
An execution of a program is happens-before consistent if each readR sees the value of a writeW to the same variable such that:
You have a data race if two conflicting actions aren’t related byhappens-before.
Let’s use the rules from the happens-before model to prove that thefollowing program prints “[7]”:
classQueue:def__init__(self):self.l=[]self.cond=threading.Condition()defget():withself.cond:whilenotself.l:self.cond.wait()ret=self.l[0]self.l=self.l[1:]returnretdefput(x):withself.cond:self.l.append(x)self.cond.notify()myqueue=Queue()defworker1():x=[7]myqueue.put(x)defworker2():y=myqueue.get()printythread1=threading.Thread(target=worker1)thread2=threading.Thread(target=worker2)thread2.start()thread1.start()
myqueue is initialized in the main thread beforethread1 orthread2 is started, that initialization happensbeforeworker1 andworker2 begin running, so there’s no wayfor either to raise a NameError, and bothmyqueue.l andmyqueue.cond are set to their final objects.x inworker1 happens before it callsmyqueue.put(), which happens before it callsmyqueue.l.append(x), which happens before the call tomyqueue.cond.release(), all because they run in the samethread.worker2,myqueue.cond will be released and re-acquireduntilmyqueue.l contains a value (x). The call tomyqueue.cond.release() inworker1 happens before that lastcall tomyqueue.cond.acquire() inworker2.myqueue.cond.acquire() happens beforemyqueue.get() readsmyqueue.l, which happens beforemyqueue.get() returns, which happens beforeprinty, againall because they run in the same thread.x in thread1 is initialized before it is printed in thread2.Usually, we wouldn’t need to look all the way into a thread-safequeue’s implementation in order to prove that uses were safe. Itsinterface would specify that puts happen before gets, and we’d reasondirectly from that.
Lots of strange things can happen when code has data races. It’s easyto avoid all of these problems by just protecting shared variableswith locks. This is not a complete list of race hazards; it’s just acollection that seem relevant to Python.
In all of these examples, variables starting withr are localvariables, and other variables are shared between threads.
This example comes from theJava memory model:
Initiallypisqandp.x==0.
Thread 1 Thread 2 r1 = p r6 = p r2 = r1.x r6.x = 3 r3 = q r4 = r3.x r5 = r1.x Can produce
r2==r5==0butr4==3, proving thatp.xwent from 0 to 3 and back to 0.
A good compiler would like to optimize out the redundant load ofp.x in initializingr5 by just re-using the value alreadyloaded intor2. We get the strange result if thread 1 sees memoryin this order:
Evaluation Computes Why r1 = p r2 = r1.x r2 == 0 r3 = q r3 is p p.x = 3 Side-effect of thread 2 r4 = r3.x r4 == 3 r5 = r2 r5 == 0 Optimized from r5 = r1.x because r2 == r1.x.
FromN2177: Sequential Consistency for Atomics, and also known asIndependent Read of Independent Write (IRIW).
Initially,a==b==0.
Thread 1 Thread 2 Thread 3 Thread 4 r1 = a r3 = b a = 1 b = 1 r2 = b r4 = a We may get
r1==r3==1andr2==r4==0, proving boththatawas written beforeb(thread 1’s data), and thatbwas written beforea(thread 2’s data). SeeSpecialRelativity for areal-world example.
This can happen if thread 1 and thread 3 are running on processorsthat are close to each other, but far away from the processors thatthreads 2 and 4 are running on and the writes are not beingtransmitted all the way across the machine before becoming visible tonearby threads.
Neither acquire/release semantics nor explicit memory barriers canhelp with this. Making the orders consistent without locking requiresdetailed knowledge of the architecture’s memory model, but Javarequires it for volatiles so we could use documentation aimed at itsimplementers.
From the POPL paper about the Java memory model [#JMM-popl].
Initially,x==y==0.
Thread 1 Thread 2 r1 = x r2 = y if r1 != 0: if r2 != 0: y = 42 x = 42 Can
r1==r2==42???
In a sequentially-consistent execution, there’s no way to get anadjacent read and write to the same variable, so the program should beconsidered correctly synchronized (albeit fragile), and should onlyproducer1==r2==0. However, the following execution ishappens-before consistent:
Statement Value Thread r1 = x 42 1 if r1 != 0: true 1 y = 42 1 r2 = y 42 2 if r2 != 0: true 2 x = 42 2
WTF, you are asking yourself. Because there were no inter-threadhappens-before edges in the original program, the read of x in thread1 can see any of the writes from thread 2, even if they only happenedbecause the read saw them. Thereare data races in thehappens-before model.
We don’t want to allow this, so the happens-before model isn’t enoughfor Python. One rule we could add to happens-before that wouldprevent this execution is:
If there are no data races in any sequentially-consistentexecution of a program, the program should have sequentiallyconsistent semantics.
Java gets this rule as a theorem, but Python may not want all of themachinery you need to prove it.
Also from the POPL paper about the Java memory model [#JMM-popl].
Initially,x==y==0.
Thread 1 Thread 2 r1 = x r2 = y y = r1 x = r2 Can
x==y==42???
In a sequentially consistent execution, no. In a happens-beforeconsistent execution, yes: The read of x in thread 1 is allowed to seethe value written in thread 2 because there are no happens-beforerelations between the threads. This could happen if the compiler orprocessor transforms the code into:
Thread 1 Thread 2 y = 42 r2 = y r1 = x x = r2 if r1 != 42: y = r1
It can produce a security hole if the speculated value is a secretobject, or points to the memory that an object used to occupy. Javacares a lot about such security holes, but Python may not.
From several classic double-checked locking examples.
Initially,d==None.
Thread 1 Thread 2 while not d: pass d = [3, 4] assert d[1] == 4 This could raise an IndexError, fail the assertion, or, withoutsome care in the implementation, cause a crash or other undefinedbehavior.
Thread 2 may actually be implemented as:
r1=list()r1.append(3)r1.append(4)d=r1
Because the assignment to d and the item assignments are independent,the compiler and processor may optimize that to:
r1=list()d=r1r1.append(3)r1.append(4)
Which is obviously incorrect and explains the IndexError. If we thenlook deeper into the implementation ofr1.append(3), we may findthat it andd[1] cannot run concurrently without causing their ownrace conditions. In CPython (without the GIL), those race conditionswould produce undefined behavior.
There’s also a subtle issue on the reading side that can cause thevalue of d[1] to be out of date. Somewhere in the implementation oflist, it stores its contents as an array in memory. This array mayhappen to be in thread 1’s cache. If thread 1’s processor reloadsd from main memory without reloading the memory that ought tocontain the values 3 and 4, it could see stale values instead. As faras I know, this can only actually happen on Alphas and maybe Itaniums,and we probably have to prevent it anyway to avoid crashes.
From several more double-checked locking examples.
Initially,d==dict()andinitialized==False.
Thread 1 Thread 2 while not initialized: pass d[‘a’] = 3 r1 = d[‘a’] initialized = True r2 = r1 == 3 assert r2 This could raise a KeyError, fail the assertion, or, without somecare in the implementation, cause a crash or other undefinedbehavior.
Becaused andinitialized are independent (except in theprogrammer’s mind), the compiler and processor can rearrange thesealmost arbitrarily, except that thread 1’s assertion has to stay afterthe loop.
This is a problem with Javafinal variables and the proposeddata-dependency ordering in C++0x.
First execute:g=[]defInit():g.extend([1,2,3])return[1,2,3]h=NoneThen in two threads:
Thread 1 Thread 2 while not h: pass r1 = Init() assert h == [1,2,3] freeze(r1) assert h == g h = r1 If h has semantics similar to a Java
finalvariable (exceptfor being write-once), then even though the first assertion isguaranteed to succeed, the second could fail.
Data-dependent guarantees like thosefinal provides only work ifthe access is through the final variable. It’s not even safe toaccess the same object through a different route. Unfortunately,because of how processors work, final’s guarantees are only cheap whenthey’re weak.
The first rule is that Python interpreters can’t crash due to raceconditions in user code. For CPython, this means that race conditionscan’t make it down into C. For Jython, it means thatNullPointerExceptions can’t escape the interpreter.
Presumably we also want a model at least as strong as happens-beforeconsistency because it lets us write a simple description of howconcurrent queues and thread launching and joining work.
Other rules are more debatable, so I’ll present each one with pros andcons.
We’d like programmers to be able to reason about their programs as ifthey were sequentially consistent. Since it’s hard to tell whetheryou’ve written a happens-before race, we only want to requireprogrammers to prevent sequential races. The Java model does thisthrough a complicated definition of causality, but if we don’t want toinclude that, we can just assert this property directly.
If the program produces a self-justifying value, it could exposeaccess to an object that the user would rather the program not see.Again, Java’s model handles this with the causality definition. Wemight be able to prevent these security problems by banningspeculative writes to shared variables, but I don’t have a proof ofthat, and Python may not need those security guarantees anyway.
The .NET [#CLR-msdn] and x86 [#x86-model] memory models are based ondefining which reorderings compilers may allow. I think that it’seasier to program to a happens-before model than to reason about allof the possible reorderings of a program, and it’s easier to insertenough happens-before edges to make a program correct, than to insertenough memory fences to do the same thing. So, although we couldlayer some reordering restrictions on top of the happens-before base,I don’t think Python’s memory model should be entirely reorderingrestrictions.
Assignments of primitive types are already atomic. If you assign3<<72+5 to a variable, no thread can see only part of the value.Jeremy Manson suggested that we extend this to all objects. Thisallows compilers to reorder operations to optimize them, withoutallowing some of the more confusinguninitialized values. Thebasic idea here is that when you assign a shared variable, readerscan’t see any changes made to the new value before the assignment, orto the old value after the assignment. So, if we have a program like:
Initially,(d.a,d.b)==(1,2), and(e.c,e.d)==(3,4).We also haveclassObj(object):pass.
Thread 1 Thread 2 r1 = Obj() r3 = d r1.a = 3 r4, r5 = r3.a, r3.b r1.b = 4 r6 = e d = r1 r7, r8 = r6.c, r6.d r2 = Obj() r2.c = 6 r2.d = 7 e = r2
(r4,r5)can be(1,2)or(3,4)but nothing else, and(r7,r8)can be either(3,4)or(6,7)but nothingelse. Unlike if writes were releases and reads were acquires,it’s legal for thread 2 to see(e.c,e.d)==(6,7)and(d.a,d.b)==(1,2)(out of order).
This allows the compiler a lot of flexibility to optimize withoutallowing users to see some strange values. However, because it relieson data dependencies, it introduces some surprises of its own. Forexample, the compiler could freely optimize the above example to:
Thread 1 Thread 2 r1 = Obj() r3 = d r2 = Obj() r6 = e r1.a = 3 r4, r7 = r3.a, r6.c r2.c = 6 r5, r8 = r3.b, r6.d r2.d = 7 e = r2 r1.b = 4 d = r1
As long as it didn’t let the initialization ofe move above any ofthe initializations of members ofr2, and similarly ford andr1.
This also helps to ground happens-before consistency. To see theproblem, imagine that the user unsafely publishes a reference to anobject as soon as she gets it. The model needs to constrain whatvalues can be read through that reference. Java says that every fieldis initialized to 0 before anyone sees the object for the first time,but Python would have trouble defining “every field”. If instead wesay that assignments to shared variables have to see a value at leastas up to date as when the assignment happened, then we don’t run intoany trouble with early publication.
Most other languages with any guarantees for unlocked variablesdistinguish between ordinary variables and volatile/atomic variables.They provide many more guarantees for the volatile ones. Python can’teasily do this because we don’t declare variables. This may or maynot matter, since python locks aren’t significantly more expensivethan ordinary python code. If we want to get those tiers back, we could:
=.__slots__ mechanism[7] with a parallel__volatiles__ list, and maybe a__finals__ list.We could just adopt sequential consistency for Python.This avoids all of thehazards mentioned above,but it prohibits lots of optimizations too.As far as I know, this is the current model of CPython,but if CPython learned to optimize out some variable reads,it would lose this property.
If we adopt this, Jython’sdict implementation may no longer beable to use ConcurrentHashMap because that only promises to createappropriate happens-before edges, not to be sequentially consistent(although maybe the fact that Java volatiles are totally orderedcarries over). Both Jython and IronPython would probably need to useAtomicReferenceArrayor the equivalent for any__slots__ arrays.
The x86 model is:
In acquire/release terminology, this appears to say that every storeis a release and every load is an acquire. This is slightly weakerthan sequential consistency, in that it allowsinconsistentorderings, but it disallowszombie values and the compileroptimizations that produce them. We would probably want to weaken themodel somehow to explicitly allow compilers to eliminate redundantvariable reads. The x86 model may also be expensive to implement onother platforms, although because x86 is so common, that may notmatter much.
We can adopt an initial memory model without totally restrictingfuture implementations. If we start with a weak model and want to getstronger later, we would only have to change the implementations, notprograms. Individual implementations could also guarantee a strongermemory model than the language demands, although that could hurtinteroperability. On the other hand, if we start with a strong modeland want to weaken it later, we can add afrom__future__importweak_memory statement to declare that some modules are safe.
The required model is weaker than any particular implementation. Thissection tries to document the actual guarantees each implementationprovides, and should be updated as the implementations change.
Uses the GIL to guarantee that other threads don’t see funnyreorderings, and does few enough optimizations that I believe it’sactually sequentially consistent at the bytecode level. Threads canswitch between any two bytecodes (instead of only between statements),so two threads that concurrently execute:
i=i+1
withi initially0 could easily end up withi==1 insteadof the expectedi==2. If they execute:
i+=1
instead, CPython 2.6 will always give the right answer, but it’s easyto imagine another implementation in which this statement won’t beatomic.
Also uses a GIL, but probably does enough optimization to violatesequential consistency. I know very little about this implementation.
Provides true concurrency under theJava memory model and storesall object fields (except for those in__slots__?) in aConcurrentHashMap,which provides fairly strong ordering guarantees. Local variables ina function may have fewer guarantees, which would become visible ifthey were captured into a closure that was then passed to anotherthread.
Provides true concurrency under the CLR memory model, which probablyprotects it fromuninitialized values. IronPython uses a lockedmap to store object fields, providing at least as many guarantees asJython.
Thanks to Jeremy Manson and Alex Martelli for detailed discussions onwhat this PEP should look like.
This document has been placed in the public domain.
Source:https://github.com/python/peps/blob/main/peps/pep-0583.rst
Last modified:2025-02-01 08:59:27 GMT