Parallel random number generation#

There are four main strategies implemented that can be used to producerepeatable pseudo-random numbers across multiple processes (localor distributed).

SeedSequence spawning#

NumPy allows you to spawn new (with very high probability) independentBitGenerator andGenerator instances via theirspawn() method.This spawning is implemented by theSeedSequence used for initializingthe bit generators random stream.

SeedSequenceimplements an algorithm to process a user-provided seed,typically as an integer of some size, and to convert it into an initial state foraBitGenerator. It uses hashing techniques to ensure that low-quality seedsare turned into high quality initial states (at least, with very highprobability).

For example,MT19937 has a state consisting of 624uint32integers. A naive way to take a 32-bit integer seed would be to just setthe last element of the state to the 32-bit seed and leave the rest 0s. This isa valid state forMT19937, but not a good one. The Mersenne Twisteralgorithmsuffers if there are too many 0s. Similarly, two adjacent 32-bitinteger seeds (i.e.12345 and12346) would produce very similarstreams.

SeedSequence avoids these problems by using successions of integer hasheswith goodavalanche properties to ensure that flipping any bit in the inputhas about a 50% chance of flipping any bit in the output. Two input seeds thatare very close to each other will produce initial states that are very farfrom each other (with very high probability). It is also constructed in sucha way that you can provide arbitrary-sized integers or lists of integers.SeedSequence will take all of the bits that you provide and mix themtogether to produce however many bits the consumingBitGenerator needs toinitialize itself.

These properties together mean that we can safely mix together the usualuser-provided seed with simple incrementing counters to getBitGeneratorstates that are (to very high probability) independent of each other. We canwrap this together into an API that is easy to use and difficult to misuse.Note that whileSeedSequence attempts to solve many of the issues related touser-provided small seeds, we stillrecommendusingsecrets.randbits to generate seeds with 128 bits of entropy toavoid the remaining biases introduced by human-chosen seeds.

fromnumpy.randomimportSeedSequence,default_rngss=SeedSequence(12345)# Spawn off 10 child SeedSequences to pass to child processes.child_seeds=ss.spawn(10)streams=[default_rng(s)forsinchild_seeds]

For convenience the direct use ofSeedSequence is not necessary.The abovestreams can be spawned directly from a parent generatorviaspawn:

parent_rng=default_rng(12345)streams=parent_rng.spawn(10)

Child objects can also spawn to make grandchildren, and so on.Each child has aSeedSequence with its position in the tree of spawnedchild objects mixed in with the user-provided seed to generate independent(with very high probability) streams.

grandchildren=streams[0].spawn(4)

This feature lets you make local decisions about when and how to split upstreams without coordination between processes. You do not have to preallocatespace to avoid overlapping or request streams from a common global service. Thisgeneral “tree-hashing” scheme isnot unique to numpy but not yet widespread.Python has increasingly-flexible mechanisms for parallelization available, andthis scheme fits in very well with that kind of use.

Using this scheme, an upper bound on the probability of a collision can beestimated if one knows the number of streams that you derive.SeedSequencehashes its inputs, both the seed and the spawn-tree-path, down to a 128-bitpool by default. The probability that there is a collision inthat pool, pessimistically-estimated ([1]), will be about\(n^2*2^{-128}\) wheren is the number of streams spawned. If a program uses an aggressive millionstreams, about\(2^{20}\), then the probability that at least one pair ofthem are identical is about\(2^{-88}\), which is in solidly-ignorableterritory ([2]).

[1]

The algorithm is carefully designed to eliminate a number of possibleways to collide. For example, if one only does one level of spawning, itis guaranteed that all states will be unique. But it’s easier toestimate the naive upper bound on a napkin and take comfort knowingthat the probability is actually lower.

[2]

In this calculation, we can mostly ignore the amount of numbers drawn from eachstream. SeeUpgrading PCG64 with PCG64DXSM for the technical details aboutPCG64. The other PRNGs we provide have some extra protection built inthat avoids overlaps if theSeedSequence pools differ in theslightest bit.PCG64DXSM has\(2^{127}\) separate cyclesdetermined by the seed in addition to the position in the\(2^{128}\) long period for each cycle, so one has to both get on ornear the same cycleand seed a nearby position in the cycle.Philox has completely independent cycles determined by the seed.SFC64 incorporates a 64-bit counter so every unique seed is atleast\(2^{64}\) iterations away from any other seed. Andfinally,MT19937 has just an unimaginably huge period. Gettinga collision internal toSeedSequence is the way a failure would beobserved.

Sequence of integer seeds#

As discussed in the previous section,SeedSequence can not only take aninteger seed, it can also take an arbitrary-length sequence of (non-negative)integers. If one exercises a little care, one can use this feature to designad hoc schemes for getting safe parallel PRNG streams with similar safetyguarantees as spawning.

For example, one common use case is that a worker process is passed oneroot seed integer for the whole calculation and also an integer worker ID (orsomething more granular like a job ID, batch ID, or something similar). Ifthese IDs are created deterministically and uniquely, then one can derivereproducible parallel PRNG streams by combining the ID and the root seedinteger in a list.

# default_rng() and each of the BitGenerators use SeedSequence underneath, so# they all accept sequences of integers as seeds the same way.fromnumpy.randomimportdefault_rngdefworker(root_seed,worker_id):rng=default_rng([worker_id,root_seed])# Do work ...root_seed=0x8c3c010cb4754c905776bdac5ee7501results=[worker(root_seed,worker_id)forworker_idinrange(10)]

This can be used to replace a number of unsafe strategies that have been usedin the past which try to combine the root seed and the ID back into a singleinteger seed value. For example, it is common to see users add the worker ID tothe root seed, especially with the legacyRandomState code.

# UNSAFE! Do not do this!worker_seed=root_seed+worker_idrng=np.random.RandomState(worker_seed)

It is true that for any one run of a parallel program constructed this way,each worker will have distinct streams. However, it is quite likely thatmultiple invocations of the program with different seeds will get overlappingsets of worker seeds. It is not uncommon (in the author’s self-experience) tochange the root seed merely by an increment or two when doing these repeatruns. If the worker seeds are also derived by small increments of the workerID, then subsets of the workers will return identical results, causing a biasin the overall ensemble of results.

Combining the worker ID and the root seed as a list of integers eliminates thisrisk. Lazy seeding practices will still be fairly safe.

This scheme does require that the extra IDs be unique and deterministicallycreated. This may require coordination between the worker processes. It isrecommended to place the varying IDsbefore the unvarying root seed.spawnappends integers after the user-provided seed, so ifyou might be mixing both thisad hoc mechanism and spawning, or passing yourobjects down to library code that might be spawning, then it is a little bitsafer to prepend your worker IDs rather than append them to avoid a collision.

# Good.worker_seed=[worker_id,root_seed]# Less good. It will *work*, but it's less flexible.worker_seed=[root_seed,worker_id]

With those caveats in mind, the safety guarantees against collision are aboutthe same as with spawning, discussed in the previous section. The algorithmicmechanisms are the same.

Independent streams#

Philox is a counter-based RNG based which generates values byencrypting an incrementing counter using weak cryptographic primitives. Theseed determines the key that is used for the encryption. Unique keys createunique, independent streams.Philox lets you bypass theseeding algorithm to directly set the 128-bit key. Similar, but different, keyswill still create independent streams.

importsecretsfromnumpy.randomimportPhilox# 128-bit number as a seedroot_seed=secrets.getrandbits(128)streams=[Philox(key=root_seed+stream_id)forstream_idinrange(10)]

This scheme does require that you avoid reusing stream IDs. This may requirecoordination between the parallel processes.

Jumping the BitGenerator state#

jumped advances the state of the BitGeneratoras-if a large number ofrandom numbers have been drawn, and returns a new instance with this state.The specific number of draws varies by BitGenerator, and ranges from\(2^{64}\) to\(2^{128}\). Additionally, theas-if draws also dependon the size of the default random number produced by the specific BitGenerator.The BitGenerators that supportjumped, along with the period of theBitGenerator, the size of the jump and the bits in the default unsigned randomare listed below.

BitGenerator

Period

Jump Size

Bits per Draw

MT19937

\(2^{19937}-1\)

\(2^{128}\)

32

PCG64

\(2^{128}\)

\(~2^{127}\) ([3])

64

PCG64DXSM

\(2^{128}\)

\(~2^{127}\) ([3])

64

Philox

\(2^{256}\)

\(2^{128}\)

64

[3](1,2)

The jump size is\((\phi-1)*2^{128}\) where\(\phi\) is thegolden ratio. As the jumps wrap around the period, the actual distancesbetween neighboring streams will slowly grow smaller than the jump size,but using the golden ratio this way is a classic method of constructinga low-discrepancy sequence that spreads out the states around the periodoptimally. You will not be able to jump enough to make those distancessmall enough to overlap in your lifetime.

jumped can be used to produce long blocks which should be long enough to notoverlap.

importsecretsfromnumpy.randomimportPCG64seed=secrets.getrandbits(128)blocked_rng=[]rng=PCG64(seed)foriinrange(10):blocked_rng.append(rng.jumped(i))

When usingjumped, one does have to take care not to jump to a stream thatwas already used. In the above example, one could not later useblocked_rng[0].jumped() as it would overlap withblocked_rng[1]. Likewith the independent streams, if the main process here wants to split off 10more streams by jumping, then it needs to start withrange(10,20),otherwise it would recreate the same streams. On the other hand, if youcarefully construct the streams, then you are guaranteed to have streams thatdo not overlap.