Multithreaded generation#
The four core distributions (random,standard_normal,standard_exponential,andstandard_gamma) all allow existing arrays to be filledusing theout keyword argument. Existing arrays need to be contiguous andwell-behaved (writable and aligned). Under normal circumstances, arrayscreated using the common constructors such asnumpy.empty will satisfythese requirements.
See also
Thread Safety for general information about thread safety in NumPy.
This example makes use ofconcurrent.futures to fill an array usingmultiple threads. Threads are long-lived so that repeated calls do notrequire any additional overheads from thread creation.
The random numbers generated are reproducible in the sense that the sameseed will produce the same outputs, given that the number of threads does notchange.
fromnumpy.randomimportdefault_rng,SeedSequenceimportmultiprocessingimportconcurrent.futuresimportnumpyasnpclassMultithreadedRNG:def__init__(self,n,seed=None,threads=None):ifthreadsisNone:threads=multiprocessing.cpu_count()self.threads=threadsseq=SeedSequence(seed)self._random_generators=[default_rng(s)forsinseq.spawn(threads)]self.n=nself.executor=concurrent.futures.ThreadPoolExecutor(threads)self.values=np.empty(n)self.step=np.ceil(n/threads).astype(np.int_)deffill(self):def_fill(random_state,out,first,last):random_state.standard_normal(out=out[first:last])futures={}foriinrange(self.threads):args=(_fill,self._random_generators[i],self.values,i*self.step,(i+1)*self.step)futures[self.executor.submit(*args)]=iconcurrent.futures.wait(futures)def__del__(self):self.executor.shutdown(False)
The multithreaded random number generator can be used to fill an array.Thevalues attributes shows the zero-value before the fill and therandom value after.
In [2]:mrng=MultithreadedRNG(10000000,seed=12345) ...:print(mrng.values[-1])Out[2]:0.0In [3]:mrng.fill() ...:print(mrng.values[-1])Out[3]:2.4545724517479104
The time required to produce using multiple threads can be compared tothe time required to generate using a single thread.
In [4]:print(mrng.threads) ...:%timeit mrng.fill()Out[4]:4 ...:32.8ms±2.71msperloop(mean±std.dev.of7runs,10loopseach)
The single threaded call directly uses the BitGenerator.
In [5]:values=np.empty(10000000) ...:rg=default_rng() ...:%timeit rg.standard_normal(out=values)Out[5]:99.6 ms ± 222 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
The gains are substantial and the scaling is reasonable even for arrays thatare only moderately large. The gains are even larger when compared to a callthat does not use an existing array due to array creation overhead.
In [6]:rg=default_rng() ...:%timeit rg.standard_normal(10000000)Out[6]:125 ms ± 309 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Note that ifthreads is not set by the user, it will be determined bymultiprocessing.cpu_count().
In [7]:# simulate the behavior for `threads=None`, if the machine had only one thread ...:mrng=MultithreadedRNG(10000000,seed=12345,threads=1) ...:print(mrng.values[-1])Out[7]:1.1800150052158556