Apseudorandom number generator (PRNG), also known as adeterministic random bit generator (DRBG),[1] is analgorithm for generating a sequence of numbers whose properties approximate the properties of sequences ofrandom numbers. The PRNG-generated sequence is not trulyrandom, because it is completely determined by an initial value, called the PRNG'sseed (which may include truly random values). Although sequences that are closer to truly random can be generated usinghardware random number generators,pseudorandom number generators are important in practice for their speed in number generation and their reproducibility.[2]
PRNGs are central in applications such assimulations (e.g. for theMonte Carlo method),electronic games (e.g. forprocedural generation), andcryptography. Cryptographic applications require the output not to be predictable from earlier outputs, and moreelaborate algorithms, which do not inherit the linearity of simpler PRNGs, are needed.
Good statistical properties are a central requirement for the output of a PRNG. In general, careful mathematical analysis is required to have any confidence that a PRNG generates numbers that are sufficiently close to random to suit the intended use.John von Neumann cautioned about the misinterpretation of a PRNG as a truly random generator, joking that "Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin."[3]
In practice, the output from many common PRNGs exhibitartifacts that cause them to fail statistical pattern-detection tests. These include:
Defects exhibited by flawed PRNGs range from unnoticeable (and unknown) to very obvious. An example was theRANDU random number algorithm used for decades onmainframe computers. It was seriously flawed, but its inadequacy went undetected for a very long time.
In many fields, research work prior to the 21st century that relied on random selection or onMonte Carlo simulations, or in other ways relied on PRNGs, were much less reliable than ideal as a result of using poor-quality PRNGs.[4] Even today, caution is sometimes required, as illustrated by the following warning in theInternational Encyclopedia of Statistical Science (2010).[5]
The list of widely used generators that should be discarded is much longer [than the list of good generators]. Do not trust blindly the software vendors. Check the default RNG of your favorite software and be ready to replace it if needed. This last recommendation has been made over and over again over the past 40 years. Perhaps amazingly, it remains as relevant today as it was 40 years ago.
As an illustration, consider the widely used programming languageJava. Up until 2020, Java still relied on alinear congruential generator (LCG) for its PRNG,[6][7] which is of low quality (see further below). Java support was upgraded withJava 17.
One well-known PRNG to avoid major problems and still run fairly quickly is theMersenne Twister (discussed below), which was published in 1998. Other higher-quality PRNGs, both in terms of computational and statistical performance, were developed before and after this date; these can be identified in theList of pseudorandom number generators.
In the second half of the 20th century, the standard class of algorithms used for PRNGs comprisedlinear congruential generators. The quality of LCGs was known to be inadequate, but better methods were unavailable. Press et al. (2007) described the result thus: "If all scientific papers whose results are in doubt because of [LCGs and related] were to disappear from library shelves, there would be a gap on each shelf about as big as your fist."[8]
A major advance in the construction of pseudorandom generators was the introduction of techniques based on linear recurrences on the two-element field; such generators are related tolinear-feedback shift registers.
The 1997 invention of theMersenne Twister,[9] in particular, avoided many of the problems with earlier generators. The Mersenne Twister has a period of 219 937 − 1 iterations (≈ 4.3×106001), is proven to beequidistributed in (up to) 623 dimensions (for 32-bit values), and at the time of its introduction was running faster than other statistically reasonable generators.
In 2003,George Marsaglia introduced the family ofxorshift generators,[10] again based on a linear recurrence. Such generators are extremely fast and, combined with a nonlinear operation, they pass strong statistical tests.[11][12][13]
In 2006, theWELL family of generators was developed.[14] The WELL generators in some ways improves on the quality of the Mersenne Twister, which has a too-large state space and a very slow recovery from state spaces with a large number of zeros.
A counter-based random number generation (CBRNG, also known as a counter-based pseudo-random number generator, or CBPRNG) is a kind of PRNG that uses only an integer counter as its internal state:
They are generally used for generating pseudorandom numbers for large parallel computations, such as over GPU or CPU clusters.[15] They have certain advantages:
Examples include:[15]
A PRNG suitable forcryptographic applications is called acryptographically-secure PRNG (CSPRNG). A requirement for a CSPRNG is that an adversary not knowing the seed has onlynegligibleadvantage in distinguishing the generator's output sequence from a random sequence. In other words, while a PRNG is only required to pass certain statistical tests, a CSPRNG must pass all statistical tests that are restricted topolynomial time in the size of the seed. Though a proof of this property is beyond the current state of the art ofcomputational complexity theory, strong evidence may be provided byreducing to the CSPRNG from aproblem that is assumed to behard, such asinteger factorization.[16] In general, years of review may be required before an algorithm can be certified as a CSPRNG.
Some classes of CSPRNGs include the following:
It has been shown to be likely that theNSA has inserted an asymmetricbackdoor into theNIST-certified pseudorandom number generatorDual_EC_DRBG.[20]
Most PRNG algorithms produce sequences that areuniformly distributed by any of several tests. It is an open question, and one central to the theory and practice ofcryptography, whether there is any way to distinguish the output of a high-quality PRNG from a truly random sequence. In this setting, the distinguisher knows that either the known PRNG algorithm was used (but not the state with which it was initialized) or a truly random algorithm was used, and has to distinguish between the two.[21] The security of most cryptographic algorithms and protocols using PRNGs is based on the assumption that it is infeasible to distinguish use of a suitable PRNG from use of a truly random sequence. The simplest examples of this dependency arestream ciphers, which (most often) work byexclusive or-ing theplaintext of a message with the output of a PRNG, producingciphertext. The design of cryptographically adequate PRNGs is extremely difficult because they must meet additional criteria. The size of its period is an important factor in the cryptographic suitability of a PRNG, but not the only one.
The GermanFederal Office for Information Security (German:Bundesamt für Sicherheit in der Informationstechnik, BSI) has established four criteria for quality of deterministic random number generators.[22] They are summarized here:
For cryptographic applications, only generators meeting the K3 or K4 standards are acceptable.
Given:
We call a function (where is the set of positive integers) apseudo-random number generator for given taking values inif and only if:
( denotes the number of elements in the finite set.)
It can be shown that if is a pseudo-random number generator for the uniform distribution on and if is theCDF of some given probability distribution, then is a pseudo-random number generator for, where is the percentile of, i.e.. Intuitively, an arbitrary distribution can be simulated from a simulation of the standard uniform distribution.
An early computer-based PRNG, suggested byJohn von Neumann in 1946, is known as themiddle-square method. The algorithm is as follows: take any number, square it, remove the middle digits of the resulting number as the "random number", then use that number as the seed for the next iteration. For example, squaring the number "1111" yields "1234321", which can be written as "01234321", an 8-digit number being the square of a 4-digit number. This gives "2343" as the "random" number. Repeating this procedure gives "4896" as the next result, and so on. Von Neumann used 10 digit numbers, but the process was the same.
A problem with the "middle square" method is that all sequences eventually repeat themselves, some very quickly, such as "0000". Von Neumann was aware of this, but he found the approach sufficient for his purposes and was worried that mathematical "fixes" would simply hide errors rather than remove them.
Von Neumann judged hardware random number generators unsuitable, for, if they did not record the output generated, they could not later be tested for errors. If they did record their output, they would exhaust the limited computer memories then available, and so the computer's ability to read and write numbers. If the numbers were written to cards, they would take very much longer to write and read. On theENIAC computer he was using, the "middle square" method generated numbers at a rate some hundred times faster than reading numbers in frompunched cards.
The middle-square method has since been supplanted by more elaborate generators.
A recent innovation is to combine the middle square with aWeyl sequence. This method produces high-quality output through a long period (seemiddle-square method).
Numbers selected from a non-uniform probability distribution can be generated using auniform distribution PRNG and a function that relates the two distributions.
First, one needs thecumulative distribution function of the target distribution:
Note that. Using a random numberc from a uniform distribution as the probability density to "pass by", we get
so that
is a number randomly selected from distribution. This is based on theinverse transform sampling.
For example, the inverse of cumulativeGaussian distribution with an ideal uniform PRNG with range (0, 1) as input would produce a sequence of (positive only) values with a Gaussian distribution; however
Similar considerations apply to generating other non-uniform distributions such asRayleigh andPoisson.