Inmathematics, areal number is said to besimply normal in anintegerbaseb[Note 1] if its infinite sequence ofdigits is distributed uniformly in the sense that each of theb digit values has the samenatural density 1/b. A number is said to benormal in baseb if, for every positive integern, all possible stringsn digits long have density b−n.
Intuitively, a number being simply normal means that no digit occurs more frequently than any other. If a number is normal, no finite combination of digits of a given length occurs more frequently than any other combination of the same length. A normal number can be thought of as an infinite sequence of coin flips (binary) or rolls of a die (base 6). Even though therewill be sequences such as 10, 100, or more consecutive tails (binary) or fives (base 6) or even 10, 100, or more repetitions of a sequence such as tail-head (two consecutive coin flips) or 6-1 (two consecutive rolls of a die), there will also be equally many of any other sequence of equal length. No digit or sequence is "favored".
A number is said to benormal (sometimes calledabsolutely normal) if it is normal in all integer bases greater than or equal to 2.
While a general proof can be given thatalmost all real numbers are normal (meaning that theset of non-normal numbers hasLebesgue measure zero),[1] this proof is notconstructive, and only a few specific numbers have been shown to be normal. For example, anyChaitin's constant is normal (anduncomputable). It is widely believed that the (computable) numbers√2,π, ande are normal, but a proof remains elusive.[2]
LetΣ be a finitealphabet ofb-digits,Σω the set of all infinitesequences that may be drawn from that alphabet, andΣ∗ the set of finite sequences, orstrings.[Note 2] LetS ∈Σω be such a sequence. For eacha inΣ letNS(a,n) denote the number of times the digita appears in the firstn digits of the sequenceS. We say thatS issimply normal if thelimit
for eacha. Now letw be any finite string inΣ∗ and letNS(w,n) be the number of times the stringw appears as asubstring in the firstn digits of the sequenceS. (For instance, ifS =01010101..., thenNS(010, 8) = 3.)S isnormal if, for all finite stringsw ∈Σ∗,
where|w| denotes the length of the stringw. In other words,S is normal if all strings of equal length occur with equalasymptotic frequency. For example, in a normal binary sequence (a sequence over the alphabet{0,1}),0 and1 each occur with frequency1⁄2;00,01,10, and11 each occur with frequency1⁄4;000,001,010,011,100,101,110, and111 each occur with frequency1⁄8; etc. Roughly speaking, theprobability of finding the stringw in any given position inS is precisely that expected if the sequence had been produced atrandom.
Suppose now thatb is aninteger greater than 1 andx is areal number. Consider the infinite digit sequence expansionSx,b ofx in the basebpositional number system (we ignore the decimal point). We say thatx issimply normal in baseb if the sequenceSx,b is simply normal[3] and thatx isnormal in baseb if the sequenceSx,b is normal.[4] The numberx is called anormal number (or sometimes anabsolutely normal number) if it is normal in baseb for every integerb greater than 1.[5][6]
A given infinite sequence is either normal or not normal, whereas a real number, having a different base-b expansion for each integerb ≥ 2, may be normal in one base but not in another[7][8] (in which case it is not a normal number). For basesr ands withlogr / logsrational (so thatr =bm ands =bn) every number normal in baser is normal in bases. For basesr ands withlogr / logs irrational, there are uncountably many numbers normal in each base but not the other.[8]
Adisjunctive sequence is a sequence in which every finite string appears. A normal sequence is disjunctive, but a disjunctive sequence need not be normal. Arich number in baseb is one whose expansion in baseb is disjunctive:[9] one that is disjunctive to every base is calledabsolutely disjunctive or is said to be alexicon. A number normal in baseb is rich in baseb, but not necessarily conversely. The real numberx is rich in baseb if and only if the set{xbn mod 1 :n ∈N} isdense in theunit interval.[9][Note 3]
We defined a number to be simply normal in baseb if each individual digit appears with frequency1⁄b. For a given baseb, a number can be simply normal (but not normal or rich), rich (but not simply normal or normal), normal (and thus simply normal and rich), or none of these. A number isabsolutely non-normal orabsolutely abnormal if it is not simply normal in any base.[5][10]
The concept of a normal number was introduced byÉmile Borel (1909). Using theBorel–Cantelli lemma, he proved thatalmost all real numbers are normal, establishing the existence of normal numbers.Wacław Sierpiński (1917) showed that it is possible to specify a particular such number. Becher and Figueira (2002) proved that there is acomputable absolutely normal number.Although this construction does not directly give the digits of the numbers constructed, it shows that it is possible in principle to enumerate each digit of a particular normal number.
The set of non-normal numbers, despite being "large" in the sense of beinguncountable, is also anull set (as its Lebesgue measure as a subset of the real numbers is zero, so it essentially takes up no space within the real numbers). Also, the non-normal numbers (as well as the normal numbers) are dense in the reals: the set of non-normal numbers between two distinct real numbers is non-empty since it containsevery rational number (in fact, it is uncountably infinite[11] and evencomeagre). For instance, there are uncountably many numbers whose decimal expansions (in base 3 or higher) do not contain the digit 1, and none of those numbers are normal.
obtained by concatenating the decimal representations of thenatural numbers in order, is normal in base 10. Likewise, the different variants of Champernowne's constant (done by performing the same concatenation in other bases) are normal in their respective bases (for example, the base-2 Champernowne constant is normal in base 2), but they have not been proven to be normal in other bases.
obtained by concatenating theprime numbers in base 10, is normal in base 10, as proved byA. H. Copeland and Paul Erdős (1946). More generally, the latter authors proved that the real number represented in baseb by the concatenation
0.f(1)f(2)f(3)...,
wheref(n) is then-th prime expressed in baseb, is normal in baseb.Besicovitch (1935) proved that the number represented by the same expression, withf(n) =n2,
0.149162536496481100121144...,
obtained by concatenating thesquare numbers in base 10, is normal in base 10.Harold Davenport and Erdős (1952) proved that the number represented by the same expression, withf being any non-constant polynomial whose values on the positive integers are positive integers, expressed in base 10, is normal in base 10.
Nakai and Shiokawa (1992) proved that iff(x) is any non-constantpolynomial with real coefficients such thatf(x) > 0 for allx > 0, then the real number represented by the concatenation
0.[f(1)][f(2)][f(3)]...,
where [f(n)] is theinteger part off(n) expressed in baseb, is normal in baseb. (This result includes as special cases all of the above-mentioned results of Champernowne, Besicovitch, and Davenport & Erdős.) The authors also show that the same result holds even more generally whenf is any function of the form
f(x) = α·xβ + α1·xβ1 + ... + αd·xβd,
where the αs and βs are real numbers with β > β1 > β2 > ... > βd ≥ 0, andf(x) > 0 for allx > 0.
It has been an elusive goal to prove the normality of numbers that are not artificially constructed. While√2,π,ln(2), ande are strongly conjectured to be normal, it is still not known whether they are normal or not. It has not even been proven that all digits actually occur infinitely many times in the decimal expansions of those constants (for example, in the case of π, the popular claim "every string of numbers eventually occurs in π" is not known to be true).[12] It has also been conjectured that everyirrationalalgebraic number is absolutely normal (which would imply that√2 is normal), and no counterexamples are known in any base. However, no irrational algebraic number has been proven to be normal in any base.
Norational number is normal in any base, since the digit sequence of a rational number iseventually periodic, and thus most of the strings that are longer than the period do not appear in the digit sequence.[Note 4]
Martin (2001) gives an example of an irrational number that is absolutely abnormal.[13] Let
Every non-zero real number is the product of two normal numbers. This follows from the general fact that every number is the product of two numbers from a set if the complement ofX has measure 0.
Ifx is normal in baseb, anda ≠ 0 is a rational number, then is also normal in baseb.[14]
If isdense (for every and for all sufficiently largen,), and are the base-b expansions of the elements ofA, then the number, formed by concatenating the elements ofA, is normal in baseb (Copeland and Erdős 1946). From this it follows that Champernowne's number is normal in base 10 (since the set of all positive integers is obviously dense) and that the Copeland–Erdős constant is normal in base 10 (since theprime number theorem implies that the set of primes is dense).
A sequence is normalif and only if everyblock of equal length appears with equal frequency. (A block of lengthk is a substring of lengthk appearing at a position in the sequence that is a multiple ofk: e.g. the first length-k block inS isS[1..k], the second length-k block isS[k+1..2k], etc.) This was implicit in the work of Ziv and Lempel (1978) and made explicit in the work of Bourke, Hitchcock, and Vinodchandran (2005).
A number is normal in baseb if and only if it is simply normal in basebk for all. This follows from the previous block characterization of normality: Since then-th block of lengthk in its baseb expansion corresponds to then-th digit in its basebk expansion, a number is simply normal in basebk if and only if blocks of lengthk appear in its baseb expansion with equal frequency.
A number is normal if and only if it is simply normal in every base. This follows from the previous characterization of baseb normality.
A number isb-normal if and only if there exists a set of positive integers where the number is simply normal in basesbm for all[15] No finite set suffices to show that the number isb-normal.
All normal sequences areclosed under finite variations: adding, removing, or changing afinite number of digits in any normal sequence leaves it normal. Similarly, if a finite number of digits are added to, removed from, or changed in any simply normal sequence, the new sequence is still simply normal.
Agafonov showed an early connection betweenfinite-state machines and normal sequences: every infinite subsequence selected from a normal sequence by aregular language is also normal. In other words, if one runs a finite-state machine on a normal sequence, where each of the finite-state machine's states are labeled either "output" or "no output", and the machine outputs the digit it reads next after entering an "output" state, but does not output the next digit after entering a "no output state", then the sequence it outputs will be normal.[16]
A deeper connection exists with finite-state gamblers (FSGs) and information lossless finite-state compressors (ILFSCs).
Afinite-state gambler (a.k.a.finite-statemartingale) is a finite-state machine over a finite alphabet, each of whose states is labelled with percentages of money to bet on each digit in. For instance, for an FSG over the binary alphabet, the current stateq bets some percentage of the gambler's money on the bit 0, and the remaining fraction of the gambler's money on the bit 1. The money bet on the digit that comes next in the input (total money times percent bet) is multiplied by, and the rest of the money is lost. After the bit is read, the FSG transitions to the next state according to the input it received. A FSGdsucceeds on an infinite sequenceS if, starting from $1, it makes unbounded money betting on the sequence; i.e., ifwhere is the amount of money the gamblerd has after reading the firstn digits ofS (seelimit superior).
Afinite-state compressor is a finite-state machine with output strings labelling itsstate transitions, including possibly the empty string. (Since one digit is read from the input sequence for each state transition, it is necessary to be able to output the empty string in order to achieve any compression at all). An information lossless finite-state compressor is a finite-state compressor whose input can be uniquely recovered from its output and final state. In other words, for a finite-state compressorC with state setQ,C is information lossless if the function, mapping the input string ofC to the output string and final state ofC, is1–1. Compression techniques such asHuffman coding orShannon–Fano coding can be implemented with ILFSCs. An ILFSCCcompresses an infinite sequenceS ifwhere is the number of digits output byC after reading the firstn digits ofS. Thecompression ratio (thelimit inferior above) can always be made to equal 1 by the 1-state ILFSC that simply copies its input to the output.
Schnorr and Stimm showed that no FSG can succeed on any normal sequence, and Bourke, Hitchcock and Vinodchandran showed theconverse. Therefore:
A sequence is normal if and only if there is no finite-state gambler that succeeds on it.
Ziv and Lempel showed:
A sequence is normal if and only if it is incompressible by any information lossless finite-state compressor
(they actually showed that the sequence's optimal compression ratio over all ILFSCs is exactly itsentropy rate, a quantitative measure of its deviation from normality, which is 1 exactly when the sequence is normal). Since theLZ compression algorithm compresses asymptotically as well as any ILFSC, this means that the LZ compression algorithm can compress any non-normal sequence.[17]
These characterizations of normal sequences can be interpreted to mean that "normal" = "finite-state random"; i.e., the normal sequences are precisely those that appear random to any finite-state machine. Compare this with thealgorithmically random sequences, which are those infinite sequences that appear random to any algorithm (and in fact have similar gambling and compression characterizations withTuring machines replacing finite-state machines).
^It is trivial though to construct rational numbers that are simply normal in a given baseb. For example, 0.010101... in base 2 is simply normal, since 0 and 1 occur with the same frequency.
Adamczewski, Boris; Bugeaud, Yann (2010), "8. Transcendence and diophantine approximation", inBerthé, Valérie; Rigo, Michael (eds.),Combinatorics, automata, and number theory, Encyclopedia of Mathematics and its Applications, vol. 135, Cambridge:Cambridge University Press, pp. 410–451,ISBN978-0-521-51597-9,Zbl1271.11073
Besicovitch, A. S. (1935), "The asymptotic distribution of the numerals in the decimal representation of the squares of the natural numbers",Mathematische Zeitschrift,39:146–156,doi:10.1007/BF01201350,S2CID123025145
Billingsley, Patrick (2012),Probability and measure (Anniversary ed.), Hoboken, N.J.: Wiley, p. 15,ISBN9781118122372,OCLC780289503
Harman, Glyn (2002), "One hundred years of normal numbers", in Bennett, M. A.;Berndt, B.C.;Boston, N.; Diamond, H.G.; Hildebrand, A.J.; Philipp, W. (eds.),Surveys in number theory: Papers from the millennial conference on number theory, Natick, MA: A K Peters, pp. 57–74,ISBN1-56881-162-4,Zbl1062.11052