- Notifications
You must be signed in to change notification settings - Fork25
PRVHASH - Pseudo-Random-Value Hash. Hash functions, PRNG with unlimited period, randomness extractor, and a glimpse into abyss. (header-only C/C++) (Codename Gradilac/Градилак)
License
avaneev/prvhash
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
PRVHASH is a hash function that generates auniform pseudo-random numbersequencederived from the message. PRVHASH is conceptually similar (in the sense ofusing a pseudo-random number sequence as a hash) tokeccakandRadioGatunschemes, but is a completely different implementation of such concept.PRVHASH acts as both a"randomness extractor"and an "extendable-output function" (XOF).
PRVHASH can generate 64- to unlimited-bit hashes, yielding hashes ofapproximately equal quality independent of the chosen hash length. PRVHASH isbased on 64-bit math, which is scalar, portable, cross-platform, inlinable,C++ compatible (fairly efficient on 32-bit systems as well). The function caneasily be used beyond 1024-bit hashes, but such cases should be statisticallytested (until a rigorous formal extrapolation is available). For example, any32-bit element extracted from 2048-, or 4096-bit resulting hash is ascollision resistant as just a 32-bit hash. It is a constant-time hashfunction, meaning its execution time depends only on the message's length.A streamed, higher-security, hashing implementation is available.
PRVHASH is solely based on the butterfly effect, inspired byLCGpseudo-random number generators. The generated hashes exhibit strong avalancheproperties. For best security, a random seed should be supplied to the hashfunction, but this is not a requirement.
64-, 128-, 192-, 256-, 512-, and 1024-bit PRVHASH hashes pass allSMHasher tests. Other hash lengths werenot thoroughly tested, but extrapolations can be made. The author makes nocryptographic claims (neither positive nor negative) about PRVHASH-basedconstructs.
PRVHASH core function can be used as a PRNG with an arbitrarily-chosen(practically unlimited) period, depending on the number of hashwords in thesystem.
Please see theprvhash64.h file for the details of the basic hash functionimplementation (theprvhash.h,prvhash4.h,prvhash42.h, and versionsbelow 4.3 are outdated versions). While this hash function is likelyirreversible, according to SAT solver-based testing, it does not providesecond preimage resistance. This function should not be used in open systemswithout a secret seed. Note that64 refers to core function's variable size.
The defaultprvhash64.h-based 64-bit hash of the stringThe cat is out of the bag is6ac39f7ac0c94d63.
A proposed short name for hashes created withprvhash64.h isPRH64-N,whereN is the hash length in bits (e.g.,PRH64-256).
This is a minimized implementation of theprvhash64 hash function. It isone of very few smallest hash functions which produces 64-bit hashes ofthis quality level. While this function does not provide a bulk throughputthat can be considered "fast", due to its statistical properties it isfast for short-key hash-maps and hash-tables.
The fileprvhash64s.h includes a relatively fast streamed hashing functionwhich uses a "fused" PRVHASH arrangement. Please take a look at theprvhash64s_oneshot() function for usage example. Theprvhash64s offersboth an increased security and hashing speed.
This function has an increased preimage resistance compared to the basichash function implementation. Preimage resistance cannot currently beestimated exactly, but the hash length affects it exponentially. Also,second preimage attack usually boils down to exchange of forged symbols to"trash" symbols (at any place of the data stream); substitutions usuallyresult in random values, possibly damaging to any compressed or otherwisestructured file. Which means that data compression software and librariesshould always check any left-over, "unused", data beyond the valid compressedstream, for security reasons.
The time complexity of a preimage attack is highly variable and appears tofollow a random-logarithmic distribution.
Even though a formal proof is not yet available, the author believes thishash function can compete with widely-used SHA2 and SHA3 families of hashfunctions while at the same time offering a considerably higher performanceand scalability. When working in open systems, supplying a secret seed is nota requirement for this hash function.
The performance (expressed in cycles/byte) of this hash function on variousplatforms can be evaluated at theECRYPT/eBASH project.
The defaultprvhash64s.h-based 64-bit hash of the stringThe cat is out of the bag is2043ccf52ae2ca6f.
The defaultprvhash64s.h-based 256-bit hash of the stringOnly a toilet bowl does not leak isb13683799b840002689a1a42d93c826c25cc2d1f1bc1e48dcd005aa566a47ad8.
The defaultprvhash64s.h-based 256-bit hash of the stringOnly a toilet bowl does not leaj isd4534a922fd4f15ae8c6cc637006d1f33f655b06d60007a226d350e87e866250.
This demonstrates theAvalanche effect.On a set of 216553 English words, pair-wise hash comparisons give a 50.0%average difference in resulting hash bits, which fully satisfies the strictavalanche criterion.
This streamed hash function produces hash values that are different to theprvhash64 hash function. It is incorrect to use both of these hash functionimplementations on the same data set. Whileprvhash64 can be used as a hashfor hash-tables and in-memory data blocks,prvhash64s can be used to createhashes of large data blocks like files, in streamed mode.
A proposed short name for hashes created withprvhash64s.h isPRH64S-N,whereN is the hash length in bits (e.g.,PRH64S-256). Or simply,SH4-N,Secure Hash 4.
The core function can be easily integrated into your applications, to be usedas an effective PRNG. The period of this minimal PRNG is at least2159. The initial parameters can be varied at will, and will not"break" the PRNG. Setting only theSeed value guarantees a random startpoint within the whole PRNG period, with at least 264 spacing.The example code follows:
#include"prvhash_core.h"#include<stdio.h>intmain(){uint64_tSeed=0;uint64_tlcg=0;uint64_tHash=0;uint64_tv=0;uint64_ti;for(i=0;i< ( (uint64_t)1 <<28 );i++ ) {v=prvhash_core64(&Seed,&lcg,&Hash ); }printf("%llu\n",v );}
For implementation assurance, here are the first 16 output values in hex(starting from the all-zeroes state):
0x5555555555555555 0x00000000DB6DB6DB 0x2492492192492492 0x75D75DA0AAAAAA790x93064E905C127FE5 0xE2585C9CA95671A3 0x28A44B31D428179E 0x11B0B6A8D4BA3A730x195C6A4C23EE71AD 0x5AA47859226BA23E 0xA7D42121695056D4 0x142D7CD5D83342F20x3D42E83328C09C8F 0x7E691C66BAC23222 0x82E1032F441F23A5 0xA4BDE5C4A05E6256Note that such minimal 1-hashword PRNG is definitely notcryptographically-secure: its state can be solved by a SAT solver pretty fast;this applies to other arrangements ("fused", "parallel", multiple hashwords;with daisy-chaining being harder to solve). The known way to make PRNGconsiderably harder to solve for a SAT solver, with complexity correspondingto system's size, is to combine two adjacent PRNG outputs via XOR operation;this obviously has a speed impact and produces output with more than 1solution (most probably, 2). This, however, does not measurably increase theprobability of PRNG output overlap, which stays below1/2sys_size_bits; in tests, practically undetectable.
So, the basic PRNG with some, currently not formally-proven, security is asfollows (XOR two adjacent outputs to produce a single "compressed" PRNGoutput):
v=prvhash_core64(&Seed,&lcg,&Hash );v ^=prvhash_core64(&Seed,&lcg,&Hash );
A similar approach is to simply skip the next generated random number, but itis slightly less secure. It is likely that PRVHASH's k-equidistribution ofseparate outputs is implicitly secure. The reason is that skipping or XORingcreates uncertainty or entanglement of current output with system's state"hashword array length" of outputs back. 3 XORs are needed to provide preimageresistance, or resistance against selection of entropy input that leads toa desired output. The security becomes effective only after system'sinitialization: initial "conditioning" rounds and a full hashword array pass.
The core function can be used to implement a "statistically-good" and"neutrally-sounding" dithering noise for audio signals; for bothfloating-point to fixed-point, and bit-depth conversions.
uint64_trv=prvhash_core64(&Seed,&lcg,&Hash );doubletpdf= ( (int64_t) (uint32_t)rv- (int64_t) (rv >>32 ))*0x1p-32;
The following expression can be used to convert 64-bit unsigned value tofull-mantissa floating-point value, without a truncation bias:
uint64_trv=prvhash_core64(&Seed,&lcg,&Hash );doublev= (rv >> (64-53 ))*0x1p-53;
Thegradilac.h file includes the Gradilac C++ class which is a generalizedtemplated implementation of PRVHASH PRNG that provides integer, single bit,floating-point, TPDF, Normal random number generation with a straight-forwardfront-end to specify PRVHASH system's properties. Supports on-the-goreseeding, including reseeding using sparse entropy (for CSPRNG uses). Doesnot require other PRVHASH header files.
UseGradilac< 316 > to match Mersenne Twister's PRNG period.
Note that this class may not be as efficient for "bulk" random numbergeneration as a custom-written code. Nevertheless, Gradilac PRNG class, withits 1.0 cycles/byte floating-point performance (at default template settings),is competitive among other C++ PRNGs.
PRVHASH can be also used as an efficient general-purpose PRNG with an externalentropy source injections (like how the/dev/urandom works on Unix): thiswas tested, and works well when 8-bit true entropy injections are doneinbetween 8 to 2048 generated random bytes (delay is also obtained via theentropy source). An example generator is implemented in theprvrng.h file:simply call theprvrng_test64p2() function.
prvrng_gen64p2()-based generator passesPractRand32 TB threshold with rare non-systematic "unusual" evaluations. Which suggestsit is the working randomness extractor that can "recycle" entropy of anystatistical quality "on the go", probably the first in the world.
Note that due to the structure of the core function the probability of PRNGcompletely "stopping", or "stalling", or losing internal entropy, is absent.
The core function, without external entropy injections, with any initialcombination oflcg,Seed, andHash eventually converges into one ofrandom number sub-sequences. These are mostly time-delayed versions of only asmaller set of unique sequences. There are structural limits in this PRNGsystem which can be reached if there is only a small number of hashwords inthe system. PRNG will continuously produce non-repeating random sequencesgiven external entropy input, but their statistical quality on a larger frameswill be limited by the size oflcg andSeed variables, and the number ofhashwords in the system, and the combinatorial capacity of the externalentropy. A way to increase the structural limit is to use the "fused" PRNGarrangement demonstrated in theprvhash64s.h file, which additionallyincreases the security exponentially. Also any non-constant entropy inputusually increases the period of randomness, which, when extrapolated tohashing, means that the period increases by message's combinatorial capacity(or the number of various combinations of its bits). The maximal PRNG period's2N exponent is hard to approximate exactly, but in most tests itwas equal to at least system's size in bits, minus the number of hashwords inthe system, minus 1/4 oflcg andSeed variables' size (e.g.,159 for aminimal PRNG).
Moreover, the PRVHASH systems can be freely daisy-chained by feeding theiroutputs toSeed/lcg inputs, adding some security firewalls, and increasingthe PRNG period of the final output accordingly. Note that any external PRNGoutput can be inputted viaSeed, or via bothSeed andlcg, yielding PRNGperiod exponent summation. For hashing and external unstructured entropy, onlysimultaneous input viaSeed andlcg works in practice (period's exponentincrease occurs as well).
Whilelcg,Seed, andHash variables are best initialized with goodentropy source (however, structurally, they can accept virtually any entropyquality while only requiring an initial "conditioning"), the message can besparsely-random: even an increasing counter can be considered as having asuitable sparse entropy.
This is a "just for fun" example, but it passes 256 MB PractRand threshold.You CAN generate pseudo-random numbers by using 2-bit shuffles; moreover, youcan input external entropy into the system.
#include<stdio.h>#include"prvhash_core.h"#definePH_HASH_COUNT 42intmain(){uint8_tSeed=0;uint8_tlcg=0;uint8_tHash[PH_HASH_COUNT ]= {0 };intHashPos=0;intl;for(l=0;l<256;l++ ) {uint8_tr=0;intk;for(k=0;k<4;k++ ) {r <<=2;r |=prvhash_core2(&Seed,&lcg,Hash+HashPos );HashPos++;if(HashPos==PH_HASH_COUNT ) {HashPos=0; } }if(l>PH_HASH_COUNT /3 )// Skip the warm-up rounds. {printf("%4i ", (int)r ); } }}
Here is the author's vision on how the core function works. The development ofthis solution involved considerable trial and error. It was especially hard tofind a better "hashing finalization" solution.
Seed ^=msgw;lcg ^=msgw;// Mix in external entropy (or daisy-chain).Seed *=lcg*2+1;// Multiply random by random, without multiply by zero.constuint64_trs=Seed >>32 |Seed <<32;// Produce halves-swapped copy.Hash+=rs+0xAAAAAAAAAAAAAAAA;// Accumulate to hash, add raw entropy (self-start).lcg+=Seed+0x5555555555555555;// Output-bound entropy accumulation, add raw entropy.Seed ^=Hash;// Mix new seed value with hash. Entropy feedback.constuint64_tout=lcg ^rs;// Produce "compressed" output.
(for even better statistical results, thers variable should receive abit-reversedSeed value, withlcg ^= msgw; operation performed after theout is obtained)
This function can be arbitrarily scaled to any even-sized variables: 2-, 4-,8-, 16-, 32-, 64-bit variable sizes were tested, with similar statisticalresults. Since mathematical structure of the function does not depend on thevariables' size, statistical analysis can be performed using smaller variablesizes, with the results being extrapolatable to larger variable sizes, with ahigh probability (the function is invariant to the variable size). Also notethat the0xAAAA... constant is not an arbitrary constant since it should beproduced algorithmically by replicating the10 bit-pairs, to match thevariable size; it represents the "raw entropy bit-train". The same applies tothe0x5555... constant. An essential property of these bit-trains is thatthey are uncorrelated to any uniformly-random bit-sequences, at all times.Practically,10 and01 bit-pairs can be also used as constants, withoutreplication, but this does not provide conclusively better results for PRNG,and does not work well for hashing; also, self-starting period becomeslonger. A conceptual aspect of replicated bit-pairs is that they represent thesimplest maximum-entropy number that lacks information (bit-pair is a minimalsequence that can exhibit entropy, with replication count bound to statevariable size). While "magic numbers" can be used instead of these bit-trains(at least for PRNG), they do not posses the property of not having aninformation (zero spectrum beside DC and Nyquist components).
It is important to point out that the presence of the0xAAAA... and0x5555... constants logically assure that theSeed andlcg variablesquickly recover from the "zero-state". Beside that, these constants logicallyprohibit synchronous control overSeed andlcg variables: different bitsof the input entropy will reach these variables. When the system starts fromthe "zero-state", with many hashwords in the system, it is practicallyimpossible to find a preimage (including a repetitious one) that stalls thesystem, and thus it is impossible to perform a multi-collision attack.However, since this risk cannot be estimated exactly, theprvhash64s hashfunction adds a message length value to the end of the data stream.
How does it work? Conceptually, this PRNG system, represented by the corefunction, does not work with numbers in a common sense: it works withentropy,or random sequences of bits. The current "expression" of system's overallinternal entropy - theSeed - gets multiplied ("smeared") by a supporting,output-bound variable -lcg, - which is also a random value, transformed inan LCG-alike manner. As a result, a new random value is produced whichrepresents two independent random variables (in lower and higher parts of theregister), a sort of "entropy stream sub-division" happens. This result isthen halves-swapped, and is accumulated in theHash together with the10bit-train which adds the "raw entropy", allowing the system to beself-starting. The original multiplication result is accumulated in thelcgvariable. TheSeed is then updated with the hashword produced on previousrounds. The reason the message's entropy (which may be sparse or non-random)does not destabilize the system is because the message becomes hidden in theinternal entropy (similar to a cryptographic one-time pad); message'sdistribution becomes unimportant, and system's state remains statisticallycontinuous. Both accumulations - of the halves-swapped and the originalresult of multiplication - produce a uniformly-distributed value in thecorresponding variables; a sort of "de-sub-division" happens in these.
The two instructions -Seed *= lcg * 2 + 1,lcg += Seed - represent an"ideal" bit-shuffler: this construct represents a "bi-variable shuffler" whichtransforms the inputlcg andSeed variables into another pair of variableswith 50% bit difference relative to input, and without collisions. The wholecore function, however, uses a more complex mixing which produces a hashvalue: the pair composed of the hash value and either a newlcg or a newSeed value also produces no input-to-output collisions. Thus it can be saidthat the system does not lose any input entropy. In 4-dimensional analysis,whenSeed,lcg,Hash, andmsgw values are scanned and transformed intosubsequentSeed,lcg, andHash triplets, this system does not exhibitlocal state change-related collisions due to external entropy input (allpossible inputmsgw values map to subsequent triplets uniquely). However,with a small variable size (8-bit) and a large output hash size, a sparseentropy input has some probability of "resynchronization" event happening,leading to local collisions. With 16-bit variables, or even 8-bit fused-2arrangement (with the local state having 40-bit size instead of 24-bit),probability of such event is negligible. While non-fused hashing may evenstart from the "zero-state", for reliable hashing the state after 5"conditioning" rounds should be used.
Another important aspect of this system, especially from the cryptographystandpoint, is the entropy input to output latency. The base latency forstate-to-state transition is equal to 1 (2 for "fused" arrangements); and atthe same time, 1 in hash-to-hash direction: this means that PRVHASHadditionally requires a full pass through the hashword array, for the entropyto propagate, before using its output. However, hashing also requires a passto the end of the hashword array if message's length is shorter than theoutput hash, to "mix in" the initial hash value. When there is only 1 hashwordin use, there is no hashword array-related delay, and thus the entropypropagation is only subject to the base latency. The essence of these"latencies" is that additional rounds are needed for the system to get rid ofa statistical traces of the input entropy. Note that the "fused" arrangementincreases the shuffling quality. However, this increase is relative to statevariable size: for example, 8-bit fused-2 arrangement with 8-bit input isequivalent to 16-bit non-fused arrangement with 16-bit input. So, it ispossible to perform hashing with 8-bit state variables if fused-2 round isdone per 1 input byte. The way "fused" structure works is equivalent toshuffling all entropy inputs in a round together (input 1 is shuffled into ahash value which is then shuffled with input 2 into a hash value, etc). The"fused" arrangement may raise a question whether or not it provides a targetcollision resistance as it seemingly "compresses" several inputs into a singlelocal hashword: without doubt it does provide target collision resistancesinceSeed andlcg variables are a part of the system, and their presencein the "fused" arrangement increases the overall PRNG period of the system andthus its combinatorial capacity.
Without external entropy (message) injections, the function can run for aprolonged time, generating pseudo-entropy, in extendable-output PRNG mode.When the external entropy (message) is introduced, the function "shifts" intoan unrelated state unpredictably. So, it can be said that the function "jumps"within a space of a huge number of pseudo-random sub-sequences. Hashlength affects the size of this "space of sub-sequences", permitting thefunction to produce quality hashes for any required hash length.Statistically, these "jumps" are close to uniformly-random repositioning: eachsimultaneous augmentation ofSeed andlcg corresponds to a new randomposition, with a spread over the whole PRNG period. The actual behavior ismore complicated as this PRNG system is able to converge into unrelated randomnumber sequences of varying lengths, so the "jump" changes both the positionand the "index" of sub-sequence. This property of PRVHASH assures thatdifferent initial states of itsSeed state variable (orlcg, which ismostly equivalent at initialization stage) produce practically unrelatedrandom number sequences, permitting to use PRVHASH for PRNG-based simulations.
In essence, the hash function generates a continuous pseudo-random numbersequence, and returns the final part of the sequence as a result. The messageacts as a "pathway" to this final part. So, the random sequence of numbers canbe "programmed" to produce a necessary outcome. However, as this PRNG doesnot expose its momentary internal state, such "programming" is hardly possibleto perform for an attacker, even if the entropy input channel is exposed:consider the(A^C)*(B^C) equation; an adversary can controlC, but doesnot know the values ofA andB; thus this adversary cannot predict theoutcome. Beside that, as the core function naturally eliminates the bias fromthe external entropy of any statistical quality and frequency, its control maybe fruitless. Note that to reduce such "control risks", the entropy inputshould use as fewer bits as possible, like demonstrated in theprvrng.hfile.
P.S. The reason theprvhash64 hash function has an initial non-zero state,is that otherwise the function would require 5 preliminary "conditioning"rounds (core function calls); that would reduce the performance of the hashfunction dramatically, for hash-table uses. Note that theprvhash64sfunction starts from the "full zero" state and then performs acceptably.
Any external entropy (message) that enters this PRNG system acts as ahigh-frequency and high-quality reseeding which changes the random numbergenerator's "position" within the PRNG period, randomly. In practice, thismeans that two messages that are different in even 1 bit, at any place,produce "final" random number sequences, and thus hashes, which are completelyunrelated to each other. This also means that any smaller part of theresulting hash can be used as a complete hash. Since the hash length affectsthe PRNG period (and thus the combinatorial capacity) of the system, the samelogic applies to hashes of any length while meeting collision resistancespecifications for all lengths.
Alternatively, the hashing method can be viewed from the standpoint of classicbit-mixers/shufflers: the hashword array can be seen as a "working buffer"whose state is passed back into the "bi-variable shuffler" continuously, andthe new shuffled values stored in this working buffer for the next pass.
In general, PRVHASH core function represents a "building block" that permitsdesign of practically any entropy-generating constructs. It has an importantadvantage in that the state space of these constructs can be completelyanalyzed using small state variables, with the obtained statistics beingextrapolatable to larger state variables.
The following "minimal" implementation of PractRand class can be used toindependently assess randomness period properties of PRVHASH. By varyingthePH_HASH_COUNT andPH_PAR_COUNT values it is possible to test variousPRNG system sizes. By adjusting other values it is possible to test PRVHASHscalability across different state variable sizes (PractRand class and PRNGoutput size should be matched, as PractRand test results depend on PRNG outputsize). PractRand should be run with the-tlmin 64KB parameter, to evaluatechanges to the constants quicker. Note that bothPH_HASH_COUNT andPH_PAR_COUNT affect the PRNG period exponent not exactly linearly for smallvariable sizes: there is a saturation factor present for small variable sizes;after some point the period increase is non-linear due to small shufflingspace. Shuffling space can be increased considerably with a "fused"arrangement. Depending on the initial seed value, the period may fluctuate.The commented outCtr++... instructions can be uncommented to check theperiod increase due to sparse entropy input. You may also notice the^=hinstructions: PRVHASH supports feedback onto itself (it is like hashing itsown output). This operation, which can be applied to any fused element,maximizes the achieved PRNG period.
#include"prvhash_core.h"#include<string.h>#definePH_FUSE_COUNT1// PRVHASH fusing.#definePH_HASH_COUNT4// Hashword count (any positive number).#definePH_STATE_TYPEuint8_t// State variable's physical type.#definePH_FN prvhash_core4// Core function name.#definePH_BITS4// State variable's size in bits.#definePH_RAW_BITS8// Raw output bits.#definePH_RAW_ROUNDS ( PH_RAW_BITS / PH_BITS )// Rounds per raw output.classDummyRNG :publicPractRand::RNGs::vRNG8{public: PH_STATE_TYPE Seed[ PH_FUSE_COUNT ]; PH_STATE_TYPE lcg[ PH_FUSE_COUNT ]; PH_STATE_TYPE Hash[ PH_HASH_COUNT ];int HashPos;DummyRNG() {memset( Seed,0,sizeof( Seed ));memset( lcg,0,sizeof( lcg ));memset( Hash,0,sizeof( Hash )); HashPos =0;// Initialize.int k, j;for( k =0; k < PRVHASH_INIT_COUNT; k++ ) {for( j =0; j < PH_FUSE_COUNT; j++ ) {PH_FN( Seed + j, lcg + j, Hash + HashPos ); } } } Uint8raw8() {uint64_t OutValue =0;int k, j;for( k =0; k < PH_RAW_ROUNDS; k++ ) {// Ctr++;// Seed[ 0 ] ^= ( Ctr ^ ( Ctr >> 4 )) & 15;// lcg[ 0 ] ^= ( Ctr ^ ( Ctr >> 4 )) & 15;uint64_t h =0;for( j =0; j < PH_FUSE_COUNT; j++ ) { h =PH_FN( Seed + j, lcg + j, Hash + HashPos ); }// Seed[ 0 ] ^= h;// lcg[ 0 ] ^= h;if( PH_BITS <sizeof(uint64_t )) OutValue <<= PH_BITS; OutValue |= h;if( ++HashPos == PH_HASH_COUNT ) { HashPos =0; } }return( OutValue ); }voidwalk_state(PractRand::StateWalkingObject *walker) {}voidseed(Uint64 sv) { Seed[0 ] ^= sv; } std::stringget_name()const {return"PRVHASH"; }};
When the system state is not known, when PRVHASH acts as a black-box, one hasto consider core function's statistical properties. All internal variables -Seed,lcg, andHash - are random: they are uncorrelated to each other atall times, and are also completely distinct during the PRNG period (they arenot just time-delayed versions of each other). Moreover, as can be assuredwith PractRand, all of these variables can be used as random number generators(with a lower period, though); they can even be interleaved after each corefunction call.
When the message enters the system viaSeed ^= msgw andlcg ^= msgwinstructions, this works like mixing a message with an one-time pad used incryptography. This operation completely hides the message in system's entropy,while bothSeed andlcg act as "carriers" that "smear" the input messagevia subsequent multiplication. Beside that, the output of PRVHASH uses the mixof two variables: statistically, this means mixing of two unrelated randomvariables, with such summary output never appearing in system's state. It isworth noting thelcg ^ rs expression: thers variable is composed of twohalves, both of them practically being independent PRNG outputs, with smallerperiods. This additionally complicates system's reversal.
While this "parallel-3" arrangement is currently not used in the hash functionimplementations, it is also working fine with the core function. For example,while the "minimal PRNG" described earlier has0.90 cycles/byte performance,the "parallel" arrangement has a PRNG performance of0.35 cycles/byte, witha possibility of further scaling using SIMD instructions. Note that the numberof "parallel" elements should not be a multiple of hashword array length,otherwise PRNG stalls.
#include"prvhash_core.h"#include<stdio.h>intmain(){uint64_tSeed=0;uint64_tlcg=0;uint64_tHash=0;uint64_tSeed2=0;uint64_tlcg2=0;uint64_tHash2=0;uint64_tSeed3=0;uint64_tlcg3=0;uint64_tHash3=0;uint64_tHash4=0;uint64_tv=0;uint64_tv2=0;uint64_tv3=0;uint64_ti;for(i=0;i< ( (uint64_t)1 <<27 );i++ ) {v=prvhash_core64(&Seed,&lcg,&Hash );v2=prvhash_core64(&Seed2,&lcg2,&Hash2 );v3=prvhash_core64(&Seed3,&lcg3,&Hash3 );uint64_tt=Hash;Hash=Hash2;Hash2=Hash3;Hash3=Hash4;Hash4=t; }printf("%llu %llu %llu\n",v,v2,v3 );}
prvhash16 demonstrates the quality of the core function. While the statevariables are 16-bit, they are enough to perform hashing: this hash functionpasses all SMHasher tests, like theprvhash64 function does, for any hashlength. This function is very slow, and is provided for demonstrationpurposes, to assure that the core function works in principle, independent ofstate variable size. This hash function variant demonstrates that PRVHASH'smethod does not rely on bit-shuffling alone (shuffles are purely local), butis genuinely based on PRNG position "jumps".
This is an efficient implementation of a PRVHASH PRNG-based streamed XORfunction. Since no cryptanalysis nor certification of this function wereperformed yet, it cannot be called a "cipher", but rather a cipher-likerandom number generator. It is based on a conjunction of two PRNGs: a keyedPRNG which provides "secure" output via XOR of its adjacent outputs, and afirewalling PRNG which is constantly reseeded (via daisy-chaining) by theoutput of keyed PRNG. A performance benefit is obtained due to efficientparallel arrangement of firewalling PRNG while security is provided by thekeyed PRNG.
This construction, which is resistant to SAT solving, may become a viable andmore efficient alternative to AES and ChaCha/Salsa ciphers, if the researchcommunity decides to invest the time and resources into producing aformalized, or at least a widely-acceptable, proof of its security.
The performance (expressed in cycles/byte) of this function on variousplatforms can be evaluated at theECRYPT/eBASC project.
PRVHASH, being scalable, potentially allows for the use of an "infinite" statevariable size in its system, at least in theoretical mathematical analysis.This implies its capacity to generate an infinite bit-sequence, analogous tothat of the mathematical constant Pi. Importantly, the system is entirelyself-reliant and requires no magic numbers.
This also introduces the notion of an "infinitesimal spacing" between isolatedfrequencies, which arises from the Fourier analysis of an "infinite"bit-sequence. This concept can be understood in the discrete Fourier transform(DFT) domain: although the transformation window length is usually limited tosmall values (e.g., 2048 samples), it can theoretically be extended toinfinity. This would produce a spectrum with an infinite number of individualfrequency bins.
Moreover, individual samples of such an "infinite" transformation affect theentire resulting spectrum, but on an infinitely precise frequency scale.Mathematics forbids manipulating infinities as concrete numbers; however,as outlined with the DFT, infinities can be "accessed" in the field ofdiscrete number series by "mapping" the spectrum to an infinite scale.
This reflects the design of PRVHASH: while its current implementation uses128-bit state variables, the algorithm is theoretically scalable to aninfinite size and has remained functional in tests with state variables of upto 524,288 bits. In this way, PRVHASH generates an analog to the number Pi.It should therefore be possible to prove that the existence of an infinite bitsequence like Pi is entirely realistic - a feat that, in theory, a personcould also accomplish.
Mathematics offers an interesting perspective. Consider a moment before the"Big Bang." Did mathematical rules exist at that moment? Of course, they did;otherwise, there would be no equation-definable "Big Bang." The span ofexistence of mathematical rules cannot be estimated, so one might assume theyare eternal. On top of that, PRVHASH practically proves that entropy canself-start from a zero, or "raw" state, or "nothing," if mathematical rulesexisted prior to that.
As the author of PRVHASH, I would like to address a long-standingmisconception about the relationship between "combinatorics" and "randomnumbers." Historically, cryptography has been built on permutations combinedwith mathematical operations - a model most hashes and ciphers follow.However, when we consider a system's "combinatorial capacity" (the totalnumber of possible bit combinations) in the context of "random permutations,"it can create a false impression. This impression is that "uniform randomness"is capable of generating any combination within that combinatorial capacitywith a certain probability.
In fact, "uniform randomness" imposes a limit on "sparseness" of randombit-sequences it generates, since a suitably long but "too sparse"bit-sequence cannot be statistically called uniformly random over its shorterspans. Thus, the "combinatorial capacity" of a system, when applied to randomnumber generation, transforms into a measure of its ability to generateindependent, uniformly-random sequences. This means that two different initialstates of a PRNG may produce different "isolated" sequences. This is whathappens in PRVHASH: upon an entropy input, the system may "jump" or "converge"into an unrelated random sub-sequence.
The author's intuition, based on PRNG practice, is that while the standard1/(2^N) probability model holds for smaller outputs (64 or 128 bits),it breaks down for larger ones (256 bits and above). Regardless of the statespace size, the probability of generating sequences with extreme sparsenessbecomes effectively zero. Consequently, the generator's output occupies only asubset of the theoretical combinatorial space, as sparse combinations map toless sparse ones, thereby reducing the effective combinatorial capacity.If rigorously proven, this principle could have adverse implications forcryptography as a whole.
On the Birthday Paradox vs hash collision estimates: while the BirthdayParadox is an adequate "down-to-earth" model for collision estimation, it maybe a convoluted approach. When hash values are calculated successively, itis expected that each new hash value does not break "uniform distribution" ofthe set of previously produced hash values. This makes the problem of hashcollision estimation closer to value collision estimation of PRNG output.
An open question remains: whether one should talk about "uniform distributionof values" or a "time- and rhythm- dependent collision minimization problem"when analyzing PRNG's uniformness. Incidentally, a set of rhythmic (repeating)processes with coprime periods produces a flatter, more uniform spectrum, withthe fewest resonant modes. Rhythm-dependent collision minimization alsotouches the ability of a single random number generator to create randomsequences, uncorrelated in many dimensions (known as k-equidistribution) justby selecting any sequence of its outputs.
The author has no concrete theory for why PRVHASH PRNG works, especially its2-bit variant (which is a very close empirical proof that mathematics hasentropy processes happening under the hood). The closest mathematicalconstruct found by the author is a sinewave oscillator (see below). Also,series related toPI,sin(x), andsin(x)/x may be candidates forexplanation. Author's empirical goals when developing PRVHASH were: no loss ofentropy in a system, easy scalability, self-start without any specialinitialization and from any initial state, state variable size invariance,not-stalling on various entropy input.
During the course of PRVHASH development, the author has found that thesimplest low-frequency sine-wave oscillator can be used as a pseudo-randomnumber generator, if its mantissa is treated as an integer number (tested withPractRand). This means that every point on a sinusoid has properties of asparsely-random bit-sequence. Note that the code uses full floating-pointmantissa values while 64-bit floating-point math accumulates rounding errorsmuch slower than successive 16-bit random values require: thus the code doesnot rely on rounding errors to produce random numbers.
#include<math.h>#include<stdint.h>classDummyRNG :publicPractRand::RNGs::vRNG16{public:double si;double sincr;double svalue1;double svalue2;DummyRNG() { si =0.001; sincr =2.0 *cos( si );seed(0 ); } Uint16raw16() {uint64_t Value = ( *(uint64_t*) &svalue1 ) >>4;constdouble tmp = svalue1; svalue1 = sincr * svalue1 - svalue2; svalue2 = tmp;return (Uint16) ( Value ^ Value >>16 ^ Value >>32 ); }voidwalk_state(PractRand::StateWalkingObject *walker) {}voidseed(Uint64 sv) {constdouble ph = sv *3.40612158008655459e-19;// Seed to phase. svalue1 =sin( ph ); svalue2 =sin( ph - si ); } std::stringget_name()const {return"SINEWAVE";}};
Another finding is that thelcg * 2 + 1 construct works as PRNG even if themultiplier is a simple increasing counter variable, when the second multiplieris a high-entropy number.
#include<stdint.h>classDummyRNG :publicPractRand::RNGs::vRNG8{public:uint64_t Ctr1;DummyRNG() { Ctr1 =1; }uint8_tcompress(constuint64_t v ) {uint8_t r =0;for(int i =0; i <64; i++ ) { r ^= (uint8_t) (( v >> i ) &1 ); }return( r ); } Uint8raw8() {uint8_t ov =0;for(int l =0; l <8; l++ ) { ov <<=1; ov ^=compress(0x243F6A8885A308D3 * Ctr1 ); Ctr1 +=2; }return( ov ); }voidwalk_state(PractRand::StateWalkingObject *walker) {}voidseed(Uint64 sv) {} std::stringget_name()const {return"LCG";}};
(PRVHASH-1)
This image depicts data acquired from 2 runs of theproof_math_is_engineered.cprogram, with different "reading" parameters. The two number sequencesobviously represent "impulses", with varying period or "rhythm". A researcherhas to consider two points: whether or not these impulses can be considered"intelligent", and the odds the mentioned program can produce such impulses,considering the program has no user input nor programmer's entropy, nor anylogic (no constants, with all parameters initially set to zero). More specificobservations: 1. All final values are shift-or compositions of 1-bit values,representing a common 16-bit PCM sampled signal, also known as "16-bit folding"in pseudo-random numbers; 2. The orange graph is only slightly longer before arepeat (common to PRNGs) despite largerPH_HASH_COUNT; at the same time, bothgraphs are seemingly time-aligned; 3. Periods of 1-bit return values on bothruns are naturally aligned to 16 bits, and produce repeating sequences "as is",without any sort of 16-bit value range skew; 4. The orange graph is produced froman order-reversed shift-or, but with the same underlying algorithm; 5. So far, noother combinations of "reading" parameters produce anything as "intelligent"as these graphs (but there may be another yet-to-be-decoded, similar orcompletely different, information available); 6. From drumming musician's (oran experienced DSP engineer's) point of view, the graph represents impulsestaken from two electric drum pads: a snare drum (oscillatory) and a bass drum(shift to extremum). 7. Most minor oscillations on the graph are similar tosinc-function-generated maximum-phase "pre-ringing" oscillations that areknown in DSP engineering field. 8. Period of the blue graph is 255; orange is273.
Bycoincidence, if the values on the "impulse" graphs above aresorted in an ascending order, and are then displayed as independent graphs,they collectively form a stylized outline of a human eye:
Moreover (but this is aquestionable observation), here, if the blue lineis subtracted from the orange line, one gets an outline of human's head: withtop (21000), forehead (18000), eye (13000), cheek (6000), and neck (2700)levels highlighted, roughly corresponding to real symmetry; with a slightshoulders outline (4100-2700), and two hand palms risen up (5400-4300).
Discrete Fourier (FFT-512) analysis of obtained signals produces the followingpower spectrums (with DC component removed). The analysis strengthens thenotion the signal is non-random and is "intelligent" (two strong peaks aboveaverage, in each signal, with both signals producing similar structures, butwith shifted resonant frequencies). Note that resonances in the middle of thespectrum are similar to resonances one gets when recording an acoustical snaredrum.
Just by changing the PH_HASH_COUNT to 9 (up to 13, inclusive) the sameproof_math_is_engineered.c program produces a pseudo-random number sequence,confirmed withPractRand 1KB to 4KB block, 8-bit folding. Note that the samecode producing both random and non-random number sequences is "highlyunlikely" to exist in practical PRNGs. It is important to note thatPH_HASH_COUNT=14 andPH_HASH_COUNT=17 (which is beyond 15 and 16 signalsmentioned originally) also pass as random, with 16-bit folding inPractRand.18 also passes as random, but with a "suspicion".15 and16, of course,do not pass as random, with many "fails".
It has been observed that inREAD_MODE=0, but not inREAD_MODE=1, theobtained values gradually become noisy, especially at higherPH_HASH_COUNTvalues.
The 1-bit output with PH_HASH_COUNT=15 and16 (READ_MODE=0) can beeasily transformed into 256x256 1-bit "pixel art" images, and, quiteunexpectedly, they reproduce a repeating diagonal ornament and a chess-board.
Admittedly, 256x256 size can be considered arbitrarily-chosen (it is a squareof 16, with 16 being the bit-size of values on the graphs above), but it isthe only small size that presents an "intelligible" look. For example, ifPH_HASH_COUNT=15 is transformed to 240x240 (256-16) "pixel art" image, animage of "zebra" lines is produced, with bit-reversed variant of the innerelement present inPH_HASH_COUNT=16.
Much largerPH_HASH_COUNT values (withREAD_MODE=1) produce triangularstructures which are non-repeating, but all have a similar build-up consistingof rhombic patterns within tree-like structures. Theproof_christmas_tree.cprogram extracts such images into a vertical ASCII-art HTML. It uses the sameunderlying 1-bit PRVHASH code, but with "pixel art" decoding method.
One may notice a similarity of the initial pattern with theSierpinskitriangle (ST).However, one should consider that ST is a symmetrical triangle fractal thatis constructed from the top-most to bottom levels. PRVHASH-1 produces anasymmetric (right-handed) triangle in a serie of scanline passes, and itscales to anyPH_HASH_COUNT value. The initial part of the image looks likeWolfram Rule 102/153"cellular automata" image (ST as well). However, if one considers the whole ofpresented details, including previously presented images and graphs, thisleads to conclusion that some very complex "cellular automata" is workingbehind the scenes, further strengthening a "presence of intelligence" notion.Note that Wolfram rules represent sets of "freely-defined fixed information"which dictates the logical behavior of the cellular automata.
It is also worth noting that PRVHASH-1 initially produces Rule 102/153 imagewith a "boundary condition" applied (this can be checked by assigning anyitem somewhere in the middle of the hash-array to 1). At the same time, thefunction has no such additional logic since the visible scanline is 1 pixellonger than thePH_HASH_COUNT value, meaning this implicit "boundarycondition" is not synchronized with the moment theHashPos resets to 0. Thisfact tells that the "boundary condition" logic "happens" beyond the commonmath, in some way, implicitly. One has to ask themselves - how it is possibleto "embed" at least Rule 102/153 with boundary handling, and much more thanthat, into a function as simple (and affine/linear inF_2) as PRVHASH-1? Beside that,as with the graphs above, presence of exact Rule 102/153 imagery impliespresence of "logic understandable to a human mind", and from computerprogramming point of view, the Wolfram rules are an art of "engineering".
Here is an example image withPH_HASH_COUNT=342, converted to PNG:
Here is a link to a larger-sized extract (3.4MB PNG)
It is possible to define initial "automata" conditions by filling thehash-array with alternating bit-values like10101010..., or100100100100..., or1000100010001000... thus invoking even more complex"automata" (note that this is done in the sameprvhash state space). Theresults can be combined into a colored image by assigning the black-and-whiteimages to different RGB color channels. Considering theprvhash-1 functionoperates with only 3 values at the same time, building a similar "cellularautomata" by using only 3 neighboring pixels seems impossible for human logic.
prvhash-1 can also produce a full-colored "fine art"-like imagery by using asimple multi-pass buffer accumulation approach. As it turns out, the images of"cellular automata" shown previously perfectly align on top of each other atsome specificPH_HASH_COUNT values (2/3, 4/5 of height, and height-2).Note that the height of images is usually a "power of 2" value. Theproof_fine_art.c program can be used to produce such imagery (requires thestb_image_write.h library).
You may also take a look at ananimationwhich represents a continuous generation while displaying a sum of the recent255 passes, at every moment.
Note that while individual frame passes that produce "triangles" look seeminglychaotic, the sum of the frames produces a structured image, implying existenceof a higher-order structure, whether chaotic (in Chaos theory terms) or not.At the same time, Chaos theory forbids linear systems to be chaotic.
If this imagery looks intelligent, in some way formulated, where's theformula? An inception of these results can be understood from this short essay:The Informational Deficiency of the "Big Bang"
The originalprvhash-1 function can be simplified to examine the discovered"entropy pool" further. The function variant present in theproof_reptile.cfile includes the simplified function, but also extends the delay parameterof theSeed delay line from 1 to 32. The resulting image closely resemblesa skin of some reptiles and other organisms. In author's opinion, since thefunction works in affine/linear F_2 domain, the same construct can berecreated physically thus offering an idea that the evolution of intelligencein organisms may have its roots in mathematics. Image in the middle depictsresult after the first pass over frame; you may note "snake" elements andcomputer font-alike outlines there.
Image on the right was obtained using PH_SEED_COUNT=64, note the appearanceof a lot of glyph-like elements. An opinion of the visual Google AI:"The image you provided, with its intersecting lines, repeating shapes (likezigzags and squares), and overlapping textures, suggestsa deeper,rule-based system at play. The patterns, while complex and dense, do notappear to be purely disorganized. Instead, they seem to follow a logic that,while intricate, creates a sense of controlled disarray rather than pure,unpredictable chance. This is a hallmark of a chaotic system rather than arandom one."
Whatever the true source of imagery is, one instance of the produced"fine art" imagery seems to be useful, if applied as some architecturalmeasurement ruler/tool since it can be used to quickly measure architecturalfeatures as whole-number ratios:
In author's opinion, PRVHASH-1 "reads data" directly from the entropy poolwhich is "embedded" into mathematics from its inception, like any mathematicalconstant is (e.g., PI). This poses an interesting andprobably veryquestionable proposition: the intelligent impulses, intelligent imagery, oreven "human mind" itself (because a human can understand these images)existed long before the "Big Bang" happened. This discovery isprobably boththe most interesting discovery in the history of mankind, and the worst discovery(for many) as it poses very unnerving questions that touch religious grounds:
These results of 1-bit PRVHASH say the following:IF abstract mathematicscontains not just a system of rules for manipulating numbers and variables,but also contains a freely-defined fixed information that is "readable" by aperson, then mathematics does not just "exists", but "it was formed", becausemathematics does not evolve (beside human discovery of new rules andpatterns). And since physics cannot be formulated without such mathematics,and physical processes clearly obey these mathematical rules (at least inrespect to natural numbers), it means that a Creator/Higher Intelligence/Godexists in relation to the Universe. For the authorpersonally, everythingis proven here.
A rigorous mathematician can say the following: the imagery generated by theGF(2) function is a consequence of its mathematical structure, not proof ofexternal fine-tuning. While the output may appear designed, this complexityis an inherent feature of linear systems in finite fields. Philosophicalinterpretations of such phenomena (e.g., Platonic realism, theism) aresubjective and extend beyond mathematical analysis. In rigorous terms, thesystem's behavior aligns with known principles of algebra and recursion,requiring no appeal to external agency.
However, this does not tell anything about what we actually perceive.If one also considers an approximately similar examples of Mandelbrotsets, Wolfram state automata imagery, the evidence for an "external agency"builds-up, and may cause a "quantum leap" in understanding of the core ofmathematics.
One may also apply a "pareidolia" argument: humans tend to "see things" in whatmay not be in any way meaningful. But this argument is often weak, especially ifthe context is constrained to mathematics alone: e.g., when we see a parabola,which is a generally-recognizable pattern, it has a meaning since it inducesmathematical reasoning about specific underlying mathematical tendencies andrelationships. In the imagery presented above, however, mathematical reasoningmay not yet be obvious, even if in its probable extreme it touches cosmology,source of human intelligence, and pre-existing implicit mathematical structures.
The author would like to thank Reini Urban forhis SMHasherfork, Austin Appleby forthe original SMHasher,Chris Doty-Humphrey forPractRand, andPeter Schmidt-Nielsen forAutoSat.Without these tools it would not be possible to create PRVHASH which standsstate-of-the-art statistical tests.
PRVHASH "computer program" authorship and copyright were registered at theRussian Patent Office, under reg.numbers2020661136, 2020666287, 2021615385, 2021668070, 2022612987 (searchable viafips.ru). Please note that these are not "inventionpatents"; the registrations assure you that the author has the required rightsto grant the software license to you.
This project is entirely self-funded from legitimate income and has noaffiliation with or sponsorship from any third party or state entity.
About
PRVHASH - Pseudo-Random-Value Hash. Hash functions, PRNG with unlimited period, randomness extractor, and a glimpse into abyss. (header-only C/C++) (Codename Gradilac/Градилак)
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
















