- Notifications
You must be signed in to change notification settings - Fork0
scru128/spec
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
SCRU128 ID is yet another attempt to supersedeUUID for the users who needdecentralized, globally unique time-ordered identifiers. SCRU128 is inspired byULID andKSUID and has the following features:
- 128-bit unsigned integer type
- Sortable by generation time (as integer and as text)
- 25-digit case-insensitive textual representation (Base36)
- 48-bit millisecond Unix timestamp that ensures useful life until year 10889
- Up to 281 trillion time-ordered but unpredictable unique IDs per millisecond
- 80-bitthree-layer randomness forglobal uniqueness
Examples in the 25-digit canonical textual representation:
0372hg16csmsm50l8dikcvukc0372hg16csmsm50l8djl6xi250372hg16csmsm50l8dmgepzz10372hg16csmsm50l8doir38270372hg16cy3nowracls909wcd0372hg16cy3nowraclvp355ce0372hg16cy3nowraclxf2ctzh0372hg16cy3nowraclyunyjke
SCRU128's size (128 bits) might not fit in some use cases because of storageefficiency. If you need compact, time-ordered unique identifiers generated bydistributed nodes, considerSCRU64. See the following comparison table for theproperties of the two schemes.
SCRU64 | SCRU128 | |
---|---|---|
Long name | Sortable, Clock-based, Realm-specifically Unique identifier | Sortable, Clock and Random number-based Unique identifier |
Binary size | 63 bits | 128 bits |
Textual size | 12 digits | 25 digits |
Textual encoding | Base36 with0-9a-z (case-insensitive) | Base36 with0-9a-z (case-insensitive) |
Timestamp range | January 1, 1970 - February 27, 4261 (UTC) | January 1, 1970 - August 2, 10889 (UTC) |
Timestamp resolution | 256 milliseconds | 1 millisecond |
Number of IDs per timestamp | Up to 8.4 million per 256 milliseconds per node (configurable) | 140.7 trillion per millisecond per node (on average) |
Number of distributed nodes | Up to 8.4 million (configurable) | No specific limitation |
Source of uniqueness | Centrally pre-assigned node ID | Independently generated random numbers |
Choose it when you ... | Prefer 64-bit integer for storage, indexing, and other reasons | Want unique IDs without hassle to coordinate generators |
- C/C++
- Go
- Java (with Kotlin and Android compatibility)
- JavaScript
- Python (and command-line tools)
- Rust
- Swift
If you are interested in implementing SCRU128, see alsoSCRU128 GeneratorTester.
A SCRU128 ID is a 128-bit unsigned integer consisting of four terms:
timestamp*2^80+counter_hi*2^56+counter_lo*2^32+entropy
Where:
timestamp
is a 48-bit Unix timestamp in milliseconds (i.e., millisecondselapsed since 1970-01-01 00:00:00+00:00, ignoring leap seconds).counter_hi
is a 24-bit randomly initialized counter that is incremented byone whencounter_lo
reaches its maximum value.counter_hi
is reset to arandom number whentimestamp
has moved forward by one second or more sincethe last renewal ofcounter_hi
.- Note:
counter_hi
effectively works like an entropy component (rather thana counter) that is refreshed only once per second.
- Note:
counter_lo
is a 24-bit randomly initialized counter that is incremented byone for each new ID generated within the sametimestamp
.counter_lo
isreset to a random number whenevertimestamp
moves forward. Whencounter_lo
reaches its maximum value,counter_hi
is incremented andcounter_lo
isreset to zero.entropy
is a 32-bit random number renewed for each new ID generated.
This definition is equivalent to allocating four unsigned integer fields to a128-bit space according to the following layout:
Bit numbers | Field name | Size | Data type | Byte order |
---|---|---|---|---|
Msb 0 - 47 | timestamp | 48 bits | Unsigned integer | Big-endian |
Msb 48 - 71 | counter_hi | 24 bits | Unsigned integer | Big-endian |
Msb 72 - 95 | counter_lo | 24 bits | Unsigned integer | Big-endian |
Msb 96 - 127 | entropy | 32 bits | Unsigned integer | Big-endian |
Note that this specification does not specify a canonical bit layout of SCRU128ID. An implementation may employ any binary form of a 128-bit unsigned integerto represent a SCRU128 ID.
A SCRU128 ID is encoded in a string using theBase36 encoding. The Base36denotes a SCRU128 ID as a 128-bit unsigned integer in the radix of 36 using thedigits of0-9a-z
(0123456789abcdefghijklmnopqrstuvwxyz
), with leading zerosadded to form a 25-digit canonical representation. The following pseudo equationillustrates the encoding algorithm:
1993501768880490086615869617690763354 = 0 * 36^24 + 3 * 36^23 + 7 * 36^22 + ... + 27 * 36^2 + 29 * 36^1 + 22 = '0' * 36^24 + '3' * 36^23 + '7' * 36^22 + ... + 'r' * 36^2 + 't' * 36^1 + 'm' = "0372ijojuxuhjsfkeryi2mrtm"
Although a 25-digit Base36 numeral can encode more than 128-bit information, anynumeral greater thanf5lxx1zz5pnorynqglhzmsp33
(2^128 - 1
, the largest128-bit unsigned integer) is not a valid SCRU128 ID.
For the sake of uniformity, an encoder should use lowercase letters in encodingIDs. A decoder, on the other hand, must always ignore cases when interpreting orlexicographically sorting encoded IDs.
The Base36 encoding shown above is available by default in several languages(e.g.,BigInteger#toString(int radix)
andBigInteger(String val, int radix)
constructor in Java). Another easy way to implement it is by using 128-bit orarbitrary-precision integer division and modulo operations. The following C codeillustrates a naive algorithm based on normal arrays and integers:
constuint8_tid[16]= {1,127,239,57,194,100,27,165,106,148,131,24,136,65,224,90};// convert byte array into digit value arrayuint8_tdigit_values[25]= {0};for (inti=0;i<16;i++) {unsignedintcarry=id[i];for (intj=24;j >=0;j--) {carry+=digit_values[j]*256;digit_values[j]=carry %36;carry=carry /36; }}// convert digit value array into stringstaticconstchardigits[]="0123456789abcdefghijklmnopqrstuvwxyz";chartext[26];for (inti=0;i<25;i++) {text[i]=digits[digit_values[i]];}text[25]='\0';puts(text);// 0372ijojuxuhjsfkeryi2mrtm
Seethe attached reference code for a comprehensive example and test vectors.
The IDs withtimestamp
set at zero or2^48 - 1
are reserved for specialpurposes (e.g., use as dummy, error, or example values) and must not be used orassigned as an identifier of anything.
A generator should employ a cryptographically strong random or pseudorandomnumber generator to generate unpredictable IDs.
Counter overflow occurs at an extremely low probability when the randomlyinitializedcounter_hi
andcounter_lo
do not provide sufficient space forthe IDs generated within a millisecond. The recommended approach to handlecounter overflow is to incrementtimestamp
and continue in the following way:
- Increment
timestamp
by one. - Reset
counter_hi
to zero. - Reset
counter_lo
to a random number.
This approach is recommended over other options such as the "sleep till nexttick" approach because this technique allows the generation of monotonicallyordered IDs in a non-blocking manner. Raising an error on a counter overflow isgenerally not recommended because a counter overflow is not a fault of users ofSCRU128.
This approach results in a greatertimestamp
value than the real-time clock.Such a gap betweentimestamp
and the wall clock should be handled as a smallclock rollback discussed below.
A SCRU128 generator relies on a real-time clock to ensure the monotonic order ofgenerated IDs; therefore, it cannot guarantee monotonicity when the clock movesback. When a generator detects a clock rollback by comparing the up-to-datetimestamp from the system clock and the one embedded in the last generated ID,the recommended treatment is:
- If the rollback is small enough (e.g., a few seconds), treat the
timestamp
of the last generated ID as the up-to-date one, betting that the wall clockwill catch up soon. - Otherwise, reset
timestamp
to the wall clock andcounter_hi
andcounter_lo
to random numbers if the monotonic order of IDs is notcritically important, or raise an error if it is.
This approach keeps the monotonic order of IDs when a clock rollback is small,while it otherwise resets the generator and proceeds as if another new generatorwere created to minimize the chance of collision.
A generator may fillcounter_hi
andcounter_lo
with random numbers if itgenerates IDs infrequently. Such a stateless implementation is acceptable,though not recommended, because the outcome is not distinguishable fromcompliant IDs.
SCRU128 utilizes timestamps and counters to ensure the uniqueness of IDsgenerated by a single generator, whereas it relies on 80-bit entropy in the usecases with distributed generators. SCRU128 fills the 80-bit field with a randomnumber when a new ID is infrequently (less than one ID per second) generated.For the distributed high-load use cases, SCRU128 assigns different lifetimes tothe three entropy components to improve the collision resistance:
- 24-bit
counter_hi
: reset to a random number every second - 24-bit
counter_lo
: reset to a random number every millisecond - 32-bit
entropy
: reset to a random number for every new ID generated
The longer lifetimes ofcounter_hi
andcounter_lo
reduce the number ofrandom numbers consumed and accordingly reduce the probability of at least onecollision because, for a given length of random bits, the less the number ofdice throws, the lower the chance of collision.
In other words, generators are assigned to 24-bitcounter_hi
buckets everysecond, and thus they will not collide with each other as long as their bucketsdiffer, even if each generates a bunch of IDs. 24-bit random numbers usuallycollide if millions of instances are generated, but the one-second interval ofcounter_hi
renewals decreases the number of trials drastically. Nevertheless,counter_hi
is refreshed every second to prevent potential attackers fromexploiting this field as a generator's fingerprint.
Even within the same bucket, the generators will not collide as long as initialcounter_lo
values are sufficiently distant from each other. Such a near matchprobability, if tried only once a millisecond, is much lower thanthe simplebirthday collision probability calculated over all the IDs generated within amillisecond.entropy
provides additional protection in the extremely rarecases where bothcounter_hi
andcounter_lo
collide, but it is primarilyintended to ensure a certain level of unguessability of consecutive IDsgenerated by a single generator.
This work is licensed under aCreative Commons Attribution 4.0 International(CC BY 4.0) License.
About
SCRU128 Specification