Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

SCRU128 Specification

License

NotificationsYou must be signed in to change notification settings

scru128/spec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SCRU128 ID is yet another attempt to supersedeUUID for the users who needdecentralized, globally unique time-ordered identifiers. SCRU128 is inspired byULID andKSUID and has the following features:

  • 128-bit unsigned integer type
  • Sortable by generation time (as integer and as text)
  • 25-digit case-insensitive textual representation (Base36)
  • 48-bit millisecond Unix timestamp that ensures useful life until year 10889
  • Up to 281 trillion time-ordered but unpredictable unique IDs per millisecond
  • 80-bitthree-layer randomness forglobal uniqueness

Examples in the 25-digit canonical textual representation:

0372hg16csmsm50l8dikcvukc0372hg16csmsm50l8djl6xi250372hg16csmsm50l8dmgepzz10372hg16csmsm50l8doir38270372hg16cy3nowracls909wcd0372hg16cy3nowraclvp355ce0372hg16cy3nowraclxf2ctzh0372hg16cy3nowraclyunyjke

SCRU128's size (128 bits) might not fit in some use cases because of storageefficiency. If you need compact, time-ordered unique identifiers generated bydistributed nodes, considerSCRU64. See the following comparison table for theproperties of the two schemes.

SCRU64SCRU128
Long nameSortable, Clock-based, Realm-specifically Unique identifierSortable, Clock and Random number-based Unique identifier
Binary size63 bits128 bits
Textual size12 digits25 digits
Textual encodingBase36 with0-9a-z (case-insensitive)Base36 with0-9a-z (case-insensitive)
Timestamp rangeJanuary 1, 1970 - February 27, 4261 (UTC)January 1, 1970 - August 2, 10889 (UTC)
Timestamp resolution256 milliseconds1 millisecond
Number of IDs per timestampUp to 8.4 million per 256 milliseconds per node (configurable)140.7 trillion per millisecond per node (on average)
Number of distributed nodesUp to 8.4 million (configurable)No specific limitation
Source of uniquenessCentrally pre-assigned node IDIndependently generated random numbers
Choose it when you ...Prefer 64-bit integer for storage, indexing, and other reasonsWant unique IDs without hassle to coordinate generators

Implementations

If you are interested in implementing SCRU128, see alsoSCRU128 GeneratorTester.

Specification v2.1.1

A SCRU128 ID is a 128-bit unsigned integer consisting of four terms:

timestamp*2^80+counter_hi*2^56+counter_lo*2^32+entropy

Where:

  • timestamp is a 48-bit Unix timestamp in milliseconds (i.e., millisecondselapsed since 1970-01-01 00:00:00+00:00, ignoring leap seconds).
  • counter_hi is a 24-bit randomly initialized counter that is incremented byone whencounter_lo reaches its maximum value.counter_hi is reset to arandom number whentimestamp has moved forward by one second or more sincethe last renewal ofcounter_hi.
    • Note:counter_hi effectively works like an entropy component (rather thana counter) that is refreshed only once per second.
  • counter_lo is a 24-bit randomly initialized counter that is incremented byone for each new ID generated within the sametimestamp.counter_lo isreset to a random number whenevertimestamp moves forward. Whencounter_loreaches its maximum value,counter_hi is incremented andcounter_lo isreset to zero.
  • entropy is a 32-bit random number renewed for each new ID generated.

This definition is equivalent to allocating four unsigned integer fields to a128-bit space according to the following layout:

Bit numbersField nameSizeData typeByte order
Msb 0 - 47timestamp48 bitsUnsigned integerBig-endian
Msb 48 - 71counter_hi24 bitsUnsigned integerBig-endian
Msb 72 - 95counter_lo24 bitsUnsigned integerBig-endian
Msb 96 - 127entropy32 bitsUnsigned integerBig-endian

Note that this specification does not specify a canonical bit layout of SCRU128ID. An implementation may employ any binary form of a 128-bit unsigned integerto represent a SCRU128 ID.

Textual representation

A SCRU128 ID is encoded in a string using theBase36 encoding. The Base36denotes a SCRU128 ID as a 128-bit unsigned integer in the radix of 36 using thedigits of0-9a-z (0123456789abcdefghijklmnopqrstuvwxyz), with leading zerosadded to form a 25-digit canonical representation. The following pseudo equationillustrates the encoding algorithm:

1993501768880490086615869617690763354    =  0  * 36^24 +  3  * 36^23 +  7  * 36^22 + ... + 27  * 36^2 + 29  * 36^1 + 22    = '0' * 36^24 + '3' * 36^23 + '7' * 36^22 + ... + 'r' * 36^2 + 't' * 36^1 + 'm'    = "0372ijojuxuhjsfkeryi2mrtm"

Although a 25-digit Base36 numeral can encode more than 128-bit information, anynumeral greater thanf5lxx1zz5pnorynqglhzmsp33 (2^128 - 1, the largest128-bit unsigned integer) is not a valid SCRU128 ID.

For the sake of uniformity, an encoder should use lowercase letters in encodingIDs. A decoder, on the other hand, must always ignore cases when interpreting orlexicographically sorting encoded IDs.

The Base36 encoding shown above is available by default in several languages(e.g.,BigInteger#toString(int radix) andBigInteger(String val, int radix)constructor in Java). Another easy way to implement it is by using 128-bit orarbitrary-precision integer division and modulo operations. The following C codeillustrates a naive algorithm based on normal arrays and integers:

constuint8_tid[16]= {1,127,239,57,194,100,27,165,106,148,131,24,136,65,224,90};// convert byte array into digit value arrayuint8_tdigit_values[25]= {0};for (inti=0;i<16;i++) {unsignedintcarry=id[i];for (intj=24;j >=0;j--) {carry+=digit_values[j]*256;digit_values[j]=carry %36;carry=carry /36;  }}// convert digit value array into stringstaticconstchardigits[]="0123456789abcdefghijklmnopqrstuvwxyz";chartext[26];for (inti=0;i<25;i++) {text[i]=digits[digit_values[i]];}text[25]='\0';puts(text);// 0372ijojuxuhjsfkeryi2mrtm

Seethe attached reference code for a comprehensive example and test vectors.

Special-purpose IDs

The IDs withtimestamp set at zero or2^48 - 1 are reserved for specialpurposes (e.g., use as dummy, error, or example values) and must not be used orassigned as an identifier of anything.

Considerations

Quality of random numbers

A generator should employ a cryptographically strong random or pseudorandomnumber generator to generate unpredictable IDs.

Counter overflow handling

Counter overflow occurs at an extremely low probability when the randomlyinitializedcounter_hi andcounter_lo do not provide sufficient space forthe IDs generated within a millisecond. The recommended approach to handlecounter overflow is to incrementtimestamp and continue in the following way:

  1. Incrementtimestamp by one.
  2. Resetcounter_hi to zero.
  3. Resetcounter_lo to a random number.

This approach is recommended over other options such as the "sleep till nexttick" approach because this technique allows the generation of monotonicallyordered IDs in a non-blocking manner. Raising an error on a counter overflow isgenerally not recommended because a counter overflow is not a fault of users ofSCRU128.

This approach results in a greatertimestamp value than the real-time clock.Such a gap betweentimestamp and the wall clock should be handled as a smallclock rollback discussed below.

Clock rollback handling

A SCRU128 generator relies on a real-time clock to ensure the monotonic order ofgenerated IDs; therefore, it cannot guarantee monotonicity when the clock movesback. When a generator detects a clock rollback by comparing the up-to-datetimestamp from the system clock and the one embedded in the last generated ID,the recommended treatment is:

  1. If the rollback is small enough (e.g., a few seconds), treat thetimestampof the last generated ID as the up-to-date one, betting that the wall clockwill catch up soon.
  2. Otherwise, resettimestamp to the wall clock andcounter_hi andcounter_lo to random numbers if the monotonic order of IDs is notcritically important, or raise an error if it is.

This approach keeps the monotonic order of IDs when a clock rollback is small,while it otherwise resets the generator and proceeds as if another new generatorwere created to minimize the chance of collision.

Stateless variant

A generator may fillcounter_hi andcounter_lo with random numbers if itgenerates IDs infrequently. Such a stateless implementation is acceptable,though not recommended, because the outcome is not distinguishable fromcompliant IDs.

Design notes: three-layer randomness

SCRU128 utilizes timestamps and counters to ensure the uniqueness of IDsgenerated by a single generator, whereas it relies on 80-bit entropy in the usecases with distributed generators. SCRU128 fills the 80-bit field with a randomnumber when a new ID is infrequently (less than one ID per second) generated.For the distributed high-load use cases, SCRU128 assigns different lifetimes tothe three entropy components to improve the collision resistance:

  1. 24-bitcounter_hi: reset to a random number every second
  2. 24-bitcounter_lo: reset to a random number every millisecond
  3. 32-bitentropy: reset to a random number for every new ID generated

The longer lifetimes ofcounter_hi andcounter_lo reduce the number ofrandom numbers consumed and accordingly reduce the probability of at least onecollision because, for a given length of random bits, the less the number ofdice throws, the lower the chance of collision.

In other words, generators are assigned to 24-bitcounter_hi buckets everysecond, and thus they will not collide with each other as long as their bucketsdiffer, even if each generates a bunch of IDs. 24-bit random numbers usuallycollide if millions of instances are generated, but the one-second interval ofcounter_hi renewals decreases the number of trials drastically. Nevertheless,counter_hi is refreshed every second to prevent potential attackers fromexploiting this field as a generator's fingerprint.

Even within the same bucket, the generators will not collide as long as initialcounter_lo values are sufficiently distant from each other. Such a near matchprobability, if tried only once a millisecond, is much lower thanthe simplebirthday collision probability calculated over all the IDs generated within amillisecond.entropy provides additional protection in the extremely rarecases where bothcounter_hi andcounter_lo collide, but it is primarilyintended to ensure a certain level of unguessability of consecutive IDsgenerated by a single generator.

License

This work is licensed under aCreative Commons Attribution 4.0 International(CC BY 4.0) License.


[8]ページ先頭

©2009-2025 Movatter.jp