Movatterモバイル変換

[0]ホーム

Jump to content

Hash collision

Edit links

From Wikipedia, the free encyclopedia

Hash function phenomenon

John Smith and Sandra Dee share the same hash value of 02, causing a hash collision.

Incomputer science, ahash collision orhash clash^[1] is when two distinct pieces of data in ahash table share the same hash value. The hash value in this case is derived from ahash function which takes a data input and returns a fixed length of bits.^[2]

Although hash algorithms, especially cryptographic hash algorithms, have been created with the intent of beingcollision resistant, they can still sometimes map different data to the same hash (by virtue of thepigeonhole principle). Malicious users can take advantage of this to mimic, access, or alter data.^[3]

Due to the possible negative applications of hash collisions indata management andcomputer security (in particular,cryptographic hash functions), collision avoidance has become an important topic in computer security.

Background

[edit]

Hash collisions can be unavoidable depending on the number of objects in a set and whether or not the bit string they are mapped to is long enough in length. When there is a set of $n {\displaystyle n}$ objects, if $n {\displaystyle n}$ is greater than $|R|$ , which in this case $R {\displaystyle R}$ is the set of the hash values, a hash collision is guaranteed to occur.^[4]

Another reason hash collisions are likely at some point in time stems from the idea of thebirthday paradox in mathematics. This problem looks at the probability of a set of two randomly chosen people having the same birthday out of $n {\displaystyle n}$ number of people.^[5] This idea has led to what has been called thebirthday attack. The premise of this attack is that it is difficult to find a birthday that specifically matches your birthday or a specific birthday, but the probability of finding a set ofany two people with matching birthdays increases the probability greatly. Bad actors can use this approach to make it simpler for them to find hash values that collide with any other hash value – rather than searching for a specific value.^[6]

The impact of collisions depends on the application. When hash functions and fingerprints are used to identify similar data, such ashomologous DNA sequences or similar audio files, the functions are designed so as tomaximize the probability of collision between distinct but similar data, using techniques likelocality-sensitive hashing.^[7]Checksums, on the other hand, are designed to minimize the probability of collisions between similar inputs, without regard for collisions between very different inputs.^[8] Instances where bad actors attempt to create or find hash collisions are known ascollision attacks.^[9]

In practice, security-related applications use cryptographic hash algorithms, which are designed to be long enough for random matches to be unlikely, fast enough that they can be used anywhere, and safe enough that it would be extremely hard to find collisions.^[8]

Collision resolution

[edit]

Main article:Hash table § Collision resolution

In hash tables, since hash collisions are inevitable, hash tables have mechanisms of dealing with them, known as collision resolutions. Two of the most common strategies areopen addressing andseparate chaining. The cache-conscious collision resolution is another strategy that has been discussed in the past for string hash tables.

Open addressing

[edit]

Main article:Open addressing

Cells in the hash table are assigned one of three states in this method – occupied, empty, or deleted. If a hash collision occurs, the table will be probed to move the record to an alternate cell that is stated as empty. There are different types of probing that take place when a hash collision happens and this method is implemented. Some types of probing arelinear probing,double hashing, andquadratic probing.^[10] Open Addressing is also known as closed hashing.^[11]

Separate chaining

[edit]

Further information:Hash table § Separate chaining

This strategy allows more than one record to be "chained" to the cells of a hash table. If two records are being directed to the same cell, both would go into that cell as a linked list. This efficiently prevents a hash collision from occurring since records with the same hash values can go into the same cell, but it has its disadvantages. Keeping track of so many lists is difficult and can cause whatever tool that is being used to become very slow.^[10] Separate chaining is also known as open hashing.^[12]

Cache-conscious collision resolution

[edit]

Although much less used than the previous two,Askitis & Zobel (2005) has proposed thecache-conscious collision resolution method in 2005.^[13] It is a similar idea to the separate chaining methods, although it does not technically involve the chained lists. In this case, instead of chained lists, the hash values are represented in a contiguous list of items. This is better suited for string hash tables and the use for numeric values is still unknown.^[10]

References

[edit]

^Thomas, Cormen (2009),Introduction to Algorithms, MIT Press, p. 253,ISBN 978-0-262-03384-8
^Stapko, Timothy (2008),"Embedded Security",Practical Embedded Security, Elsevier, pp. 83–114,doi:10.1016/b978-075068215-2.50006-9,ISBN 9780750682152, retrieved2021-12-08{{citation}}: CS1 maint: work parameter with ISBN (link)
^Schneier, Bruce."Cryptanalysis of MD5 and SHA: Time for a New Standard".Computerworld. Archived fromthe original on 2016-03-16. Retrieved2016-04-20.Much more than encryption algorithms, one-way hash functions are the workhorses of modern cryptography.
^Cybersecurity and Applied Mathematics. 2016.doi:10.1016/c2015-0-01807-x.ISBN 9780128044520.
^Soltanian, Mohammad Reza Khalifeh (10 November 2015).Theoretical and Experimental Methods for Defending Against DDoS Attacks.ISBN 978-0-12-805399-7.OCLC 1162249290.
^Conrad, Eric; Misenar, Seth; Feldman, Joshua (2016),"Domain 3: Security Engineering (Engineering and Management of Security)",CISSP Study Guide, Elsevier, pp. 103–217,doi:10.1016/b978-0-12-802437-9.00004-7,ISBN 9780128024379, retrieved2021-12-08{{citation}}: CS1 maint: work parameter with ISBN (link)
^Rajaraman, A.;Ullman, J. (2010)."Mining of Massive Datasets, Ch. 3".
^^a ^bAl-Kuwari, Saif; Davenport, James H.; Bradford, Russell J. (2011).Cryptographic Hash Functions: Recent Design Trends and Security Notions. Inscrypt '10.
^Schema, Mike (2012).Hacking Web Apps.
^^a ^b ^cNimbe, Peter; Ofori Frimpong, Samuel; Opoku, Michael (2014-08-20)."An Efficient Strategy for Collision Resolution in Hash Tables".International Journal of Computer Applications.99 (10):35–41.Bibcode:2014IJCA...99j..35N.doi:10.5120/17411-7990.ISSN 0975-8887.
^Kline, Robert."Closed Hashing".CSC241 Data Structures and Algorithms. West Chester University. Retrieved2022-04-06.
^"Open hashing or separate chaining".Log₂2.
^Askitis, Nikolas; Zobel, Justin (2005). Consens, M.; Navarro, G. (eds.).Cache-Conscious Collision Resolution in String Hash Tables. International Symposium on String Processing and Information Retrieval.String Processing and Information Retrieval SPIRE 2005. Lecture Notes in Computer Science. Vol. 3772. Berlin, Heidelberg: Springer Berlin Heidelberg. pp. 91–102.doi:10.1007/11575832_11.ISBN 978-3-540-29740-6.

External links

[edit]

v t e Cryptographic hash functions andmessage authentication codes
List Comparison Known attacks
Common functions	MD5 (compromised) SHA-1 (compromised) SHA-2 SHA-3 BLAKE2
SHA-3 finalists	BLAKE Grøstl JH Skein Keccak (winner)
Other functions	BLAKE3 CubeHash ECOH FSB Fugue GOST HAS-160 HAVAL Kupyna LSH Lane MASH-1 MASH-2 MD2 MD4 MD6 MDC-2 N-hash RIPEMD RadioGatún SIMD SM3 SWIFFT Shabal Snefru Streebog Tiger VSH Whirlpool
Password hashing/ key stretching functions	Argon2 Balloon bcrypt Catena crypt LM hash Lyra2 Makwa PBKDF2 scrypt yescrypt
General purpose key derivation functions	HKDF KDF1/KDF2
MAC functions	CBC-MAC DAA GMAC HMAC NMAC OMAC/CMAC PMAC Poly1305 SipHash UMAC VMAC
Authenticated encryption modes	CCM ChaCha20-Poly1305 CWC EAX GCM IAPM OCB
Attacks	Collision attack Preimage attack Birthday attack Brute-force attack Rainbow table Side-channel attack Length extension attack
Design	Avalanche effect Hash collision Merkle–Damgård construction Sponge function HAIFA construction
Standardization	CAESAR Competition CRYPTREC NESSIE NIST hash function competition Password Hashing Competition NSA Suite B CNSA
Utilization	Hash-based cryptography Merkle tree Message authentication Proof of work Salt Pepper

v t e Cryptography
General	History of cryptography Outline of cryptography Classical cipher Cryptographic protocol Authentication protocol Cryptographic primitive Cryptanalysis Cryptocurrency Cryptosystem Cryptographic nonce Cryptovirology Hash function Cryptographic hash function Key derivation function Secure Hash Algorithms Digital signature Kleptography Key (cryptography) Key exchange Key generator Key schedule Key stretching Keygen Machines Ransomware Random number generation Cryptographically secure pseudorandom number generator (CSPRNG) Pseudorandom noise (PRN) Secure channel Insecure channel Subliminal channel Encryption Decryption End-to-end encryption Harvest now, decrypt later Information-theoretic security Plaintext Codetext Ciphertext Shared secret Trapdoor function Trusted timestamping Key-based routing Onion routing Garlic routing Kademlia Mix network
Mathematics	Cryptographic hash function Block cipher Stream cipher Symmetric-key algorithm Authenticated encryption Public-key cryptography Quantum key distribution Quantum cryptography Post-quantum cryptography Message authentication code Random numbers Steganography
Category