Python Enhancement Proposals

Python »
PEP Index »
PEP 247

PEP 247 – API for Cryptographic Hash Functions

Author:: A.M. Kuchling <amk at amk.ca>
Status:

Abstract

There are several different modules available that implement cryptographichashing algorithms such as MD5 or SHA. This document specifies a standard APIfor such algorithms, to make it easier to switch between differentimplementations.

Specification

All hashing modules should present the same interface. Additional methods orvariables can be added, but those described in this document should always bepresent.

Hash function modules define one function:

new([string]) (unkeyedhashes)

new([key],[string]) (keyedhashes)

Create a new hashing object and return it. The first form is for hashesthat are unkeyed, such as MD5 or SHA. For keyed hashes such as HMAC,keyis a required parameter containing a string giving the key to use. In bothcases, the optionalstring parameter, if supplied, will be immediatelyhashed into the object’s starting state, as ifobj.update(string)was called.
After creating a hashing object, arbitrary strings can be fed into theobject using itsupdate() method, and the hash value can be obtained atany time by calling the object’sdigest() method.
Arbitrary additional keyword arguments can be added to this function, but ifthey’re not supplied, sensible default values should be used. For example,rounds anddigest_size keywords could be added for a hash functionwhich supports a variable number of rounds and several different outputsizes, and they should default to values believed to be secure.

Hash function modules define one variable:

digest_size

An integer value; the size of the digest produced by the hashing objectscreated by this module, measured in bytes. You could also obtain this valueby creating a sample object and accessing itsdigest_size attribute, butit can be convenient to have this value available from the module. Hasheswith a variable output size will set this variable toNone.

Hashing objects require a single attribute:

digest_size

This attribute is identical to the module-leveldigest_size variable,measuring the size of the digest produced by the hashing object, measured inbytes. If the hash has a variable output size, this output size must bechosen when the hashing object is created, and this attribute must containthe selected size. Therefore,None isnot a legal value for thisattribute.

Hashing objects require the following methods:

copy()

Return a separate copy of this hashing object. An update to this copy won’taffect the original object.

digest()

Return the hash value of this hashing object as a string containing 8-bitdata. The object is not altered in any way by this function; you cancontinue updating the object after calling this function.

hexdigest()

Return the hash value of this hashing object as a string containinghexadecimal digits. Lowercase letters should be used for the digitsathroughf. Like the.digest() method, this method mustn’t alter theobject.

update(string)

Hashstring into the current state of the hashing object.update() canbe called any number of times during a hashing object’s lifetime.

Hashing modules can define additional module-level functions or object methodsand still be compliant with this specification.

Here’s an example, using a module namedMD5:

>>>fromCrypto.HashimportMD5>>>m=MD5.new()>>>m.digest_size16>>>m.update('abc')>>>m.digest()'\x90\x01P\x98<\xd2O\xb0\xd6\x96?}(\xe1\x7fr'>>>m.hexdigest()'900150983cd24fb0d6963f7d28e17f72'>>>MD5.new('abc').digest()'\x90\x01P\x98<\xd2O\xb0\xd6\x96?}(\xe1\x7fr'

Rationale

The digest size is measured in bytes, not bits, even though hash algorithmsizes are usually quoted in bits; MD5 is a 128-bit algorithm and not a 16-byteone, for example. This is because, in the sample code I looked at, the lengthin bytes is often needed (to seek ahead or behind in a file; to compute thelength of an output string) while the length in bits is rarely used. Therefore,the burden will fall on the few people actually needing the size in bits, whowill have to multiplydigest_size by 8.

It’s been suggested that theupdate() method would be better namedappend(). However, that method is really causing the current state of thehashing object to be updated, andupdate() is already used by the md5 andsha modules included with Python, so it seems simplest to leave the nameupdate() alone.

The order of the constructor’s arguments for keyed hashes was a sticky issue.It wasn’t clear whether thekey should come first or second. It’s a requiredparameter, and the usual convention is to place required parameters first, butthat also means that thestring parameter moves from the first position tothe second. It would be possible to get confused and pass a single argument toa keyed hash, thinking that you’re passing an initial string to an unkeyedhash, but it doesn’t seem worth making the interface for keyed hashes moreobscure to avoid this potential error.

Changes

2001-09-17: Renamedclear() toreset(); addeddigest_size attributeto objects; added.hexdigest() method.

2001-09-20: Removedreset() method completely.

2001-09-28: Setdigest_size toNone for variable-size hashes.

Acknowledgements

Thanks to Aahz, Andrew Archibald, Rich Salz, Itamar Shtull-Trauring, and thereaders of the python-crypto list for their comments on this PEP.

Copyright

This document has been placed in the public domain.

Source:https://github.com/python/peps/blob/main/peps/pep-0247.rst

Last modified:2025-02-01 08:59:27 GMT