11 år 11 år

Hash functions

DefinitionFixed length output from function given arbritary length input. Can be keyed or un-keyed. Output often called "digest". Function is one way since the output is fixed (many inputs can go to the same output)

Properties

  1. One-wayness (preimage resistance): Given output (and key) it is difficult (practical/impossible) to find the input.
  2. Second preimage (weak collision resistance): Given a specific input (and key) it is difficult to find another input that has the same output.
  3. (Strong) collision resistance: Difficult to find any two inputs that have the same output (easier to find any two inputs than to find another input then one input is already given). When this is satisfied, the 2. is also satisfied. It is possible to make a function such that one is valid without number 2, but it would be artificial.

Additional requirements "Certificational weakness" are important when used for cryptographic purposes.

  • Avalanche property: Changing any one bit in input leads to about half of output bits to change
  • Local one-wayness: No input bits can be guessed based on output bits. Basically number 1 of the previous properties, but applied for individual bits

We also want the algorithm to be fast, especially when used to verify integrity of huge amounts of information.

Constructing a hash function

One way to generate these one-way functions is to use a difficult problem like in asymmetric ciphers (factorization, discrete logarithms, lattice) but they tend to be very slow. The Merkle-Damgård construction is a classical design (MD4/MD5) using a compression function that takes a fixed input and returns a shorter output. It can typically be a symmetric block cipher repeated as many times as unnecessary to run through the whole message always using the last output as the key for the next round. These block ciphers typically have fewer rounds than normal block ciphers to speed it up.


Unfortunately using a strong block cipher does not guarantee a good hash function. MD4 and MD5 are 128 bit and has been broken, remember the strong collision resistance. MD5 is 4 rounds and MD4 is 3 rounds but otherwise equal. SHA-1 (Secure Hash Algorithm) is 160 bit and is not considered secure since collisions has been found. SHA-2 is 256 and can be 512 bit. A contest to design SHA-3 is in progress.

Applications of hash functions

  • Data integrity: comparing your calculation of the hash based on the file/information you got with the hash generated at the source. Un-keyed hash functions are used and speed is important.
  • Authentication: Message authentication Codes (MAC): Keyed hash functions are used to compute the hash. Both ends needs to know the key to verify the hash, and only the trusted parties can generate an accepted hash.
  • Digital signatures: Almost the same as the last one, but based on asymmetric keys. Anyone knowing the public key can verify the signature, but only the one possessing the private key can sign (signing with the deciphering transformation). One problem with signing the original message is that the signature will be as long as the message (encipherment and decipherment is a abjection). This is time and storage consuming. Another problem is that a forger could make up a signature, compute encipher it with the targets public key and hope for a meaningful message. Very unlikely, but with enough computational power it could be done. The solution is to hash the message and then sign the hash (un-keyed hash). Second preimage resistant hash algorithms make forgery very difficult. The original message plus the hash could then be encrypted for even more security.

How to sign: Alice signs a message verifiable by all possessing Alice's public key


Compare with encipherment: Anyone sending Alice a secret message only Alice can read