DEV Community

Kyle Schneider
Kyle Schneider

Posted on

The (non-secure) MD5 Hash Algorithm

DISCLAIMER: This hash algorithm is not considered secure in 2023 and should never be implemented to protect any data and only be used as an exercise.

Notation & Terminology

  • "Bit" refers to the simple 1 or 0 quantity that computers deal in.
  • "Byte" refers to a collection of 8 bits.
  • "Word" refers to a collection of 32 bits (equivalently, 4 bytes). In later examples, long binary numbers will be written with one word per line.

For Boolean X, the symbol ~X refers to the opposite of X.

Recall the logical binary operations and, or, and xor:

P Q and or xor
0 0 0 0 0
1 0 0 1 1
0 1 0 1 1
1 1 1 1 0

For a binary number X, let X <<< s denote the binary number obtained by circularly shifting/rotating X to the left by s bit positions.
For example:
If X = 111011 and s = 3, then X <<< s = 011111.

Step 1: Append Padding Bits

The message will be extended so that it's length is 64 less than a multiple of 512, i.e., congruent to 448=512-64 modulo 512. There will always be padding applied, even in the case that the message is already congruent to 448 modulo 512.
To perform the padding, add one bit of value "1" to the end of the message followed by the required number of "0" bits such that the new length is congruent to 448 modulo 512. For a given message, at least one bit will be appended and at most 512 bits are appended.

For example, the message "kyle" in binary is

01101011 01111001 01101100 01100101 00001010
Enter fullscreen mode Exit fullscreen mode

which has 40 bits. Therefore, we append a single "1" followed by 407 "0" bits, giving a total of 40 + 1 + 407 = 448 = 512-64 bits:

01101011 01111001 01101100 01100101 
00001010 10000000 00000000 00000000
...
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
// 448 bits
Enter fullscreen mode Exit fullscreen mode

Step 2: Append Length

Append a 64-bit representation of the length of the original message to the result of Step 1. The binary of "kyle" was of length 40, which is 101000 in binary. Therefore we append the 64-bit extension of this number:

01101011 01111001 01101100 01100101 
00001010 10000000 00000000 00000000
...
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00101000
// 512 bits
Enter fullscreen mode Exit fullscreen mode

Now the number of bits in the message is a multiple of 512. Equivalently, it is comprised of a multiple of 16 (32-bit) words.

(In the unlikely case that the message has a length greater than 2^64 bits, simply take just the low-order bits of the length, i.e., lop the front off.)

From here, define M[i] to be the i-th word of the message thus far, for 0 ≤ i ≤ N-1, where N is a multiple of 16 such that 16 * N is the length of the message thus far. Hence M[0] is the first 32-bits, M[1] is the second 32-bits, and so on.

Step 3: Initialize MD Buffer

There are four words, A, B, C, and D, which are used to compute the message digest. These values are initialized to the following hexadecimal values:

  • A: 01 23 45 67
  • B: 89 ab cd ef
  • C: fe dc ba 98
  • D: 76 54 32 10

Step 4: Process Message in 16-Word Blocks

Define the following four functions which take three words as input and returns a single word:

  • F(X,Y,Z) = (X and Y) or (~X and Z)
  • G(X,Y,Z) = (X and Z) or (Y and ~Z)
  • H(X,Y,Z) = X xor Y xor Z
  • I(X,y,Z) = Y xor (X or ~Z)

Note that the functions apply the Boolean logic bit-wise.

Define

T[i]4294967296sin(i) for integer 1i64. T[i] \equiv \lfloor 4294967296 * |\sin(i)| \rfloor \textrm{ for integer } 1\leq i\leq64.

Now for the actual algorithm, directly from R. Rivest's documentation [1]:

Do the following:

/* Process each 16-word block. */
For i = 0 to N/16-1 do

  /* Copy block i into X. */
  For j = 0 to 15 do
    Set X[j] to M[i*16+j].
  end /* of loop on j */

  /* Save A as AA, B as BB, C as CC, and D as DD. */
  AA = A
  BB = B
  CC = C
  DD = D

  /* Round 1. */
  /* Let [abcd k s i] denote the operation
    a = b + ((a + F(b,c,d) + X[k] + T[i]) <<< s). */
  /* Do the following 16 operations. */
  [ABCD 0 7 1] [DABC 1 12 2] [CDAB 2 17 3] [BCDA 3 22 4]
  [ABCD 4 7 5] [DABC 5 12 6] [CDAB 6 17 7] [BCDA 7 22 8]
  [ABCD 8 7 9] [DABC 9 12 10] [CDAB 10 17 11] [BCDA 11 22 12]
  [ABCD 12 7 13] [DABC 13 12 14] [CDAB 14 17 15] [BCDA 15 22 16]

  /* Round 2. */
  /* Let [abcd k s i] denote the operation
    a = b + ((a + G(b,c,d) + X[k] + T[i]) <<< s). */
  /* Do the following 16 operations. */
  [ABCD 1 5 17] [DABC 6 9 18] [CDAB 11 14 19] [BCDA 0 20 20]
  [ABCD 5 5 21] [DABC 10 9 22] [CDAB 15 14 23] [BCDA 4 20 24]
  [ABCD 9 5 25] [DABC 14 9 26] [CDAB 3 14 27] [BCDA 8 20 28]
  [ABCD 13 5 29] [DABC 2 9 30] [CDAB 7 14 31] [BCDA 12 20 32]

  /* Round 3. */
  /* Let [abcd k s t] denote the operation
    a = b + ((a + H(b,c,d) + X[k] + T[i]) <<< s). */
  /* Do the following 16 operations. */
  [ABCD 5 4 33] [DABC 8 11 34] [CDAB 11 16 35] [BCDA 14 23 36]
  [ABCD 1 4 37] [DABC 4 11 38] [CDAB 7 16 39] [BCDA 10 23 40]
  [ABCD 13 4 41] [DABC 0 11 42] [CDAB 3 16 43] [BCDA 6 23 44]
  [ABCD 9 4 45] [DABC 12 11 46] [CDAB 15 16 47] [BCDA 2 23 48]

  /* Round 4. */
  /* Let [abcd k s t] denote the operation
    a = b + ((a + I(b,c,d) + X[k] + T[i]) <<< s). */
  /* Do the following 16 operations. */
  [ABCD 0 6 49] [DABC 7 10 50] [CDAB 14 15 51] [BCDA 5 21 52]
  [ABCD 12 6 53] [DABC 3 10 54] [CDAB 10 15 55] [BCDA 1 21 56]
  [ABCD 8 6 57] [DABC 15 10 58] [CDAB 6 15 59] [BCDA 13 21 60]
  [ABCD 4 6 61] [DABC 11 10 62] [CDAB 2 15 63] [BCDA 9 21 64]

  /* Then perform the following additions. (That is increment each
  of the four registers by the value it had before this block
  was started.) */
  A = A + AA
  B = B + BB
  C = C + CC
  D = D + DD

 end /* of loop on i */
Enter fullscreen mode Exit fullscreen mode

The diagram below may assist in understanding the process used in the [ABCD k s i] operation:

Image description

Step 5: Output

After completing the 4 rounds per 512-bit block of the message, the final output will begin with the low-order word A and end with the high-order word D. By concatenation, the final output would read DCBA, a 128-bit value.

Sensitivity to Initial Conditions

One of the important features of a hash algorithm is for two similar inputs to have very different outputs. For example, compare the MD5 hash of "kyle"

kyle: 4b75751e170e00f56886726c3f46eecd
Enter fullscreen mode Exit fullscreen mode

with the MD5 hash of "Kyle" (capital K)

Kyle: e8b579fe36f15209c6f167396a46b04e
Enter fullscreen mode Exit fullscreen mode

References:
[1] R. Rivest The MD5 Message-Digest Algorithm. MIT Laboratory for Computer Science and RSA Data Security, Inc. April 1992.

Top comments (0)