Why is a Hash Function Irreversible?

#hash #rust

Introduction

Hash functions are commonly used for verifying data integrity and cryptographic purposes. In this article, I will implement the widely-used SHA-256 hash function and examine why it is irreversible (i.e., why the original value cannot be restored).

The conclusion is that the irreversibility of hash functions is achieved through information loss.

The Rust implementation of SHA-256 for this article is available here:

https://github.com/akira-19/algorithms_rust/tree/main/sha-256

The Flow of SHA-256

The irreversibility of SHA-256 can be understood by analyzing its flow. Let’s hash the string "msg" as an example.

First, convert the string msg into its character codes (hexadecimal representation).
Next, format the message into a 64-byte block. Add 0x80 just after the original message, and append the binary length of the original message to the last 8 bytes of the block. Since msg is 3 bytes, its binary length is 24 bits, represented as 0x18 in hexadecimal. Pad the gap between 0x80 and the message length with zeros.
Split the 64-byte block into 4-byte segments. For example, the first segment 6d 73 67 80 becomes 6d736780 (in decimal: 1836279680).

Calculate the value W, which includes irreversible processing. A part of sample Rust code snippet is as follows.

 fn calc_w(msg: [u32; 16]) -> [u32; 64] {
   let mut w = vec![0; 64];

   for i in 0..16 {
       w[i] = msg[i];
   }

   for i in 16..64 {
       w[i] = lower_sigma_1(w[i - 2])
           .wrapping_add(w[i - 7])
           .wrapping_add(lower_sigma_0(w[i - 15]))
           .wrapping_add(w[i - 16]);
   }

   w
 }

 fn lower_sigma_1(x: u32) -> u32 {
   rotr(x, 17) ^ rotr(x, 19) ^ shr(x, 10)
 }

 fn rotr(x: u32, n: u32) -> u32 {
   (x >> n) | (x << (32 - n))
 }

 fn shr(x: u32, n: u32) -> u32 {
   x >> n
 }

The calc_w function calls the lower_sigma_1 function, which in turn calls rotr. Let’s visualize how this works.

As an example, let x = 31 and n = 3. In binary, 31 is 11111. Since x and n are unsigned 32-bit integers, x is represented as 000...000011111 (27 leading zeros).
The operation x >> n shifts x to the right by n bits, adding 000 to the left and removing the last 111.
Similarly, x << (32 - n) shifts x to the left by 32 - 3 = 29 bits. The first three bits become 111, followed by zeros.
Finally, the bitwise OR of steps 6 and 7 results in a value like 111000...0011.

The information loss occurs in the shr operation. In step 6, the bits shifted out to the right cannot be recovered.

Additionally, in lower_sigma_1, the operation rotr(x, 17) ^ rotr(x, 19) results in information loss. To simplify, this process involves XORing 10000 shifted right by 1 bit with 10000 shifted right by 2 bits, yielding 01000 ^ 00100 = 00000. This result could also originate from an original value of 00000, making the original value unrecoverable.

By repeating such operations, it becomes impossible to restore the original value.