DEV Community

OULD AMARA Amine
OULD AMARA Amine

Posted on

Hashing Algorithms and creating a simple file integrity monitor (FIM)

The CIA triad

which stands for : Confidentiality, Integrity and Availability. These are the three pillars of every security infrastructure and represent goals for security experts to ensure in their company. Here’s what each one means in simple terms :

  • Confidentiality is keeping the data confidential and not shown to people who are not supposed to see it. a simple example would be the data exchanged between a client and a server in an online store (passwords, credit card information, preferences ...)

  • Integrity is maintaining the consistency and trustworthiness of data, making sure it doesn’t change if it’s not supposed to and if it does, the user knows about it. This is what we will cover in this tutorial. We will build a simple FIM (File Integrity Monitor) using hashing algorithms to monitor data and keep tabs on changes made on it (writing) and implement a warning that is triggered when said changes happen so that the user may take the necessary precautions.

  • Availability is ensuring that systems remain online and available for those who need them.

Hashing Algorithms

Or a cryptographic hash function is an algorithm that takes an arbitrary amount of data input and produces a fixed-size output of enciphered text called a hash value, or just “hash.” That enciphered text can then be stored instead of the password itself, and later used to verify the user in the most basic cases.

  • Hashes are non-reversible. it is very hard to find the original password from the output or hash.
  • Diffusion, the slightest of changes to the input will produce an entirely different output, thus making it harder.
  • Determinism, a given input must always produce the same hash value
  • Collision resistance. It should be hard to find two different passwords that hash to the same enciphered text.
  • Non-predictable. The hash value should not be predictable from the input. There are many hashing algorithm, in this post, we will be using the sha256 hash function, which is still approved as a secure algorithm

FIM (File Integrity Monitor)

File Integrity Monitoring (FIM) is a security practice which consists of verifying the integrity of operating systems and application software files to determine if tampering or fraud has occurred by comparing them to a trusted "baseline." this is mainly done by using hashing algorithms.

Coding our basic FIM

In our application, the input will be the digital thumbprint of each file in the directory that we would like to monitor for changes, the outputted hashes will be stored in a file to be then later compared to a newly calculated hash; If they're equal, that means no changes have been made to the file, else there has been changes. We will also cover the cases where a file is deleted or a new file is created.
Here's a chart to help you understand the functioning of the scripts we are about to see
Image description
Now for the code, this step-by-step guide will be in bash (the Bourne Again SHell) which is a widely used shell scripting language for automating tasks, but you can also find the python or Powershell version on the github page

#User input 
echo -ne "would you like to\n   1) Collect a new .baseline\nOr\n    2) Proceed with the previously recorded one\n   [ 1 | 2 ] ? "
read ans
Enter fullscreen mode Exit fullscreen mode

get the user's input, easy enough, right?

function calculate_file_hash(){
    filehash=$(sha256sum $1 | cut -d ' ' -f 1)
    filepath=$1
    path_and_hash=$filepath"|"$filehash
    echo $path_and_hash
}
Enter fullscreen mode Exit fullscreen mode

here we created a function that calculates the file hash for the specified file directory in function call argument

First case scenario, Collecting the baseline

if [ "$ans" = "1" ];then
    if [[ -f ".baseline.txt" ]]; then 
        rm .baseline.txt
        >.baseline.txt 
        #hidden file starts with a . (in linux based systems) 
    else
        >.baseline.txt 
    fi


#filling in the .baseline.txt file with filepath|filehash pairs
    for entry in "$monitoring_dir"/*
    do
        res=$(calculate_file_hash "$entry")
        echo $res >> .baseline.txt
    done
Enter fullscreen mode Exit fullscreen mode

in this part, the user decided to collect a new baseline, the old one will be deleted if it exists and we will store the file_path|file_hash pairs in the newly created baseline.txt file using the calculate_file_hash function

else
    declare -A path_hash_dict
    #creating a dictionary with filepath as key and filehash as value
    lines=$(cat .baseline.txt)
    for line in $lines 
    do
        path=$( echo "$line" | cut -d '|' -f1 )
        hash=$( echo "$line" | cut -d '|' -f2-)
        path_hash_dict[$path]=$hash
    done 
Enter fullscreen mode Exit fullscreen mode

Second case scenario, user wants to start monitoring the files, first we create a dictionary where each key is the file path and the value for this key is the file's hash, this is done for easy access to the data stored in the baseline.txt file

while true
    do
        sleep 1
        #checking if a file has been deleted 
        for key in "${!path_hash_dict[@]}"; do
            if [ ! -f "$key" ]; then
                echo -e "A file has been REMOVED ! FILE NAME :$key" 
            fi
        done


        for file in "$monitoring_dir"/*
        do
            hash=$(sha256sum $file | cut -d ' ' -f 1)
            if [ ! -v path_hash_dict[$file] ]; then
                echo -e "A file has been CREATED ! FILE NAME : $key"
            else
                if [ "$hash" = "${path_hash_dict[$file]}" ]; then
                   continue
                elif [ "$hash" != "${path_hash_dict[$file]}" ]; then
                    echo -e "A file has been CHANGED ! FILE NAME : $key"
                    ls -la $key

                fi
            fi
        done

    done

fi

Enter fullscreen mode Exit fullscreen mode

Let the monitoring start ! In this infinite while loop, if a key in our dictionary doesn't correspond to a file's name in the monitored directory, it means it has been deleted
If a file's name is not among the keys in our dictionary, it means a new file has been created in the monitored directory
Lastly, we calculate the hash of each file and compare it to the hash stored in the dictionary, if they're different, this means the file has been modified.


Find the a more complete version of this script on Github. You can also find the python and Powershell versions there.

Credit where credit's due,

  • This post was inspired by Josh Madakor's youtube video, check out his youtube channel for cyber security related content
  • Some lines from this article about cryptographic hash functions

Top comments (0)