Hafiz Shamnad

Posted on Mar 7

Day 19 — How I Built a File Integrity Monitor in Python to Detect File Tampering

#python #linux #cybersecurity #devops

What if one of your critical system files changed right now?

Would you notice?

Attackers rarely need to install complex malware. Often they simply modify existing files — a web server script, a cron job, or a configuration file.

That tiny change can be the difference between a secure system and a persistent backdoor.

This is exactly the problem File Integrity Monitoring solves.

Introduction

Every system administrator or security engineer has faced the same unsettling question at some point: has this file been tampered with? Whether it's a config file silently modified by malware, a binary swapped out during a supply-chain attack, or a web asset defaced by an intruder — unauthorized file changes are one of the most common indicators of compromise.

FIM (File Integrity Monitor) is a lightweight, terminal-native Python tool that answers that question definitively. It uses cryptographic hashing to create a trusted snapshot of your files and then compares future states against that snapshot — flagging any modifications, deletions, or suspicious new additions.

This post walks through how FIM works, how to use it, what features it offers, and the security principles that power it under the hood.

The Security Problem: Why File Integrity Monitoring Matters

Before diving into the code, it's worth understanding why this kind of tool exists in the first place.

Detecting Unauthorized Changes

Attackers who compromise a system often need to modify files to maintain persistence — think webshells dropped into /var/www, backdoors inserted into system binaries, or cron jobs quietly added to /etc/cron.d. A file integrity monitor acts as a silent watchdog. It doesn't prevent the change, but it makes the change impossible to hide.

Compliance Requirements

Many security frameworks mandate file integrity monitoring as a control. PCI-DSS (Requirement 11.5), HIPAA, and NIST 800-53 all reference it as a critical baseline measure. Tools like Tripwire and AIDE have been the enterprise standard for years, but they're heavy, complex to configure, and overkill for a developer's machine, a small server, or a CTF lab environment. FIM fills that gap.

Forensic Baseline

In incident response, one of the first things investigators need is a clean baseline — what did the system look like before the incident? A pre-breach FIM snapshot is invaluable for answering that question quickly.

Architecture Overview

FIM follows a simple pipeline:

Directory → File Scanner → Hash Generator → Baseline Database → Integrity Checker

Files are recursively scanned
Cryptographic hashes are generated
Baseline fingerprints are stored
Future scans compare current hashes against the baseline

5. Differences are reported

How It Works

FIM operates on a simple but cryptographically sound two-phase model.

Phase 1 — Baseline Creation

When you run FIM with the --init flag, it walks the target directory recursively, computes a cryptographic hash of every file, and stores the results in a baseline.json file. This JSON file is your trusted ground truth — a fingerprint of the directory at a known-good point in time.

Each entry in the baseline captures:

The file path — the full path to the file
The hash — a fixed-length digest derived from the file's contents
The algorithm — SHA-256 by default, but configurable
File metadata — size, last-modified timestamp, and permissions

Phase 2 — Integrity Scanning

When you run FIM with the --scan flag, it recomputes hashes of all current files and compares them against the stored baseline. It then reports three categories of change:

Modified — the file still exists, but its hash no longer matches
Deleted — the file was in the baseline but is no longer present on disk
New — the file exists on disk but was not in the baseline

This three-way classification is important. A legitimate software update might add new files; that's different from a file being secretly modified. FIM gives you the full picture.

Cryptographic Hashing: The Engine Behind FIM

At the core of file integrity monitoring is a single idea: a cryptographic hash function produces a deterministic, fixed-length "fingerprint" of any input. Change even a single bit of the input, and the output changes completely and unpredictably. This property — called the avalanche effect — is what makes hashing ideal for tamper detection.

FIM supports three hashing algorithms, each with different trade-offs:

SHA-256 (Default)

SHA-256 is part of the SHA-2 family, standardized by NIST. It produces a 256-bit (64-character hex) digest and is the gold standard for general-purpose integrity checking. It's used in TLS certificates, Git commit IDs, and Bitcoin. For most use cases, SHA-256 is the right choice — fast, widely supported, and collision-resistant for all practical purposes.

MD5

MD5 is fast and produces a compact 128-bit digest. However, it is cryptographically broken — researchers have demonstrated practical collision attacks, meaning two different files can produce the same MD5 hash. FIM includes MD5 for legacy compatibility and speed in low-risk contexts, but it should not be relied upon in security-sensitive scenarios. Use SHA-256 or BLAKE2 instead.

BLAKE2b

BLAKE2 is a modern hashing algorithm designed to be faster than SHA-256 while maintaining strong security guarantees. It's the recommended choice when you need high throughput — scanning large directories with many big files, for example. BLAKE2b is used in WireGuard, libsodium, and numerous modern cryptographic systems.

Summary:

Algorithm	Digest Size	Speed	Security	Best For
SHA-256	256-bit	Fast	Strong	General use (default)
MD5	128-bit	Fastest	Broken	Legacy / low-stakes
BLAKE2b	512-bit	Fastest	Very Strong	Large-scale scanning

Code explanation

1. Shebang and Header

#!/usr/bin/env python3

This tells Linux/Unix systems to run the script using Python 3 from the system environment.

The large ASCII block is simply branding and documentation for the tool:

FIM - File Integrity Monitor v2.0
Developed by Hafiz Shamnad
SHA-256 | MD5 | Blake2 Hashing Security

It has no functional effect, but improves CLI UX.

2. Importing Required Libraries

import os
import sys
import hashlib
import json
import argparse
import time
import datetime
import platform
import stat
from pathlib import Path

These are all standard Python libraries.

What each one does:

Library	Purpose
`os`	filesystem traversal
`sys`	Python runtime info
`hashlib`	cryptographic hashing
`json`	storing baseline database
`argparse`	command-line arguments
`time`	delays in watch mode
`datetime`	timestamps
`platform`	system info for banner
`stat`	file permission extraction
`pathlib`	path handling

No external dependencies means the tool runs on any Python installation.

3. ANSI Color Class (Terminal Styling)

class C:

This class defines ANSI escape codes used for terminal colors.

Example:

RED = "\033[31m"
GREEN = "\033[32m"
RESET = "\033[0m"

These allow output like:

[ERROR] printed in red
[SUCCESS] printed in green

Purpose:

improves CLI readability
highlights security events

For example:

MODIFIED files → yellow
DELETED files → red
NEW files → blue

4. Global Configuration

BASELINE_FILE = "baseline.json"
REPORT_FILE   = "fim_report.txt"

These define:

where the baseline database is stored
where exported reports are written

The baseline file contains trusted fingerprints of files.

5. Banner Printing

def print_banner():

This function prints the startup banner with:

tool name
developer name
hashing algorithms
system OS
Python version

Example output:

File Integrity Monitor v2.0
Developed by Hafiz Shamnad
Linux · Python 3.11

It uses the color class to make the UI look professional.

6. Helper Functions

Several small utilities simplify the code.

tag()

def tag(label, color, text):

Creates labeled messages like:

[ERROR] Baseline not found

section()

Prints section headers such as:

──────────────
CREATE BASELINE
──────────────

progress_bar()

Displays scanning progress:

████████████░░░░ 65%

This improves UX when hashing many files.

human_size()

Converts bytes into readable sizes.

Example:

2048 → 2 KB
1048576 → 1 MB

Useful when showing file metadata.

file_meta()

def file_meta(path):

Collects metadata about files:

file size
last modification time
permissions

Example returned structure:

{
  "size": 1024,
  "modified": "2026-03-07 12:33:11",
  "permissions": "0o644"
}

This metadata is stored in the baseline.

7. Hashing Engine

ALGORITHMS = {
    "sha256": lambda: hashlib.sha256(),
    "md5": lambda: hashlib.md5(),
    "blake2b": lambda: hashlib.blake2b(),
}

This dictionary allows dynamic selection of hashing algorithms.

Users can run:

--algo sha256
--algo md5
--algo blake2b

Hashing a File

def hash_file(path, algorithm="sha256"):

Steps:

Create hash object
Open file in binary mode
Read file in chunks
Update hash

while chunk := f.read(8192):
    h.update(chunk)

Reading chunks prevents memory overload for large files.

Finally:

return h.hexdigest()

This returns the final cryptographic fingerprint.

Example:

9f86d081884c7d659a2feaa0c55ad015...

8. Directory Scanning Engine

def scan_directory(directory, algorithm="sha256", extensions=None)

This function performs the core filesystem analysis.

Steps:

1. Walk through the directory

for root, _, files in os.walk(directory):

os.walk() recursively traverses folders.

2. Filter by extension

If the user specifies:

--ext .php .html

only those files are scanned.

3. Hash each file

file_hash = hash_file(path, algorithm)

4. Store results

Each file is saved as:

results[path] = {
    "hash": file_hash,
    "algorithm": algorithm,
    "size": size,
    "modified": time,
    "permissions": perms
}

This becomes part of the baseline database.

9. Creating the Baseline

def create_baseline(directory)

This function performs Phase 1 of file integrity monitoring.

Steps:

Scan the directory
Generate hashes
Collect metadata
Store results

The baseline structure looks like:

{
 "_meta": {
   "created": "2026-03-07",
   "algorithm": "sha256",
   "file_count": 120
 },
 "files": {
   "/var/www/index.php": {
      "hash": "...",
      "size": 1200
   }
 }
}

Finally it writes this to:

baseline.json

This file becomes the trusted snapshot.

10. Integrity Checking

def check_integrity(directory)

This is Phase 2.

The function:

Loads baseline.json
Scans current files
Compares hashes

Three outcomes are detected.

Modified files

Hash mismatch.

[MODIFIED] config.php

Deleted files

File existed before but is missing now.

[DELETED] login.php

New files

File exists but not in baseline.

[NEW] backdoor.php

11. Exporting Reports

def _export_report()

Creates a forensic report:

fim_report_20260307_134512.txt

Report includes:

modified files
deleted files
new files
timestamps

Useful for incident response documentation.

12. File Verification

def verify_file(filepath)

Allows checking a single file.

Example:

python fim.py /etc --verify /etc/passwd

The tool:

hashes the file
compares it to baseline
prints result

Example output:

✔ Hash matches baseline

✘ Hash differs from baseline

13. Watch Mode (Continuous Monitoring)

def watch_mode(directory, interval=30)

This runs integrity checks repeatedly.

Example:

python fim.py /var/www --watch

Workflow:

Scan #1
Scan #2
Scan #3

If a file changes:

2 issues detected!

This approximates real-time monitoring.

Steps:

Print banner
Parse CLI arguments
Call the appropriate function

Example flow:

User command
      ↓
argparse parses flags
      ↓
init / scan / watch / verify
      ↓
FIM executes requested operation

Installation

FIM has no external dependencies — it runs on Python 3.8+ using only the standard library.

# Clone or download fim.py, then run directly:
python fim.py --help

That's it. No pip install, no virtual environment, no configuration files.

Usage Guide

Creating a Baseline

Before FIM can detect anything, you need to establish a trusted baseline of your directory. Run this once on a known-clean system:

python fim.py --init /var/www/html

This scans the directory, hashes every file with SHA-256, and saves the results to baseline.json.

With a different hashing algorithm:

python fim.py --init /etc --algo blake2b

Filtering by file type:

python fim.py --init /var/www --ext .php .html .js .py

Filtering by extension is useful when you only care about specific file types — for example, monitoring only PHP and HTML files in a web root, ignoring uploaded media assets.

Scanning for Changes

Once a baseline exists, run a scan at any time to detect changes:

python fim.py --scan /var/www/html

FIM will compare current file hashes against the baseline and print a color-coded report. Modified files appear in yellow, deleted files in red, and new files in blue.

Export the scan results to a file:

python fim.py --scan /var/www/html --export

This generates a timestamped report (e.g., fim_report_20250307_143022.txt) useful for auditing or incident documentation.

Verifying a Single File

Sometimes you just need to quickly check one specific file:

python fim.py --verify /etc/passwd

FIM hashes the file, displays its metadata, and checks whether the hash matches the baseline — all in one shot.

Viewing Baseline Info

To inspect the metadata of an existing baseline without running a full scan:

python fim.py --info .

This shows when the baseline was created, how many files are indexed, the total size, and which algorithm was used.

Continuous Watch Mode

For active monitoring, FIM can run in a loop — rescanning on a configurable interval and alerting you immediately when something changes:

python fim.py --watch /var/www --interval 60

This rescans every 60 seconds and prints a timestamped status line. A clean scan shows green; any detected changes show red with a count of issues. Press Ctrl+C to stop.

Watch mode is particularly useful when you're doing live incident response and want real-time awareness of filesystem changes.

Feature Breakdown

Feature	Flag	Description
Create baseline	`--init`	Hash all files and save as trusted snapshot
Scan for changes	`--scan`	Compare current state against baseline
Single file verify	`--verify FILE`	Hash and check one file immediately
Baseline info	`--info`	Show metadata about the saved baseline
Continuous watch	`--watch`	Real-time monitoring loop
Watch interval	`--interval N`	Set rescan interval in seconds (default: 30)
Algorithm choice	`--algo`	Choose sha256 / md5 / blake2b
Extension filter	`--ext`	Only scan specific file extensions
Export report	`--export`	Save results to a timestamped `.txt` file

Practical Use Cases

Web server protection — Set up a baseline of /var/www and run a watch scan. Any webshell dropped by an attacker will appear instantly as a new file.

Config file auditing — Monitor /etc to catch unauthorized changes to SSH configs, sudoers, cron jobs, and PAM modules.

Software supply chain checks — Before deploying a software update, scan the package against a known-good baseline to verify no files were tampered with in transit.

Post-incident forensics — Run a scan against a pre-breach baseline to identify exactly which files were touched during an intrusion.

CTF challenges — In competition environments, FIM is a quick way to track what a binary or script is modifying on the filesystem.

Example Scan Output

Limitations and Honest Caveats

FIM is a practical tool, but it's not a replacement for enterprise solutions in high-stakes production environments. File integrity monitoring is most powerful when combined with logging, intrusion detection systems, and strict access controls.

The baseline.json file is only as trustworthy as the system it was created on. If an attacker already has access when you run --init, or if they subsequently modify both the target files and the baseline, FIM cannot detect the deception. In hardened environments, the baseline should be stored on read-only media, a separate host, or protected with access controls.

FIM also doesn't monitor in real time at the kernel level the way auditd or inotify-based tools do — it's snapshot-based. The --watch mode approximates real-time monitoring through polling, which introduces a detection window equal to the scan interval.

Building a file integrity monitor teaches several important security concepts:

Cryptographic hashing for tamper detection
Filesystem traversal and metadata collection
Security baselining techniques
Incident detection logic
CLI security tooling in Python

Conclusion

FIM demonstrates that powerful security tooling doesn't require heavyweight frameworks or complex configuration. At its heart, it's a practical application of one idea: cryptographic hashes make file tampering detectable. By combining that principle with a clean terminal UI, multi-algorithm support, and a range of monitoring modes, FIM becomes a genuinely useful tool for developers, sysadmins, and security practitioners who want visibility into their filesystems without the overhead of an enterprise solution. Security often starts with visibility.
If you don't know when your files change, you don't know when your system is compromised.

The full source code is a single self-contained Python file. Drop it on any system with Python 3.8+ and you're ready to go.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.