DEV Community

Cover image for Day 20 — How I Built a File Integrity Monitor in Python to Detect File Tampering
Hafiz Shamnad
Hafiz Shamnad

Posted on

Day 20 — How I Built a File Integrity Monitor in Python to Detect File Tampering

What if one of your critical system files changed right now?

Would you notice?

Attackers rarely need to install complex malware. Often they simply modify existing files — a web server script, a cron job, or a configuration file.

That tiny change can be the difference between a secure system and a persistent backdoor.

This is exactly the problem File Integrity Monitoring solves.

Introduction

Every system administrator or security engineer has faced the same unsettling question at some point: has this file been tampered with? Whether it's a config file silently modified by malware, a binary swapped out during a supply-chain attack, or a web asset defaced by an intruder — unauthorized file changes are one of the most common indicators of compromise.

FIM (File Integrity Monitor) is a lightweight, terminal-native Python tool that answers that question definitively. It uses cryptographic hashing to create a trusted snapshot of your files and then compares future states against that snapshot — flagging any modifications, deletions, or suspicious new additions.

This post walks through how FIM works, how to use it, what features it offers, and the security principles that power it under the hood.


The Security Problem: Why File Integrity Monitoring Matters

Before diving into the code, it's worth understanding why this kind of tool exists in the first place.

Detecting Unauthorized Changes

Attackers who compromise a system often need to modify files to maintain persistence — think webshells dropped into /var/www, backdoors inserted into system binaries, or cron jobs quietly added to /etc/cron.d. A file integrity monitor acts as a silent watchdog. It doesn't prevent the change, but it makes the change impossible to hide.

Compliance Requirements

Many security frameworks mandate file integrity monitoring as a control. PCI-DSS (Requirement 11.5), HIPAA, and NIST 800-53 all reference it as a critical baseline measure. Tools like Tripwire and AIDE have been the enterprise standard for years, but they're heavy, complex to configure, and overkill for a developer's machine, a small server, or a CTF lab environment. FIM fills that gap.

Forensic Baseline

In incident response, one of the first things investigators need is a clean baseline — what did the system look like before the incident? A pre-breach FIM snapshot is invaluable for answering that question quickly.


Architecture Overview

FIM follows a simple pipeline:

Directory → File Scanner → Hash Generator → Baseline Database → Integrity Checker

  1. Files are recursively scanned
  2. Cryptographic hashes are generated
  3. Baseline fingerprints are stored
  4. Future scans compare current hashes against the baseline

5. Differences are reported

How It Works

FIM operates on a simple but cryptographically sound two-phase model.

Phase 1 — Baseline Creation

When you run FIM with the --init flag, it walks the target directory recursively, computes a cryptographic hash of every file, and stores the results in a baseline.json file. This JSON file is your trusted ground truth — a fingerprint of the directory at a known-good point in time.

Each entry in the baseline captures:

  • The file path — the full path to the file
  • The hash — a fixed-length digest derived from the file's contents
  • The algorithm — SHA-256 by default, but configurable
  • File metadata — size, last-modified timestamp, and permissions

Phase 2 — Integrity Scanning

When you run FIM with the --scan flag, it recomputes hashes of all current files and compares them against the stored baseline. It then reports three categories of change:

  • Modified — the file still exists, but its hash no longer matches
  • Deleted — the file was in the baseline but is no longer present on disk
  • New — the file exists on disk but was not in the baseline

This three-way classification is important. A legitimate software update might add new files; that's different from a file being secretly modified. FIM gives you the full picture.


Cryptographic Hashing: The Engine Behind FIM

At the core of file integrity monitoring is a single idea: a cryptographic hash function produces a deterministic, fixed-length "fingerprint" of any input. Change even a single bit of the input, and the output changes completely and unpredictably. This property — called the avalanche effect — is what makes hashing ideal for tamper detection.

FIM supports three hashing algorithms, each with different trade-offs:

SHA-256 (Default)

SHA-256 is part of the SHA-2 family, standardized by NIST. It produces a 256-bit (64-character hex) digest and is the gold standard for general-purpose integrity checking. It's used in TLS certificates, Git commit IDs, and Bitcoin. For most use cases, SHA-256 is the right choice — fast, widely supported, and collision-resistant for all practical purposes.

MD5

MD5 is fast and produces a compact 128-bit digest. However, it is cryptographically broken — researchers have demonstrated practical collision attacks, meaning two different files can produce the same MD5 hash. FIM includes MD5 for legacy compatibility and speed in low-risk contexts, but it should not be relied upon in security-sensitive scenarios. Use SHA-256 or BLAKE2 instead.

BLAKE2b

BLAKE2 is a modern hashing algorithm designed to be faster than SHA-256 while maintaining strong security guarantees. It's the recommended choice when you need high throughput — scanning large directories with many big files, for example. BLAKE2b is used in WireGuard, libsodium, and numerous modern cryptographic systems.

Summary:

Algorithm Digest Size Speed Security Best For
SHA-256 256-bit Fast Strong General use (default)
MD5 128-bit Fastest Broken Legacy / low-stakes
BLAKE2b 512-bit Fastest Very Strong Large-scale scanning

Code explanation

1. Shebang and Header

#!/usr/bin/env python3
Enter fullscreen mode Exit fullscreen mode

This tells Linux/Unix systems to run the script using Python 3 from the system environment.

The large ASCII block is simply branding and documentation for the tool:

FIM - File Integrity Monitor v2.0
Developed by Hafiz Shamnad
SHA-256 | MD5 | Blake2 Hashing Security
Enter fullscreen mode Exit fullscreen mode

It has no functional effect, but improves CLI UX.


2. Importing Required Libraries

import os
import sys
import hashlib
import json
import argparse
import time
import datetime
import platform
import stat
from pathlib import Path
Enter fullscreen mode Exit fullscreen mode

These are all standard Python libraries.

What each one does:

Library Purpose
os filesystem traversal
sys Python runtime info
hashlib cryptographic hashing
json storing baseline database
argparse command-line arguments
time delays in watch mode
datetime timestamps
platform system info for banner
stat file permission extraction
pathlib path handling

No external dependencies means the tool runs on any Python installation.


3. ANSI Color Class (Terminal Styling)

class C:
Enter fullscreen mode Exit fullscreen mode

This class defines ANSI escape codes used for terminal colors.

Example:

RED = "\033[31m"
GREEN = "\033[32m"
RESET = "\033[0m"
Enter fullscreen mode Exit fullscreen mode

These allow output like:

[ERROR] printed in red
[SUCCESS] printed in green
Enter fullscreen mode Exit fullscreen mode

Purpose:

  • improves CLI readability
  • highlights security events

For example:

MODIFIED files → yellow
DELETED files → red
NEW files → blue
Enter fullscreen mode Exit fullscreen mode

4. Global Configuration

BASELINE_FILE = "baseline.json"
REPORT_FILE   = "fim_report.txt"
Enter fullscreen mode Exit fullscreen mode

These define:

  • where the baseline database is stored
  • where exported reports are written

The baseline file contains trusted fingerprints of files.


5. Banner Printing

def print_banner():
Enter fullscreen mode Exit fullscreen mode

This function prints the startup banner with:

  • tool name
  • developer name
  • hashing algorithms
  • system OS
  • Python version

Example output:

File Integrity Monitor v2.0
Developed by Hafiz Shamnad
Linux · Python 3.11
Enter fullscreen mode Exit fullscreen mode

It uses the color class to make the UI look professional.


6. Helper Functions

Several small utilities simplify the code.

tag()

def tag(label, color, text):
Enter fullscreen mode Exit fullscreen mode

Creates labeled messages like:

[ERROR] Baseline not found
Enter fullscreen mode Exit fullscreen mode

section()

Prints section headers such as:

──────────────
CREATE BASELINE
──────────────
Enter fullscreen mode Exit fullscreen mode

progress_bar()

Displays scanning progress:

████████████░░░░ 65%
Enter fullscreen mode Exit fullscreen mode

This improves UX when hashing many files.


human_size()

Converts bytes into readable sizes.

Example:

2048 → 2 KB
1048576 → 1 MB
Enter fullscreen mode Exit fullscreen mode

Useful when showing file metadata.


file_meta()

def file_meta(path):
Enter fullscreen mode Exit fullscreen mode

Collects metadata about files:

  • file size
  • last modification time
  • permissions

Example returned structure:

{
  "size": 1024,
  "modified": "2026-03-07 12:33:11",
  "permissions": "0o644"
}
Enter fullscreen mode Exit fullscreen mode

This metadata is stored in the baseline.


7. Hashing Engine

ALGORITHMS = {
    "sha256": lambda: hashlib.sha256(),
    "md5": lambda: hashlib.md5(),
    "blake2b": lambda: hashlib.blake2b(),
}
Enter fullscreen mode Exit fullscreen mode

This dictionary allows dynamic selection of hashing algorithms.

Users can run:

--algo sha256
--algo md5
--algo blake2b
Enter fullscreen mode Exit fullscreen mode

Hashing a File

def hash_file(path, algorithm="sha256"):
Enter fullscreen mode Exit fullscreen mode

Steps:

  1. Create hash object
  2. Open file in binary mode
  3. Read file in chunks
  4. Update hash
while chunk := f.read(8192):
    h.update(chunk)
Enter fullscreen mode Exit fullscreen mode

Reading chunks prevents memory overload for large files.

Finally:

return h.hexdigest()
Enter fullscreen mode Exit fullscreen mode

This returns the final cryptographic fingerprint.

Example:

9f86d081884c7d659a2feaa0c55ad015...
Enter fullscreen mode Exit fullscreen mode

8. Directory Scanning Engine

def scan_directory(directory, algorithm="sha256", extensions=None)
Enter fullscreen mode Exit fullscreen mode

This function performs the core filesystem analysis.

Steps:

1. Walk through the directory

for root, _, files in os.walk(directory):
Enter fullscreen mode Exit fullscreen mode

os.walk() recursively traverses folders.


2. Filter by extension

If the user specifies:

--ext .php .html
Enter fullscreen mode Exit fullscreen mode

only those files are scanned.


3. Hash each file

file_hash = hash_file(path, algorithm)
Enter fullscreen mode Exit fullscreen mode

4. Store results

Each file is saved as:

results[path] = {
    "hash": file_hash,
    "algorithm": algorithm,
    "size": size,
    "modified": time,
    "permissions": perms
}
Enter fullscreen mode Exit fullscreen mode

This becomes part of the baseline database.


9. Creating the Baseline

def create_baseline(directory)
Enter fullscreen mode Exit fullscreen mode

This function performs Phase 1 of file integrity monitoring.

Steps:

  1. Scan the directory
  2. Generate hashes
  3. Collect metadata
  4. Store results

The baseline structure looks like:

{
 "_meta": {
   "created": "2026-03-07",
   "algorithm": "sha256",
   "file_count": 120
 },
 "files": {
   "/var/www/index.php": {
      "hash": "...",
      "size": 1200
   }
 }
}
Enter fullscreen mode Exit fullscreen mode

Finally it writes this to:

baseline.json
Enter fullscreen mode Exit fullscreen mode

This file becomes the trusted snapshot.


10. Integrity Checking

def check_integrity(directory)
Enter fullscreen mode Exit fullscreen mode

This is Phase 2.

The function:

  1. Loads baseline.json
  2. Scans current files
  3. Compares hashes

Three outcomes are detected.

Modified files

Hash mismatch.

[MODIFIED] config.php
Enter fullscreen mode Exit fullscreen mode

Deleted files

File existed before but is missing now.

[DELETED] login.php
Enter fullscreen mode Exit fullscreen mode

New files

File exists but not in baseline.

[NEW] backdoor.php
Enter fullscreen mode Exit fullscreen mode

11. Exporting Reports

def _export_report()
Enter fullscreen mode Exit fullscreen mode

Creates a forensic report:

fim_report_20260307_134512.txt
Enter fullscreen mode Exit fullscreen mode

Report includes:

  • modified files
  • deleted files
  • new files
  • timestamps

Useful for incident response documentation.


12. File Verification

def verify_file(filepath)
Enter fullscreen mode Exit fullscreen mode

Allows checking a single file.

Example:

python fim.py /etc --verify /etc/passwd
Enter fullscreen mode Exit fullscreen mode

The tool:

  1. hashes the file
  2. compares it to baseline
  3. prints result

Example output:

✔ Hash matches baseline
Enter fullscreen mode Exit fullscreen mode

or

✘ Hash differs from baseline
Enter fullscreen mode Exit fullscreen mode

13. Watch Mode (Continuous Monitoring)

def watch_mode(directory, interval=30)
Enter fullscreen mode Exit fullscreen mode

This runs integrity checks repeatedly.

Example:

python fim.py /var/www --watch
Enter fullscreen mode Exit fullscreen mode

Workflow:

Scan #1
Scan #2
Scan #3
Enter fullscreen mode Exit fullscreen mode

If a file changes:

2 issues detected!
Enter fullscreen mode Exit fullscreen mode

This approximates real-time monitoring.


Steps:

  1. Print banner
  2. Parse CLI arguments
  3. Call the appropriate function

Example flow:

User command
      ↓
argparse parses flags
      ↓
init / scan / watch / verify
      ↓
FIM executes requested operation
Enter fullscreen mode Exit fullscreen mode

Installation

FIM has no external dependencies — it runs on Python 3.8+ using only the standard library.

# Clone or download fim.py, then run directly:
python fim.py --help
Enter fullscreen mode Exit fullscreen mode

That's it. No pip install, no virtual environment, no configuration files.


Usage Guide

Creating a Baseline

Before FIM can detect anything, you need to establish a trusted baseline of your directory. Run this once on a known-clean system:

python fim.py --init /var/www/html
Enter fullscreen mode Exit fullscreen mode

This scans the directory, hashes every file with SHA-256, and saves the results to baseline.json.

With a different hashing algorithm:

python fim.py --init /etc --algo blake2b
Enter fullscreen mode Exit fullscreen mode

Filtering by file type:

python fim.py --init /var/www --ext .php .html .js .py
Enter fullscreen mode Exit fullscreen mode

Filtering by extension is useful when you only care about specific file types — for example, monitoring only PHP and HTML files in a web root, ignoring uploaded media assets.


Scanning for Changes

Once a baseline exists, run a scan at any time to detect changes:

python fim.py --scan /var/www/html
Enter fullscreen mode Exit fullscreen mode

FIM will compare current file hashes against the baseline and print a color-coded report. Modified files appear in yellow, deleted files in red, and new files in blue.

Export the scan results to a file:

python fim.py --scan /var/www/html --export
Enter fullscreen mode Exit fullscreen mode

This generates a timestamped report (e.g., fim_report_20250307_143022.txt) useful for auditing or incident documentation.


Verifying a Single File

Sometimes you just need to quickly check one specific file:

python fim.py --verify /etc/passwd
Enter fullscreen mode Exit fullscreen mode

FIM hashes the file, displays its metadata, and checks whether the hash matches the baseline — all in one shot.


Viewing Baseline Info

To inspect the metadata of an existing baseline without running a full scan:

python fim.py --info .
Enter fullscreen mode Exit fullscreen mode

This shows when the baseline was created, how many files are indexed, the total size, and which algorithm was used.


Continuous Watch Mode

For active monitoring, FIM can run in a loop — rescanning on a configurable interval and alerting you immediately when something changes:

python fim.py --watch /var/www --interval 60
Enter fullscreen mode Exit fullscreen mode

This rescans every 60 seconds and prints a timestamped status line. A clean scan shows green; any detected changes show red with a count of issues. Press Ctrl+C to stop.

Watch mode is particularly useful when you're doing live incident response and want real-time awareness of filesystem changes.


Feature Breakdown

Feature Flag Description
Create baseline --init Hash all files and save as trusted snapshot
Scan for changes --scan Compare current state against baseline
Single file verify --verify FILE Hash and check one file immediately
Baseline info --info Show metadata about the saved baseline
Continuous watch --watch Real-time monitoring loop
Watch interval --interval N Set rescan interval in seconds (default: 30)
Algorithm choice --algo Choose sha256 / md5 / blake2b
Extension filter --ext Only scan specific file extensions
Export report --export Save results to a timestamped .txt file

Practical Use Cases

Web server protection — Set up a baseline of /var/www and run a watch scan. Any webshell dropped by an attacker will appear instantly as a new file.

Config file auditing — Monitor /etc to catch unauthorized changes to SSH configs, sudoers, cron jobs, and PAM modules.

Software supply chain checks — Before deploying a software update, scan the package against a known-good baseline to verify no files were tampered with in transit.

Post-incident forensics — Run a scan against a pre-breach baseline to identify exactly which files were touched during an intrusion.

CTF challenges — In competition environments, FIM is a quick way to track what a binary or script is modifying on the filesystem.


Example Scan Output





Limitations and Honest Caveats

FIM is a practical tool, but it's not a replacement for enterprise solutions in high-stakes production environments. File integrity monitoring is most powerful when combined with logging, intrusion detection systems, and strict access controls.

The baseline.json file is only as trustworthy as the system it was created on. If an attacker already has access when you run --init, or if they subsequently modify both the target files and the baseline, FIM cannot detect the deception. In hardened environments, the baseline should be stored on read-only media, a separate host, or protected with access controls.

FIM also doesn't monitor in real time at the kernel level the way auditd or inotify-based tools do — it's snapshot-based. The --watch mode approximates real-time monitoring through polling, which introduces a detection window equal to the scan interval.


Building a file integrity monitor teaches several important security concepts:

  • Cryptographic hashing for tamper detection
  • Filesystem traversal and metadata collection
  • Security baselining techniques
  • Incident detection logic
  • CLI security tooling in Python

Conclusion

FIM demonstrates that powerful security tooling doesn't require heavyweight frameworks or complex configuration. At its heart, it's a practical application of one idea: cryptographic hashes make file tampering detectable. By combining that principle with a clean terminal UI, multi-algorithm support, and a range of monitoring modes, FIM becomes a genuinely useful tool for developers, sysadmins, and security practitioners who want visibility into their filesystems without the overhead of an enterprise solution. Security often starts with visibility.
If you don't know when your files change, you don't know when your system is compromised.

The full source code is a single self-contained Python file. Drop it on any system with Python 3.8+ and you're ready to go.


Top comments (0)