I work on a legacy PHP application that runs on AWS EC2. The application is deployed from a deploy server with rsync. In this environment, I needed a practical way to detect file tampering on application servers.
Existing tools did not fit this deployment model well, so I built a small Go tool called kekkai and open-sourced it. In this post, I want to explain not only the design choices, but also the implementation and operational details that mattered in practice.
https://github.com/catatsuy/kekkai
The environment
This application has these characteristics:
- it runs on AWS EC2
- it is a legacy PHP application
- dependencies are installed on a deploy server
- the application is deployed with
rsync
This is a common setup for older PHP applications. I wanted a solution that fits this environment instead of assuming container images or immutable deployments.
The basic model
The model is simple.
First, the deploy server calculates hashes for files and creates a manifest. The manifest can be stored either in S3 or in a local file. Then the application server verifies local files against that manifest.
The tool has two main commands:
-
generate: create a manifest from the current files -
verify: compare current files with the manifest
I wanted the data flow to stay easy to understand. The deploy server creates the trusted data, and the application server only reads it and verifies local files.
Manifest structure
The manifest contains these values for each file:
- path
- SHA-256 hash
- file size
It also contains the exclude rules used at generation time.
I wanted the manifest itself to describe what should be checked. I did not want verification behavior to depend on extra local configuration on the application server.
This is also why verify does not accept additional exclude rules. If the application server is compromised, I do not want it to be able to silently skip more files.
Why I only hash file contents
I only check file contents. I do not check timestamps or other metadata.
The reason is simple: metadata changes too easily. Normal operational work can change timestamps even when the file contents are still the same. If a tool alerts on that, it creates noisy alerts, and eventually people stop trusting the alerts.
This is also why I did not want an approach that archives the whole source tree into a tar file and hashes that tar file. A tar file can change for reasons that do not mean the application code was tampered with. I wanted the tool to fail only when the content of an actual file changed.
How generate works
The generate command walks the target directory and creates manifest entries one by one.
For regular files, it reads the file, calculates a SHA-256 hash, and stores the path, hash, and file size in the manifest.
Exclude rules are applied at this stage. I made this choice on purpose. The deploy server is the side that creates trusted data, so exclude handling should be fixed there.
After all entries are collected, the manifest is written either to a local file or to S3.
I also made generate flexible enough to work even when some excluded directories do not exist on the deploy server. That helps in real deployment environments where some paths only exist on application servers.
How verify works
The verify command loads the manifest first. Then it walks the target directory and compares each current file with the manifest entry.
It checks:
- whether the path exists in the manifest
- whether the file type matches
- whether the file size matches
- whether the calculated hash matches
It also detects files that exist in the manifest but are missing on disk.
When verification fails, the command exits with a non-zero status. It also writes error details to standard error, including the path that failed. This makes the tool easy to integrate with monitoring systems.
How symlinks are handled in Go
Symlinks needed special handling.
kekkai does not follow symlinks. Instead, it verifies the symlink itself.
The implementation is roughly like this:
- use
os.Lstatto check whether the entry is a symlink - use
os.Readlinkto read the target path string - add a
symlink:prefix to that string - calculate the SHA-256 hash of that prefixed string
- during verification, check both the file type and the stored hash
This lets the tool detect:
- a changed symlink target path
- a type change between a regular file and a symlink
- added or removed symlinks
This design is intentional. If the symlink target path stays the same but the target file contents change, that is outside the scope of this check. I accepted that trade-off because I wanted predictable behavior and simple logic.
I also do not cache symlink verification results. The hashed input is only a short string, so the cost is small.
Why I support both S3 and local files
The manifest itself must be protected.
If an attacker can modify both the application files and the manifest, verification becomes meaningless. That is why the main production model stores the manifest in S3 instead of next to the application files.
At the same time, I also wanted local file output. Without that, even simple tests would require AWS credentials. So the tool supports both:
- S3 for production
- local files for testing and development
I also recommend being careful with local manifest output. If you deploy that manifest into the same target directory, verify can fail because the manifest itself appears as an unexpected file.
Protecting the manifest with S3 and IAM
Using S3 also makes it easier to separate permissions.
The application side only needs GetObject.
The deploy side only needs PutObject.
That separation is useful because the deploy server and the application servers have different roles. If needed, S3 features such as versioning can also help protect the manifest further.
I also recommend keeping base-path fixed in production and managing it explicitly. Since base-path and app-name become part of the S3 path, this helps avoid accidentally overwriting production data.
Why I chose SHA-256
For this kind of verification, I needed a hash function with the right security properties. I did not want to use a weak fast hash that would make it easier to replace a file with another input that matches the stored hash.
In security terms, the important property here is second-preimage resistance.
I considered SHA-256 and SHA-512. I chose SHA-256 because it is standard, well known, and easy to justify. I also did not see a meaningful advantage from SHA-512 for source-code-sized files in this use case.
How I reduced production load
Performance was the hardest practical problem.
Hashing a large codebase uses CPU, memory, and I/O. If the verification tool itself harms production stability, that defeats the purpose. Because of that, I added several controls.
GOMAXPROCS and workers
First, I rely on normal Go controls such as GOMAXPROCS.
kekkai also has a --workers option to control how many files are hashed in parallel. By default, it uses the same value as GOMAXPROCS.
This helps, but it is not enough. Even with one worker, the process can still keep one CPU core busy when many files are processed.
I/O rate limiting with golang.org/x/time/rate
To make the tool safer in production, I added I/O rate limiting with golang.org/x/time/rate.
Instead of only limiting concurrency, I also limit how fast the tool reads file data. This makes it possible to slow verification down on purpose and reduce the production impact.
The core idea is simple:
- create a limiter
- read file data in chunks
- wait on the limiter before each chunk
- write the chunk into the hasher
This approach gave me the most flexible control. In practice, this mattered more than worker limits alone.
kekkai exposes this through the --rate-limit option. Of course, if the value is too small, verification will become very slow, so this needs to be tuned carefully.
Cache
I also added a local cache to make repeated verification faster.
The cache stores file metadata and can skip hash calculation when mtime, ctime, and file size have not changed. Here, ctime means file change time, not creation time.
I know that metadata-based skipping is not a perfect security check by itself. That is why the cache is only an optimization layer.
There is also some risk that the cache file itself could be tampered with. Because of that, the default behavior is to recalculate hashes with a 10% probability even when the cache says the file is unchanged. This probability can be changed with --verify-probability. If it is set to 0, hash recalculation is skipped as long as the cache metadata still matches.
The cache also includes the hash of the cache file itself. If tampering is detected, the cache is disabled. Also, files under /tmp may eventually be deleted, so the cache can be rebuilt naturally over time.
Go implementation notes
I also made a few implementation choices to reduce overhead in Go itself.
When hashing many files, I do not want to allocate a new hasher for every file if I can avoid it. So I reuse hash.Hash with Reset() instead of calling sha256.New() every time.
The same idea applies to buffers. I reuse the io.CopyBuffer buffer for each worker, instead of allocating a new buffer per file.
This matters because sha256.New() is not free, and repeated allocations across many files and workers increase GC cost and cache misses.
One important detail is that hash.Hash is not goroutine-safe. So if hashing is done in parallel, each worker needs its own hasher and buffer.
How I run it on the deploy server
In production, the deploy process is implemented with shell scripts. The deploy server installs dependencies first and then runs rsync.
Because of that, I run kekkai generate at the end of the deploy script.
A typical command looks like this:
kekkai generate --target /var/www/app \
--s3-bucket 'kekkai-test' \
--base-path production \
--app-name kekkai \
--exclude ".git/**"
This stores the manifest as production/kekkai/manifest.json in the specified S3 bucket.
At this stage, it is important to list every directory that must be ignored, such as log directories or NFS mount points. Since exclude rules are stored in the manifest, mistakes here will affect later verification.
How I run it on application servers
The minimum command on an application server is simple:
kekkai verify --target /var/www/app \
--s3-bucket 'kekkai-test' \
--base-path production \
--app-name kekkai
In real production, I also care about these points:
- the application server must not be able to write to S3
- I want alerts on failure
- I want to limit load on EC2
- I want to use the cache to reduce execution time
So the application side only gets s3:GetObject.
Monitoring and alerts
I run verification as a periodic check from our monitoring system.
If I alert on a single failure, I may get alerts during deployment. That would create false positives and reduce trust in alerts. So I only alert after repeated failures.
Timeout is also important. Full verification can take several minutes, so the monitoring side needs a longer timeout than a normal health check.
Why I use systemd-run
I also use systemd-run when running kekkai verify.
The reason is simple: I do not want this check to run with strong privileges or compete too aggressively with the main application.
A real example looks like this:
systemd-run --quiet --wait --pipe --collect \
-p Type=oneshot \
-p CPUQuota=25% -p CPUWeight=50 \
-p PrivateTmp=no -p User=nobody \
/bin/bash -lc \
'nice -n 10 ionice -c2 -n7 /usr/local/bin/kekkai verify --s3-bucket kekkai-test --app-name app --base-path production --target /var/www/app --use-cache --rate-limit 10485760 2>&1'
There are several reasons for this setup.
-
User=nobodymakes the command run as a low-privilege user -
niceandionicereduce CPU and I/O priority -
CPUQuotaandCPUWeightreduce CPU usage further through cgroup control -
PrivateTmp=nois necessary if I want to use the cache in/tmp
That last point is easy to miss. If PrivateTmp=no is not set, the process gets a different private /tmp, and the cache file cannot be reused.
I also mention Go 1.25 or later in this context. Before Go 1.25, even if cgroup limits were applied, GOMAXPROCS could still reflect the parent machine's CPU count. Since Go 1.25 became cgroup-aware by default, I target Go 1.25 or later in kekkai.
Alert contents
When verification fails, kekkai writes the error to standard error, including the affected path.
Some monitoring systems include standard output in notifications, so I redirect standard error to standard output when needed. That way, a notification to Slack or another channel can include the actual file path that failed verification.
This makes investigation much faster.
Production results
In production, the application has about 17,000 files including dependencies.
- manifest generation takes a few seconds
- verification takes about 4 to 5 minutes with
--rate-limit 10485760(10 MB/s) - with
--use-cache, a cache hit can reduce that to about 25 seconds - verification runs once per hour on application servers
This difference is intentional. I want generate to finish quickly as part of deployment, but I want verify to run slowly and safely on production servers. Even if verification takes about five minutes, running it once per hour is enough for this use case.
Final thoughts
I did not want to build a large security platform. I wanted a small tool that fits a specific real-world environment: a legacy PHP application on EC2, deployed with rsync, with a deploy server and application servers playing different roles.
That focus shaped both the design and the implementation: content-only hashing, strict exclude rules, explicit symlink handling, S3 and IAM for manifest protection, local cache with probabilistic re-verification, rate limiting, and safe execution with systemd-run.
If you work with a similar deployment model, this approach may be useful for you too. I have open-sourced the tool on GitHub as catatsuy/kekkai.
catatsuy
/
kekkai
A lightweight Go tool for detecting file tampering by comparing content-based hashes stored securely in S3.
Kekkai
A simple and fast Go tool for file integrity monitoring. Detects unauthorized file modifications caused by OS command injection and other attacks by recording file hashes during deployment and verifying them periodically.
The name "Kekkai" comes from the Japanese word 結界 (kekkai), meaning "barrier" - a protective boundary that keeps unwanted things out, perfectly representing this tool's purpose of protecting your files from tampering.
Takumi, the AI offensive security engineer
Design Philosophy
Kekkai was designed to solve specific challenges in production server environments:
Why Kekkai?
Traditional tools like tar or file sync utilities (e.g., rsync) include metadata like timestamps in their comparisons, causing false positives when only timestamps change. In environments with heavy NFS usage or dynamic log directories, existing tools become difficult to configure and maintain.
Core Principles
-
Content-Only Hashing
- Hashes only file contents, ignoring timestamps and metadata
- Detects actual content changes, not superficial modifications
-
Immutable Exclude…
Top comments (0)