Introduction
In an era where data breaches and file tampering incidents make headlines, securing file transfers between systems has become a fundamental requirement in system design. While there are many mechanisms available to protect data in transit - such as encryption, digital signatures, and mutual authentication - these typically focus on who is sending or receiving the data.
But what about what is being sent? How can you ensure that a file received at the destination is exactly the same as the one sent from the source system, with zero modifications - accidental or malicious?
This is where checksums and hashes play a powerful role.
🧩 The Integrity Challenge
Even if the transport is secure (like over HTTPS, SFTP, or a VPN tunnel), file tampering can occur:
During staging or transit through intermediate systems.
Due to network glitches or disk write issues.
Because of insider threats or misconfigured scripts.
So, we need a mechanism to verify the integrity of the file, regardless of the identity or origin of the sender.
✅ Enter Checksums and Hashing
A checksum is a digital fingerprint of your file. It's generated using a mathematical algorithm (called a hash function) that takes in a file and outputs a fixed-length string. If even a single bit of the file changes, the resulting checksum will also change - making it a reliable method for detecting tampering.
Historically, algorithms like CRC32 and MD5 were popular for checksum generation. However:
- CRC32 is fast but prone to collisions and not cryptographically secure.
- MD5 has been broken and is vulnerable to collision attacks, making it unsuitable for security-sensitive applications.
🔐 SHA-256 - A Modern, Secure Approach
Today, SHA-256 (part of the SHA-2 family) is the industry standard for cryptographic hashing:
- It generates a 256-bit (32-byte) hash value.
- It's collision-resistant and widely used in SSL, blockchain, and file integrity systems.
- It's ideal for internal systems where encryption or digital signatures might be overkill.
In our scenario, we'll explore using SHA-256 for securing file transfers between two on-prem or internal systems that are already within a trusted network.
🏗️ High-Level Flow:
- Source system computes the SHA-256 hash of the file.
- It sends both the file and its hash to the destination system.
- The destination system recomputes the SHA-256 hash of the received file.
- It compares the new hash with the original.
- If they match ✅: File is intact.
- If they differ ❌: File is tampered or corrupted.
🧪 Example: Generating SHA-256 Hash
You can use a variety of tools - Python, PowerShell, Linux utilities, or even C# - to generate the hash. For my use case, I had to develop PowerShell scripts
#Save the code to a file : generate_hash.ps1
param (
[Parameter(Mandatory = $true)]
[string]$File,
[string]$HashFile
)
#Check if the input file exists
if (-not (Test-Path $File)) {
Write-Error "Input file '$File' not found."
exit 1
}
#Set default hash file if not provided
if (-not $HashFile) {
$HashFile = "$File.sha256"
}
try {
# Compute SHA256 hash
$hash = Get-FileHash -Path $File -Algorithm SHA256
$hash.Hash | Out-File -FilePath $HashFile -Encoding ascii -Force
Write-Output "SHA256 hash written to '$HashFile'"
exit 0
}
catch {
Write-Error "Failed to generate hash: $_"
exit 2
}
# Sample Usage
PS D:\GithubRepo\blog_posts\file_hashing> .\generate_hash.ps1 -File .\sample.txt
SHA256 hash written to '.\sample.txt.sha256'
PS D:\GithubRepo\blog_posts\file_hashing> cat .\sample.txt.sha256
C8447B1EC04E6D24B5B07D40126F30D7F6295B1F95E450EF64E4DEB627CADBC6`
🔍 Verifying SHA-256 Hash on the Destination
Once the file is received, verification is straightforward.
# Save the code to a file : verify_hash.ps1
param (
[string]$File,
[string]$HashFile
)
if (-not $File -or -not $HashFile) {
Write-Output "Usage: .\verify_hash.ps1 -File <input_file> -HashFile <hash_file>"
exit 1
}
if (-not (Test-Path $File)) {
Write-Output "File '$File' not found."
exit 1
}
if (-not (Test-Path $HashFile)) {
Write-Output "Hash file '$HashFile' not found."
exit 1
}
# Compute current hash
$computedHash = (Get-FileHash -Path $File -Algorithm SHA256).Hash.Trim()
# Read expected hash
$expectedHash = Get-Content $HashFile | Select-Object -First 1
$expectedHash = $expectedHash.Trim()
# Compare
if ($computedHash.ToUpper() -eq $expectedHash.ToUpper()) {
Write-Output "Hash verified successfully."
} else {
Write-Output "Hash mismatch!"
Write-Output "Expected: $expectedHash"
Write-Output "Actual: $computedHash"
}
# Sample Usage
PS D:\GithubRepo\blog_posts\file_hashing> .\verify_hash.ps1 -File .\sample.txt -HashFile .\sample.txt.sha256
Hash verified successfully.
🔒 What About Security?
While this approach ensures integrity, it does not provide:
Confidentiality - File content is still in plaintext unless encrypted separately.
Authentication - Anyone can generate a hash if they get the file.
If your systems operate in an untrusted network or involve external parties, consider:
Using HMAC-SHA256 (hash-based message authentication code with a secret key)
Digitally signing the hash using a private key
Encrypting the entire payload with TLS or PGP
⚙️ When Is This Approach Ideal?
Use SHA-256 checksum validation when:
Systems are internal, trusted, and firewalled.
You only care about integrity, not identity or secrecy.
Lightweight and fast verification is a priority.
This can be especially useful in:
Automated data pipelines
ETL jobs transferring CSV/JSON/XML files
Batch file movement between microservices or legacy systems
📦 Bonus Tip: Automate the Workflow
Generate hash on the source during export
Transmit both .data and .data.sha256
Use scheduled jobs or file watchers to verify and move files to staging or quarantine based on checksum match.
🧭 Final Thoughts
SHA-256-based file integrity verification offers a simple, reliable, and effective way to secure file transfers in controlled environments. It doesn't replace encryption or authentication but serves as a complementary measure that gives you confidence your files haven't been tampered with.
Top comments (0)