DEV Community

Cover image for 10 Non-Cryptographic Hashing Use Cases (MD5/Similar) in PHP
Radhika
Radhika

Posted on

10 Non-Cryptographic Hashing Use Cases (MD5/Similar) in PHP

The Message-Digest Algorithm 5 (MD5) is a hashing function that produces a 128-bit (32-character hexadecimal) hash. While MD5 is considered cryptographically broken and should not be used for security-critical applications like password hashing or digital signatures due to its vulnerability to collision attacks, it still has some valid, non-critical use cases in PHP.

Cache Key Generation

Hashing a complex database query string, a URL, or an object's state to generate a short, consistent key for storing and retrieving the result in a memory cache (like Redis or Memcached).

This example uses MD5 to turn a complex data structure (like an array of query parameters) into a short, consistent cache key.

PHP

function generate_cache_key(array $data): string {
    // Convert the array to a consistent string format (JSON is good for this)
    $data_string = json_encode($data);

    // Hash the string to get a short, unique key
    return 'cache_' . md5($data_string); 
}

$query_params = ['user_id' => 101, 'page' => 2, 'sort_by' => 'name'];
$cache_key = generate_cache_key($query_params);

echo "Cache Key: " . $cache_key . "\n";
// Output example: Cache Key: cache_65a8e02d68f219f71c110255b5463f23
Enter fullscreen mode Exit fullscreen mode

Notes:

Purpose: Provides a deterministic fingerprint for a large or complex input, which is essential for cache lookups. The same input must always produce the same key.

Why MD5 is OK: Caching keys are non-sensitive. Even if an attacker could generate a collision (a different input with the same hash), the worst outcome is a cache miss or fetching the wrong cached data, which is usually a low-severity issue.

Implementation Detail: Always ensure the input data is serialized consistently (e.g., sort array keys before JSON encoding) to avoid different inputs generating the same hash.

Gravatar (or Similar Avatar) Generation

Hashing a user's email address to create a globally unique ID for fetching their avatar image from a service like Gravatar. The hash is intentionally exposed and not considered sensitive.

MD5 is used to hash an email address, which is the standard practice for generating a Gravatar URL.

PHP

function get_gravatar_url(string $email, int $size = 80): string {
    // Email must be trimmed and converted to lowercase before hashing
    $clean_email = trim(strtolower($email));
    $hash = md5($clean_email);

    // Construct the Gravatar URL
    return "https://www.gravatar.com/avatar/{$hash}?s={$size}";
}

$user_email = "User.Name@Example.com ";
$avatar_url = get_gravatar_url($user_email);

echo "Avatar URL: " . $avatar_url . "\n";
// Output example: Avatar URL: https://www.gravatar.com/avatar/08b7d99540c4a4e8d3876e5d26ff8c53?s=80
Enter fullscreen mode Exit fullscreen mode

Notes:

Purpose: Creates a unique, public identifier derived from a user's email address.

Why MD5 is OK: The hash is intended to be public and exposed in the URL. If the user's email were highly sensitive, this process would expose its existence (via the hash), but in practice, the hash is the only way the Gravatar system can link an email to a profile.

Requirement: Hashing must be consistent (trimmed and lowercase) to ensure the same email always resolves to the same image.

File Naming for Uploads

Using the hash of a file's content (or a combination of the user ID and timestamp) to give an uploaded file a unique, non-conflicting name on the server's file system, preventing users from overwriting each other's files.

Hashing the file's binary content ensures that even if two users upload files with the same name, they won't conflict if their contents are different.

PHP

$uploaded_file_path = '/tmp/user_upload_12345.jpg';

// Calculate the hash of the file's content
$content_hash = md5_file($uploaded_file_path);

// Get the original file extension (for this example, let's assume it's jpg)
$extension = pathinfo('original_file_name.jpg', PATHINFO_EXTENSION); 

// Create a unique filename based on the hash
$new_filename = "{$content_hash}.{$extension}";

echo "New Filename: " . $new_filename . "\n";
// Output example: New Filename: 4894d30626354b39527f4d2222383c27.jpg
Enter fullscreen mode Exit fullscreen mode

Notes:

Purpose: Guarantees a non-conflicting, unique file name on the server's storage system.

PHP Function: Uses md5_file() which is efficient because it processes the file in chunks without loading the entire content into memory, making it suitable for large files.

Alternative: Can be combined with a timestamp or user ID to make the name more unique, although the file content hash itself should be unique enough for most practical purposes.

Static Asset Versioning (Cache Busting)

Hashing the content of a static file (e.g., a CSS or JavaScript bundle) and appending it to the filename (e.g., app.1a2b3c4d.css). When the file content changes, the hash changes, forcing the user's browser to download the new version instead of using a cached old version.

This demonstrates how a strong hash of the asset's content is used in the URL to "bust" the browser cache when the file changes.

PHP

$css_file_path = '/var/www/html/assets/style.css';
$css_hash = md5_file($css_file_path);

// In your HTML template, you would use this path:
$asset_url = "/assets/style.{$css_hash}.css";

echo "Cache-Busted URL: " . $asset_url . "\n";
// Output example: Cache-Busted URL: /assets/style.90c8b3d68a2f5e4e8c1b6a7d5f0e3c1a.css
Enter fullscreen mode Exit fullscreen mode

Notes:

Purpose: Ensures users always load the latest version of a file (CSS, JS, images) when its content has changed.

Mechanism: Browsers cache files based on their URL. By embedding the hash, a change in the file content results in a new URL, forcing the browser to perform a new download.

Why MD5 is OK: This is a speed/convenience optimization, not a security feature. The file content is already public, so the hash is non-sensitive.

Detecting Duplicate Uploads

Calculating the hash of a file before or during upload and checking if that hash already exists in the database. This prevents storing multiple identical copies of the same image or document.
Used to check if an uploaded file (referenced by its temporary path) is already in the system.

PHP

$uploaded_file_path = '/tmp/new_upload.pdf';

// 1. Calculate the hash of the new file
$file_hash = md5_file($uploaded_file_path);

// 2. Simulate checking the database
$db_records = [
    'a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6' => 'document_A.pdf',
    // ... other files
];

if (array_key_exists($file_hash, $db_records)) {
    echo "DUPLICATE FOUND! File already exists as: " . $db_records[$file_hash] . "\n";
} else {
    echo "New unique file. Hash is: " . $file_hash . "\n";
    // Code to save the file and hash to DB here...
}
Enter fullscreen mode Exit fullscreen mode

Notes:

Purpose: Saves storage space and avoids unnecessary processing by immediately identifying redundant uploads.

Collision Risk: The risk of two legitimate, different user-uploaded files randomly having the same MD5 hash is statistically negligible for this non-security application.

Efficiency: Calculating the hash and performing a database index lookup is much faster than running complex content analysis or comparison.

URL Shortening

Using the hash of a long URL as the basis for a short, unique identifier (though a simple sequential ID is often preferred, a hash ensures the resulting short code is immediately unique).

Hashing the long URL to produce a short, unique segment. Note that this is not ideal as it can produce very long segments (32 chars), but it guarantees uniqueness.

PHP

function shorten_url(string $long_url): string {
    // Take the first 8 characters of the hash for a shorter key
    return substr(md5($long_url), 0, 8);
}

$long_url = "https://www.example.com/very/long/article?id=12345&user=abc";
$short_key = shorten_url($long_url);

echo "Shortened Key: " . $short_key . "\n";
// Output example: Shortened Key: e0b263b6
Enter fullscreen mode Exit fullscreen mode

Notes:

Purpose: Converts a long, variable-length string into a fixed-length string to create a compact identifier.

Practicality: While hashing guarantees a unique key for a unique URL, most production URL shorteners use a simple sequential ID or a randomized string generation for better key length control and database performance.

Security: This is fine as the URL content is typically public.

Data Integrity Check (Non-Security Critical)

Calculating a hash (checksum) for a large file or data transfer and comparing it after transmission to confirm that no bits were flipped due to a network error. This is for accidental corruption only.
Generating a checksum for a data block before saving it, to verify it hasn't been corrupted when retrieved later.

PHP

$data = "The quick brown fox jumps over the lazy dog.";

// Save the data and its checksum
$original_checksum = md5($data);
// ... Data is stored/transferred ...

// Later, retrieve the data and check it
$retrieved_data = "The quick brown fox jumps over the lazy dog."; 
$retrieved_checksum = md5($retrieved_data);

if ($original_checksum === $retrieved_checksum) {
    echo "Data integrity verified. Checksums match.\n";
} else {
    echo "ERROR: Data corruption detected!\n";
}
Enter fullscreen mode Exit fullscreen mode

Notes:

Purpose: To quickly detect accidental changes, like bit rot on a disk or an error during an internal data transfer between services.

Limitation: It is a weak integrity check. Because of the MD5 collision vulnerability, you should not use this to prove that a file has not been maliciously altered. Stronger hashes like SHA-256 are preferred for public integrity verification.

Database Index/Sharding Key

Hashing a key (like a user ID) to determine which database shard or partition the data should be stored on, ensuring a balanced distribution across multiple servers.

Using the hash to determine the target table or shard for a record, often by taking the modulus of the hash's numeric representation.

PHP

$user_id = 456789;
$num_shards = 4;

// Convert the user ID (or a hash of it) to a numeric value
// We'll use the user ID directly for simplicity, but a hash can be used.
$shard_index = ($user_id % $num_shards);

$table_name = "user_data_shard_" . $shard_index;

echo "User ID {$user_id} belongs to table: " . $table_name . "\n";
// Output example: User ID 456789 belongs to table: user_data_shard_1
Enter fullscreen mode Exit fullscreen mode

Notes:

Purpose: Ensures even distribution of data across a set of partitioned (sharded) databases, preventing single database servers from becoming bottlenecks.

Hashing Role: Hashing a key (like a username or object ID) before applying the modulus operation provides a more random, uniform distribution than using a simple ID.

Requirement: The hashing function must be consistently used for both read and write operations.

Message Queue Deduplication

Hashing the content of a message being sent to a queue to quickly check if a message with the identical content has already been processed or is currently waiting, preventing duplicate processing.

Hashing the message content to check for duplicates before adding it to a queue.

PHP

$message_content = json_encode(['action' => 'send_email', 'target' => 'alice@example.com']);

// Generate a unique fingerprint for the message
$message_fingerprint = md5($message_content);

// Simulate checking if the fingerprint exists in a set/queue lookup
$processed_fingerprints = [
    'a5b4c3d2e1f0a9b8c7d6e5f4a3b2c1d0'
];

if (!in_array($message_fingerprint, $processed_fingerprints)) {
    echo "Message is unique. Adding to queue with fingerprint: " . $message_fingerprint . "\n";
    // Code to add to queue and processed_fingerprints array
} else {
    echo "Message is a duplicate. Skipping.\n";
}
Enter fullscreen mode Exit fullscreen mode

Notes:

Purpose: A fast filter to prevent systems from performing redundant work (e.g., sending the same email twice).

Requirement: Like caching, the message content must be serialized consistently (sorted keys, etc.) to ensure that two identical logical messages produce the same hash.

Why MD5 is OK: It's used for speed and consistency; the occasional collision causing a missed message is usually acceptable compared to the performance cost of a more secure hash.

Generating Unique Session IDs (Weak/Legacy)

While not recommended for modern, secure systems, MD5 was historically used to combine a user's IP address, timestamp, and a random number to generate a relatively unique, non-guessable session identifier before better, cryptographically secure methods were standard.

⚠️ This is a discouraged practice. This code shows how MD5 was historically used by combining various environmental factors to create a unique ID.

PHP

// ⚠️ NOTE: Use PHP's built-in session functions or a secure library instead.
$remote_ip = '203.0.113.45'; // $_SERVER['REMOTE_ADDR']
$timestamp = time();
$random_num = bin2hex(random_bytes(16)); // Secure random number

// Combine inputs and hash them
$session_data = $remote_ip . $timestamp . $random_num;
$legacy_session_id = md5($session_data);

echo "Legacy Session ID (DO NOT USE): " . $legacy_session_id . "\n";
// Output example: Legacy Session ID (DO NOT USE): 53a2f8b17b2d4c0e6f9a3d1c2b0a9e8d
Enter fullscreen mode Exit fullscreen mode

Notes:

Historical Context: MD5 was once used because it was the fastest way to generate a seemingly random, fixed-length unique string from varying inputs.

Modern Risk: MD5's speed and collision weakness make these session IDs less secure against brute-force guessing or sophisticated attacks.

Modern Solution: Always use PHP's native session management or cryptographically secure functions like random_bytes() combined with bin2hex() for generating tokens, which are designed to be unpredictable and resistant to attacks.

Conclusion

The analysis of MD5 use cases reveals a clear distinction between its historical application and its appropriate, non-cryptographic role in modern PHP and web development.

While MD5 is cryptographically broken and must never be used for security-critical functions like password hashing, its speed and fixed-length output make it perfectly suited for generating non-sensitive, deterministic fingerprints (checksums).

The ten discussed applications—from Cache Key Generation and File Naming to Asset Versioning and Deduplication—demonstrate that MD5 retains significant value as a fast utility tool for operational efficiency and data organization.

For website development understanding this distinction is crucial for building robust and performant applications. Experienced website developers can strategically leverage non-secure hashes for:

Performance Optimization: Creating fast cache keys to improve application response times.

Infrastructure Management: Generating unique file identifiers to manage large volumes of user uploads efficiently.

Deployment Reliability: Using hashes in asset versioning to ensure clients always see the latest code, eliminating common caching issues.

By adhering to industry best practices—using modern, slow, salted algorithms like Bcrypt for security (passwords) and fast, non-secure algorithms like MD5/CRC32 for utility (caching) — Website developers can deliver secure, high-performing custom website development that meet contemporary development standards.

Top comments (0)