Patryk Sowiński

Posted on Apr 22 • Edited on May 30

Architecture vs Brute-Force: What I Learned Benchmarking KDFs for my Thesis

#security #cryptography #performance #thesis

Choosing a KDF is a trade-off between user verification time and the brute-force cost for an attacker. For my bachelor’s thesis project, Vaulton, I wanted to look past theoretical recommendations and get empirical data on how these algorithms perform across different hardware tiers.

I benchmarked Argon2id, Bcrypt, and PBKDF2 across a range of devices from an Intel iGPU to an RTX 5080 to identify their real-world breaking points. This post covers the data from those tests, the hardware bottlenecks I identified, and how different architectures respond to memory-hard functions.

The Hardness of the Problem

PBKDF2-SHA256: A NIST-standard algorithm that is primarily CPU-bound. Its largest weakness in modern security is its low memory requirement, which makes it highly parallelizable on GPUs.
Bcrypt: Based on the Blowfish cipher, it introduces some memory requirements (a 4KB S-box) that complicate GPU implementation. While it remains competitive in terms of throughput, it is naturally limited by its age and fixed memory usage.
Argon2id: The winner of the Password Hashing Competition (2015). It is designed specifically to be Memory-Hard, forcing attackers to dedicate physical RAM per thread, which significantly limits the parallelization capabilities of GPUs and ASICs.

Test Environment

Benchmarks were performed across three distinct hardware configurations representing various levels of compute power and memory architectures.

ID	GPU	Memory Architecture	CPU
System 1	Intel® Arc™ Graphics (iGPU)	8018 MB (Shared)	Intel® Core™ Ultra 5 125h
System 2	NVIDIA GTX 1660 Ti	6143 MB (VRAM)	Intel® Core™ i5-9300HF
System 3	NVIDIA GeForce RTX 5080	16302 MB (VRAM)	Intel® Core™ Ultra 7 265K

Note: Attacker throughput was measured using Hashcat 7.1.2 (offline attack model, single device).

Methodology: Normalizing for User Cost

When choosing parameters for a password manager, the goal is to maximize security while maintaining a consistent user experience. I normalized the parameters so that the verification time falls into three distinct "UX profiles" (Low, Medium, and High) based on performance on a mid-range laptop (System 2).

Profile	Algorithm	Hashcat mode	Parameters	Verification Time (System 2)
Low	PBKDF2	10900	250,000 iterations	0.157s
	Bcrypt	3200	Cost Factor 11	0.186s
	Argon2id	34000	64 MB RAM	0.178s
Medium	PBKDF2	10900	500,000 iterations	0.315s
	Bcrypt	3200	Cost Factor 12	0.371s
	Argon2id	34000	128 MB RAM	0.370s
High	PBKDF2	10900	1,000,000 iterations	0.628s
	Bcrypt	3200	Cost Factor 13	0.743s
	Argon2id	34000	256 MB RAM	0.741s

Note: All Argon2id tests used 3 iterations and a parallelism factor of 1.

Benchmark Results (Throughput in H/s)

The following table presents the raw throughput (Hashes per second) for each algorithm at the normalized security levels across all systems.

Hardware	Algorithm	Low	Medium	High
System 1 (iGPU)	PBKDF2-SHA256	1726	755	382
	Bcrypt	34	17	9
	Argon2id	17	4	1
System 2 (1660 Ti)	PBKDF2-SHA256	3998	2054	996
	Bcrypt	346	175	87
	Argon2id	360	101	26
System 3 (RTX 5080)	PBKDF2-SHA256	25236	12612	6271
	Bcrypt	2390	1195	598
	Argon2id	1164	332	87

Baseline Comparison (SHA-256)

For context, the raw SHA-256 throughput (Baseline, Hashcat mode 1400) on these systems was:

iGPU: 1084 MH/s
GTX 1660 Ti: 2329 MH/s
RTX 5080: 15263 MH/s

Analysis: The Attacker's Bottleneck

Figure 1: Comparison of security horizon across different password hashing algorithms.

The Resiliency of Bcrypt

In terms of raw H/s throughput, Bcrypt remains relatively efficient for attackers compared to memory-hard algorithms. On the RTX 5080, it maintains roughly 6.8x higher throughput than Argon2id at high security levels (598 H/s vs 87 H/s).

However, as seen in Figure 1, this efficiency translates into a stark reality for the security horizon of a master password. While Bcrypt provides a formidable defensive wall (estimated at 39 years for a given keyspace), Argon2id shifts the goalpost to 268 years for that same keyspace. While Bcrypt is "competitive" in its resistance compared to PBKDF2, it is still significantly outclassed by the generational jump in security provided by Argon2id's memory hardness.

The Memory Wall of Argon2id

The true advantage of Argon2id is not its throughput, but its GPU resistance through memory hardness. While PBKDF2 scales quickly with hardware power (jumping from 996 H/s on a 1660 Ti to 6271 H/s on a 5080), Argon2id forces a hardware bottleneck.

On the "High" profile (256MB), the RTX 5080 is roughly 72 times slower at cracking Argon2id than it is at cracking PBKDF2, despite the user spending around the same amount of time on verification.

Figure 2: Hardware scaling factor between GTX 1660 Ti and RTX 5080.

As shown in Figure 2, throwing more modern compute power at the problem yields significantly fewer gains for Argon2id. The more RAM you require, the less effective the attacker's raw compute power becomes, as they quickly hit a VRAM bandwidth and capacity wall.

Architectural Anomalies: The 1660 Ti Parity

An unexpected result occurred in the "Low" profile tests on the GTX 1660 Ti, where Argon2id (360 H/s) reached parity with Bcrypt (346 H/s).

This parity was absent in the other test environments; both the iGPU and the RTX 5080 showed Argon2id as roughly twice as slow as Bcrypt at the same level. This suggests that the 1660 Ti represents a specific point of convergence where the computational bottleneck of Bcrypt and the memory bottleneck of Argon2id happen to align, resulting in nearly identical throughput. On shared-memory or ultra-high-compute systems, the memory-hardness of Argon2id becomes the dominant bottleneck much earlier.

Conclusion

For modern applications requiring high security against GPU-based attacks:

PBKDF2 is no longer recommended in systems that want to stop parallelized attacks due to its lack of memory requirements.
Bcrypt is a solid secondary choice but lacks the tuneable memory hardness of newer algorithms.
Argon2id is the superior option. By forcing a significant memory footprint (e.g., 256MB), it effectively minimizes the massive parallelization advantage of high-end GPUs.

In my implementation for Vaulton, I provide two tiered options for password security:

Standard: Argon2id (128MB, 3 iterations, 1p) - Balance of performance and security.
Hardened: Argon2id (256MB, 4 iterations, 1p) - Maximizing memory hardness for high-value protection.

These configurations are intended to maximize the computational and physical cost of a brute-force attack, forcing even high-end hardware like the RTX 5080 to operate at a fraction of its potential throughput due to memory bottlenecks. On low-memory mobile devices, higher memory settings (e.g., 256MB) may introduce noticeable latency or fail under constrained environments, so adaptive parameter tuning per device should be considered for commercial products.

Discussion

In systems implementing zero-knowledge architecture, the responsibility for performing these heavy KDF calculations shifted to the client-side (e.g., in a browser or mobile app), while the server might only perform a lightweight rehash of authentication verifiers for storage in the database.

How are you handling the trade-off between client-side performance and high-security KDF parameters? Are there specific edge cases or mobile device limitations you've encountered when requiring significant amounts of RAM for client-side hashing?

I am particularly interested in feedback about higher security (stronger params) vs user cost (UX degradation) tradeoffs.

_{Cover image by Athena Sandrini on Pexels, with minor edits.}

DEV Community