Building Own Stream Cipher: Part 2 - RC4: From Ubiquity to Collapse and What It Taught Us About Trust

#programming #learning #devops #security

For years, RC4 was everywhere. Browsers used it to secure TLS connections. VPNs depended on it. Wi-Fi (through WEP) shipped it into millions of homes and offices. It was the cipher trusted with the internet’s most private conversations.

And yet RC4 is now banned, deprecated, and remembered mostly as a cautionary tale. Why? Because RC4 had flaws that weren’t just “theoretical”—they were exploited in the wild.

What RC4 Is

RC4 is a stream cipher designed in the late 1980s. Unlike block ciphers (AES, DES) that process chunks of data, a stream cipher generates a continuous keystream of bytes. Encryption is simply:

ciphertext = plaintext ⊕ keystream

Decryption is the same operation again (XOR undoes itself).

Under the hood, RC4 maintains a 256-byte permutation array and two counters. It has two steps:

KSA (Key Scheduling Algorithm): shuffles the array using the key.
PRGA (Pseudo-Random Generation Algorithm): walks the array, swaps entries, and emits one keystream byte per step.

Its code was so simple it could be written in under 20 lines of C. That simplicity, combined with speed, made RC4 wildly popular.

Why RC4 Is Insecure

Over time, cryptanalysis revealed that RC4’s simplicity came at a price:

Biased output.

The first keystream bytes aren’t uniform. Some values occur more often than others. With enough samples, those biases leak plaintext—this powered attacks on TLS cookies in practice.
Weak key scheduling.

RC4’s initialisation leaks information about the key. In WEP, where each packet’s key was (IV || password), those leaks let attackers recover the Wi-Fi key just by listening.
No nonce discipline.

RC4 doesn’t define how to avoid keystream reuse. If the same keystream is ever used twice:

C1 ⊕ C2 = P1 ⊕ P2

which immediately leaks structure from both messages.
No integrity.

RC4 provides confidentiality only. Flip a bit in ciphertext, and the same bit flips in plaintext. Without authentication (MAC/AEAD), traffic is malleable.

RC4 in Python (for learning only)

Here’s a minimal implementation you can run to see how RC4 works:

def rc4(key: bytes, data: bytes) -> bytes:
    # Key Scheduling Algorithm (KSA)
    S = list(range(256))
    j = 0
    for i in range(256):
        j = (j + S[i] + key[i % len(key)]) % 256
        S[i], S[j] = S[j], S[i]
    # Pseudo-Random Generation Algorithm (PRGA)
    i = j = 0
    out = bytearray()
    for b in data:
        i = (i + 1) % 256
        j = (j + S[i]) % 256
        S[i], S[j] = S[j], S[i]
        k = S[(S[i] + S[j]) % 256]
        out.append(b ^ k)
    return bytes(out)

# Demo
key = b"secret"
pt  = b"HELLO WORLD"
ct  = rc4(key, pt)
print(ct.hex())
print(rc4(key, ct))  # recovers b'HELLO WORLD'

This is for educational purposes only. RC4 should never be used in new systems.

Real-World Consequences: The TJX Breach

The flaws weren’t academic. In 2005–2007, attackers sat in the parking lot of T.J. Maxx and Marshalls stores, captured WEP traffic, and cracked the Wi-Fi key.

Why it worked: WEP used RC4 with a 24-bit IV. The IV was public and repeated quickly. With RC4’s weak key scheduling, attackers could recover the password statistically from captured traffic.
What happened next: Once on the store Wi-Fi (which bridged directly into the corporate LAN), they installed sniffers on point-of-sale systems.
Impact: Tens of millions of card numbers stolen, hundreds of millions in damages, and years of mandated audits.

This wasn’t a zero-day exploit. It was RC4’s known weaknesses, amplified by WEP’s design, turned into a turnkey key-recovery attack.

The Lesson: Trust, and How We Lost It

RC4 wasn’t obscure. It was the most widely deployed cipher on earth for nearly two decades. Entire industries put their secrets in its hands.

Why?

It was fast and simple.
It was popular: “everyone else uses it, so it must be safe.”
It seemed unbroken: no obvious attacks at first.

But popularity and convenience are not proof of strength. By the time practical attacks emerged, RC4 was so deeply embedded that ripping it out took years—and in some cases, major breaches happened first.

This raises the harder question:

👉 If we were wrong about RC4, how do we know we’re not wrong about the ciphers we trust today?

What RC4 teaches us about trust

Popularity is not evidence. A cipher’s ubiquity reflects convenience, not security.
Scrutiny is the real defence. We should trust algorithms that have endured years of open, hostile analysis—not just ones that are fast and widely deployed.
Agility is part of security. Systems must be designed to deprecate broken primitives quickly. The real sin of WEP and TLS with RC4 was not just using them—it was keeping them long after cracks appeared.
AEAD is the modern baseline. Today’s ciphers (AES-GCM, ChaCha20-Poly1305) don’t just encrypt. They define nonce handling and provide authentication by design, closing the gaps that RC4 left wide open.

Closing Thought

RC4 teaches us that security isn’t a one-time decision. It’s a continuous process of skepticism, open testing, and agility.

The real mistake wasn’t just RC4’s flaws—it was the industry’s misplaced trust in them.

If we want to avoid repeating that story, we have to treat trust not as blind faith but as something earned—through open design, relentless cryptanalysis, and the ability to adapt when we learn that yesterday’s certainty was wrong.