zuka3

Posted on Mar 19

The Math Behind RSA #1: Primes and Prime Factorization

#rsa #cryptography #python #math

This article has a more math-focused version with formal proofs on Folio.

Every time you visit an HTTPS website, send a message on Signal, or push to GitHub, RSA (or a close relative) is working behind the scenes. But what actually makes RSA secure? It comes down to one beautifully simple asymmetry: multiplying two primes is trivial; factoring their product is not.

This is the first article in a four-part series where we build up the mathematics behind RSA from scratch. No prior number theory knowledge required -- just curiosity and a bit of Python.

The Fundamental Asymmetry

Try this in your head. Multiply:

127 \times 311 = \text{?}

Hard to do mentally, but a computer handles it in nanoseconds. Now go backwards -- factor this number into its two prime components:

39497 = \text{?} \times \text{?}

Even knowing the answer is two primes, this is significantly harder. You would need to try dividing by candidate primes until something works. And that gap -- easy to multiply, hard to factor -- is the entire foundation of RSA.

Let's see it in Python:

import time

p = 2**521 - 1   # a known Mersenne prime (157 digits)
q = 2**607 - 1   # another Mersenne prime (183 digits)

start = time.time()
n = p * q
elapsed = time.time() - start

print(f"n has {len(str(n))} digits")
print(f"Multiplication took {elapsed:.6f} seconds")
# n has 340 digits
# Multiplication took 0.000001 seconds

That multiplication finishes in microseconds. Factoring a 340-digit number back into its two prime factors? With current algorithms and hardware, it would take longer than the age of the universe.

What Are Prime Numbers, Really?

A prime number $p$ is an integer greater than 1 whose only positive divisors are 1 and itself. The first few:

2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, \ldots

The Fundamental Theorem of Arithmetic tells us that every integer $n > 1$ can be written as a unique product of primes (up to ordering). For example:

2024 = 2^3 \times 11 \times 23

This uniqueness is what makes factoring a well-defined problem -- there is exactly one answer to find.

How Many Primes Are There?

A natural question: if RSA needs large primes, are there enough of them? The answer is a resounding yes.

The Prime Number Theorem states that the number of primes less than or equal to $x$ , denoted $\pi(x)$ , is approximately:

\pi(x) \sim \frac{x}{\ln x}

This means roughly 1 in every $\ln x$ numbers near $x$ is prime. For a 1024-bit number (about 308 digits), $\ln(2^{1024}) \approx 710$ , so roughly 1 in every 710 numbers of that size is prime.

import math

bits = 1024
x = 2 ** bits
density = 1 / (bits * math.log(2))
print(f"Approximately 1 in {int(1/density)} numbers near 2^{bits} is prime")
# Approximately 1 in 710 numbers near 2^1024 is prime

That is an astronomically large supply. There are more 1024-bit primes than atoms in the observable universe.

Testing for Primality: The Miller-Rabin Test

RSA key generation needs to find large primes quickly. Trial division -- dividing by every number up to $\sqrt{n}$ -- is hopelessly slow for 300-digit numbers. Instead, we use probabilistic primality tests.

The most widely used is the Miller-Rabin primality test. Here is the core idea.

The Math

Fermat's Little Theorem says that if $p$ is prime and $a$ is not divisible by $p$ , then:

a^{p-1} \equiv 1 \pmod{p}

Miller-Rabin strengthens this. Write $n - 1 = 2^s \cdot d$ where $d$ is odd. Then for a prime $n$ , any base $a$ must satisfy one of:

a^d \equiv 1 \pmod{n}

a^{2^r \cdot d} \equiv -1 \pmod{n} \quad \text{for some } 0 \le r < s

If neither condition holds, $n$ is definitely composite. If both fail to disprove primality, $n$ is probably prime. Each round of testing with a random base has at most a $\frac{1}{4}$ chance of being fooled, so $k$ rounds give a false positive probability of at most $4^{-k}$ .

Python Implementation

import random

def miller_rabin(n, k=20):
    """
    Miller-Rabin primality test.
    Returns True if n is probably prime, False if definitely composite.
    k rounds -> false positive probability at most 4^(-k).
    """
    if n < 2:
        return False
    if n == 2 or n == 3:
        return True
    if n % 2 == 0:
        return False

    # Write n - 1 as 2^s * d with d odd
    s, d = 0, n - 1
    while d % 2 == 0:
        s += 1
        d //= 2

    # Run k rounds of testing
    for _ in range(k):
        a = random.randrange(2, n - 1)
        x = pow(a, d, n)  # a^d mod n (Python's built-in modular exponentiation)

        if x == 1 or x == n - 1:
            continue

        for _ in range(s - 1):
            x = pow(x, 2, n)
            if x == n - 1:
                break
        else:
            return False  # definitely composite

    return True  # probably prime

# Test it
print(miller_rabin(2**61 - 1))    # True  (Mersenne prime)
print(miller_rabin(2**67 - 1))    # False (not prime: 147573952589676412927 = 193707721 * 761838257287)
print(miller_rabin(561))           # False (Carmichael number, fools Fermat test but not Miller-Rabin)

With 20 rounds, the probability of a false positive is $4^{-20} \approx 10^{-12}$ -- far less likely than a hardware error flipping a bit in your computation.

Generating Large Primes

With Miller-Rabin in hand, generating a prime of a desired bit length is straightforward: pick random odd numbers and test until one passes.

import secrets

def generate_prime(bits, k=20):
    """Generate a random prime with the specified number of bits."""
    while True:
        # Generate a random odd number with the MSB set (ensures correct bit length)
        candidate = secrets.randbits(bits)
        candidate |= (1 << (bits - 1)) | 1  # set MSB and LSB

        if miller_rabin(candidate, k):
            return candidate

# Generate a 1024-bit prime
p = generate_prime(1024)
print(f"Found a {p.bit_length()}-bit prime")
print(f"First 20 digits: {str(p)[:20]}...")
print(f"Last 20 digits:  ...{str(p)[-20:]}")

Because roughly 1 in 710 odd 1024-bit numbers is prime, we expect to test about 355 candidates on average. Each Miller-Rabin test is fast (modular exponentiation is $O(\log^2 n \cdot \log \log n)$ with efficient algorithms), so generating a 1024-bit prime typically takes well under a second.

In practice, implementations apply small optimizations like checking divisibility by small primes (3, 5, 7, ...) before running Miller-Rabin, which filters out most composites cheaply.

SMALL_PRIMES = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]

def generate_prime_optimized(bits, k=20):
    """Prime generation with small-prime sieving."""
    while True:
        candidate = secrets.randbits(bits)
        candidate |= (1 << (bits - 1)) | 1

        # Quick check: skip if divisible by any small prime
        if any(candidate % sp == 0 for sp in SMALL_PRIMES):
            if candidate not in SMALL_PRIMES:
                continue

        if miller_rabin(candidate, k):
            return candidate

How Hard Is Factoring, Actually?

So multiplication is easy and factoring is hard -- but how hard exactly?

The Best Known Algorithm: GNFS

The fastest known algorithm for factoring large numbers is the General Number Field Sieve (GNFS). Its heuristic running time for factoring an integer $n$ is:

L_n!\left[\frac{1}{3},\; \left(\frac{64}{9}\right)^{1/3}\right] = \exp!\left(\left(\frac{64}{9}\right)^{1/3} (\ln n)^{1/3} (\ln \ln n)^{2/3}\right)

This is sub-exponential but super-polynomial -- it grows faster than any polynomial in $\log n$ but slower than a true exponential. In practice, this means that each additional digit in the number you are trying to factor makes the problem harder, but not uniformly so.

Factoring Records

The largest RSA-type number factored so far is RSA-250 (250 decimal digits, 829 bits), achieved in 2020. It required approximately 2700 CPU core-years of computation.

RSA Key Lengths and Security

Here is a rough picture of how factoring difficulty scales with key size:

RSA Key Size	Status	Estimated GNFS Effort
512 bits (155 digits)	Broken (1999)	Hours on modern hardware
768 bits (232 digits)	Broken (2009)	~2000 CPU core-years
1024 bits (309 digits)	Threatened	~ $10^6$ core-years
2048 bits (617 digits)	Secure	~ $10^{20}$ core-years
4096 bits (1234 digits)	Very secure	~ $10^{40}$ core-years

This is why 2048-bit RSA keys are the current minimum recommendation, and why security-conscious applications use 4096-bit keys.

A Note on Quantum Computing

Shor's algorithm can factor integers in polynomial time on a quantum computer, which would break RSA entirely. This is why the cryptographic community is actively developing post-quantum algorithms. But as of today, no quantum computer has factored anything beyond trivially small numbers, so RSA remains safe for now.

Putting It All Together

Let's tie everything from this article into a mini RSA key parameter generator:

import secrets
import random
import time

def miller_rabin(n, k=20):
    if n < 2:
        return False
    if n == 2 or n == 3:
        return True
    if n % 2 == 0:
        return False
    s, d = 0, n - 1
    while d % 2 == 0:
        s += 1
        d //= 2
    for _ in range(k):
        a = random.randrange(2, n - 1)
        x = pow(a, d, n)
        if x == 1 or x == n - 1:
            continue
        for _ in range(s - 1):
            x = pow(x, 2, n)
            if x == n - 1:
                break
        else:
            return False
    return True

def generate_prime(bits):
    while True:
        candidate = secrets.randbits(bits)
        candidate |= (1 << (bits - 1)) | 1
        if miller_rabin(candidate):
            return candidate

# Generate RSA parameters
KEY_BITS = 2048
start = time.time()
p = generate_prime(KEY_BITS // 2)
q = generate_prime(KEY_BITS // 2)
n = p * q
elapsed = time.time() - start

print(f"Generated two {KEY_BITS // 2}-bit primes in {elapsed:.2f}s")
print(f"Product n has {n.bit_length()} bits ({len(str(n))} digits)")
print(f"Good luck factoring that!")

Running this produces a 2048-bit modulus in about a second. Factoring it back? That would take roughly $10^{20}$ CPU core-years. The asymmetry is staggering, and it is exactly what keeps your data safe.

Key Takeaways

The core asymmetry: multiplying primes is $O(n^2)$ or better; factoring their product is sub-exponential via GNFS.
Primes are abundant: the Prime Number Theorem guarantees a vast supply of large primes.
Miller-Rabin lets us find large primes efficiently with negligible error probability.
Factoring is hard but not proven hard: no one has proved $P \neq NP$ , so the hardness of factoring remains an assumption -- but one that has held up for decades.

Note: A version of this article with formal theorem/proof environments is available on Folio.

Next up in the series: **The Math Behind RSA #2: Modular Arithmetic and Euler's Theorem* -- where we'll build the algebraic machinery that makes RSA encryption and decryption actually work. We'll cover modular inverses, Euler's totient function, and the beautiful identity that ties it all together.*

DEV Community