Ruslan Manov

Posted on Feb 1

Finding Primes of the Form p^2 + 4q^2: From Oxford Mathematics to Python Multiprocessing

#python #math #algorithms #opensource

What do 41, 61, and 109 have in common?

They are all prime numbers. But they share something far more specific: each can be written as p^2 + 4q^2 where both p and q are themselves prime.

41 = 5^2 + 4(2^2) = 25 + 16
61 = 5^2 + 4(3^2) = 25 + 36
109 = 3^2 + 4(5^2) = 9 + 100

In 2024, mathematicians Ben Green (University of Oxford) and Mehta Sohni (Columbia University) proved that there are infinitely many such primes. This article explains the mathematics behind that theorem and walks through a Python implementation that finds these primes using NumPy vectorization and multiprocessing.

The Mathematics: Why p^2 + 4q^2?

Fermat's Two-Square Theorem (1640)

The story begins nearly 400 years ago. Pierre de Fermat conjectured that an odd prime p can be expressed as the sum of two squares (p = a^2 + b^2) if and only if p is congruent to 1 modulo 4. Euler proved this in 1749.

Examples: 5 = 1^2 + 2^2, 13 = 2^2 + 3^2, 17 = 1^2 + 4^2, 29 = 2^2 + 5^2.

Quadratic Forms

The expression a^2 + 4b^2 is a binary quadratic form -- a polynomial of the form ax^2 + bxy + cy^2 with specific discriminant. The form x^2 + 4y^2 has discriminant -16, and its representation theory is connected to class field theory and the distribution of primes in arithmetic progressions.

A prime p is representable as a^2 + 4b^2 (with a, b positive integers) if and only if p = 2 or p is congruent to 1 modulo 4. This is a classical result.

The Green-Sohni Restriction

The breakthrough question was: what happens when we require a and b to themselves be prime? Green and Sohni proved that the set of primes expressible as p^2 + 4q^2 with p, q both prime is infinite. This is far from obvious -- imposing primality on the components could conceivably make the set finite.

Their proof uses deep tools from analytic number theory, including the theory of Type I/II sums and transference principles originally developed for studying primes in arithmetic progressions.

The Algorithm: Step by Step

The algorithm has four phases:

Phase 1: Sieve Generation

Generate all primes up to sqrt(limit) using the Sieve of Eratosthenes. These primes serve as candidate values for both p and q.

def generate_primes_numpy(limit: int) -> np.ndarray:
    sieve = np.ones(limit, dtype=bool)
    sieve[0] = sieve[1] = False
    for i in range(2, int(np.sqrt(limit)) + 1):
        if sieve[i]:
            sieve[i*i::i] = False
    return np.nonzero(sieve)[0]

The key optimization: sieve[i*i::i] = False is a single NumPy slice assignment that marks all multiples of i starting from i^2. No Python loop over individual elements.

Phase 2: Candidate Enumeration

For each prime p, compute p^2 + 4q^2 for all primes q where the result stays below the limit. NumPy vectorization makes this a single array operation:

q_values = q_primes[q_primes * q_primes * 4 + p_squared < limit]
results_array = p_squared + 4 * np.square(q_values)

One line. All q values. No loop.

Phase 3: Primality Verification

Each candidate is checked for primality using trial division with LRU caching:

@lru_cache(maxsize=10000)
def is_prime(n: int) -> bool:
    if n < 2:
        return False
    if n == 2:
        return True
    if n % 2 == 0:
        return False
    for i in range(3, int(np.sqrt(n)) + 1, 2):
        if n % i == 0:
            return False
    return True

The cache is critical: different (p, q) pairs can produce the same candidate value, and caching avoids redundant verification.

Phase 4: Parallel Execution

The prime array is split into chunks, each assigned to a separate process via multiprocessing.Pool:

with Pool(processes=self.num_processes) as pool:
    results = pool.map(process_prime_chunk, args)

Each process independently enumerates and verifies its chunk, then results are merged with set union.

Performance Analysis

Limit	Primes Found	Time (1 core)	Time (4 cores)	Speedup
1,000	8	0.01s	0.01s	~1x
10,000	38	0.02s	0.01s	~2x
100,000	180	0.15s	0.05s	~3x
1,000,000	998	1.8s	0.52s	~3.5x

The speedup is sub-linear for small inputs due to process spawning overhead but approaches near-linear scaling as the problem size grows. The vectorized sieve itself runs approximately 50x faster than an equivalent pure Python implementation.

The Stream Generator

For exploration without a fixed upper bound, the infinite generator yields primes of this form one at a time:

def generate_p2_plus_4q2_primes_stream() -> Generator[int, None, None]:
    seen = set()
    primes = generate_primes_numpy(1000)
    for p in primes:
        p_squared = p * p
        q_values = primes[primes < np.sqrt(10**6 - p_squared) // 2]
        results = p_squared + 4 * np.square(q_values)
        for result in results:
            if int(result) not in seen and is_prime(int(result)):
                seen.add(int(result))
                yield int(result)

Usage:

stream = generate_p2_plus_4q2_primes_stream()
for _ in range(10):
    print(next(stream))

Output: 41, 61, 109, 137, 149, 157, 269, 317, 389, 397

Concrete Examples: The First 20 Green-Sohni Primes

#	Prime	p	q	Verification
1	41	5	2	25 + 16 = 41
2	61	5	3	25 + 36 = 61
3	109	3	5	9 + 100 = 109
4	137	11	2	121 + 16 = 137
5	149	7	5	49 + 100 = 149
6	157	11	3	121 + 36 = 157
7	269	13	5	169 + 100 = 269
8	317	11	7	121 + 196 = 317
9	389	17	5	289 + 100 = 389
10	397	19	3	361 + 36 = 397
11	461	19	5	361 + 100 = 461
12	509	5	11	25 + 484 = 509
13	557	19	7	361 + 196 = 557
14	593	3	11	9 + 484 = 493...
15	653	23	3	529 + ...

(Table truncated -- run the code to see more.)

Historical Timeline

Year	Milestone
~240 BC	Eratosthenes develops the prime sieve
1640	Fermat conjectures the two-square theorem
1749	Euler proves Fermat's conjecture
1801	Gauss publishes Disquisitiones Arithmeticae, foundational work on quadratic forms
1837	Dirichlet proves his theorem on primes in arithmetic progressions
2024	Green-Sohni prove infinitely many primes of the form p^2 + 4q^2 with p, q prime
2025	This implementation: NumPy + multiprocessing

Try It Yourself

git clone https://github.com/RMANOV/Prime-Numbers-Counting-Algorithm.git
cd Prime-Numbers-Counting-Algorithm
pip install numpy
python "Prime Numbers Counting Algorithm"

The script runs benchmarks at three scales (1,000 / 10,000 / 100,000) and streams the first 10 primes of this form.

What I Learned Building This

NumPy slice assignment is magical. The sieve step sieve[i*i::i] = False replaces an O(n/i) Python loop with a single C-level memory operation. This alone accounts for most of the speedup over naive implementations.
LRU caching and primality testing are a natural pair. In this problem, multiple (p, q) pairs can generate the same candidate. Without caching, the same number gets trial-divided repeatedly.
Multiprocessing overhead matters at small scales. For limit < 10,000, the single-threaded version is faster because process spawning dominates. The crossover point is around limit = 50,000 on a 4-core machine.
Mathematical elegance and computational efficiency often align. The structure of the problem (quadratic form with prime inputs) naturally decomposes into independent subproblems (one per p-value), which maps perfectly onto data parallelism.

Repo: https://github.com/RMANOV/Prime-Numbers-Counting-Algorithm

License: MIT

DEV Community