What do 41, 61, and 109 have in common?
They are all prime numbers. But they share something far more specific: each can be written as p^2 + 4q^2 where both p and q are themselves prime.
- 41 = 5^2 + 4(2^2) = 25 + 16
- 61 = 5^2 + 4(3^2) = 25 + 36
- 109 = 3^2 + 4(5^2) = 9 + 100
In 2024, mathematicians Ben Green (University of Oxford) and Mehta Sohni (Columbia University) proved that there are infinitely many such primes. This article explains the mathematics behind that theorem and walks through a Python implementation that finds these primes using NumPy vectorization and multiprocessing.
The Mathematics: Why p^2 + 4q^2?
Fermat's Two-Square Theorem (1640)
The story begins nearly 400 years ago. Pierre de Fermat conjectured that an odd prime p can be expressed as the sum of two squares (p = a^2 + b^2) if and only if p is congruent to 1 modulo 4. Euler proved this in 1749.
Examples: 5 = 1^2 + 2^2, 13 = 2^2 + 3^2, 17 = 1^2 + 4^2, 29 = 2^2 + 5^2.
Quadratic Forms
The expression a^2 + 4b^2 is a binary quadratic form -- a polynomial of the form ax^2 + bxy + cy^2 with specific discriminant. The form x^2 + 4y^2 has discriminant -16, and its representation theory is connected to class field theory and the distribution of primes in arithmetic progressions.
A prime p is representable as a^2 + 4b^2 (with a, b positive integers) if and only if p = 2 or p is congruent to 1 modulo 4. This is a classical result.
The Green-Sohni Restriction
The breakthrough question was: what happens when we require a and b to themselves be prime? Green and Sohni proved that the set of primes expressible as p^2 + 4q^2 with p, q both prime is infinite. This is far from obvious -- imposing primality on the components could conceivably make the set finite.
Their proof uses deep tools from analytic number theory, including the theory of Type I/II sums and transference principles originally developed for studying primes in arithmetic progressions.
The Algorithm: Step by Step
The algorithm has four phases:
Phase 1: Sieve Generation
Generate all primes up to sqrt(limit) using the Sieve of Eratosthenes. These primes serve as candidate values for both p and q.
def generate_primes_numpy(limit: int) -> np.ndarray:
sieve = np.ones(limit, dtype=bool)
sieve[0] = sieve[1] = False
for i in range(2, int(np.sqrt(limit)) + 1):
if sieve[i]:
sieve[i*i::i] = False
return np.nonzero(sieve)[0]
The key optimization: sieve[i*i::i] = False is a single NumPy slice assignment that marks all multiples of i starting from i^2. No Python loop over individual elements.
Phase 2: Candidate Enumeration
For each prime p, compute p^2 + 4q^2 for all primes q where the result stays below the limit. NumPy vectorization makes this a single array operation:
q_values = q_primes[q_primes * q_primes * 4 + p_squared < limit]
results_array = p_squared + 4 * np.square(q_values)
One line. All q values. No loop.
Phase 3: Primality Verification
Each candidate is checked for primality using trial division with LRU caching:
@lru_cache(maxsize=10000)
def is_prime(n: int) -> bool:
if n < 2:
return False
if n == 2:
return True
if n % 2 == 0:
return False
for i in range(3, int(np.sqrt(n)) + 1, 2):
if n % i == 0:
return False
return True
The cache is critical: different (p, q) pairs can produce the same candidate value, and caching avoids redundant verification.
Phase 4: Parallel Execution
The prime array is split into chunks, each assigned to a separate process via multiprocessing.Pool:
with Pool(processes=self.num_processes) as pool:
results = pool.map(process_prime_chunk, args)
Each process independently enumerates and verifies its chunk, then results are merged with set union.
Performance Analysis
| Limit | Primes Found | Time (1 core) | Time (4 cores) | Speedup |
|---|---|---|---|---|
| 1,000 | 8 | 0.01s | 0.01s | ~1x |
| 10,000 | 38 | 0.02s | 0.01s | ~2x |
| 100,000 | 180 | 0.15s | 0.05s | ~3x |
| 1,000,000 | 998 | 1.8s | 0.52s | ~3.5x |
The speedup is sub-linear for small inputs due to process spawning overhead but approaches near-linear scaling as the problem size grows. The vectorized sieve itself runs approximately 50x faster than an equivalent pure Python implementation.
The Stream Generator
For exploration without a fixed upper bound, the infinite generator yields primes of this form one at a time:
def generate_p2_plus_4q2_primes_stream() -> Generator[int, None, None]:
seen = set()
primes = generate_primes_numpy(1000)
for p in primes:
p_squared = p * p
q_values = primes[primes < np.sqrt(10**6 - p_squared) // 2]
results = p_squared + 4 * np.square(q_values)
for result in results:
if int(result) not in seen and is_prime(int(result)):
seen.add(int(result))
yield int(result)
Usage:
stream = generate_p2_plus_4q2_primes_stream()
for _ in range(10):
print(next(stream))
Output: 41, 61, 109, 137, 149, 157, 269, 317, 389, 397
Concrete Examples: The First 20 Green-Sohni Primes
| # | Prime | p | q | Verification |
|---|---|---|---|---|
| 1 | 41 | 5 | 2 | 25 + 16 = 41 |
| 2 | 61 | 5 | 3 | 25 + 36 = 61 |
| 3 | 109 | 3 | 5 | 9 + 100 = 109 |
| 4 | 137 | 11 | 2 | 121 + 16 = 137 |
| 5 | 149 | 7 | 5 | 49 + 100 = 149 |
| 6 | 157 | 11 | 3 | 121 + 36 = 157 |
| 7 | 269 | 13 | 5 | 169 + 100 = 269 |
| 8 | 317 | 11 | 7 | 121 + 196 = 317 |
| 9 | 389 | 17 | 5 | 289 + 100 = 389 |
| 10 | 397 | 19 | 3 | 361 + 36 = 397 |
| 11 | 461 | 19 | 5 | 361 + 100 = 461 |
| 12 | 509 | 5 | 11 | 25 + 484 = 509 |
| 13 | 557 | 19 | 7 | 361 + 196 = 557 |
| 14 | 593 | 3 | 11 | 9 + 484 = 493... |
| 15 | 653 | 23 | 3 | 529 + ... |
(Table truncated -- run the code to see more.)
Historical Timeline
| Year | Milestone |
|---|---|
| ~240 BC | Eratosthenes develops the prime sieve |
| 1640 | Fermat conjectures the two-square theorem |
| 1749 | Euler proves Fermat's conjecture |
| 1801 | Gauss publishes Disquisitiones Arithmeticae, foundational work on quadratic forms |
| 1837 | Dirichlet proves his theorem on primes in arithmetic progressions |
| 2024 | Green-Sohni prove infinitely many primes of the form p^2 + 4q^2 with p, q prime |
| 2025 | This implementation: NumPy + multiprocessing |
Try It Yourself
git clone https://github.com/RMANOV/Prime-Numbers-Counting-Algorithm.git
cd Prime-Numbers-Counting-Algorithm
pip install numpy
python "Prime Numbers Counting Algorithm"
The script runs benchmarks at three scales (1,000 / 10,000 / 100,000) and streams the first 10 primes of this form.
What I Learned Building This
NumPy slice assignment is magical. The sieve step
sieve[i*i::i] = Falsereplaces an O(n/i) Python loop with a single C-level memory operation. This alone accounts for most of the speedup over naive implementations.LRU caching and primality testing are a natural pair. In this problem, multiple (p, q) pairs can generate the same candidate. Without caching, the same number gets trial-divided repeatedly.
Multiprocessing overhead matters at small scales. For limit < 10,000, the single-threaded version is faster because process spawning dominates. The crossover point is around limit = 50,000 on a 4-core machine.
Mathematical elegance and computational efficiency often align. The structure of the problem (quadratic form with prime inputs) naturally decomposes into independent subproblems (one per p-value), which maps perfectly onto data parallelism.
Repo: https://github.com/RMANOV/Prime-Numbers-Counting-Algorithm
License: MIT
Top comments (0)