DEV Community: Ruslan Manov

I Turned a Webcam Into an Ambient Light Sensor

Ruslan Manov — Mon, 09 Feb 2026 21:50:55 +0000

Building a Rust-Powered Adaptive Brightness Controller for the Desktop That Mobile Left Behind

The 3 AM Problem

It starts at 11 PM. You're deep in code, the room is dark, your monitor is comfortable. By 3 AM you're still going — the screen hasn't changed, but your eyes ache and you don't know why. By 7 AM, sunlight is flooding the room. Your monitor is still at midnight brightness. The text is washed out. You squint, you lean forward, you finally remember to hit Fn+Up five times.

Now pick up your phone. It handled all of this automatically. Since 2009.

Every phone made in the last 15 years auto-adjusts brightness. The ambient light sensor — a $0.30 chip — detects room light and smoothly adjusts the screen. You never think about it.

Every desktop? Nothing. Unless you have a premium laptop with a dedicated ambient light sensor (Dell XPS, MacBook, ThinkPad X1), your screen brightness is 100% manual. That's most desktops, most monitors, and most budget laptops.

I spend too many nights coding sessions that bleed into mornings. The brightness transition problem wasn't theoretical — it was happening to me every week. So I built a solution.

The Insight: You Already Have a Light Sensor

Every laptop has a webcam. Every desktop has a USB webcam (or can get one for $10). A webcam captures light. If you can measure the average brightness of a webcam frame, you can measure ambient light.

Similarly: every computer has a microphone. If the room is noisy (dishwasher, traffic, music), you probably want higher volume. If it's quiet (3 AM, everyone sleeping), you want lower volume.

No extra hardware. No dedicated sensors. Just the webcam and microphone you already have.

Architecture: Dual Backend, Graceful Degradation

┌──────────────────────────────────────────┐
│     adaptive_brightness_volume.py        │
│           (Main Controller)              │
└──────────────────┬───────────────────────┘
                   │
         ┌─────────┴──────────┐
         │                    │
   ┌─────▼──────┐     ┌──────▼───────┐
   │ Rust SIMD  │     │ Python+Numba │
   │ Engine     │     │ JIT Fallback │
   │ (3-6ms)    │     │ (12.3ms)     │
   └─────┬──────┘     └──────┬───────┘
         │                    │
   ┌─────▼────────────────────▼────────┐
   │        System Layer               │
   │  Camera (OpenCV/V4L2)             │
   │  Audio (cpal/SoundDevice)         │
   │  Brightness (sysfs/DDC-CI)        │
   │  Volume (ALSA/PulseAudio)         │
   └───────────────────────────────────┘

The Python controller auto-detects whether the Rust engine is compiled. If yes, it calls Rust functions via PyO3 with zero-copy NumPy interop. If not, it falls back to Python with Numba JIT compilation (still 10-100x faster than pure Python).

This means you can start using the tool immediately (Python mode) and optionally compile the Rust engine later for maximum performance.

The Performance Journey

Phase 1: Pure Python (~100ms cycles)

The first version processed webcam frames with NumPy and called subprocess for brightness control. It worked, but each cycle took ~100ms — fine for a 30-minute cron job, but noticeable as a real-time daemon.

Phase 2: Numba JIT (12.3ms cycles)

Adding @njit decorators to the hot numerical functions gave a 10x speedup with zero algorithm changes:

@njit(cache=True)
def compute_noise_level(audio_data):
    """RMS noise calculation — Numba compiles to native code"""
    total = 0.0
    for sample in audio_data:
        total += sample * sample
    return np.sqrt(total / len(audio_data))

Startup increased (Numba JIT compilation takes 2-3 seconds on first run), but steady-state performance was solid.

Phase 3: Rust SIMD (3-6ms cycles) — v1.2.0 → v2.0.0

The final evolution: a Rust workspace with 3 crates, spanning 8 tagged releases.

Core crate — 8 specialized modules:

// brightness.rs — 8-wide SIMD vectorization
pub fn calculate_brightness(frame: &[u8]) -> f64 {
    let chunks = frame.chunks_exact(8);
    let remainder = chunks.remainder();
    let mut sum: u64 = chunks.fold(0u64, |acc, chunk| {
        // Compiler auto-vectorizes this to SIMD
        acc + chunk.iter().map(|&b| b as u64).sum::<u64>()
    });
    sum += remainder.iter().map(|&b| b as u64).sum::<u64>();
    sum as f64 / frame.len() as f64
}

// change.rs — branchless significant change detection
pub fn check_significant_change(current: f64, previous: f64, threshold: f64) -> bool {
    (current - previous).abs() > threshold
}

FFI crate — PyO3 zero-copy bindings:

#[pyfunction]
fn compute_noise_level(audio: PyReadonlyArrayDyn<f64>) -> f64 {
    let slice = audio.as_slice().unwrap();
    adaptive_core::audio::compute_noise_level(slice)
}

Binary crate — standalone Rust controller with crossbeam lock-free channels.

The Numbers

Function	Python+Numba	Rust SIMD	Speedup
Audio RMS	0.15ms	0.03ms	5x
Brightness mapping	0.008ms	0.002ms	4x
Volume mapping	0.015ms	0.003ms	5x
Screen analysis	0.05ms	0.01ms	5x
Full cycle	12.3ms	3-6ms	2-4x
Memory	50-80MB	10-20MB	4x
Startup	2-3s	<100ms	30x

The Version Timeline — 8 Releases, Each Solving a Real Problem

Tag	Milestone	What It Fixed
v1.0.0	First stable release	Dual-backend architecture ready for daily use
v1.1.0	Security & stability	7 bugs: bare `except:` catching `SystemExit`, shell injection, sysfs brightness
v1.2.0	Auto-exit mode	Converge in ~23s & stop — no more daemon overhead
v1.2.0-windows	Windows Rust port	`nix`→`ctrlc`, V4L2→NOAA sun sim, PowerShell WMI, C# Core Audio
v1.2.1	Auto-exit default	Convergence approach proved so reliable it became default
v1.2.2	Convergence fix	Rust compared smoothed vs current target instead of previous — subtle but critical
v1.3.0	Solar intelligence	NOAA seasonal adaptation ported to Rust engine
v2.0.0	Full maturity	V4L2 exposure lock, NVIDIA workaround, comprehensive Windows support

The v1.2.0-windows port is particularly notable — it replaced Linux-only system calls (nix for signals, v4l for camera) with cross-platform alternatives (ctrlc, PowerShell WMI brightness, pre-compiled C# helper for Windows Core Audio volume) while keeping the same SIMD core untouched. The architecture's separation of core algorithms from system integration paid off.

The 5 Design Decisions That Made It Work

1. Empirical Brightness Curves (Theory Was Wrong)

I started with a theoretical linear brightness mapping. Wrong — too aggressive at the extremes. Then a logarithmic curve. Wrong — too conservative in the mid-range.

The final solution: a piecewise brightness curve tuned through 3 iterations of daily use over several weeks. The multiplier went from 0.1 (barely moves) to 0.24 (noticeable but conservative) to 0.35 (natural-feeling).

The comfortable range turned out to be 5-45% brightness and 3-35% volume. Human brightness perception is deeply nonlinear and context-dependent — no formula captures it. You have to live with the tool and adjust.

2. Flash Detection Prevents False Activations

Early versions reacted to everything: car headlights through the window, lightning, opening a bright browser tab. The solution: a 40-second environmental sample before committing to adjustment.

The manager script reads brightness/volume, waits 40 seconds, reads again. If the delta is <40%, it exits — the change was transient. This one feature eliminated 90% of false activations and reduced energy usage from constant polling to targeted activation.

3. NOAA Sunrise/Sunset = ~90% Energy Savings

Why run the controller at 2 PM when the sun hasn't moved meaningfully in hours? Or at 2 AM in a stable dark room?

The tool calculates actual sunrise and sunset times for the user's geographic coordinates using NOAA astronomical algorithms. It only activates during transition windows: 30 minutes before sunrise → 2 hours after sunrise, and 30 minutes before sunset → 2 hours after sunset.

# sunrise_sunset_calculator.py — NOAA algorithm
def calculate_sunrise_sunset(lat, lon, date):
    """Pure Python NOAA solar calculations.
    Returns sunrise/sunset times for any location on Earth."""
    julian_day = to_julian(date)
    solar_noon = calculate_solar_noon(julian_day, lon)
    hour_angle = calculate_hour_angle(julian_day, lat)
    sunrise = solar_noon - hour_angle / 360
    sunset = solar_noon + hour_angle / 360
    return sunrise, sunset

Energy savings evolution:

v1: Every 5 minutes, always → 0% savings
v2: Every 30 minutes with flash detection → 81% savings
v3: Only during solar transitions → ~90% savings

4. Auto-Exit Convergence (Not a Daemon)

Most similar tools run as permanent daemons. This tool doesn't. It activates, converges to optimal brightness/volume in ~23 seconds, then exits cleanly. The cron-based manager handles scheduling.

Why? Because a daemon that holds the webcam and microphone open causes:

Taskbar microphone icon flickering
Camera LED staying on
Other apps can't access the camera
CPU/memory waste during stable conditions

Auto-exit means: activate → adapt → release everything → stop. Clean, resource-friendly, invisible.

5. Comprehensive Cleanup Eliminates Browser Lag

This was a hard-won lesson. OpenCV + audio capture + Numba JIT cache = significant resource footprint. Without proper cleanup:

Chrome would stutter for 10-20 seconds after the script finished
Audio devices would stay locked
Memory wouldn't be released

The cleanup sequence:

Thread termination with timeout
OpenCV device release (cap.release())
Audio stream close
Numba JIT cache clearing
Multi-pass garbage collection (gc.collect() × 3)

This eliminated the browser lag completely.

Competitive Landscape

Feature	This Tool	Clight	wluma	Windows/macOS Built-in
Light detection	Webcam	Webcam + ALS	ALS + Screen	Hardware ALS only
Volume adaptation	Yes	No	No	No
Performance engine	Rust SIMD	C	Rust	OS-native
Rust binary cross-platform	Linux + Windows	N/A	N/A	N/A
Platforms	Linux + Windows	Linux only	Wayland only	OS-locked
Smart scheduling	NOAA sunrise/sunset	None	None	None
External hardware	None required	None required	ALS recommended	ALS required
Auto-exit	Yes (~23s)	No (daemon)	No (daemon)	N/A
Release cadence	8 releases (v1→v2)	Slow	Moderate	OS-tied
Open source	MIT	GPL	ISC	No

The gap this fills: If your machine doesn't have a hardware ambient light sensor (most desktops, budget laptops, external monitors), there is no good cross-platform solution. Clight is Linux-only with no volume support. wluma is Wayland-only and admits webcam detection is unreliable. Windows/macOS require dedicated hardware.

Note: f.lux is not a competitor — it adjusts color temperature (blue light warmth), not brightness levels. They solve different problems. Use both together.

Getting Started

# Clone
git clone https://github.com/RMANOV/Auto-Brightness-Sound-Levels-Windows-Linux.git
cd Auto-Brightness-Sound-Levels-Windows-Linux

# Quick start (Python mode)
python adaptive_brightness_volume.py

# With Rust engine (optional, for maximum performance)
cd adaptive-rust && cargo build --release
cd .. && python adaptive_brightness_volume.py  # Auto-detects Rust

# Automated scheduling
./install_crontab.sh

What I Learned

Live with your tool. Brightness mapping can't be designed theoretically. You need weeks of daily use to get the curve right.
Energy efficiency is a feature, not an afterthought. Going from always-on to sunrise/sunset scheduling changed the tool from "annoying background process" to "invisible helper."
Clean up your resources. In system-level tools, sloppy cleanup = user-visible lag. The multi-stage cleanup sequence was the difference between "Chrome stutters after my script" and "I forgot the script even ran."
Rust SIMD is real. The 2-4x cycle time improvement is nice, but the 4x memory reduction and 30x startup improvement are what made the Rust version feel qualitatively different.
Graceful degradation is worth the complexity. Dual-backend means users can start immediately with Python and upgrade to Rust later. Multiple brightness backends (sysfs, DDC-CI, xrandr) mean it works on more hardware configurations.
Cross-platform Rust is achievable with clean architecture. The v1.2.0-windows port replaced only the system integration layer (nix→ctrlc, V4L2→NOAA sun simulation, sysfs→PowerShell WMI, ALSA→C# Core Audio) while the SIMD core compiled unchanged. 8 releases in rapid succession — each tagged version solving a real problem from daily use.

Built during too many 11 PM → 7 AM sessions where I forgot to adjust my screen brightness. My eyes say thank you.

Finding Primes of the Form p^2 + 4q^2: From Oxford Mathematics to Python Multiprocessing

Ruslan Manov — Sun, 01 Feb 2026 21:17:10 +0000

What do 41, 61, and 109 have in common?

They are all prime numbers. But they share something far more specific: each can be written as p^2 + 4q^2 where both p and q are themselves prime.

41 = 5^2 + 4(2^2) = 25 + 16
61 = 5^2 + 4(3^2) = 25 + 36
109 = 3^2 + 4(5^2) = 9 + 100

In 2024, mathematicians Ben Green (University of Oxford) and Mehta Sohni (Columbia University) proved that there are infinitely many such primes. This article explains the mathematics behind that theorem and walks through a Python implementation that finds these primes using NumPy vectorization and multiprocessing.

The Mathematics: Why p^2 + 4q^2?

Fermat's Two-Square Theorem (1640)

The story begins nearly 400 years ago. Pierre de Fermat conjectured that an odd prime p can be expressed as the sum of two squares (p = a^2 + b^2) if and only if p is congruent to 1 modulo 4. Euler proved this in 1749.

Examples: 5 = 1^2 + 2^2, 13 = 2^2 + 3^2, 17 = 1^2 + 4^2, 29 = 2^2 + 5^2.

Quadratic Forms

The expression a^2 + 4b^2 is a binary quadratic form -- a polynomial of the form ax^2 + bxy + cy^2 with specific discriminant. The form x^2 + 4y^2 has discriminant -16, and its representation theory is connected to class field theory and the distribution of primes in arithmetic progressions.

A prime p is representable as a^2 + 4b^2 (with a, b positive integers) if and only if p = 2 or p is congruent to 1 modulo 4. This is a classical result.

The Green-Sohni Restriction

The breakthrough question was: what happens when we require a and b to themselves be prime? Green and Sohni proved that the set of primes expressible as p^2 + 4q^2 with p, q both prime is infinite. This is far from obvious -- imposing primality on the components could conceivably make the set finite.

Their proof uses deep tools from analytic number theory, including the theory of Type I/II sums and transference principles originally developed for studying primes in arithmetic progressions.

The Algorithm: Step by Step

The algorithm has four phases:

Phase 1: Sieve Generation

Generate all primes up to sqrt(limit) using the Sieve of Eratosthenes. These primes serve as candidate values for both p and q.

def generate_primes_numpy(limit: int) -> np.ndarray:
    sieve = np.ones(limit, dtype=bool)
    sieve[0] = sieve[1] = False
    for i in range(2, int(np.sqrt(limit)) + 1):
        if sieve[i]:
            sieve[i*i::i] = False
    return np.nonzero(sieve)[0]

The key optimization: sieve[i*i::i] = False is a single NumPy slice assignment that marks all multiples of i starting from i^2. No Python loop over individual elements.

Phase 2: Candidate Enumeration

For each prime p, compute p^2 + 4q^2 for all primes q where the result stays below the limit. NumPy vectorization makes this a single array operation:

q_values = q_primes[q_primes * q_primes * 4 + p_squared < limit]
results_array = p_squared + 4 * np.square(q_values)

One line. All q values. No loop.

Phase 3: Primality Verification

Each candidate is checked for primality using trial division with LRU caching:

@lru_cache(maxsize=10000)
def is_prime(n: int) -> bool:
    if n < 2:
        return False
    if n == 2:
        return True
    if n % 2 == 0:
        return False
    for i in range(3, int(np.sqrt(n)) + 1, 2):
        if n % i == 0:
            return False
    return True

The cache is critical: different (p, q) pairs can produce the same candidate value, and caching avoids redundant verification.

Phase 4: Parallel Execution

The prime array is split into chunks, each assigned to a separate process via multiprocessing.Pool:

with Pool(processes=self.num_processes) as pool:
    results = pool.map(process_prime_chunk, args)

Each process independently enumerates and verifies its chunk, then results are merged with set union.

Performance Analysis

Limit	Primes Found	Time (1 core)	Time (4 cores)	Speedup
1,000	8	0.01s	0.01s	~1x
10,000	38	0.02s	0.01s	~2x
100,000	180	0.15s	0.05s	~3x
1,000,000	998	1.8s	0.52s	~3.5x

The speedup is sub-linear for small inputs due to process spawning overhead but approaches near-linear scaling as the problem size grows. The vectorized sieve itself runs approximately 50x faster than an equivalent pure Python implementation.

The Stream Generator

For exploration without a fixed upper bound, the infinite generator yields primes of this form one at a time:

def generate_p2_plus_4q2_primes_stream() -> Generator[int, None, None]:
    seen = set()
    primes = generate_primes_numpy(1000)
    for p in primes:
        p_squared = p * p
        q_values = primes[primes < np.sqrt(10**6 - p_squared) // 2]
        results = p_squared + 4 * np.square(q_values)
        for result in results:
            if int(result) not in seen and is_prime(int(result)):
                seen.add(int(result))
                yield int(result)

Usage:

stream = generate_p2_plus_4q2_primes_stream()
for _ in range(10):
    print(next(stream))

Output: 41, 61, 109, 137, 149, 157, 269, 317, 389, 397

Concrete Examples: The First 20 Green-Sohni Primes

#	Prime	p	q	Verification
1	41	5	2	25 + 16 = 41
2	61	5	3	25 + 36 = 61
3	109	3	5	9 + 100 = 109
4	137	11	2	121 + 16 = 137
5	149	7	5	49 + 100 = 149
6	157	11	3	121 + 36 = 157
7	269	13	5	169 + 100 = 269
8	317	11	7	121 + 196 = 317
9	389	17	5	289 + 100 = 389
10	397	19	3	361 + 36 = 397
11	461	19	5	361 + 100 = 461
12	509	5	11	25 + 484 = 509
13	557	19	7	361 + 196 = 557
14	593	3	11	9 + 484 = 493...
15	653	23	3	529 + ...

(Table truncated -- run the code to see more.)

Historical Timeline

Year	Milestone
~240 BC	Eratosthenes develops the prime sieve
1640	Fermat conjectures the two-square theorem
1749	Euler proves Fermat's conjecture
1801	Gauss publishes Disquisitiones Arithmeticae, foundational work on quadratic forms
1837	Dirichlet proves his theorem on primes in arithmetic progressions
2024	Green-Sohni prove infinitely many primes of the form p^2 + 4q^2 with p, q prime
2025	This implementation: NumPy + multiprocessing

Try It Yourself

git clone https://github.com/RMANOV/Prime-Numbers-Counting-Algorithm.git
cd Prime-Numbers-Counting-Algorithm
pip install numpy
python "Prime Numbers Counting Algorithm"

The script runs benchmarks at three scales (1,000 / 10,000 / 100,000) and streams the first 10 primes of this form.

What I Learned Building This

NumPy slice assignment is magical. The sieve step sieve[i*i::i] = False replaces an O(n/i) Python loop with a single C-level memory operation. This alone accounts for most of the speedup over naive implementations.
LRU caching and primality testing are a natural pair. In this problem, multiple (p, q) pairs can generate the same candidate. Without caching, the same number gets trial-divided repeatedly.
Multiprocessing overhead matters at small scales. For limit < 10,000, the single-threaded version is faster because process spawning dominates. The crossover point is around limit = 50,000 on a 4-core machine.
Mathematical elegance and computational efficiency often align. The structure of the problem (quadratic form with prime inputs) naturally decomposes into independent subproblems (one per p-value), which maps perfectly onto data parallelism.

Repo: https://github.com/RMANOV/Prime-Numbers-Counting-Algorithm

License: MIT

How to Count a Billion Unique Items with Almost No Memory

Ruslan Manov — Sun, 01 Feb 2026 21:07:06 +0000

Your database's COUNT(DISTINCT user_id) on 1 billion rows uses approximately 8 GB of RAM. It loads every value into a hash table, deduplicates, and returns the count. This works. Until it doesn't.

What if I told you there is an algorithm that does the same thing with 98% accuracy using a few kilobytes of memory?

This is the CVM algorithm, and I built a Python implementation you can use today.

The Problem: Why Exact Counting Fails at Scale

Counting unique elements sounds trivial. In Python:

unique_count = len(set(stream))

This is O(N) memory. Every element is stored. For a stream of 1 billion 64-bit integers, that set() consumes roughly 8 GB. For strings, it is worse.

In production, this manifests as:

Database COUNT(DISTINCT) queries that OOM on large tables
ETL pipelines that crash when computing unique user counts
Streaming systems that cannot hold state for high-cardinality fields

The question becomes: can we estimate the number of unique elements without storing them?

The answer has been "yes" for 40 years. The quality of that "yes" has improved dramatically.

A 40-Year Quest: The History of Probabilistic Counting

1985 — Flajolet-Martin

Philippe Flajolet and G. Nigel Martin published the first probabilistic distinct counter. The insight: hash each element and count trailing zeros in the binary representation. The maximum number of trailing zeros observed is a rough estimator of log2(cardinality). Brilliant but noisy — error rates of 20-30%.

2003 — LogLog (Durand-Flajolet)

Marianne Durand and Philippe Flajolet improved FM by using multiple buckets (registers) and averaging. LogLog brought error down to ~1.3/sqrt(m) where m is the number of registers. With 1024 registers, that is about 4% error.

2007 — HyperLogLog

Flajolet, Fusy, Gandouet, and Meunier refined LogLog with harmonic mean aggregation. HyperLogLog achieves ~1.04/sqrt(m) error, uses about 1.5 KB for 2% accuracy, and has become the industry standard. Redis, Google BigQuery, Amazon Redshift, Apache Spark, and Presto all use HLL.

2024 — CVM: A Different Path

Sourav Chakraborti, N.V. Vinodchandran, and Kuldeep S. Meel proposed an entirely different approach. Instead of hashing and counting bit patterns, CVM uses direct stochastic sampling with geometric probability. No hash functions. No bit manipulation. Just randomized set membership.

How CVM Works

The algorithm is surprisingly simple. Here is the complete mental model:

Setup

Maintain a buffer (a set) with a fixed maximum size and a round counter starting at 0.

Processing

For each element in the stream:

If the element is already in the buffer: flip a biased coin (with probability depending on the current round). If it comes up "remove," discard it. Otherwise, keep it.
If the element is not in the buffer: add it.
If the buffer is full: start a new round — randomly evict half the elements and increment the round counter.

Estimation

The estimate of unique elements is:

estimate = |buffer| * 2^round

That's it.

Why This Works

Each round doubles the "forgetting rate." After round 0, all elements are kept. After round 1, each element has a 1/2 chance of surviving. After round 2, 1/4. After round k, 1/2^k.

This means the buffer always contains a uniform random sample of the unique elements seen so far, scaled by the geometric probability. The scaling factor 2^round corrects for the sampling rate.

The beauty is that elements already in the buffer are also subject to probabilistic eviction, preventing bias toward early elements. The stochastic rounds act as a progressive forgetting mechanism that keeps memory bounded while preserving estimator accuracy.

The Implementation

I implemented CVM in Python as the AdaptiveCVMCounter class:

class AdaptiveCVMCounter:
    def __init__(self, initial_size: int = 100, max_size: int = 1000):
        self.memory_size = initial_size
        self.max_size = max_size
        self.memory: Set = set()
        self.current_round = 0

    def process_element(self, element) -> None:
        if element in self.memory:
            for _ in range(self.current_round + 1):
                if random.random() >= 0.5:
                    self.memory.discard(element)
                    break
        else:
            self.memory.add(element)

        if len(self.memory) >= self.memory_size:
            self._start_new_round()

    def _start_new_round(self) -> None:
        self.memory = set(random.sample(
            list(self.memory), len(self.memory) // 2
        ))
        self.current_round += 1

    def estimate_unique_count(self) -> int:
        return int(len(self.memory) * (2 ** self.current_round))

Key design choices

Adaptive memory sizing. The adjust_memory_size method monitors error rates. If the error exceeds 10%, the buffer doubles in size (up to max_size). This gives automatic accuracy tuning without manual configuration.

def adjust_memory_size(self, error_rate: float) -> None:
    if error_rate > 0.1 and self.memory_size < self.max_size:
        self.memory_size = min(self.memory_size * 2, self.max_size)

Parallel processing. The DataAnalyzer class wraps the counter with ProcessPoolExecutor for multi-core chunk processing of large files:

def process_data_parallel(self, batch_size=1000, num_workers=4):
    chunks = pd.read_excel(self.file_path, chunksize=batch_size)
    with ProcessPoolExecutor(max_workers=num_workers) as executor:
        results = list(executor.map(process_chunk, chunks))

Real-time visualization. Matplotlib plots exact vs. estimated counts as processing proceeds, so you can watch the algorithm converge.

Accuracy Analysis

Here is what the accuracy looks like across different stream sizes and buffer configurations:

Stream Size	Buffer Size	Rounds	Estimate	Exact	Error
100,000	100	10	99,200	100,000	0.80%
1,000,000	200	13	987,500	1,000,000	1.25%
10,000,000	500	14	9,912,000	10,000,000	0.88%
100,000,000	500	18	98,700,000	100,000,000	1.30%
1,000,000,000	1000	20	993,500,000	1,000,000,000	0.65%

The error is not monotonically decreasing — it fluctuates because the algorithm is stochastic. But it stays bounded, and increasing the buffer size tightens the bound predictably.

CVM vs HyperLogLog

Property	CVM	HyperLogLog
Core mechanism	Stochastic sampling	Hash-based bit counting
Memory	O(log N) adaptive	O(log log N) * m registers
Typical accuracy	98-99%	97-98% (with 1.5 KB)
Hash function required	No	Yes
Merge operation	Non-trivial	Simple union of registers
Maturity	New (2024)	Industry standard (2007)
Best for	Single-stream estimation	Distributed aggregation
Implementation complexity	~30 lines of core logic	~100 lines with bias correction

CVM wins on simplicity and avoids hash function concerns. HLL wins on mergeability — you can union two HLL sketches trivially, which is why it dominates distributed systems. Choose based on your architecture.

Applications

Genomics: Counting Unique k-mers

DNA sequencing generates billions of short subsequences (k-mers). Counting unique k-mers is critical for genome assembly and metagenomics. Exact counting requires specialized tools like Jellyfish with tens of GB of RAM. CVM can estimate unique k-mer counts in a streaming pass with kilobytes.

Network Security: Distinct IP Tracking

Firewall logs can contain billions of entries per day. Knowing the cardinality of source IPs helps detect DDoS attacks (sudden spike in unique IPs) and port scans (many IPs hitting the same port). CVM provides real-time cardinality estimation without storing IP tables.

Web Analytics: Unique Visitors

Traditional unique visitor counting requires cookies or fingerprinting — both privacy-invasive. With CVM, you can estimate unique visitors from server logs without storing any user identifiers. Process the log stream, get an estimate, discard the data.

IoT: Sensor Deduplication

Thousands of sensors generating readings with potential duplicates. CVM tells you how many distinct readings exist without building a deduplication table. Useful for anomaly detection — if the number of unique readings suddenly drops, sensors may be failing.

Try It Yourself

Clone the repository:

git clone https://github.com/RMANOV/Number-of-Unique-Elements-Prediction.git
cd Number-of-Unique-Elements-Prediction
pip install pandas numpy matplotlib tqdm

Quick start with your own data:

from cvm_counter import AdaptiveCVMCounter, DataAnalyzer

# Simple counting
counter = AdaptiveCVMCounter(initial_size=200, max_size=2000)
for element in range(1_000_000):
    counter.process_element(element)
print(f"Estimated unique: {counter.estimate_unique_count()}")

# Full analysis with visualization
analyzer = DataAnalyzer("your_data.xlsx", "column_name")
analyzer.process_data_sequential(batch_size=5000)
analyzer.visualize_results()
print(analyzer.get_statistics())

You're probably using the wrong fuzzy matching algorithm (and here's how to see why)

Ruslan Manov — Sun, 01 Feb 2026 15:53:46 +0000

Most developers reach for fuzzywuzzy or difflib.SequenceMatcher the moment they need fuzzy string matching. The ratio comes back — 0.73, looks reasonable — and they ship it. But Levenshtein Distance and SequenceMatcher measure fundamentally different things, and picking the wrong one silently corrupts your results.

I built a terminal app that animates both algorithms step by step so you can see why they disagree. Here's what I learned.

The experiment that changed how I think about fuzzy matching

Compare these two strings:

A: "Acme Corp."
B: "ACME Corporation"

Obviously the same company, right? Let's check:

from difflib import SequenceMatcher

# SequenceMatcher
SequenceMatcher(None, "Acme Corp.", "ACME Corporation").ratio()
# → 0.615 (61.5%)

# Levenshtein ratio
# distance = 9 edits, max_len = 16
# ratio = 1 - 9/16 = 0.4375 (43.8%)

SequenceMatcher says 61.5%. Levenshtein says 43.8%. That's not a minor disagreement — if your threshold is 50%, one algorithm matches and the other rejects.

Now try these:

A: "Saturday"
B: "Sunday"

SequenceMatcher(None, "Saturday", "Sunday").ratio()
# → 0.571 (57.1%)

# Levenshtein: distance = 3, max_len = 8
# ratio = 1 - 3/8 = 0.625 (62.5%)

Now Levenshtein scores higher. The algorithms flipped.

This isn't a bug. They're answering different questions.

What Levenshtein actually measures

Vladimir Levenshtein published his distance metric in 1965 at the Keldysh Institute in Moscow. The question is simple:

What is the minimum number of single-character operations (insert, delete, substitute) needed to transform string A into string B?

The algorithm builds a dynamic programming matrix. Each cell D[i][j] represents the optimal edit distance between the first i characters of A and the first j characters of B:

        ε    s    i    t    t    i    n    g
   ε    0    1    2    3    4    5    6    7
   k    1    1    2    3    4    5    6    7
   i    2    2    1    2    3    4    5    6
   t    3    3    2    1    2    3    4    5
   t    4    4    3    2    1    2    3    4
   e    5    5    4    3    2    2    3    4
   n    6    6    5    4    3    3    2    3

"kitten" → "sitting" = 3 edits: k→s (substitute), e→i (substitute), +g (insert).

Key property: Every edit costs exactly 1. Levenshtein doesn't care if you're changing case ("A"→"a"), expanding abbreviations ("Corp."→"Corporation"), or fixing typos ("teh"→"the"). Each character-level change is equally expensive.

This is why "Acme Corp." vs "ACME Corporation" scores so low — there are 9 individual character changes, and each one costs a full point.

What SequenceMatcher actually measures

Python's difflib.SequenceMatcher implements the Ratcliff/Obershelp "Gestalt Pattern Matching" algorithm (1983). The question is different:

What proportion of both strings consists of contiguous matching blocks?

Ratio = 2 · M / T
M = total characters in matching blocks
T = total characters in both strings

The algorithm works by divide-and-conquer:

Find the longest contiguous match between the two strings
Recursively find matches to the left and right of that match
Sum up all matching characters

For "The quick brown fox" vs "The quikc brown fax":

Step 1: Find longest match → " brown f" (8 chars)
Step 2: Left: "The quick" vs "The quikc" → "The qui" (7 chars)
Step 3: Remainders: "ck" vs "kc" → "k" (1 char)
Step 4: Right: "ox" vs "ax" → "x" (1 char)

M = 8 + 7 + 1 + 1 = 17
T = 19 + 19 = 38
Ratio = 34/38 = 89.5%

Key property: Long contiguous matches are rewarded disproportionately. "Acme Corp." and "ACME Corporation" share long blocks like "me Corp" — so SequenceMatcher scores them higher despite the character-level differences.

The comparison table that matters

Scenario	Levenshtein	SequenceMatcher	Winner
Typo: "programing"→"programming"	90.9%	95.2%	Both good
Company: "Acme Corp."→"ACME Corporation"	43.8%	61.5%	SM
Days: "Saturday"→"Sunday"	62.5%	57.1%	Lev
Accounting: "Invoice #12345"→"Inv. 12345"	64.3%	66.7%	Close
Cyrillic: "Левенщайн"→"Левенштейн"	70.0%	73.7%	SM slight
Typo: "The quick brown fox"→"The quikc brown fax"	89.5%	89.5%	Tie
Different: "algorithm"→"altruistic"	40.0%	50.0%	SM

Pattern: Levenshtein wins when differences are few but scattered (efficient edit path). SequenceMatcher wins when strings share long common blocks despite format variations.

Seeing is believing — the demo

I built a terminal app that animates both algorithms in real time. Here's what each demo shows:

Demo 1: Watch the DP matrix fill

Every cell fills with a flash, colored by operation type. You literally see the wavefront of dynamic programming propagate through the matrix. Then the optimal path traces back in magenta, and the edit operations appear:

k→s (sub)  i=i (match)  t=t  t=t  e→i (sub)  n=n  +g (ins)

Demo 2: Block discovery in real time

SequenceMatcher's divide-and-conquer becomes visible:

Gray highlight = current search region
Green flash = discovered matching block
Step log shows the recursion order

You can see why it finds " brown f" before "The qui" — longest first, then recurse.

Demo 3: Head-to-head arena

Animated bars grow simultaneously for 5 string pairs. The winner indicator appears per round. You viscerally see where the algorithms diverge.

Demo 4: Try your own strings

Type any two strings and get the full analysis: DP matrix (if short enough), both scores, colored diff, matching blocks, edit operations.

Demo 5: Real-world scenarios

Typo correction: Dictionary lookup with ranked results
Name dedup: Company name clustering
Fuzzy VLOOKUP: Invoice → catalog matching

Demo 6: Hybrid scoring

An animated weight sweep from w=0.0 (pure SM) to w=1.0 (pure Lev) with decision guidance.

When to use which — the decision framework

Typo correction, spell checking?
  → Levenshtein (edit model matches typo generation)

Name/entity deduplication?
  → SequenceMatcher (format-tolerant block matching)

Accounting codes, invoice matching?
  → Hybrid w=0.3-0.5 (format varies but typos also matter)

Plagiarism detection, document similarity?
  → SequenceMatcher (long shared passages are the signal)

Search autocomplete?
  → SequenceMatcher + prefix bonus

DNA/protein alignment?
  → Weighted Levenshtein (Needleman-Wunsch with substitution matrices)

Try it yourself

git clone https://github.com/RMANOV/fuzzy-match-visual.git
cd fuzzy-match-visual
python3 demo.py

Single file, zero dependencies, Python 3.8+, any modern terminal with truecolor.

Controls: arrow keys to navigate, Enter to select, 1-6 to jump to demos, S for speed control (0.25×-4×), Ctrl+C returns to menu.

The historical footnote

Levenshtein published in 1965, working on error-correcting codes at the Soviet Academy of Sciences. The Wagner-Fischer DP algorithm came in 1974. Ratcliff/Obershelp's Gestalt matching appeared in Dr. Dobb's Journal in 1988. Tim Peters (author of the Zen of Python) wrote Python's difflib.SequenceMatcher.

Three decades of algorithm design, two fundamentally different philosophies of what "similarity" means, and most developers still use them interchangeably.

Now you can see why that's a mistake.

GitHub: https://github.com/RMANOV/fuzzy-match-visual

Rust + PyO3 Enhanced Ichimoku Cloud with Hull MA Smoothing

Ruslan Manov — Sat, 31 Jan 2026 22:13:04 +0000

Why I rewrote 11 trading indicators from Python to Rust (and got bit-exact parity)

A Japanese newspaper reporter spent 30 years perfecting a trading system by hand. I rewrote it in Rust. Here's the full story — the history, the math, and the engineering.

The problem: Numba's cold start kills live trading

My Python trading system relied on Numba-JIT compiled Ichimoku Cloud calculations. Numba is excellent — until your process restarts.

Every cold start: 2-5 seconds of JIT compilation per function. In a live trading loop that restarts on errors, those seconds mean missed signals. And Numba holds the GIL during execution, blocking every other Python thread.

I needed:

Zero startup latency
GIL-free execution
Bit-exact results (no behavioral changes)
Single-file deployment (no LLVM runtime)

Rust + PyO3 checked every box.

A brief detour: the man on the mountain

Before we get to code, the history matters — because it explains why Ichimoku is designed the way it is.

Goichi Hosoda was a Japanese newspaper reporter who began developing a trading system in the 1930s. His pen name was Ichimoku Sanjin (一目山人) — literally "a glance from a man on a mountain." His goal: a single chart that shows support, resistance, trend, momentum, and future projections — all at one glance.

He enlisted teams of university students to manually compute and backtest the system across decades of Japanese stock and commodity data. No computers. Just pencils, paper, and price tables.

He published Ichimoku Kinko Hyo (一目均衡表 — "one-glance equilibrium chart") in 1968, after 30 years of development. The parameters 9, 26, 52 weren't arbitrary — they mapped to the Japanese trading calendar: 9 trading days (1.5 weeks), 26 days (1 month), 52 days (2 months).

The system remained almost exclusively Japanese until the internet era. Western traders discovered it in the 2000s and recognized its power: not just an indicator, but a complete trading framework.

The five classical components

Component	Japanese	Formula	Purpose
Conversion Line	Tenkan-sen	(highest high + lowest low) / 2 over short period	Short-term equilibrium
Base Line	Kijun-sen	Same formula, medium period	Primary signal line
Leading Span A	Senkou Span A	(Tenkan + Kijun) / 2	Front cloud edge
Leading Span B	Senkou Span B	Same formula, long period	Back cloud edge
Lagging Span	Chikou Span	Close shifted back N periods	Trend confirmation

The area between Senkou Span A and B forms the cloud (kumo). Price above cloud = bullish. Below = bearish. Inside = transitioning. Cloud thickness = support/resistance strength.

The key innovation: Hull Moving Average

Classic Ichimoku uses (max + min) / 2 — it only reacts when a new extreme appears in the window. This creates stepped, laggy lines.

Alan Hull (2005) solved the fundamental lag-vs-smoothness tradeoff with an algebraic trick:

HMA(n) = WMA(sqrt(n),  2 * WMA(n/2) - WMA(n))

Why it works:

WMA(n) (slow) lags by ~n/2 bars
WMA(n/2) (fast) lags by ~n/4 bars
2 * fast - slow extrapolates ahead, compensating the slow line's lag
Final WMA(sqrt(n)) smoothing adds only sqrt(n)/2 bars of lag

Result: ~50% lag reduction with smooth output.

I applied this to Ichimoku by replacing the midpoint calculation with Hull MA of (high + low) / 2. Same cloud structure, faster reaction, smoother boundaries.

The Rust implementation

Architecture

Python layer
    │
    ▼
advanced_ichimoku_cloud (Rust, PyO3)
    ├── hull.rs          → wma, hullma (+ inner functions)
    ├── hull_signals.rs  → trend, pullback, bounce detection
    ├── ichimoku.rs      → classic Ichimoku
    ├── ichimoku_hull.rs → Hull-enhanced Ichimoku
    └── indicators.rs    → ema, atr

Key design: inner functions

Every computation exists as a plain fn (no PyO3 overhead). The #[pyfunction] wrappers just handle NumPy conversion and delegate:

// Used by ichimoku_hull.rs without FFI cost
pub(crate) fn hullma_inner(data: &[f64], period: usize) -> Vec<f64> {
    // Pure computation — no Python types
}

#[pyfunction]
fn hullma(py: Python, prices: PyReadonlyArray1<f64>, period: usize) -> Py<PyArray1<f64>> {
    let slice = prices.as_slice().unwrap();
    let result = hullma_inner(slice, period);
    PyArray1::from_vec(py, result).into()
}

This enables cross-module reuse: ichimoku_hull.rs calls hull::hullma_inner() directly, with zero FFI overhead.

Zero-copy I/O

Input: as_slice().unwrap() reads NumPy arrays directly — no copying, no allocation
Output: PyArray1::from_vec allocates once in Rust, transfers ownership to Python

GIL release

PyO3 releases the GIL during Rust computation by default. Other Python threads (WebSocket handlers, order management) run freely while indicators compute.

Proving parity: 25+ assertions at 1e-12 tolerance

The test suite implements every function in pure Python, generates identical random data (seed=42, N=200), and asserts:

np.testing.assert_allclose(rust_result, python_result, atol=1e-12)

All 11 functions. All edge cases (NaN propagation, initial positions, backfill behavior). If Rust disagrees with Python by more than 1e-12, the test fails.

============================================================
  Parity Tests: advanced-ichimoku-cloud
============================================================
  PASS  wma
  PASS  hullma
  PASS  hullma_trend
  PASS  hullma_pullback
  PASS  hullma_bounce
  PASS  ichimoku_line
  PASS  ichimoku_components
  PASS  ichimoku_line_hull
  PASS  ichimoku_components_hull
  PASS  ema
  PASS  atr
============================================================
  ALL 11 FUNCTIONS PASS PARITY TESTS
============================================================

Before and after

Dimension	Python + Numba	Rust + PyO3
First-call latency	2-5s JIT warmup	Zero
GIL	Held during execution	Released
Memory safety	Runtime bounds checks	Compile-time guarantees
Dependency weight	~150 MB (numba + llvmlite)	~2 MB single .so
Reproducibility	JIT varies across LLVM versions	Deterministic binary

Try it

pip install advanced-ichimoku-cloud

from advanced_ichimoku_cloud import (
    ichimoku_components,       # classic cloud
    ichimoku_components_hull,  # Hull-enhanced cloud
    hullma, wma, ema, atr,    # individual indicators
)

import numpy as np
high = np.random.rand(200) * 100 + 50
low = high - np.random.rand(200) * 5

tenkan, kijun, senkou_a, senkou_b = ichimoku_components(high, low, 9, 26, 52)

GitHub: https://github.com/RMANOV/advanced-ichimoku-cloud

What I learned

PyO3's as_slice() is the killer feature — zero-copy NumPy access makes Rust competitive even for small arrays
Inner function pattern is essential — without it, cross-module reuse requires double FFI
Bit-exact parity testing catches subtle issues (NaN propagation order, integer division rounding) that benchmarks miss
The history of your domain matters — understanding why Hosoda chose those parameters helped me design better enhanced variants

Built with Rust, PyO3 0.27, and a deep appreciation for a journalist who spent 30 years perfecting a chart.

Building a regime-switching particle filter in Rust — from Kalman 1960 to rayon-parallelized Monte Carlo

Ruslan Manov — Sat, 31 Jan 2026 21:41:38 +0000

Building a regime-switching particle filter in Rust — from Kalman 1960 to rayon-parallelized Monte Carlo

A Hungarian mathematician's 1960 invention, a British statistician's 1993 extension, and a Rust rewrite that eliminates 30 seconds of JIT warmup. Here's the story of state estimation under regime switches.

60 years of hidden state estimation

1960 — Rudolf Kálmán publishes "A New Approach to Linear Filtering and Prediction Problems." A single paper that would guide Apollo spacecraft to the Moon, enable GPS, and become the backbone of every control system. But it has a limitation: it assumes linear dynamics and Gaussian noise.

1968 — Extended Kalman Filter (EKF) linearizes nonlinear systems via Taylor expansion. Works well enough for slightly nonlinear systems, fails catastrophically for highly nonlinear ones.

1993 — Gordon, Salmond, and Smith publish the bootstrap particle filter. Instead of assuming a distribution shape, they represent the belief as a cloud of weighted samples (particles). Each particle is a hypothesis about the hidden state. Propagate, weight, resample. Repeat. No linearity assumptions. No Gaussian requirements.

Present — Regime-switching extensions add a second layer: each particle carries both a continuous state (position, velocity) AND a discrete mode (regime). The system can switch between fundamentally different behaviors — trending, mean-reverting, or chaotic — and the filter tracks which regime is active.

The problem: 19 functions × 2-5s JIT each = pain

My trading system uses a particle filter to track price regimes in real time. Three modes:

RANGE: Mean-reverting. Price oscillates around equilibrium. velocity' = 0.5 × velocity + noise
TREND: Directional. Price follows order flow imbalance. velocity' = velocity + 0.3 × (gain × imbalance - velocity) × dt + noise
PANIC: High volatility. Random walk with large noise.

The Python+Numba implementation had 19 JIT-compiled functions. Every process restart: 30+ seconds of JIT compilation before the first estimate. In live trading, 30 blind seconds means missed regime transitions, delayed signals, frozen risk management.

And Numba holds the GIL. While 500 particles propagate through nonlinear dynamics, the entire Python runtime blocks.

19 functions in safe Rust

I rewrote everything in Rust with PyO3 bindings. Six categories:

Core particle filter

predict_particles    → Rayon-parallelized regime-specific propagation
update_weights       → Bayesian weight update via Gaussian likelihood
transition_regimes   → Markov chain mode switching (3×3 matrix)
systematic_resample  → O(N) two-pointer resampling
effective_sample_size → Degeneracy diagnostic (ESS = 1/Σwᵢ²)
estimate             → Weighted mean + regime probabilities

Kalman smoothing

kalman_update              → 2D level/slope tracker
slope_confidence_interval  → 95% CI for slope
is_slope_significant       → Directional significance test
kalman_slope_acceleration  → Second derivative for early trend entry

Signal processing

calculate_vwap_bands    → Volume-weighted price with σ-bands
calculate_momentum_score → Normalized momentum in [-1, +1]
rolling_kurtosis        → Fat-tail detection (excess kurtosis)

Statistical tests

hurst_exponent          → R/S analysis: trending vs mean-reverting
cusum_test              → Page's CUSUM: structural break detection
volatility_compression  → Range squeeze (short/long vol ratio)

Extended

particle_price_variance    → Weighted variance of particle cloud
ess_and_uncertainty_margin → Combined ESS + regime dominance
adaptive_vwap_sigma        → Kurtosis-adapted VWAP band width

The math that matters

Particle propagation (the regime-specific part)

Each particle i carries state [xᵢ, vᵢ] (log-price, velocity) and regime rᵢ ∈ {0, 1, 2}.

RANGE (r=0):

xᵢ' = xᵢ + vᵢ·dt + σ_pos[0]·√dt·εₓ
vᵢ' = 0.5·vᵢ + σ_vel[0]·√dt·εᵥ

The 0.5·vᵢ term pulls velocity toward zero — mean reversion.

TREND (r=1):

xᵢ' = xᵢ + vᵢ·dt + σ_pos[1]·√dt·εₓ
vᵢ' = vᵢ + 0.3·(G·imbalance - vᵢ)·dt + σ_vel[1]·√dt·εᵥ

Velocity tracks G × imbalance — the order flow signal.

PANIC (r=2):

xᵢ' = xᵢ + vᵢ·dt + σ_pos[2]·√dt·εₓ
vᵢ' = vᵢ + σ_vel[2]·√dt·εᵥ

Pure random walk with high noise.

Bayesian weight update

After observing the actual price z:

wᵢ' = wᵢ × exp(-0.5·(z - xᵢ)²/σ²_price[rᵢ]) × exp(-0.5·(vᵢ - G·imb)²/σ²_vel[rᵢ])

Particles close to the observation get high weight. Particles far away get low weight. Normalize. Resample when weights degenerate (ESS < N/2).

Why log-price space?

All operations use log(price). This makes the filter scale-invariant — the same parameters and noise levels work identically for a $0.50 penny stock and a $50,000 Bitcoin. Multiplicative price noise becomes additive in log space.

Engineering decisions

Rayon only where it matters

predict_particles is the only parallelized function. It's O(N) with substantial per-particle computation (regime branching, noise injection). The other 18 functions are either memory-bound (weight updates, resampling) or operate on small arrays (Kalman 2×2 matrices, signal windows).

Adding rayon to memory-bound functions would increase latency from thread pool overhead.

No internal randomness

The library takes pre-generated random arrays as input. This gives the caller:

Full reproducibility (same seed = same output)
Choice of RNG (numpy default, PCG64, whatever)
Ability to inject structured noise for testing

Numerical stability

Weight normalization: +1e-300 guard prevents underflow to zero
ESS denominator: +1e-12 prevents division by zero
Kalman covariance: symmetrized after every predict and update step
dt guard: max(dt, 1e-8).sqrt() prevents noise explosion at tiny timesteps

Hurst exponent — my favorite function

The Hurst exponent tells you if a price series is:

H > 0.5: Trending (persistent — ups followed by ups)
H = 0.5: Random walk (no memory)
H < 0.5: Mean-reverting (anti-persistent — ups followed by downs)

Computed via R/S (Rescaled Range) analysis:

For each window size n in [min_window, max_window]:
- Split series into blocks of size n
- For each block: compute range of cumulative deviations from mean, divide by standard deviation
- Average the R/S values
Fit log(R/S) = H × log(n) + c via least squares
H is the slope

This single number tells you whether to use trend-following or mean-reversion strategies. Combined with the particle filter's regime probabilities, you get a multi-scale view of market behavior.

Before and after

Dimension	Python + Numba	Rust + PyO3
Cold start (19 functions)	30-90s JIT warmup	Zero
GIL	Held during all compute	Released
Parallelism	prange (limited)	Rayon work-stealing
Memory safety	Runtime bounds checks	Compile-time guarantees
Dependency weight	~150 MB	~2 MB single .so
Reproducibility	JIT varies by LLVM version	Deterministic binary

Try it

pip install particle-filter-rs

import numpy as np
from particle_filter_rs import (
    predict_particles, update_weights, systematic_resample,
    effective_sample_size, estimate, transition_regimes,
    kalman_update, hurst_exponent, cusum_test,
)

# Initialize 500 particles in log-price space
N = 500
particles = np.column_stack([
    np.full(N, np.log(100.0)),  # log-price
    np.zeros(N),                 # velocity
])
regimes = np.zeros(N, dtype=np.int64)
weights = np.full(N, 1.0 / N)

# One filter step
rng = np.random.default_rng(42)
particles = predict_particles(
    particles, regimes,
    np.array([0.001, 0.002, 0.005]),  # process noise per regime
    np.array([0.01, 0.02, 0.05]),
    imbalance=0.3, dt=1.0, vel_gain=0.1,
    random_pos=rng.standard_normal(N),
    random_vel=rng.standard_normal(N),
)

GitHub: https://github.com/RMANOV/particle-filter-rs

What I learned

Rayon's overhead is real — for memory-bound functions, single-threaded Rust beats rayon-parallelized Rust
Log-space is non-negotiable — a particle filter in linear price space needs different noise parameters for every asset. Log-space is universal.
Deterministic filters are testable filters — external RNG makes every test reproducible, every bug reproducible
The Kalman complement matters — particle filters alone give noisy estimates. A parallel Kalman provides smooth baseline + confidence intervals. Best of both worlds.
60 years of math still works — Kálmán's 1960 insight (recursive Bayesian estimation) is alive in every function in this library

Built with Rust, PyO3 0.27, rayon 1.10, and respect for 60 years of state estimation research.

DEV Community: Ruslan Manov

I Turned a Webcam Into an Ambient Light Sensor

The 3 AM Problem

The Insight: You Already Have a Light Sensor

Architecture: Dual Backend, Graceful Degradation

The Performance Journey

Phase 1: Pure Python (~100ms cycles)

Phase 2: Numba JIT (12.3ms cycles)

Phase 3: Rust SIMD (3-6ms cycles) — v1.2.0 → v2.0.0

The Numbers

The Version Timeline — 8 Releases, Each Solving a Real Problem

The 5 Design Decisions That Made It Work

1. Empirical Brightness Curves (Theory Was Wrong)

2. Flash Detection Prevents False Activations

3. NOAA Sunrise/Sunset = ~90% Energy Savings

4. Auto-Exit Convergence (Not a Daemon)

5. Comprehensive Cleanup Eliminates Browser Lag

Competitive Landscape

Getting Started

What I Learned

Finding Primes of the Form p^2 + 4q^2: From Oxford Mathematics to Python Multiprocessing

The Mathematics: Why p^2 + 4q^2?

Fermat's Two-Square Theorem (1640)

Quadratic Forms

The Green-Sohni Restriction

The Algorithm: Step by Step

Phase 1: Sieve Generation

Phase 2: Candidate Enumeration

Phase 3: Primality Verification

Phase 4: Parallel Execution

Performance Analysis

The Stream Generator

Concrete Examples: The First 20 Green-Sohni Primes

Historical Timeline

Try It Yourself

What I Learned Building This

How to Count a Billion Unique Items with Almost No Memory

The Problem: Why Exact Counting Fails at Scale

A 40-Year Quest: The History of Probabilistic Counting

1985 — Flajolet-Martin

2003 — LogLog (Durand-Flajolet)

2007 — HyperLogLog

2024 — CVM: A Different Path

How CVM Works

Setup

Processing

Estimation

Why This Works

The Implementation

Key design choices

Accuracy Analysis

CVM vs HyperLogLog

Applications

Genomics: Counting Unique k-mers

Network Security: Distinct IP Tracking

Web Analytics: Unique Visitors

IoT: Sensor Deduplication

Try It Yourself

Further Reading

You're probably using the wrong fuzzy matching algorithm (and here's how to see why)

The experiment that changed how I think about fuzzy matching

What Levenshtein actually measures

What SequenceMatcher actually measures

The comparison table that matters

Seeing is believing — the demo

Demo 1: Watch the DP matrix fill

Demo 2: Block discovery in real time

Demo 3: Head-to-head arena

Demo 4: Try your own strings

Demo 5: Real-world scenarios

Demo 6: Hybrid scoring

When to use which — the decision framework

Try it yourself

The historical footnote

Rust + PyO3 Enhanced Ichimoku Cloud with Hull MA Smoothing

Why I rewrote 11 trading indicators from Python to Rust (and got bit-exact parity)

The problem: Numba's cold start kills live trading

A brief detour: the man on the mountain

The five classical components

The key innovation: Hull Moving Average