Building a zero-infra job queue with SQLite (and stress-testing it to 14 jobs/sec)

#python #sqlite #architecture #opensource

I needed to trigger automation scripts on an old Android phone from a cloud VPS, without opening ports or spinning up Redis. So I built a job queue out of Flask and SQLite.

Here is a deep dive into what it is, how SQLite handles the concurrency, and what happened when I tried to stress-test it in production.

What it is (and what it isn't)

Intent Bus is designed to be the missing link between simple cron jobs and heavy enterprise message brokers.

What it aims for:

Giving indie hackers and home-lab enthusiasts actual execution guarantees (retries, dead-letter queues) without needing to deploy or maintain broker infrastructure.
Running anywhere. You can deploy the server on a free cloud tier, and run workers on a Raspberry Pi, a Termux phone, or a cheap VPS.

What it is not:

It is not a Kafka replacement. It is not meant for microservices processing 10,000 transactions per second. If you have enterprise scale, use enterprise tools.

The Core Features

Cron is strictly fire-and-forget, with no coordination and silent failures. To fix that, Intent Bus provides:

Feature	Description
Atomic Locking	Prevents double-claiming. A cryptographic lease token is issued to the claiming worker.
Reliable Delivery	If a worker crashes or drops offline, the lease expires and the job requeues with exponential backoff.
Dead-Letter Queue	Failed jobs (e.g., after 3 attempts) are archived for inspection rather than deleted.
Capability Routing	Workers advertise what they can do; jobs require what they need (e.g., require a worker with `ffmpeg`).
Priority Queues	High-priority intents are always claimed and processed first.

The SQLite Atomic Lock

The biggest challenge with using SQLite for a queue is concurrent writes. If 40 workers poll the server at the exact same millisecond, they can't all claim the same job.

Intent Bus solves this by strictly enforcing SQLite's WAL (Write-Ahead Logging) mode and relying on the UPDATE ... RETURNING clause introduced in SQLite 3.35.0.

When a worker asks for a job, the server executes a single atomic transaction:

UPDATE intents
SET
    status = 'claimed',
    claimed_at = :now,
    claim_expires_at = :lease_exp,
    claim_token = :token,
    claim_attempts = claim_attempts + 1
WHERE id = (
    SELECT id FROM intents 
    WHERE status = 'open' AND run_at <= :now
    ORDER BY priority DESC, created_at ASC
    LIMIT 1
)
RETURNING id, payload, claim_token;

Because SQLite serializes writes, this guarantees that only one worker will ever receive the ephemeral claim_token required to fulfill or fail that specific job.

The Stress Test: A Tale of Two Deployments

I wanted to see exactly where SQLite would choke under concurrent worker load, so I wrote an enterprise-style stress harness.

Attempt 1: The PythonAnywhere Disaster
I originally deployed the architecture to PythonAnywhere's free tier. The results were catastrophic.

When I launched the high-intensity concurrent test, it didn't even manage to start. I stepped it down to the medium-intensity test (15 concurrent workers fighting for 500 jobs), and it threw 100% network errors.

Why? Because PythonAnywhere's free tier routes requests through a single-threaded Gunicorn worker. The concurrent polling requests bottlenecked at the WSGI layer, queued up, and timed out before SQLite even knew what hit it. It wasn't a database lock issue; it was a server concurrency issue.

Attempt 2: Docker on Render
I moved the exact same SQLite file and Flask app to a basic Docker container on Render, configured with multiple application threads (--threads 4).

Render's free tier aggressively puts your container to sleep when idle, meaning the very first request takes about 30 seconds to wake the server up (the dreaded cold start). But once it's awake? It handles the load beautifully.

Here is the benchmark data from the Render Docker deployment:

Configuration	Workers	Jobs	Success	P99 Latency	Throughput
Light	5	50	100%	0.594s	3.72 j/s
Medium	15	500	98.75%	0.517s	13.27 j/s
Heavy	40	2000	99.01%	2.586s	13.62 j/s

Even with 40 concurrent workers furiously hammering the single SQLite file for write locks, there were 0 network errors, 0 lost leases, and 0 rate limit crashes. It comfortably sustained over 13 jobs per second.

Limits & The Theoretical Future

Right now, the system is tuned for safety. Payloads are strictly limited to 8KB to prevent database bloat, and busy_timeout PRAGMAs are set to allow workers to wait gracefully for locks.

Could it go further? In theory, yes. If you moved off a free cloud container onto a dedicated VPS with NVMe storage, and tuned SQLite for pure speed (e.g., PRAGMA synchronous=OFF), you could probably push 50-100 jobs a second. Furthermore, the architecture is simple enough that you could easily swap the get_db() call to point to PostgreSQL if you needed horizontal scaling.

But honestly, if you hit that scale, you should just bite the bullet and deploy Redis. The entire philosophy of Intent Bus is avoiding that infrastructure jump until it is absolutely necessary.

The server runs on Docker, Raspberry Pi, or any free cloud tier. Workers run anywhere that speaks HTTP — including Termux.