DEV Community

Cover image for Building a zero-infra job queue with SQLite (and stress-testing it to 14 jobs/sec)
D_SECURITY
D_SECURITY

Posted on

Building a zero-infra job queue with SQLite (and stress-testing it to 14 jobs/sec)

I needed to trigger automation scripts on an old Android phone from a cloud VPS, without opening ports or spinning up Redis. So I built a job queue out of Flask and SQLite.

Here is a deep dive into what it is, how SQLite handles the concurrency, and what happened when I tried to stress-test it in production.

What it is (and what it isn't)

Intent Bus is designed to be the missing link between simple cron jobs and heavy enterprise message brokers.

What it aims for:

  • Giving indie hackers and home-lab enthusiasts actual execution guarantees (retries, dead-letter queues) without needing to deploy or maintain broker infrastructure.
  • Running anywhere. You can deploy the server on a free cloud tier, and run workers on a Raspberry Pi, a Termux phone, or a cheap VPS.

What it is not:

  • It is not a Kafka replacement. It is not meant for microservices processing 10,000 transactions per second. If you have enterprise scale, use enterprise tools.

The Core Features

Cron is strictly fire-and-forget, with no coordination and silent failures. To fix that, Intent Bus provides:

Feature Description
Atomic Locking Prevents double-claiming. A cryptographic lease token is issued to the claiming worker.
Reliable Delivery If a worker crashes or drops offline, the lease expires and the job requeues with exponential backoff.
Dead-Letter Queue Failed jobs (e.g., after 3 attempts) are archived for inspection rather than deleted.
Capability Routing Workers advertise what they can do; jobs require what they need (e.g., require a worker with ffmpeg).
Priority Queues High-priority intents are always claimed and processed first.

The SQLite Atomic Lock

The biggest challenge with using SQLite for a queue is concurrent writes. If 40 workers poll the server at the exact same millisecond, they can't all claim the same job.

Intent Bus solves this by strictly enforcing SQLite's WAL (Write-Ahead Logging) mode and relying on the UPDATE ... RETURNING clause introduced in SQLite 3.35.0.

When a worker asks for a job, the server executes a single atomic transaction:

UPDATE intents
SET
    status = 'claimed',
    claimed_at = :now,
    claim_expires_at = :lease_exp,
    claim_token = :token,
    claim_attempts = claim_attempts + 1
WHERE id = (
    SELECT id FROM intents 
    WHERE status = 'open' AND run_at <= :now
    ORDER BY priority DESC, created_at ASC
    LIMIT 1
)
RETURNING id, payload, claim_token;
Enter fullscreen mode Exit fullscreen mode

Because SQLite serializes writes, this guarantees that only one worker will ever receive the ephemeral claim_token required to fulfill or fail that specific job.

The Stress Test: A Tale of Two Deployments

I wanted to see exactly where SQLite would choke under concurrent worker load, so I wrote an enterprise-style stress harness.

Attempt 1: The PythonAnywhere Disaster
I originally deployed the architecture to PythonAnywhere's free tier. The results were catastrophic.

When I launched the high-intensity concurrent test, it didn't even manage to start. I stepped it down to the medium-intensity test (15 concurrent workers fighting for 500 jobs), and it threw 100% network errors.

Why? Because PythonAnywhere's free tier routes requests through a single-threaded Gunicorn worker. The concurrent polling requests bottlenecked at the WSGI layer, queued up, and timed out before SQLite even knew what hit it. It wasn't a database lock issue; it was a server concurrency issue.

Attempt 2: Docker on Render
I moved the exact same SQLite file and Flask app to a basic Docker container on Render, configured with multiple application threads (--threads 4).

Render's free tier aggressively puts your container to sleep when idle, meaning the very first request takes about 30 seconds to wake the server up (the dreaded cold start). But once it's awake? It handles the load beautifully.

Here is the benchmark data from the Render Docker deployment:

Configuration Workers Jobs Success P99 Latency Throughput
Light 5 50 100% 0.594s 3.72 j/s
Medium 15 500 98.75% 0.517s 13.27 j/s
Heavy 40 2000 99.01% 2.586s 13.62 j/s

Even with 40 concurrent workers furiously hammering the single SQLite file for write locks, there were 0 network errors, 0 lost leases, and 0 rate limit crashes. It comfortably sustained over 13 jobs per second.

Limits & The Theoretical Future

Right now, the system is tuned for safety. Payloads are strictly limited to 8KB to prevent database bloat, and busy_timeout PRAGMAs are set to allow workers to wait gracefully for locks.

Could it go further? In theory, yes. If you moved off a free cloud container onto a dedicated VPS with NVMe storage, and tuned SQLite for pure speed (e.g., PRAGMA synchronous=OFF), you could probably push 50-100 jobs a second. Furthermore, the architecture is simple enough that you could easily swap the get_db() call to point to PostgreSQL if you needed horizontal scaling.

But honestly, if you hit that scale, you should just bite the bullet and deploy Redis. The entire philosophy of Intent Bus is avoiding that infrastructure jump until it is absolutely necessary.

The server runs on Docker, Raspberry Pi, or any free cloud tier. Workers run anywhere that speaks HTTP — including Termux.

I would love to hear your feedback. Has anyone else experimented with using SQLite for job queueing? What edge cases did you run into?

Top comments (0)