DEV Community: LIKKI SAMARTH REDDY

I Thought SQLite Was Fast—Until 50 AI Agents Started Writing at Once

LIKKI SAMARTH REDDY — Thu, 02 Jul 2026 16:08:26 +0000

A real-world engineering experiment on checkpointing, persistence, and why your storage backend matters more than your AI model.

Everyone talks about LLMs.

Bigger models.
Better prompts.
Smarter agents.

But almost nobody talks about what happens after an AI agent has been running for hours.

Where does its state live?
How do you resume after a crash?
How do hundreds of agents save progress simultaneously?
What happens when persistence becomes the bottleneck?

Those questions led me to build Living AI—an experimental checkpointing engine for long-running AI agents.

Its purpose isn't to replace agent frameworks.

It's to solve a simpler problem:

Persist agent state quickly without letting storage stall execution.

The Experiment

I built Living AI around a few simple ideas:

Pluggable storage backends
In-memory hot cache
Compression
Recovery API
Execution-budgeted persistence
Built-in performance metrics

The architecture looks like this:

Agent
   │
   ▼
Checkpoint Engine
   │
   ├── Compress State
   ├── Update Hot Cache
   └── Persist to Storage
          │
          ├── SQLite
          └── Redis-compatible Store

Instead of benchmarking models, I benchmarked the infrastructure beneath them.

Benchmark #1 — Single Agent

First I tested a single long-running workflow.

Workload

500 checkpoints
250 KB state
Forced cache eviction

Results

✅ Average save: 13.45 ms

✅ Hot-cache recovery: 1.11 ms

✅ Cold recovery (SQLite + decompression): 1.48 ms

At this point everything looked healthy.

Benchmark #2 — Then I Added 50 Concurrent Agents

This is where things became interesting.

Configuration:

50 concurrent agents
1000 checkpoint attempts
SQLite backend
50 ms persistence budget

Results:

Metric	SQLite
Successful writes	5
Timed-out writes	995
Average write latency	282 ms
Maximum latency	735 ms

At first glance, this looked terrible.

Then I realized something important.

The checkpoint engine wasn't slow.

SQLite had become the bottleneck.

Because SQLite allows only one writer at a time, concurrent checkpoint requests began waiting on database locks.

The engine's timeout policy skipped slow persistence attempts rather than blocking agent execution.

That trade-off kept the agents responsive under load.

The Real Test

Without changing the checkpoint engine...

Without changing compression...

Without changing agent logic...

I swapped only the storage backend.

SQLite became a Redis-compatible implementation.

Exactly the same workload.

Exactly the same checkpoint engine.

Here were the results.

Metric	SQLite	Redis-compatible
SLA Compliance	0.5%	100%
Average Write	282 ms	0.64 ms
p99 Write	735 ms	1.23 ms

That single experiment completely changed where optimization effort should go.

The bottleneck wasn't checkpointing.

It wasn't serialization.

It wasn't caching.

It was storage contention.

One More Surprise

After removing storage contention, another bottleneck appeared.

Compression.

Large checkpoints (~793 KB) spent far more time compressing data than writing it.

In other words:

Once storage became fast enough, CPU work became the limiting factor.

That's exactly the kind of bottleneck you want to discover through benchmarking.

What Living AI Is

Living AI is an experiment in building infrastructure for long-running AI systems.

Current components include:

Pluggable persistence layer
Compression abstraction
Hot memory cache
Recovery API
Performance metrics
Benchmark suite

Rather than optimizing prompts, the focus is on making agent execution more resilient and observable.

Lessons Learned

This project reinforced three engineering lessons:

1. Architecture matters more than micro-optimizations.

A clean storage abstraction made it possible to compare backends without rewriting checkpoint logic.

2. Benchmarks often reveal a different bottleneck than you expect.

I started by trying to optimize checkpointing.

I ended up learning much more about storage systems.

3. Infrastructure deserves as much attention as models.

As AI agents become longer-lived and more autonomous, persistence, recovery, and state management become increasingly important parts of the stack.

What's Next

I'm currently exploring:

Additional storage backends
Faster compression algorithms
Async persistence queues
Framework integrations
Larger-scale reproducible benchmarks

If you're building AI agents, workflow engines, or distributed systems, I'd love to hear how you're approaching checkpointing and recovery.

The code is still evolving, and feedback from other engineers would be incredibly valuable.

Check the code: LivingAI

Why AI Agents Need a 50ms SLA Checkpoint Engine (and How We Built One)

LIKKI SAMARTH REDDY — Thu, 02 Jul 2026 16:01:00 +0000

Building AI agents that survive production is a different problem than building AI agents that work in development.

In development, your agent runs once, on your machine, with no concurrent users and a database that responds in milliseconds. In production, you have fifty agents running simultaneously, conversation histories that grow to hundreds of kilobytes, and a database that occasionally locks, times out, or becomes briefly unavailable.

Most agent frameworks were not designed for this reality. And the gap shows up in one specific place: checkpointing.

The silent killer in production agent architectures

Checkpointing is how an agent saves its state between steps. Every major framework does it. LangGraph has SqliteSaver and PostgresSaver. CrewAI has its own persistence layer. The OpenAI Agents SDK has thread state management.

What almost none of them account for is what happens when the database is slow.

The standard implementation looks roughly like this:

async def save_checkpoint(state):
    await database.write(state)  # blocks until complete
    continue_execution()

Under normal conditions, this is fine. Under concurrent load with SQLite, this is catastrophic. SQLite uses file-level write locking. When fifty agents try to write simultaneously, they queue behind each other. Write latencies spike from under a millisecond to over seven hundred milliseconds. Your agent, which was supposed to respond in two seconds, is now waiting three quarters of a second just to save its state at each step.

We ran this exact scenario. Fifty concurrent agents, one thousand total writes, payloads growing from five to one hundred kilobytes as conversation histories accumulated. With SQLite as the backing store, average write latency was 282ms. The p99 was 735ms. SLA compliance, defined as completing the write within 50ms, was 0.5%.

That is not a configuration problem. That is a fundamental architectural mismatch between SQLite's single-writer model and concurrent agent workloads.

The architecture we built

Living AI is our open-source solution to this problem. The core insight is that checkpointing should never block the agent execution thread, regardless of what the database is doing.

The architecture has three components.

The first is a hot RAM cache. When an agent saves state, it writes synchronously to an in-process LRU cache with a configurable TTL. This write is always sub-millisecond because it never touches disk or network. Reads check this cache first. In a running agent, the most recent state is almost always in the cache, which means the common read path resolves in microseconds.

The second is a budgeted durable write. After updating the RAM cache, the engine attempts to write to the backing database. This write runs inside asyncio.wait_for with a hard timeout, fifty milliseconds by default. If the database cannot complete the write within budget, the engine drops the write, logs it as a missed checkpoint, and continues. The agent thread is never blocked.

The third is a self-describing compression layer. Every state blob is compressed with zlib at level six and prepended with a one-byte codec header. The header value 0x00 means uncompressed, 0x01 means zlib. This detail matters more than it sounds: it means you can change the compression algorithm to zstd in the future without breaking any existing checkpoints. Old blobs read their own header and decompress correctly regardless of the current default.

The ordering of the first two components is the critical design decision. The RAM cache is updated before the database write is attempted. This means even if every database write times out, the agent still has access to its current state through the cache, and crash recovery still works. We stress tested this directly: in our hyperscale test with 150 concurrent agents and 0.77MB state payloads, 99.3% of database writes timed out, and recovery success rate was 100% across all 1500 agents.

What the benchmark numbers actually show

We ran three test tiers and want to be transparent about what each one measures.

The single-agent benchmark, which is what the README headline numbers come from, uses the SQLite store with one writer and 50KB compressed blobs. Checkpoint write latency at p50 is 0.3ms, at p95 is 0.8ms, at p99 is approximately 1ms. Hot cache reads resolve in around 4 microseconds.

The production workload test uses 50 concurrent agents, 1000 total writes, and payloads growing from 5KB to 100KB. This is where the SQLite versus Redis comparison becomes meaningful:

Metric	SQLite	Redis
SLA compliance within 50ms	0.5%	100%
Average write latency	282ms	0.64ms
p99 write latency	735ms	1.23ms
Recovery success rate	100%	100%

The hyperscale test uses 150 concurrent agents, 1500 total writes, and 0.77MB payloads representing large context windows with long histories and extensive tool call records:

Metric	SQLite	Redis
SLA compliance within 50ms	0.7%	100%
p99 write latency	above 800ms for successes	62ms
Recovery success rate	100%	100%
p99 recovery read latency	8.84ms	6.61ms
Total execution time	85.81s	68.54s

One honest observation about the Redis hyperscale p99 of 62ms: this is slightly above the 50ms SLA, and all writes still completed because asyncio loop scheduling allowed them through. The bottleneck at this scale is not the database. It is CPU. Compressing a 0.77MB blob with zlib is a CPU-bound operation that runs under Python's GIL. At that payload size, the compression itself takes approximately 40ms, which leaves little budget for I/O. Teams hitting this ceiling have two options: switch to zstd, which compresses significantly faster, or offload compression to a process pool executor. We will add both as configuration options in a future release.

The important pattern across all three tiers is that recovery success rate is 100% regardless of SLA compliance. The two metrics are independent because recovery reads from the RAM cache, not the database. SLA compliance tells you how much of your state made it to durable storage. Recovery success tells you whether your agents can resume after a crash. Both matter, but they are not the same number.

How Living AI fits with LangGraph and CrewAI

Living AI is not a replacement for agent frameworks. LangGraph handles graph compilation, conditional routing, state schemas, and the execution model that makes complex multi-agent workflows possible. CrewAI handles crew orchestration, role assignment, and agent collaboration. These are problems Living AI does not solve and does not try to solve.

What Living AI adds is the production reliability layer that sits underneath the framework:

Your agent logic
    ↓
LangGraph / CrewAI / OpenAI Agents
    ↓
Living AI runtime
    ↓
Redis / PostgreSQL / SQLite

The framework decides where the agent goes. Living AI makes sure it gets there reliably, can recover if it crashes, and leaves a complete execution record for debugging and compliance.

The adapter layer makes this composable. Each framework adapter is a thin translation layer that maps framework execution events to Living AI's ExecutionNode model. The core runtime has zero framework dependencies. Swapping from LangGraph to CrewAI does not change how checkpointing, recovery, or replay works.

The replay capability

Crash recovery is the obvious use case. But the more interesting capability for day-to-day development is replay.

When an agent produces a wrong answer, or books the wrong flight, or sends the wrong message, the question you want to answer is: what exactly happened, and why did the model make that decision? With a standard observability tool, you have logs. You can see what happened. But you cannot re-run the execution with the exact same inputs to reproduce and debug the behavior.

Living AI stores every prompt, every response, every tool call, and every intermediate state in an append-only execution graph. The replay engine can re-execute any recorded run in four modes.

FULL replay re-executes every node from scratch, making real API calls and tool invocations. FROM_NODE replay re-executes from a specific node, skipping the work that preceded it. MOCK_TOOLS replay is the most useful for debugging: it re-runs the LLM reasoning with recorded tool responses served from the execution history, so you can iterate on prompt changes without making real API calls or triggering real side effects. COUNTERFACTUAL replay re-executes with modified input at a specific node, letting you test what would have happened if a particular tool had returned a different value.

The MOCK_TOOLS mode is what makes Living AI useful beyond just crash recovery. If a customer reports that the AI booked the wrong flight, you can replay that exact execution, inspect the LLM's reasoning at each step with the recorded context, and identify where the decision went wrong, all without touching a live system.

Getting started

The core library has zero runtime dependencies. Everything uses the Python standard library.

pip install livingai

For Redis:

pip install "livingai[redis]"

For PostgreSQL:

pip install "livingai[postgres]"

A minimal crash recovery example:

import asyncio
from livingai import (
    CheckpointEngine, SQLiteStore, ExecutionNode,
    RecoveryEngine, NodeType, Status
)

async def main():
    engine = CheckpointEngine(SQLiteStore("agent.db"))

    step = ExecutionNode(
        execution_id="run-1",
        type=NodeType.PROMPT,
        status=Status.SUCCESS,
        output="plan ready"
    )
    await engine.save(step, state=b"serialized agent state")

    charge = ExecutionNode(
        execution_id="run-1",
        type=NodeType.TOOL,
        status=Status.SUCCESS,
        output={"receipt": "R-1"}
    )
    await engine.save(charge)

    recovery = RecoveryEngine(CheckpointEngine(SQLiteStore("agent.db")))
    plan = await recovery.plan("run-1")

    print("resume from:", plan.resume_node_id)
    print("skip effects:", len(plan.skipped_nodes))

asyncio.run(main())

The skip effects line is the one that matters. Tool nodes are marked non-idempotent by default. The recovery engine will never re-run them. If your agent charged a card on step six and crashed on step eight, the card is not charged again on recovery.

The examples directory in the repository has five runnable demos covering crash recovery, MOCK_TOOLS debugging, cost tracking, and the LangGraph adapter. None of them require an LLM API key or network access.

Choosing a store for your workload

One lesson from the benchmarks worth making explicit: the right store depends on your concurrency level, not your preference.

SQLite is the right default for local development and single-agent workloads. It requires zero configuration, ships with Python, and performs well under low concurrency. The benchmark numbers at p99 under 1ms are real and achievable in this scenario.

Redis is the right choice for production workloads with multiple concurrent agents. The switch is one import change and a connection URL. No agent logic changes. No core configuration changes. SLA compliance goes from 0.5% to 100%.

PostgreSQL is the right choice when you need long-term durable storage with query capabilities, cost aggregation across runs, and the ability to reconstruct execution history after a process restart that evicted the Redis cache.

You can also layer them: Redis as the hot tier for active executions, PostgreSQL as the cold tier for historical records. This is the configuration we recommend for teams running agents at scale.

The project is Apache-2.0 licensed and completely open source.

GitHub: github.com/likkisamarthreddy/livingai

If you are running agents in production and have hit reliability problems we have not covered here, open a GitHub Discussion. We are actively building the next milestone, which is a FastAPI cloud backend with a web-based replay UI, and real production feedback is shaping what gets built.