DEV Community

Nathaniel Hamlett
Nathaniel Hamlett

Posted on • Originally published at nathanhamlett.com

SQLite Is the Best Database for AI Agents (And You're Overcomplicating It)

SQLite Is the Best Database for AI Agents (And You're Overcomplicating It)

Every AI agent tutorial starts the same way: set up a vector database, connect a graph store, spin up Redis for caching, add Postgres for structured data. By the time you've configured your infrastructure, you've spent more time on DevOps than on the actual agent.

I run an autonomous AI agent that operates 24/7. It has 44 skills, 23 scheduled cron jobs, tracks 400+ opportunities through an 8-stage pipeline, manages outreach across email and Telegram, and publishes content autonomously. The entire state layer is a single SQLite file.

Not SQLite as a prototype. Not SQLite as a "we'll migrate later." SQLite as the production database for a system that runs real operations every day.

Here's why this works, what breaks, and when you should actually reach for something bigger.

Why Agents Need State (And Why Most Skip It)

Most AI agent demos are stateless. The agent receives a prompt, does some work, returns a result. No memory. No pipeline. No tracking.

This works for demos. It fails catastrophically for anything that runs autonomously. Here's what happens when your agent has no state management:

  • It rediscovers the same information repeatedly. My agent scans 9 job APIs daily. Without a database, it would alert me about the same job every scan. With deduplication in SQLite, it inserts new discoveries and skips known ones.
  • It can't track progress through multi-step workflows. An opportunity goes through 8 stages: discovered → researched → strategy_ready → outreach_drafted → approved → applied → interviewing → closed. Each stage has different data requirements and different actions. Without state, you can't know where anything is.
  • It can't learn from its own outputs. Which application strategies get responses? Which cover letter narratives work? Which sources produce the best leads? Without tracking, every cycle starts from zero.
  • It can't coordinate across sessions. My agent runs 23 cron jobs throughout the day. A morning scan discovers opportunities. An afternoon session researches them. An evening session builds application packets. A night session submits them. Without shared state, these sessions can't hand work to each other.

You need a database. The question is which one.

Why SQLite Beats Everything Else for Agents

1. Zero Configuration, Zero Dependencies

import sqlite3
db = sqlite3.connect('agent.db')
db.execute('CREATE TABLE IF NOT EXISTS opportunities (...)')
Enter fullscreen mode Exit fullscreen mode

That's it. No server to install. No connection strings. No Docker containers. No port conflicts. No authentication. No migrations framework. The database is a file. You copy it, you back it up, you version it.

For an autonomous agent that needs to be reliable above all else, removing infrastructure dependencies removes failure modes. My agent has never gone down because of a database issue. It has gone down because of API rate limits, network timeouts, and authentication token expiry — but never because the database wasn't running.

2. WAL Mode Makes Concurrent Access Safe

The main concern people raise about SQLite is concurrent access. If your morning scan cron and your research cron both try to write at the same time, don't you get locking errors?

db = sqlite3.connect('agent.db')
db.execute('PRAGMA journal_mode=WAL')
db.execute('PRAGMA busy_timeout=5000')
Enter fullscreen mode Exit fullscreen mode

WAL (Write-Ahead Logging) mode solves this. Multiple readers can operate simultaneously. Writers queue cleanly. The busy_timeout parameter tells SQLite to wait up to 5 seconds for a lock instead of immediately failing. In practice, my 23 cron jobs have never had a locking conflict that a 5-second timeout couldn't handle.

The key insight: AI agents are not high-concurrency systems. You're not handling 10,000 requests per second. You're handling maybe 5-10 concurrent processes, each doing a few writes per minute. SQLite handles this effortlessly.

3. The Entire State Is Inspectable

sqlite3 agent.db "SELECT company, role, stage, fit_score FROM opportunities WHERE stage != 'closed' ORDER BY fit_score DESC LIMIT 10"
Enter fullscreen mode Exit fullscreen mode

No ORM. No admin panel. No special tools. I can inspect my agent's entire state with a single command-line query. When something looks wrong, I can see exactly what happened:

SELECT company, applied_at, outcome, notes
FROM opportunities
WHERE stage = 'applied' AND applied_at > '2026-03-01'
ORDER BY applied_at DESC;
Enter fullscreen mode Exit fullscreen mode

This matters more than people realize. Debugging an autonomous agent is already hard — you're trying to understand why a system made a decision while you weren't watching. Having the state in a single, queryable file makes the difference between a 5-minute investigation and a 2-hour archaeology expedition through log files.

4. Schema Evolution Is Simple

When I needed to add follow-up tracking:

# Check if column exists, add if not
columns = [row[1] for row in db.execute('PRAGMA table_info(opportunities)').fetchall()]
if 'follow_up_1_sent' not in columns:
    db.execute('ALTER TABLE opportunities ADD COLUMN follow_up_1_sent TEXT')
    db.execute('ALTER TABLE opportunities ADD COLUMN next_follow_up_due TEXT')
Enter fullscreen mode Exit fullscreen mode

No migration files. No migration runner. No downtime. SQLite's ALTER TABLE ADD COLUMN is instantaneous and non-destructive. For a system that evolves daily (I've added 8 columns this week alone), this is essential.

The Actual Schema

Here's what powers a real autonomous agent:

-- Core pipeline
CREATE TABLE opportunities (
    id INTEGER PRIMARY KEY,
    source TEXT,
    url TEXT,
    company TEXT,
    role TEXT,
    raw_description TEXT,
    discovered_at TEXT DEFAULT CURRENT_TIMESTAMP,
    stage TEXT DEFAULT 'discovered',
    fit_score REAL,
    -- Research outputs
    company_research TEXT,
    role_analysis TEXT,
    warm_intros TEXT,
    strategy TEXT,
    -- Application materials
    tailored_resume TEXT,
    cover_letter TEXT,
    narrative_variant TEXT,  -- A through E, rotated
    -- Submission tracking
    apply_url TEXT,
    ats_type TEXT,
    applied_at TEXT,
    -- Follow-up management
    follow_up_1_sent TEXT,
    follow_up_2_sent TEXT,
    next_follow_up_due TEXT,
    contact_email TEXT,
    outcome TEXT,
    notes TEXT
);

-- Revenue tracking (beyond job applications)
CREATE TABLE revenue_streams (
    id INTEGER PRIMARY KEY,
    source TEXT,          -- 'bounty', 'freelance', 'gig', 'content'
    platform TEXT,
    amount REAL,
    status TEXT DEFAULT 'pending',
    title TEXT,
    submitted_at TEXT,
    paid_at TEXT
);

-- Published content
CREATE TABLE content_pipeline (
    id INTEGER PRIMARY KEY,
    title TEXT,
    type TEXT,             -- 'article', 'thread', 'bounty_submission'
    platform TEXT,
    status TEXT DEFAULT 'draft',
    published_url TEXT,
    published_at TEXT
);
Enter fullscreen mode Exit fullscreen mode

Three tables. That's the entire state layer for an agent that runs 44 skills across 23 scheduled jobs. The opportunities table does the heavy lifting — it's both the pipeline tracker and the knowledge store for each opportunity.

Patterns That Work

Pipeline Stage Transitions

def advance_stage(db, opp_id, new_stage, **kwargs):
    """Move an opportunity to the next pipeline stage."""
    updates = {'stage': new_stage, 'updated_at': datetime.now().isoformat()}
    updates.update(kwargs)
    set_clause = ', '.join(f'{k} = ?' for k in updates)
    db.execute(
        f'UPDATE opportunities SET {set_clause} WHERE id = ?',
        list(updates.values()) + [opp_id]
    )
    db.commit()

# Usage in a research session:
advance_stage(db, 42, 'researched',
    company_research='Founded 2023, Series A...',
    role_analysis='Looking for someone who...',
    warm_intros='3 mutual connections on Twitter...')
Enter fullscreen mode Exit fullscreen mode

Each cron job queries for opportunities in the stage it handles and advances them to the next stage. The morning scan creates discovered entries. The research session picks up discovered entries with fit_score >= 7.0 and advances them to researched. The packet builder picks up researched entries and advances them to strategy_ready.

Deduplication on Insert

def insert_if_new(db, company, role, url, source, fit_score):
    """Only insert if we haven't seen this opportunity before."""
    existing = db.execute(
        'SELECT id FROM opportunities WHERE company = ? AND role = ? AND url = ?',
        (company, role, url)
    ).fetchone()
    if existing:
        return None  # Already tracked
    db.execute(
        'INSERT INTO opportunities (company, role, url, source, fit_score) VALUES (?, ?, ?, ?, ?)',
        (company, role, url, source, fit_score)
    )
    db.commit()
    return db.execute('SELECT last_insert_rowid()').fetchone()[0]
Enter fullscreen mode Exit fullscreen mode

This is critical when you're scanning multiple job APIs on a schedule. Without deduplication, your database fills with duplicates and your pipeline metrics become meaningless.

Narrative Variant Rotation

def pick_narrative_variant(db):
    """Pick the least-recently-used narrative variant."""
    counts = db.execute('''
        SELECT narrative_variant, COUNT(*) as cnt
        FROM opportunities
        WHERE narrative_variant IS NOT NULL
        GROUP BY narrative_variant
    ''').fetchall()
    used = {row[0]: row[1] for row in counts}
    variants = ['A', 'B', 'C', 'D', 'E']
    # Pick the variant with the lowest usage count
    return min(variants, key=lambda v: used.get(v, 0))
Enter fullscreen mode Exit fullscreen mode

Every cover letter uses a different narrative angle. Variant A leads with builds. Variant B leads with operations. Variant C leads with relationships. The database tracks which variant each application used, so the agent never sends the same style twice in a row.

When SQLite Isn't Enough

I use SQLite for structured pipeline state. I use different tools for different data types:

  • Vector search (semantic similarity over documents): Qdrant. SQLite can't do cosine similarity over embeddings. If your agent needs RAG, you need a vector database.
  • Graph relationships (who knows whom, entity connections): FalkorDB. SQLite can model relationships with JOIN tables but graph traversal queries are painful.
  • Real-time pub/sub (instant notifications across processes): Use the filesystem or a message queue. SQLite's change notification support is limited.
  • Multi-machine coordination: If your agent runs across multiple servers, SQLite can't share state between them. This is the real migration trigger — not data volume, not query complexity, but distributed deployment.

The honest answer for most agent builders: you won't hit these limits for a long time. A single SQLite database handles millions of rows without breaking a sweat. My 400+ opportunity pipeline with full research text in every row is about 15MB. SQLite databases routinely handle hundreds of gigabytes.

The Deeper Point

The AI agent ecosystem has an infrastructure addiction. Every tutorial assumes you need Kubernetes, Postgres, Redis, a vector database, a monitoring stack, and a deployment pipeline before you can build anything useful.

This is backwards. The agent's value comes from its intelligence — what it knows, what it can do, and how it decides. The infrastructure should be invisible. SQLite makes it invisible.

My agent has generated more value from one file-based database and well-designed pipeline stages than most tutorial agents generate from their entire cloud infrastructure. Because I spent my time on the hard problems — fit scoring, research quality, narrative variation, follow-up cadence — instead of on database administration.

Start with SQLite. You can always migrate later. But you probably won't need to.


Nathan Hamlett builds autonomous AI systems. His current agent runs 44 skills across 23 cron jobs, managing a full job pipeline and content operation from a single SQLite database. More at nathanhamlett.com.

Top comments (0)