Hassan

Posted on Mar 13

SQLite as a CRM: Why We Chose the Simplest Database for Our Sales Pipeline

#ai #python #typescript #hiring

51 leads, 96 outreach events, four tables, one file. Sync completes in under a second. Here is why we chose SQLite over everything else.

Our CRM is a SQLite database with four tables. It sits on an external hard drive attached to a Raspberry Pi. It is downstream of a directory of markdown files that six AI agents read and write to. The database never writes back to the agents. It exists purely to answer questions that markdown cannot answer efficiently.

This is not a compromise. It is a deliberate architecture. The agents speak markdown. The database speaks SQL. A sync script translates between them on every run. Both layers do what they are good at, and neither tries to do the other's job.

The Architecture: Markdown First, SQLite Second

The source of truth is a directory of markdown files:

crm/leads/
├── taktile/
│   └── profile.md
├── parloa/
│   └── profile.md
├── cosuno/
│   └── profile.md
└── ... (51 leads)

outreach/drafts/
├── taktile.md              # Email draft
├── taktile.approved        # Approval marker
├── taktile.email-1-sent    # Contains date: "2026-03-10"
├── taktile.email-2-sent    # Contains date: "2026-03-13"
└── ...

Each lead is a directory with a profile.md file. Each outreach action is a marker file in outreach/drafts/. The marker files are deliberately dumb: taktile.approved is an empty file whose existence means "approved." taktile.email-1-sent contains a single line with the date the email was sent.

A sync script (sync.py) runs after every pipeline execution. It parses all markdown profiles and marker files, then upserts everything into SQLite. The database has four tables:

CREATE TABLE leads (
    slug TEXT PRIMARY KEY,
    company TEXT NOT NULL,
    website TEXT,
    industry TEXT,
    size TEXT,
    location TEXT,
    funding TEXT,
    tech_stack TEXT,
    score INTEGER,
    score_budget INTEGER,
    score_authority INTEGER,
    score_need INTEGER,
    score_timeline INTEGER,
    score_fit INTEGER,
    stage TEXT DEFAULT 'Research',
    source TEXT,
    created_at TEXT,
    updated_at TEXT,
    next_action TEXT
);

CREATE TABLE outreach_events (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    lead_slug TEXT REFERENCES leads(slug),
    event_type TEXT,    -- 'approved', 'email_1_sent', 'email_2_sent', 'bounce', 'reply'
    event_date TEXT,
    notes TEXT
);

CREATE TABLE stage_transitions (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    lead_slug TEXT REFERENCES leads(slug),
    from_stage TEXT,
    to_stage TEXT,
    changed_at TEXT,
    reason TEXT
);

CREATE TABLE telegram_log (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    message_type TEXT,
    reference_id TEXT,
    sent_at TEXT,
    chat_id TEXT,
    message_text TEXT
);

That is the entire schema. Four tables, no joins required for the common queries, no indexes beyond the primary keys. The leads table has 18 columns. The outreach_events table tracks every email sent, every bounce, every reply, with timestamps.

Why Markdown Is the Source of Truth

The agents are the primary users of the CRM. They create leads, score them, write outreach drafts, and update statuses. Every one of these operations is a file write.

When the SDR agent creates a new lead, it writes a profile.md with structured fields:

# Taktile

## Overview
- **Website:** https://taktile.com
- **Industry:** Fintech / Decision Intelligence
- **Size:** 51-200
- **Location:** Berlin, Germany
- **Funding:** Series B ($54M)
- **Tech stack:** Python, TypeScript, React, Kubernetes

## Score

| Dimension | Score | Max | Justification |
|-----------|-------|-----|---------------|
| Budget | 18 | 20 | Series B funded, actively hiring |
| Authority | 14 | 20 | CTO identified, engineering blog active |
| Need | 16 | 20 | 8 open engineering roles |
| Timeline | 12 | 20 | Scaling post-fundraise |
| Fit | 15 | 20 | Python/TS stack matches our hiring pipeline |
| **Total** | **75** | **100** | |

## Status
- **Stage:** Outreach
- **Created:** 2026-03-10
- **Last updated:** 2026-03-13

This file is simultaneously the agent's output, the human-readable record, and the sync source. There is no translation layer between what the agent produces and what the system stores. The agent writes markdown because that is what LLMs produce naturally. The system reads markdown because that is what the sync script parses.

Git history provides a complete audit trail. Every field change is a commit. You can run git log --follow crm/leads/taktile/profile.md and see every score update, stage change, and profile enrichment with timestamps and diffs.

If we stored lead data in a database directly, agents would need to execute SQL inserts and updates. That means SQL in prompts, connection handling, error recovery for failed transactions, and a database client dependency. Markdown eliminates all of this. The agent writes a file. Done.

sync.py: The Translation Layer

The sync script is 250 lines of Python that does three things: parse lead profiles, parse outreach markers, and upsert everything into SQLite.

Parsing is harder than it sounds. Over three months, the SDR agent has produced lead profiles in four different score formats:

Markdown table: | Budget | 18 | 20 | Justification... |
H3 heading: ### Total Score: 75 / 100
H3 bold: ### **Total: 75/100**
Field notation: - **Score:** 75/100

The parse_score_table() function handles all four. When the SDR agent drifts to a new format (which happens when the prompt is updated or the model changes), we add a parser for it. The sync script is tolerant of format variation because the agents are not perfectly consistent.

Stage normalization is similarly flexible:

def normalize_stage(raw_stage):
    """Normalize freeform stage text to a clean stage name.

    'Research — draft outreach immediately' -> 'Research'
    'New — not yet contacted' -> 'Identified'
    'New Lead — Research Complete' -> 'Research'
    """
    if not raw_stage:
        return "Identified"
    base = re.split(r"\s*[—–\-]\s*", raw_stage, maxsplit=1)[0].strip()
    base_lower = base.lower()
    if base_lower in ("new", "new lead", "identified"):
        return "Identified"
    for stage in VALID_STAGES:
        if stage.lower() == base_lower:
            return stage
    return base if base in VALID_STAGES else "Research"

The agent writes "Research — draft outreach immediately" as the stage. The sync script extracts "Research." This tolerance for freeform input is essential when your data producers are LLMs that add editorial notes to structured fields.

Stage Reconciliation: When Markers Override Markdown

This is the most important logic in the sync script: if outreach marker files exist for a lead, the stage is forced to "Outreach" regardless of what the markdown profile says.

The SDR agent might write a lead profile with Stage: Research, then in the same pipeline run, score it above 60 and create an outreach draft with an .approved marker. The profile still says "Research" because the agent wrote the profile before making the outreach decision. Without reconciliation, the database would show the lead as "Research" when it has already been approved for outreach.

The sync script checks for this:

# Check for stage transition before upserting
existing = get_lead(slug)
if existing and existing.get("stage") and data.get("stage"):
    old_stage = existing["stage"].lower()
    new_stage = data["stage"].lower()
    advanced_stages = ("sequence complete", "re-approach", "qualifying",
                       "meeting", "proposal", "negotiation", "won")
    if old_stage in advanced_stages and new_stage not in advanced_stages:
        del data["stage"]  # Keep the DB stage, don't regress

The database stage is authoritative for advancement. If a lead has reached "Sequence Complete" (all emails sent), the markdown profile cannot regress it back to "Outreach." This prevents a common failure mode where re-running the SDR agent would reset stages of leads that have already completed their outreach sequence.

The outreach_events table is the evidence layer. Every email sent, every bounce, every reply is logged with a timestamp:

patterns = [
    ("*.approved", "approved"),
    ("*.email-1-sent", "email_1_sent"),
    ("*.email-2-sent", "email_2_sent"),
    ("*.email-3-sent", "email_3_sent"),
    ("*.breakup-sent", "breakup_sent"),
]

The marker file taktile.email-2-sent becomes a row in outreach_events with lead_slug='taktile', event_type='email_2_sent', and event_date='2026-03-13'. This table is what makes queries like "which leads have received Email 1 but not Email 2, and it has been 3+ days?" possible.

Query Patterns

The database enables four categories of queries that markdown cannot answer efficiently.

Follow-up scheduling. "Which leads received Email 1 more than three days ago but have not received Email 2?" This requires joining leads with outreach_events, filtering by event type and date arithmetic. In markdown, you would need to scan every marker file, parse dates, and cross-reference. In SQLite, it is a single query.

Sequence completion detection. "Which leads have received all three emails and the breakup, with no reply?" The email sending script checks this before each run to auto-move leads to "Sequence Complete." Without the database, this check would require globbing for four marker files per lead and checking for the absence of a reply marker.

Pipeline metrics. "How many leads are in each stage? What is the reply rate? How many emails were sent this week?" These aggregate queries run daily for the Telegram digest. They complete in milliseconds against SQLite. Computing them from markdown would require parsing every profile and every marker file.

Reply rate calculation. SELECT COUNT(DISTINCT lead_slug) FROM outreach_events WHERE event_type = 'reply' divided by SELECT COUNT(DISTINCT lead_slug) FROM outreach_events WHERE event_type = 'email_1_sent'. This is the north star metric for outreach effectiveness. It runs every day at 20:00 for the pipeline status message.

Auto-Generated pipeline.md

The pipeline summary that humans read (crm/pipeline.md) is auto-generated by the sync script. It is never manually edited. On every sync run, the script queries the database and writes a markdown table:

## Pipeline Summary

| Stage | Count |
|-------|-------|
| Research | 13 |
| Outreach | 7 |
| Sequence Complete | 28 |
| Re-approach | 2 |

## Active Outreach

| Company | Score | Last Email | Next Due |
|---------|-------|------------|----------|
| Taktile | 75 | Email 2 (Mar 13) | Email 3 (Mar 16) |
| Parloa | 68 | Email 1 (Mar 13) | Email 2 (Mar 16) |

This file exists purely for human consumption. It is a rendered view of the database. If it gets corrupted or deleted, the next sync regenerates it.

Why Not a "Real" CRM?

We evaluated three alternatives: Airtable, HubSpot (free tier), and a custom Django app with PostgreSQL.

Airtable would have required the agents to interact with an API. Every lead creation becomes an HTTP request with authentication, rate limits, error handling, and a schema that needs to stay synchronized between the Airtable config and the agent prompts. For 51 leads, the overhead of API integration exceeds the value.

HubSpot solves problems we do not have: multi-user access control, email tracking pixels, meeting scheduling, pipeline visualization. We have six AI agents and two humans. The agents do not need a UI. The humans get a daily Telegram message. HubSpot would add complexity without removing any.

Django + PostgreSQL would have been the "proper" engineering choice. But PostgreSQL needs a running server process, backup configuration, connection pooling, and an ORM or migration framework. SQLite is a single file. You back it up with cp. You inspect it with sqlite3 pipeline.db. You delete it and regenerate it from markdown in under a second.

The honest answer is that SQLite is the right choice because our system is small and does not need concurrency. We have one writer (the sync script) and several readers (the email sender, the reply checker, the Telegram bot, the daily digest). SQLite handles this workload without thinking.

If we had ten agents writing concurrently to the database, we would need PostgreSQL. But we do not. The agents write to markdown files. One sync script writes to SQLite. There is never write contention.

Performance

Current numbers: 51 leads, 96 outreach events, 47 stage transitions. A full sync — parsing all markdown profiles, all marker files, and upserting everything — completes in under one second.

The database file is 180 KB. It uses WAL (Write-Ahead Logging) journal mode for concurrent reads during writes. Foreign keys are enabled. That is the entire performance configuration.

We have not needed an index beyond the primary keys. Every query runs in milliseconds. At our scale, SQLite's performance is not a consideration. It will remain a non-consideration until we have thousands of leads, which is a good problem to have.

Rebuilding From Scratch

Because markdown is the source of truth, the database is disposable. If it gets corrupted, if a schema migration goes wrong, if we want to restructure the tables — we delete the file and run sync.py. Every row is reconstructed from the markdown files. The outreach events are reconstructed from the marker files. The stage transitions are reconstructed from git history (though in practice we rarely need them after a rebuild).

We have done this three times: once to add the stage_transitions table, once to add the telegram_log table, and once after a bug in the sync script produced duplicate outreach events. Each rebuild took under five seconds.

This is the real advantage of a downstream database. It is not precious. You can destroy it without losing anything. The markdown files, which are version-controlled in git, are the durable state.

Key Takeaways

The best database is the one your system already speaks. Our agents speak markdown. Making them write SQL would add complexity without adding capability. The translation happens once, in a sync script, not in every agent run.
Markdown as source of truth, SQLite as query layer. This separation means the database is disposable and rebuildable. Agents never interact with the database directly.
Build tolerance for format variation into the sync layer. LLM outputs are not perfectly consistent. The sync script handles four different score formats and normalizes freeform stage names. This tolerance is essential when your data producers are AI agents.
Stage reconciliation prevents state regression. Marker files (evidence of actions taken) override profile fields (agent-written state) when they conflict. The system trusts actions over declarations.
SQLite is enough when you have one writer and low concurrency. Do not add PostgreSQL because you think you should. Add it when you have a concrete concurrency problem. For 51 leads and one sync script, SQLite is not a compromise — it is the correct choice.

FAQ

What happens when you have 1,000 leads?

SQLite handles millions of rows without issue. The bottleneck would be the markdown parsing in sync.py, which currently takes under a second for 51 leads. At 1,000 leads, sync would take 10-15 seconds — still fast enough for a script that runs twice a day. The first real scaling concern would be git performance with thousands of small files, not SQLite.

Can multiple agents write to the database simultaneously?

They do not need to. Agents write to markdown files, not to the database. The sync script is the only database writer, and it runs once per pipeline execution. There is never write contention. If we needed concurrent database writes, we would switch to PostgreSQL. But the markdown-first architecture means we do not.

How do you handle schema changes?

Delete the database file and run sync.py. The schema is defined in init_db() using CREATE TABLE IF NOT EXISTS. A full rebuild from markdown takes under five seconds. We do not use migration frameworks. The database is disposable.

SifrVentures builds dedicated engineering teams for tech companies. Based in Berlin. Learn how we work | Read more on our blog

DEV Community