Zoricic

Posted on Mar 4

SQLite as a Vector Database — Yes, Really

#ai #database #rag #sql

Do you really need a vector database? For local AI agents, SQLite handles embeddings just fine — and it's already on your machine.

This article explores a practical approach for local-first AI tools. For context on MCP transports and how agents communicate, see Understanding MCP Server Transports.

The Problem with "Production-Grade" Vector Databases

When building AI agents that use embeddings (for RAG, semantic search, or memory), the default advice is: "Use a vector database like Pinecone, Weaviate, or Chroma."

But for local agents — tools running on your machine, personal assistants, or development prototypes — this advice creates unnecessary complexity:

What you need:                    What you're told to set up:
┌─────────────────────┐          ┌─────────────────────────────┐
│ Store ~10K vectors  │          │ Docker container            │
│ Query occasionally  │    →     │ Separate database server    │
│ Works offline       │          │ API keys and auth           │
│ Easy to backup      │          │ Cloud dependencies          │
└─────────────────────┘          └─────────────────────────────┘

There's a mismatch between the problem and the solution.

SQLite as an Embedding Store

SQLite is a serverless, file-based database that ships with most operating systems. It turns out it's perfectly capable of storing and querying embeddings for local use cases.

Basic Approach: Store Embeddings as BLOBs

The simplest approach stores embeddings as binary data:

CREATE TABLE embeddings (
    id INTEGER PRIMARY KEY,
    content TEXT NOT NULL,
    embedding BLOB NOT NULL,
    metadata JSON,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

import java.nio.ByteBuffer
import java.nio.ByteOrder
import java.sql.Connection
import kotlinx.serialization.json.Json
import kotlinx.serialization.encodeToString

fun storeEmbedding(
    conn: Connection,
    content: String,
    embedding: FloatArray,
    metadata: Map<String, Any>? = null
) {
    val embeddingBlob = ByteBuffer
        .allocate(embedding.size * 4)
        .order(ByteOrder.LITTLE_ENDIAN)
        .apply { embedding.forEach { putFloat(it) } }
        .array()

    conn.prepareStatement(
        "INSERT INTO embeddings (content, embedding, metadata) VALUES (?, ?, ?)"
    ).use { stmt ->
        stmt.setString(1, content)
        stmt.setBytes(2, embeddingBlob)
        stmt.setString(3, metadata?.let { Json.encodeToString(it) })
        stmt.executeUpdate()
    }
}

fun getEmbedding(blob: ByteArray): FloatArray {
    val buffer = ByteBuffer.wrap(blob).order(ByteOrder.LITTLE_ENDIAN)
    return FloatArray(blob.size / 4) { buffer.getFloat() }
}

Similarity Search: Brute Force Works

For local agents with thousands (not millions) of vectors, brute-force cosine similarity is fast enough:

data class SearchResult(
    val id: Int,
    val content: String,
    val metadata: String?,
    val similarity: Double
)

fun searchSimilar(
    conn: Connection,
    queryEmbedding: FloatArray,
    topK: Int = 5
): List<SearchResult> {
    val results = mutableListOf<SearchResult>()

    conn.prepareStatement(
        "SELECT id, content, embedding, metadata FROM embeddings"
    ).use { stmt ->
        stmt.executeQuery().use { rs ->
            while (rs.next()) {
                val storedVec = getEmbedding(rs.getBytes("embedding"))
                val similarity = cosineSimilarity(queryEmbedding, storedVec)
                results.add(
                    SearchResult(
                        id = rs.getInt("id"),
                        content = rs.getString("content"),
                        metadata = rs.getString("metadata"),
                        similarity = similarity
                    )
                )
            }
        }
    }

    return results.sortedByDescending { it.similarity }.take(topK)
}

fun cosineSimilarity(a: FloatArray, b: FloatArray): Double {
    val dot = a.zip(b.toList()).sumOf { (x, y) -> (x * y).toDouble() }
    val normA = kotlin.math.sqrt(a.sumOf { (it * it).toDouble() })
    val normB = kotlin.math.sqrt(b.sumOf { (it * it).toDouble() })
    return dot / (normA * normB)
}

Performance Reality Check

How fast is brute-force search? Faster than you'd expect:

Vectors	Dimensions	Search Time	Memory
1,000	1536	~5ms	~6 MB
10,000	1536	~50ms	~60 MB
100,000	1536	~500ms	~600 MB

For a local agent querying a personal knowledge base, 50ms is imperceptible. You don't need approximate nearest neighbors (ANN) algorithms until you're well past 100K vectors.

SQLite-VSS: When You Need More Speed

For larger datasets, sqlite-vss adds vector search capabilities to SQLite using Facebook's FAISS library:

-- Load the extension
.load ./vector0
.load ./vss0

-- Create a virtual table for vector search
CREATE VIRTUAL TABLE vss_embeddings USING vss0(
    embedding(1536)  -- OpenAI embedding dimensions
);

-- Insert vectors
INSERT INTO vss_embeddings (rowid, embedding)
SELECT id, embedding FROM embeddings;

-- Search with approximate nearest neighbors
SELECT rowid, distance
FROM vss_embeddings
WHERE vss_search(embedding, ?)
LIMIT 10;

This gives you sub-millisecond searches even at 1M+ vectors, while keeping everything in a single SQLite file.

Real-World Example: Local-First AI Tools

Many local-first AI tools use SQLite for codebase indexing and embeddings storage. The pattern is common — a single database file holds everything the agent needs:

┌───────────────────────────────────────────────────────┐
│                    Local AI Agent                     │
├───────────────────────────────────────────────────────┤
│                                                       │
│  ┌─────────────┐   ┌─────────────┐   ┌────────────┐  │
│  │  Embeddings │   │    Chat     │   │   Config   │  │
│  │  (vectors)  │   │   History   │   │  & State   │  │
│  └──────┬──────┘   └──────┬──────┘   └─────┬──────┘  │
│         │                 │                │         │
│         └─────────────────┼────────────────┘         │
│                           │                          │
│                  ┌────────▼────────┐                 │
│                  │    SQLite DB    │                 │
│                  │  (single file)  │                 │
│                  └─────────────────┘                 │
│                           │                          │
│                  ~/.agent/data.db                    │
└───────────────────────────────────────────────────────┘

Everything lives in one file. Backup? Copy the file. Reset? Delete it. Move to another machine? Take it with you.

Advantages of SQLite for Embeddings

1. Zero Infrastructure

# Pinecone setup:
pip install pinecone-client
# Create account, get API key, configure environment...

# SQLite setup:
# Nothing. It's already there.

No servers, no Docker, no accounts, no API keys. Your agent works offline, on airplanes, and in restricted networks.

2. Single-File Simplicity

Your entire knowledge base is one .db file:

~/.my-agent/
├── config.json
└── knowledge.db    ← Everything: embeddings, chat history, metadata

This makes debugging trivial. You can open the database with any SQLite client and inspect exactly what's stored.

3. Atomic Transactions

SQLite gives you ACID transactions for free:

fun updateDocument(
    conn: Connection,
    docId: Int,
    newContent: String,
    newEmbedding: FloatArray
) {
    conn.autoCommit = false
    try {
        conn.prepareStatement("DELETE FROM embeddings WHERE doc_id = ?").use { stmt ->
            stmt.setInt(1, docId)
            stmt.executeUpdate()
        }
        storeEmbedding(conn, newContent, newEmbedding, mapOf("doc_id" to docId))
        conn.commit()
    } catch (e: Exception) {
        conn.rollback()
        throw e
    } finally {
        conn.autoCommit = true
    }
    // Either both succeed or both fail
}

No partial updates, no orphaned embeddings, no consistency issues.

4. Rich Querying

Combine vector search with SQL's full power. Pre-filter with SQL, then compute similarity in your application:

fun searchRecentNonArchived(
    conn: Connection,
    queryEmbedding: FloatArray,
    topK: Int = 10
): List<SearchResult> {
    val candidates = mutableListOf<SearchResult>()

    // SQL handles filtering — only fetch what matches
    conn.prepareStatement("""
        SELECT id, content, embedding, metadata
        FROM embeddings
        WHERE created_at > datetime('now', '-7 days')
          AND json_extract(metadata, '$.archived') != true
    """).use { stmt ->
        stmt.executeQuery().use { rs ->
            while (rs.next()) {
                val storedVec = getEmbedding(rs.getBytes("embedding"))
                candidates.add(
                    SearchResult(
                        id = rs.getInt("id"),
                        content = rs.getString("content"),
                        metadata = rs.getString("metadata"),
                        similarity = cosineSimilarity(queryEmbedding, storedVec)
                    )
                )
            }
        }
    }

    return candidates.sortedByDescending { it.similarity }.take(topK)
}

Try doing that with a pure vector database — most don't support SQL-level filtering before vector search.

5. Portable and Debuggable

# Inspect your embeddings
sqlite3 knowledge.db "SELECT COUNT(*) FROM embeddings"

# Backup before experimenting
cp knowledge.db knowledge.db.backup

# Share with a colleague
scp knowledge.db user@other-machine:~/

Disadvantages and Limitations

1. No Native Vector Operations

SQLite doesn't understand vectors natively. You're either:

Computing similarity in application code (fine for <100K vectors)
Using sqlite-vss extension (adds complexity)

// This works, but it's not "native"
val similarities = rows.map { row ->
    row to cosineSimilarity(query, getEmbedding(row.embedding))
}

2. Memory Constraints

Brute-force search loads all vectors into memory. At 1M vectors with 1536 dimensions:

1,000,000 × 1,536 × 4 bytes = ~6 GB

That's too much for most laptops. You'll need sqlite-vss or a different approach.

3. No Built-in Sharding

SQLite is single-file by design. If you need to scale across machines, you're on your own:

SQLite:
┌─────────┐
│ One DB  │  ← Can't split across servers
└─────────┘

Distributed vector DB:
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Shard 1 │ │ Shard 2 │ │ Shard 3 │  ← Built-in distribution
└─────────┘ └─────────┘ └─────────┘

4. Write Concurrency

SQLite uses file-level locking. Multiple processes writing simultaneously will block:

// Process A: Writing embeddings...
// Process B: Waiting for lock...
// Process C: Waiting for lock...

For single-user local agents, this is rarely a problem. For multi-user services, it's a showstopper.

5. No Managed Updates or Filtering Indexes

Dedicated vector databases offer features like:

Automatic re-indexing when embeddings change
Pre-filtering indexes for metadata queries
Quantization for memory efficiency

With SQLite, you implement these yourself or go without.

When to Use SQLite for Embeddings

Use Case	SQLite?	Why
Personal knowledge base	Yes	Thousands of docs, single user
Local coding assistant	Yes	Codebase fits in memory
Development prototype	Yes	Fast iteration, no setup
Chat memory for agent	Yes	Small dataset, needs persistence
Production RAG service	No	Need scale, concurrency, managed ops
Multi-tenant application	No	Need isolation, distribution
Real-time search at scale	No	Need millisecond latency at 10M+ vectors

When NOT to Use SQLite

Don't use SQLite when:

You have more than ~500K embeddings
Multiple services need to write concurrently
You need sub-10ms search latency at scale
Your application is multi-tenant
You're building a service that needs to scale horizontally

In these cases, use a proper vector database: Pinecone for managed simplicity, Weaviate for open-source flexibility, or pgvector if you're already on PostgreSQL.

Practical Implementation Tips

1. Index Your Metadata

CREATE INDEX idx_metadata_type ON embeddings(json_extract(metadata, '$.type'));
CREATE INDEX idx_created_at ON embeddings(created_at);

Pre-filter with SQL before computing similarities.

2. Batch Your Inserts

fun storeEmbeddingsBatch(
    conn: Connection,
    items: List<Triple<String, FloatArray, Map<String, Any>?>>
) {
    conn.prepareStatement(
        "INSERT INTO embeddings (content, embedding, metadata) VALUES (?, ?, ?)"
    ).use { stmt ->
        items.forEach { (content, embedding, metadata) ->
            val blob = ByteBuffer
                .allocate(embedding.size * 4)
                .order(ByteOrder.LITTLE_ENDIAN)
                .apply { embedding.forEach { putFloat(it) } }
                .array()
            stmt.setString(1, content)
            stmt.setBytes(2, blob)
            stmt.setString(3, metadata?.let { Json.encodeToString(it) })
            stmt.addBatch()
        }
        stmt.executeBatch()
    }
}

3. Use WAL Mode for Better Concurrency

val conn = DriverManager.getConnection("jdbc:sqlite:knowledge.db").apply {
    createStatement().use { stmt ->
        stmt.execute("PRAGMA journal_mode=WAL")    // Write-ahead logging
        stmt.execute("PRAGMA synchronous=NORMAL")  // Faster writes, still safe
    }
}

4. Consider Dimensionality Reduction

If memory is tight, reduce embedding dimensions:

// Using Smile library for dimensionality reduction
import smile.projection.PCA

// Reduce 1536 → 512 dimensions (loses some accuracy)
val pca = PCA.fit(originalEmbeddings).getProjection(512)
val reducedEmbeddings = originalEmbeddings.map { pca.project(it) }

The Bottom Line

SQLite is a legitimate choice for embedding storage in local AI agents. It's not a hack or a compromise — it's the right tool for single-user, offline-capable, moderate-scale use cases.

The key insight: most local agents don't need distributed vector databases. They need simple, reliable, file-based storage that works without infrastructure. SQLite delivers exactly that.

Start with plain SQLite and brute-force search. If you outgrow it (you probably won't for local use), add sqlite-vss. Only reach for Pinecone or Weaviate when you're building a service, not a tool.

Resources

SQLite Documentation
sqlite-vss — Vector similarity search for SQLite
OpenAI Embeddings Guide
FAISS — The library behind sqlite-vss
pgvector — PostgreSQL alternative if you need more power
Chroma — Embedded vector database (SQLite under the hood!)

DEV Community