DEV Community

Zrcic
Zrcic

Posted on

SQLite as a Vector Database — Yes, Really

Do you really need a vector database? For local AI agents, SQLite handles embeddings just fine — and it's already on your machine.

This article explores a practical approach for local-first AI tools. For context on MCP transports and how agents communicate, see Understanding MCP Server Transports.

The Problem with "Production-Grade" Vector Databases

When building AI agents that use embeddings (for RAG, semantic search, or memory), the default advice is: "Use a vector database like Pinecone, Weaviate, or Chroma."

But for local agents — tools running on your machine, personal assistants, or development prototypes — this advice creates unnecessary complexity:

What you need:                    What you're told to set up:
┌─────────────────────┐          ┌─────────────────────────────┐
│ Store ~10K vectors  │          │ Docker container            │
│ Query occasionally  │    →     │ Separate database server    │
│ Works offline       │          │ API keys and auth           │
│ Easy to backup      │          │ Cloud dependencies          │
└─────────────────────┘          └─────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

There's a mismatch between the problem and the solution.

SQLite as an Embedding Store

SQLite is a serverless, file-based database that ships with most operating systems. It turns out it's perfectly capable of storing and querying embeddings for local use cases.

Basic Approach: Store Embeddings as BLOBs

The simplest approach stores embeddings as binary data:

CREATE TABLE embeddings (
    id INTEGER PRIMARY KEY,
    content TEXT NOT NULL,
    embedding BLOB NOT NULL,
    metadata JSON,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
Enter fullscreen mode Exit fullscreen mode
import java.nio.ByteBuffer
import java.nio.ByteOrder
import java.sql.Connection
import kotlinx.serialization.json.Json
import kotlinx.serialization.encodeToString

fun storeEmbedding(
    conn: Connection,
    content: String,
    embedding: FloatArray,
    metadata: Map<String, Any>? = null
) {
    val embeddingBlob = ByteBuffer
        .allocate(embedding.size * 4)
        .order(ByteOrder.LITTLE_ENDIAN)
        .apply { embedding.forEach { putFloat(it) } }
        .array()

    conn.prepareStatement(
        "INSERT INTO embeddings (content, embedding, metadata) VALUES (?, ?, ?)"
    ).use { stmt ->
        stmt.setString(1, content)
        stmt.setBytes(2, embeddingBlob)
        stmt.setString(3, metadata?.let { Json.encodeToString(it) })
        stmt.executeUpdate()
    }
}

fun getEmbedding(blob: ByteArray): FloatArray {
    val buffer = ByteBuffer.wrap(blob).order(ByteOrder.LITTLE_ENDIAN)
    return FloatArray(blob.size / 4) { buffer.getFloat() }
}
Enter fullscreen mode Exit fullscreen mode

Similarity Search: Brute Force Works

For local agents with thousands (not millions) of vectors, brute-force cosine similarity is fast enough:

data class SearchResult(
    val id: Int,
    val content: String,
    val metadata: String?,
    val similarity: Double
)

fun searchSimilar(
    conn: Connection,
    queryEmbedding: FloatArray,
    topK: Int = 5
): List<SearchResult> {
    val results = mutableListOf<SearchResult>()

    conn.prepareStatement(
        "SELECT id, content, embedding, metadata FROM embeddings"
    ).use { stmt ->
        stmt.executeQuery().use { rs ->
            while (rs.next()) {
                val storedVec = getEmbedding(rs.getBytes("embedding"))
                val similarity = cosineSimilarity(queryEmbedding, storedVec)
                results.add(
                    SearchResult(
                        id = rs.getInt("id"),
                        content = rs.getString("content"),
                        metadata = rs.getString("metadata"),
                        similarity = similarity
                    )
                )
            }
        }
    }

    return results.sortedByDescending { it.similarity }.take(topK)
}

fun cosineSimilarity(a: FloatArray, b: FloatArray): Double {
    val dot = a.zip(b.toList()).sumOf { (x, y) -> (x * y).toDouble() }
    val normA = kotlin.math.sqrt(a.sumOf { (it * it).toDouble() })
    val normB = kotlin.math.sqrt(b.sumOf { (it * it).toDouble() })
    return dot / (normA * normB)
}
Enter fullscreen mode Exit fullscreen mode

Performance Reality Check

How fast is brute-force search? Faster than you'd expect:

Vectors Dimensions Search Time Memory
1,000 1536 ~5ms ~6 MB
10,000 1536 ~50ms ~60 MB
100,000 1536 ~500ms ~600 MB

For a local agent querying a personal knowledge base, 50ms is imperceptible. You don't need approximate nearest neighbors (ANN) algorithms until you're well past 100K vectors.

SQLite-VSS: When You Need More Speed

For larger datasets, sqlite-vss adds vector search capabilities to SQLite using Facebook's FAISS library:

-- Load the extension
.load ./vector0
.load ./vss0

-- Create a virtual table for vector search
CREATE VIRTUAL TABLE vss_embeddings USING vss0(
    embedding(1536)  -- OpenAI embedding dimensions
);

-- Insert vectors
INSERT INTO vss_embeddings (rowid, embedding)
SELECT id, embedding FROM embeddings;

-- Search with approximate nearest neighbors
SELECT rowid, distance
FROM vss_embeddings
WHERE vss_search(embedding, ?)
LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

This gives you sub-millisecond searches even at 1M+ vectors, while keeping everything in a single SQLite file.

Real-World Example: Local-First AI Tools

Many local-first AI tools use SQLite for codebase indexing and embeddings storage. The pattern is common — a single database file holds everything the agent needs:

┌───────────────────────────────────────────────────────┐
│                    Local AI Agent                     │
├───────────────────────────────────────────────────────┤
│                                                       │
│  ┌─────────────┐   ┌─────────────┐   ┌────────────┐  │
│  │  Embeddings │   │    Chat     │   │   Config   │  │
│  │  (vectors)  │   │   History   │   │  & State   │  │
│  └──────┬──────┘   └──────┬──────┘   └─────┬──────┘  │
│         │                 │                │         │
│         └─────────────────┼────────────────┘         │
│                           │                          │
│                  ┌────────▼────────┐                 │
│                  │    SQLite DB    │                 │
│                  │  (single file)  │                 │
│                  └─────────────────┘                 │
│                           │                          │
│                  ~/.agent/data.db                    │
└───────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Everything lives in one file. Backup? Copy the file. Reset? Delete it. Move to another machine? Take it with you.

Advantages of SQLite for Embeddings

1. Zero Infrastructure

# Pinecone setup:
pip install pinecone-client
# Create account, get API key, configure environment...

# SQLite setup:
# Nothing. It's already there.
Enter fullscreen mode Exit fullscreen mode

No servers, no Docker, no accounts, no API keys. Your agent works offline, on airplanes, and in restricted networks.

2. Single-File Simplicity

Your entire knowledge base is one .db file:

~/.my-agent/
├── config.json
└── knowledge.db    ← Everything: embeddings, chat history, metadata
Enter fullscreen mode Exit fullscreen mode

This makes debugging trivial. You can open the database with any SQLite client and inspect exactly what's stored.

3. Atomic Transactions

SQLite gives you ACID transactions for free:

fun updateDocument(
    conn: Connection,
    docId: Int,
    newContent: String,
    newEmbedding: FloatArray
) {
    conn.autoCommit = false
    try {
        conn.prepareStatement("DELETE FROM embeddings WHERE doc_id = ?").use { stmt ->
            stmt.setInt(1, docId)
            stmt.executeUpdate()
        }
        storeEmbedding(conn, newContent, newEmbedding, mapOf("doc_id" to docId))
        conn.commit()
    } catch (e: Exception) {
        conn.rollback()
        throw e
    } finally {
        conn.autoCommit = true
    }
    // Either both succeed or both fail
}
Enter fullscreen mode Exit fullscreen mode

No partial updates, no orphaned embeddings, no consistency issues.

4. Rich Querying

Combine vector search with SQL's full power. Pre-filter with SQL, then compute similarity in your application:

fun searchRecentNonArchived(
    conn: Connection,
    queryEmbedding: FloatArray,
    topK: Int = 10
): List<SearchResult> {
    val candidates = mutableListOf<SearchResult>()

    // SQL handles filtering — only fetch what matches
    conn.prepareStatement("""
        SELECT id, content, embedding, metadata
        FROM embeddings
        WHERE created_at > datetime('now', '-7 days')
          AND json_extract(metadata, '$.archived') != true
    """).use { stmt ->
        stmt.executeQuery().use { rs ->
            while (rs.next()) {
                val storedVec = getEmbedding(rs.getBytes("embedding"))
                candidates.add(
                    SearchResult(
                        id = rs.getInt("id"),
                        content = rs.getString("content"),
                        metadata = rs.getString("metadata"),
                        similarity = cosineSimilarity(queryEmbedding, storedVec)
                    )
                )
            }
        }
    }

    return candidates.sortedByDescending { it.similarity }.take(topK)
}
Enter fullscreen mode Exit fullscreen mode

Try doing that with a pure vector database — most don't support SQL-level filtering before vector search.

5. Portable and Debuggable

# Inspect your embeddings
sqlite3 knowledge.db "SELECT COUNT(*) FROM embeddings"

# Backup before experimenting
cp knowledge.db knowledge.db.backup

# Share with a colleague
scp knowledge.db user@other-machine:~/
Enter fullscreen mode Exit fullscreen mode

Disadvantages and Limitations

1. No Native Vector Operations

SQLite doesn't understand vectors natively. You're either:

  • Computing similarity in application code (fine for <100K vectors)
  • Using sqlite-vss extension (adds complexity)
// This works, but it's not "native"
val similarities = rows.map { row ->
    row to cosineSimilarity(query, getEmbedding(row.embedding))
}
Enter fullscreen mode Exit fullscreen mode

2. Memory Constraints

Brute-force search loads all vectors into memory. At 1M vectors with 1536 dimensions:

1,000,000 × 1,536 × 4 bytes = ~6 GB
Enter fullscreen mode Exit fullscreen mode

That's too much for most laptops. You'll need sqlite-vss or a different approach.

3. No Built-in Sharding

SQLite is single-file by design. If you need to scale across machines, you're on your own:

SQLite:
┌─────────┐
│ One DB  │  ← Can't split across servers
└─────────┘

Distributed vector DB:
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Shard 1 │ │ Shard 2 │ │ Shard 3 │  ← Built-in distribution
└─────────┘ └─────────┘ └─────────┘
Enter fullscreen mode Exit fullscreen mode

4. Write Concurrency

SQLite uses file-level locking. Multiple processes writing simultaneously will block:

// Process A: Writing embeddings...
// Process B: Waiting for lock...
// Process C: Waiting for lock...
Enter fullscreen mode Exit fullscreen mode

For single-user local agents, this is rarely a problem. For multi-user services, it's a showstopper.

5. No Managed Updates or Filtering Indexes

Dedicated vector databases offer features like:

  • Automatic re-indexing when embeddings change
  • Pre-filtering indexes for metadata queries
  • Quantization for memory efficiency

With SQLite, you implement these yourself or go without.

When to Use SQLite for Embeddings

Use Case SQLite? Why
Personal knowledge base Yes Thousands of docs, single user
Local coding assistant Yes Codebase fits in memory
Development prototype Yes Fast iteration, no setup
Chat memory for agent Yes Small dataset, needs persistence
Production RAG service No Need scale, concurrency, managed ops
Multi-tenant application No Need isolation, distribution
Real-time search at scale No Need millisecond latency at 10M+ vectors

When NOT to Use SQLite

Don't use SQLite when:

  • You have more than ~500K embeddings
  • Multiple services need to write concurrently
  • You need sub-10ms search latency at scale
  • Your application is multi-tenant
  • You're building a service that needs to scale horizontally

In these cases, use a proper vector database: Pinecone for managed simplicity, Weaviate for open-source flexibility, or pgvector if you're already on PostgreSQL.

Practical Implementation Tips

1. Index Your Metadata

CREATE INDEX idx_metadata_type ON embeddings(json_extract(metadata, '$.type'));
CREATE INDEX idx_created_at ON embeddings(created_at);
Enter fullscreen mode Exit fullscreen mode

Pre-filter with SQL before computing similarities.

2. Batch Your Inserts

fun storeEmbeddingsBatch(
    conn: Connection,
    items: List<Triple<String, FloatArray, Map<String, Any>?>>
) {
    conn.prepareStatement(
        "INSERT INTO embeddings (content, embedding, metadata) VALUES (?, ?, ?)"
    ).use { stmt ->
        items.forEach { (content, embedding, metadata) ->
            val blob = ByteBuffer
                .allocate(embedding.size * 4)
                .order(ByteOrder.LITTLE_ENDIAN)
                .apply { embedding.forEach { putFloat(it) } }
                .array()
            stmt.setString(1, content)
            stmt.setBytes(2, blob)
            stmt.setString(3, metadata?.let { Json.encodeToString(it) })
            stmt.addBatch()
        }
        stmt.executeBatch()
    }
}
Enter fullscreen mode Exit fullscreen mode

3. Use WAL Mode for Better Concurrency

val conn = DriverManager.getConnection("jdbc:sqlite:knowledge.db").apply {
    createStatement().use { stmt ->
        stmt.execute("PRAGMA journal_mode=WAL")    // Write-ahead logging
        stmt.execute("PRAGMA synchronous=NORMAL")  // Faster writes, still safe
    }
}
Enter fullscreen mode Exit fullscreen mode

4. Consider Dimensionality Reduction

If memory is tight, reduce embedding dimensions:

// Using Smile library for dimensionality reduction
import smile.projection.PCA

// Reduce 1536 → 512 dimensions (loses some accuracy)
val pca = PCA.fit(originalEmbeddings).getProjection(512)
val reducedEmbeddings = originalEmbeddings.map { pca.project(it) }
Enter fullscreen mode Exit fullscreen mode

The Bottom Line

SQLite is a legitimate choice for embedding storage in local AI agents. It's not a hack or a compromise — it's the right tool for single-user, offline-capable, moderate-scale use cases.

The key insight: most local agents don't need distributed vector databases. They need simple, reliable, file-based storage that works without infrastructure. SQLite delivers exactly that.

Start with plain SQLite and brute-force search. If you outgrow it (you probably won't for local use), add sqlite-vss. Only reach for Pinecone or Weaviate when you're building a service, not a tool.


Resources

Top comments (0)