Do you really need a vector database? For local AI agents, SQLite handles embeddings just fine — and it's already on your machine.
This article explores a practical approach for local-first AI tools. For context on MCP transports and how agents communicate, see Understanding MCP Server Transports.
The Problem with "Production-Grade" Vector Databases
When building AI agents that use embeddings (for RAG, semantic search, or memory), the default advice is: "Use a vector database like Pinecone, Weaviate, or Chroma."
But for local agents — tools running on your machine, personal assistants, or development prototypes — this advice creates unnecessary complexity:
What you need: What you're told to set up:
┌─────────────────────┐ ┌─────────────────────────────┐
│ Store ~10K vectors │ │ Docker container │
│ Query occasionally │ → │ Separate database server │
│ Works offline │ │ API keys and auth │
│ Easy to backup │ │ Cloud dependencies │
└─────────────────────┘ └─────────────────────────────┘
There's a mismatch between the problem and the solution.
SQLite as an Embedding Store
SQLite is a serverless, file-based database that ships with most operating systems. It turns out it's perfectly capable of storing and querying embeddings for local use cases.
Basic Approach: Store Embeddings as BLOBs
The simplest approach stores embeddings as binary data:
CREATE TABLE embeddings (
id INTEGER PRIMARY KEY,
content TEXT NOT NULL,
embedding BLOB NOT NULL,
metadata JSON,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
import java.nio.ByteBuffer
import java.nio.ByteOrder
import java.sql.Connection
import kotlinx.serialization.json.Json
import kotlinx.serialization.encodeToString
fun storeEmbedding(
conn: Connection,
content: String,
embedding: FloatArray,
metadata: Map<String, Any>? = null
) {
val embeddingBlob = ByteBuffer
.allocate(embedding.size * 4)
.order(ByteOrder.LITTLE_ENDIAN)
.apply { embedding.forEach { putFloat(it) } }
.array()
conn.prepareStatement(
"INSERT INTO embeddings (content, embedding, metadata) VALUES (?, ?, ?)"
).use { stmt ->
stmt.setString(1, content)
stmt.setBytes(2, embeddingBlob)
stmt.setString(3, metadata?.let { Json.encodeToString(it) })
stmt.executeUpdate()
}
}
fun getEmbedding(blob: ByteArray): FloatArray {
val buffer = ByteBuffer.wrap(blob).order(ByteOrder.LITTLE_ENDIAN)
return FloatArray(blob.size / 4) { buffer.getFloat() }
}
Similarity Search: Brute Force Works
For local agents with thousands (not millions) of vectors, brute-force cosine similarity is fast enough:
data class SearchResult(
val id: Int,
val content: String,
val metadata: String?,
val similarity: Double
)
fun searchSimilar(
conn: Connection,
queryEmbedding: FloatArray,
topK: Int = 5
): List<SearchResult> {
val results = mutableListOf<SearchResult>()
conn.prepareStatement(
"SELECT id, content, embedding, metadata FROM embeddings"
).use { stmt ->
stmt.executeQuery().use { rs ->
while (rs.next()) {
val storedVec = getEmbedding(rs.getBytes("embedding"))
val similarity = cosineSimilarity(queryEmbedding, storedVec)
results.add(
SearchResult(
id = rs.getInt("id"),
content = rs.getString("content"),
metadata = rs.getString("metadata"),
similarity = similarity
)
)
}
}
}
return results.sortedByDescending { it.similarity }.take(topK)
}
fun cosineSimilarity(a: FloatArray, b: FloatArray): Double {
val dot = a.zip(b.toList()).sumOf { (x, y) -> (x * y).toDouble() }
val normA = kotlin.math.sqrt(a.sumOf { (it * it).toDouble() })
val normB = kotlin.math.sqrt(b.sumOf { (it * it).toDouble() })
return dot / (normA * normB)
}
Performance Reality Check
How fast is brute-force search? Faster than you'd expect:
| Vectors | Dimensions | Search Time | Memory |
|---|---|---|---|
| 1,000 | 1536 | ~5ms | ~6 MB |
| 10,000 | 1536 | ~50ms | ~60 MB |
| 100,000 | 1536 | ~500ms | ~600 MB |
For a local agent querying a personal knowledge base, 50ms is imperceptible. You don't need approximate nearest neighbors (ANN) algorithms until you're well past 100K vectors.
SQLite-VSS: When You Need More Speed
For larger datasets, sqlite-vss adds vector search capabilities to SQLite using Facebook's FAISS library:
-- Load the extension
.load ./vector0
.load ./vss0
-- Create a virtual table for vector search
CREATE VIRTUAL TABLE vss_embeddings USING vss0(
embedding(1536) -- OpenAI embedding dimensions
);
-- Insert vectors
INSERT INTO vss_embeddings (rowid, embedding)
SELECT id, embedding FROM embeddings;
-- Search with approximate nearest neighbors
SELECT rowid, distance
FROM vss_embeddings
WHERE vss_search(embedding, ?)
LIMIT 10;
This gives you sub-millisecond searches even at 1M+ vectors, while keeping everything in a single SQLite file.
Real-World Example: Local-First AI Tools
Many local-first AI tools use SQLite for codebase indexing and embeddings storage. The pattern is common — a single database file holds everything the agent needs:
┌───────────────────────────────────────────────────────┐
│ Local AI Agent │
├───────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌────────────┐ │
│ │ Embeddings │ │ Chat │ │ Config │ │
│ │ (vectors) │ │ History │ │ & State │ │
│ └──────┬──────┘ └──────┬──────┘ └─────┬──────┘ │
│ │ │ │ │
│ └─────────────────┼────────────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ SQLite DB │ │
│ │ (single file) │ │
│ └─────────────────┘ │
│ │ │
│ ~/.agent/data.db │
└───────────────────────────────────────────────────────┘
Everything lives in one file. Backup? Copy the file. Reset? Delete it. Move to another machine? Take it with you.
Advantages of SQLite for Embeddings
1. Zero Infrastructure
# Pinecone setup:
pip install pinecone-client
# Create account, get API key, configure environment...
# SQLite setup:
# Nothing. It's already there.
No servers, no Docker, no accounts, no API keys. Your agent works offline, on airplanes, and in restricted networks.
2. Single-File Simplicity
Your entire knowledge base is one .db file:
~/.my-agent/
├── config.json
└── knowledge.db ← Everything: embeddings, chat history, metadata
This makes debugging trivial. You can open the database with any SQLite client and inspect exactly what's stored.
3. Atomic Transactions
SQLite gives you ACID transactions for free:
fun updateDocument(
conn: Connection,
docId: Int,
newContent: String,
newEmbedding: FloatArray
) {
conn.autoCommit = false
try {
conn.prepareStatement("DELETE FROM embeddings WHERE doc_id = ?").use { stmt ->
stmt.setInt(1, docId)
stmt.executeUpdate()
}
storeEmbedding(conn, newContent, newEmbedding, mapOf("doc_id" to docId))
conn.commit()
} catch (e: Exception) {
conn.rollback()
throw e
} finally {
conn.autoCommit = true
}
// Either both succeed or both fail
}
No partial updates, no orphaned embeddings, no consistency issues.
4. Rich Querying
Combine vector search with SQL's full power. Pre-filter with SQL, then compute similarity in your application:
fun searchRecentNonArchived(
conn: Connection,
queryEmbedding: FloatArray,
topK: Int = 10
): List<SearchResult> {
val candidates = mutableListOf<SearchResult>()
// SQL handles filtering — only fetch what matches
conn.prepareStatement("""
SELECT id, content, embedding, metadata
FROM embeddings
WHERE created_at > datetime('now', '-7 days')
AND json_extract(metadata, '$.archived') != true
""").use { stmt ->
stmt.executeQuery().use { rs ->
while (rs.next()) {
val storedVec = getEmbedding(rs.getBytes("embedding"))
candidates.add(
SearchResult(
id = rs.getInt("id"),
content = rs.getString("content"),
metadata = rs.getString("metadata"),
similarity = cosineSimilarity(queryEmbedding, storedVec)
)
)
}
}
}
return candidates.sortedByDescending { it.similarity }.take(topK)
}
Try doing that with a pure vector database — most don't support SQL-level filtering before vector search.
5. Portable and Debuggable
# Inspect your embeddings
sqlite3 knowledge.db "SELECT COUNT(*) FROM embeddings"
# Backup before experimenting
cp knowledge.db knowledge.db.backup
# Share with a colleague
scp knowledge.db user@other-machine:~/
Disadvantages and Limitations
1. No Native Vector Operations
SQLite doesn't understand vectors natively. You're either:
- Computing similarity in application code (fine for <100K vectors)
- Using sqlite-vss extension (adds complexity)
// This works, but it's not "native"
val similarities = rows.map { row ->
row to cosineSimilarity(query, getEmbedding(row.embedding))
}
2. Memory Constraints
Brute-force search loads all vectors into memory. At 1M vectors with 1536 dimensions:
1,000,000 × 1,536 × 4 bytes = ~6 GB
That's too much for most laptops. You'll need sqlite-vss or a different approach.
3. No Built-in Sharding
SQLite is single-file by design. If you need to scale across machines, you're on your own:
SQLite:
┌─────────┐
│ One DB │ ← Can't split across servers
└─────────┘
Distributed vector DB:
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Shard 1 │ │ Shard 2 │ │ Shard 3 │ ← Built-in distribution
└─────────┘ └─────────┘ └─────────┘
4. Write Concurrency
SQLite uses file-level locking. Multiple processes writing simultaneously will block:
// Process A: Writing embeddings...
// Process B: Waiting for lock...
// Process C: Waiting for lock...
For single-user local agents, this is rarely a problem. For multi-user services, it's a showstopper.
5. No Managed Updates or Filtering Indexes
Dedicated vector databases offer features like:
- Automatic re-indexing when embeddings change
- Pre-filtering indexes for metadata queries
- Quantization for memory efficiency
With SQLite, you implement these yourself or go without.
When to Use SQLite for Embeddings
| Use Case | SQLite? | Why |
|---|---|---|
| Personal knowledge base | Yes | Thousands of docs, single user |
| Local coding assistant | Yes | Codebase fits in memory |
| Development prototype | Yes | Fast iteration, no setup |
| Chat memory for agent | Yes | Small dataset, needs persistence |
| Production RAG service | No | Need scale, concurrency, managed ops |
| Multi-tenant application | No | Need isolation, distribution |
| Real-time search at scale | No | Need millisecond latency at 10M+ vectors |
When NOT to Use SQLite
Don't use SQLite when:
- You have more than ~500K embeddings
- Multiple services need to write concurrently
- You need sub-10ms search latency at scale
- Your application is multi-tenant
- You're building a service that needs to scale horizontally
In these cases, use a proper vector database: Pinecone for managed simplicity, Weaviate for open-source flexibility, or pgvector if you're already on PostgreSQL.
Practical Implementation Tips
1. Index Your Metadata
CREATE INDEX idx_metadata_type ON embeddings(json_extract(metadata, '$.type'));
CREATE INDEX idx_created_at ON embeddings(created_at);
Pre-filter with SQL before computing similarities.
2. Batch Your Inserts
fun storeEmbeddingsBatch(
conn: Connection,
items: List<Triple<String, FloatArray, Map<String, Any>?>>
) {
conn.prepareStatement(
"INSERT INTO embeddings (content, embedding, metadata) VALUES (?, ?, ?)"
).use { stmt ->
items.forEach { (content, embedding, metadata) ->
val blob = ByteBuffer
.allocate(embedding.size * 4)
.order(ByteOrder.LITTLE_ENDIAN)
.apply { embedding.forEach { putFloat(it) } }
.array()
stmt.setString(1, content)
stmt.setBytes(2, blob)
stmt.setString(3, metadata?.let { Json.encodeToString(it) })
stmt.addBatch()
}
stmt.executeBatch()
}
}
3. Use WAL Mode for Better Concurrency
val conn = DriverManager.getConnection("jdbc:sqlite:knowledge.db").apply {
createStatement().use { stmt ->
stmt.execute("PRAGMA journal_mode=WAL") // Write-ahead logging
stmt.execute("PRAGMA synchronous=NORMAL") // Faster writes, still safe
}
}
4. Consider Dimensionality Reduction
If memory is tight, reduce embedding dimensions:
// Using Smile library for dimensionality reduction
import smile.projection.PCA
// Reduce 1536 → 512 dimensions (loses some accuracy)
val pca = PCA.fit(originalEmbeddings).getProjection(512)
val reducedEmbeddings = originalEmbeddings.map { pca.project(it) }
The Bottom Line
SQLite is a legitimate choice for embedding storage in local AI agents. It's not a hack or a compromise — it's the right tool for single-user, offline-capable, moderate-scale use cases.
The key insight: most local agents don't need distributed vector databases. They need simple, reliable, file-based storage that works without infrastructure. SQLite delivers exactly that.
Start with plain SQLite and brute-force search. If you outgrow it (you probably won't for local use), add sqlite-vss. Only reach for Pinecone or Weaviate when you're building a service, not a tool.
Resources
- SQLite Documentation
- sqlite-vss — Vector similarity search for SQLite
- OpenAI Embeddings Guide
- FAISS — The library behind sqlite-vss
- pgvector — PostgreSQL alternative if you need more power
- Chroma — Embedded vector database (SQLite under the hood!)
Top comments (0)