Introduction
In this post, we'll break down four vector storage patterns supported by LM-Kit.
LM-Kit simplifies the complexity of embedding storage by offering a unified, developer-friendly interface that supports both instant prototyping and scalable deployment.
It supports four storage patterns, each tailored to different stages of your project:
- In-Memory: Ideal for fast prototyping and low-volume tasks with zero setup.
- Built-In Vector DB: Self-contained file-based storage for local tools or offline apps.
- Qdrant Vector Store: External high-performance DB for cloud or large-scale deployments.
- Custom IVectorStore: Build your own backend to integrate with proprietary systems.
All methods use the same DataSource
abstraction, so you can switch storage backends without changing your code logic.
Modern AI apps, from semantic search to retrieval-augmented generation, rely on embeddings to turn text or images into dense vectors. LM-Kit gives you a few ways to handle those vectors: you can compute them in memory when needed or store them for the long haul. Each approach has tradeoffs in speed, complexity, and cost.
If you're new to embeddings, check out our Embeddings Glossary. Or better yet, talk to LM-Kit Maestro, our free chatbot generator: GitHub: LM-Kit/Maestro
The Four Patterns
LM-Kit supports four main embedding storage patterns, ranging from ephemeral in-memory use to persistent vector databases, so you can match your infrastructure to the needs of your app.
- On-the-Fly (In-Memory) Embeddings
-
Persistent Storage with an External Vector Database
- Prebuilt Qdrant Vector Store
- Custom Vector Store via
IVectorStore
- Persistent Storage with LM-Kit's Built-In Vector DB
LM-Kit uses a DataSource class to manage all of this. It's your main tool for embedding storage, representing a collection that can handle anything from text and documents to images or web pages. A DataSource element contains multiple Section elements, and each section holds TextPartition¹ objects (basically embedding vectors). Metadata can be associated to DataSource and to Section structures.
-
DataSource (can include metadata)
- → Sections (can include metadata)
- → TextPartitions (stores dense vectors)
¹ The "Text" in TextPartition is a bit misleading now. These partitions can handle images too.
On-the-Fly Embeddings (In-Memory)
How it works
When you need an embedding, LM-Kit calls the model, gets the vector, and keeps it in RAM for immediate use. Nothing is written to disk. No external service needed.
Code Example
Create an in-memory vector database and perform retrieval:
// Define some strings from which we want to generate embeddings
string[] examples =
{
"How do I bake a chocolate cake?",
"What is the recipe for chocolate cake?",
"I want to make a chocolate cake.",
"Chocolate cake is delicious.",
"How do I cook pasta?",
"I need instructions to bake a cake.",
"Baking requires precise measurements.",
"I like vanilla ice cream.",
"The weather is sunny today.",
"What is the capital of France?",
"Paris is a beautiful city.",
"How can I improve my coding skills?",
"Programming requires practice."
};
// Load the embedding model
var model = LM.LoadFromModelID("nomic-embed-text");
// Specify optional metadata to attach to the new collection (a.k.a. DataSource)
var collectionMetadata = new MetadataCollection
{
{ "description", "my description" },
{ "another-pair-key", "another-pair-value" }
};
// Create a new in-memory vector database
var collection = DataSource.CreateInMemoryDataSource(
"my-collection",
model,
collectionMetadata);
// Compute embeddings to insert into the collection
var embedder = new Embedder(model);
List<DataSource.VectorEntry> vectorEntries = new List<DataSource.VectorEntry>();
// Run multithreaded embedding on the list of examples
var embeddings = embedder.GetEmbeddings(examples);
for (int index = 0; index < examples.Length; index++)
{
vectorEntries.Add(new DataSource.VectorEntry(
vector: embeddings[index],
payload: examples[index]));
}
const string SectionIdentifier = "my-section-identifier";
// Specify optional metadata to attach to the new section.
// Note: a single collection can contain multiple sections.
var sectionMetadata = new MetadataCollection
{
{ "description", "my description" },
{ "another-pair-key", "another-pair-value" }
};
// Add the computed embedding vectors to the collection
collection.Upsert(
SectionIdentifier,
vectorEntries,
sectionMetadata);
// Now perform search
// Build the query vector
string query = "How do I bake a chocolate cake?";
var queryVector = new Embedder(model).GetEmbeddings(query);
// Search for similar vectors across partitions
var similarPartitions = VectorSearch.FindMatchingPartitions(
[collection],
model,
queryVector);
// Do something with the search results...
This example demonstrated how to set up an in-memory vector store using DataSource.CreateInMemoryDataSource, generate embeddings with a model, and organize them into a section with optional metadata. It finishes by performing a semantic search using a query vector. Everything runs locally in RAM, making it ideal for fast prototyping, real-time classification, or experimentation without persistent storage.
In-Memory Embeddings: When and Why to Use Them
Aspect | Details |
---|---|
Best When | • Rapid prototyping or experimentation • One-shot queries (no need to reuse later) • Small datasets that fit in memory • Real-time classification/semantic search |
Upsides | • Zero infrastructure or setup • Instant start • No file I/O overhead |
Downsides | • Lost on restart • Not suitable for large collections • No sharing across processes |
Serialization Note
Although referred to as "in-memory", any DataSource
instance can be serialized to a file or stream using the Serialize() method. It can then be fully restored into memory with the Deserialize() method.
This provides flexibility to:
- Save your in-memory collections between sessions
- Store intermediate states during experimentation
- Transfer embeddings across environments without requiring an external vector database
Persistent Storage with an External Vector DB
For production use or anything large-scale, you'll want to persist your vectors in a proper database. LM-Kit supports this through the IVectorStore abstraction.
Qdrant Vector Store (Prebuilt)
LM-Kit offers an out-of-the-box integration with Qdrant via the QdrantEmbeddingStore class. Qdrant is an open-source, high-performance vector database that supports HNSW indexing and advanced payload filtering.
QdrantEmbeddingStore is a simple implementation of the IVectorStore interface. It has been open-sourced and is available as part of the dedicated LM-Kit.NET.Data.Connectors.Qdrant package.
The source code for this package is hosted in the LM-Kit.NET Data Connectors GitHub repository.
Additional prebuilt vector store integrations will be added progressively to the same repository. If you require a specific implementation on a short timeline, feel free to reach out to our team.
// Initializing store
// We're using local environment that we've started with: docker run -p 6333:6333 -p 6334:6334
// Check this tutorial to setup qdrant local environment: https://qdrant.tech/documentation/quickstart/
var store = new QdrantEmbeddingStore(new Uri("http://localhost:6334"));
var model = LM.LoadFromModelID("nomic-embed-text");
var collection = DataSource.CreateVectorStoreDataSource(store, "my-collection", model);
Qdrant Vector Store: When and Why to Use it
Aspect | Details |
---|---|
Best When | • Production-scale semantic search • Cloud or distributed deployments • Need advanced filtering by metadata • Sharing embeddings across multiple services |
Upsides | • Battle-tested, open-source vector DB • High performance (HNSW indexing) • Powerful metadata filtering • Scales horizontally |
Downsides | • Requires standing up a Qdrant instance • Network latency vs local • Additional operational overhead |
Custom Vector Store with IVectorStore
If you're building your own backend or want to hook into existing systems, just implement the IVectorStore interface.
public interface IVectorStore
{
public Task<bool> CollectionExistsAsync(string collectionIdentifier, CancellationToken cancellationToken = default);
public Task CreateCollectionAsync(string collectionIdentifier, uint vectorSize, CancellationToken cancellationToken = default);
public Task DeleteCollectionAsync(string collectionIdentifier, CancellationToken cancellationToken = default);
public Task UpsertAsync(string collectionIdentifier, string id, float[] vectors, MetadataCollection metadata, CancellationToken cancellationToken = default);
public Task DeleteFromMetadataAsync(string collectionIdentifier, MetadataCollection metadata, CancellationToken cancellationToken = default);
public Task UpdateMetadataAsync(string collectionIdentifier, string id, MetadataCollection metadata, bool clearFirst, CancellationToken cancellationToken = default);
public Task<MetadataCollection?> GetMetadataAsync(string collectionIdentifier, string id, CancellationToken cancellationToken = default);
public Task<List<VectorRecord>> RetrieveFromMetadataAsync(string collectionIdentifier, MetadataCollection metadata, bool getVector, bool getMetadata, CancellationToken cancellationToken = default);
public Task<List<VectorRecord>> SearchSimilarVectorsAsync(string collectionIdentifier, float[] vector, uint limit, bool getVector, bool getMetadata, CancellationToken cancellationToken = default);
}
Custom Vector Store: When and Why to Build Your Own
Aspect | Details |
---|---|
Best When | • Integrating with proprietary systems • Using an existing vector DB not yet supported by LM-Kit • Need custom indexing or sharding logic • Full control over storage/retrieval |
Upsides | • Total flexibility • Can leverage existing infrastructure • Tailored to your exact requirements |
Downsides | • You own the implementation and maintenance • More upfront development effort |
Persistent Storage with LM-Kit's Built-In Vector DB
When you need durable embedding storage without deploying an external service, LM-Kit's built-in vector database is your go-to solution. Think of it as a SQLite for dense vectors: a self-contained, file-based engine optimized for storing and querying embeddings at scale. Designed to handle millions of vectors on a single node, it delivers low-latency insertions, deletions and searches even as your dataset grows.
Under the hood, it stores vectors and metadata in an optimized file format and provides two clear APIs for managing and querying the data:
-
DataSource.CreateFileDataSource(path, name, model, metadata, overwrite: true)
- Initialize or overwrite a local vector store at the specified file path. -
DataSource.LoadFromFile(path, model, readOnly: true)
- Reopen an existing store for querying or modification.
These methods let you insert, delete, and search embeddings entirely on disk. This makes the built-in store ideal for rapid prototyping, desktop tools, or any scenario where you want portable, versionable vector storage without standing up a full vector-DB cluster.
Now, let's dive into some source code to see how LM-Kit's built-in vector storage works in practice, from creating a local database to querying it later.
Creating and Populating a Local Vector Database
// Define some strings from which we want to generate embeddings
string[] examples =
{
"How do I bake a chocolate cake?",
"What is the recipe for chocolate cake?",
"I want to make a chocolate cake.",
"Chocolate cake is delicious.",
"How do I cook pasta?",
"I need instructions to bake a cake.",
"Baking requires precise measurements.",
"I like vanilla ice cream.",
"The weather is sunny today.",
"What is the capital of France?",
"Paris is a beautiful city.",
"How can I improve my coding skills?",
"Programming requires practice."
};
// Load the embedding model
var model = LM.LoadFromModelID("nomic-embed-text");
// Specify optional metadata to attach to the new collection (a.k.a. DataSource)
var collectionMetadata = new MetadataCollection
{
{ "description", "my description" },
{ "another-pair-key", "another-pair-value" }
};
// Create a new local vector database (overwriting if it already exists)
const string CollectionPath = "d:\\collection.ds";
var collection = DataSource.CreateFileDataSource(
CollectionPath,
"my-collection",
model,
collectionMetadata,
overwrite: true);
// Compute embeddings to insert into the collection
var embedder = new Embedder(model);
List<DataSource.VectorEntry> vectorEntries = new List<DataSource.VectorEntry>();
// Run multithreaded embedding on the list of examples
var embeddings = embedder.GetEmbeddings(examples);
for (int index = 0; index < examples.Length; index++)
{
vectorEntries.Add(new DataSource.VectorEntry(
vector: embeddings[index],
payload: examples[index]));
}
const string SectionIdentifier = "my-section-identifier";
// Specify optional metadata to attach to the new section.
// Note: a single collection can contain multiple sections.
var sectionMetadata = new MetadataCollection
{
{ "description", "my description" },
{ "another-pair-key", "another-pair-value" }
};
// Add the computed embedding vectors to the collection
collection.Upsert(
SectionIdentifier,
vectorEntries,
sectionMetadata);
// Close the database
collection.Dispose();
Loading and Querying an Existing Database
// Load the embedding model
var model = LM.LoadFromModelID("nomic-embed-text");
const string CollectionPath = "d:\\collection.ds";
// Load our previously created database in read-only mode (sufficient for querying)
var collection = DataSource.LoadFromFile(
CollectionPath,
model,
readOnly: true);
// Build the query vector
string query = "How do I bake a chocolate cake?";
var queryVector = new Embedder(model).GetEmbeddings(query);
// Search for similar vectors across partitions
var similarPartitions = VectorSearch.FindMatchingPartitions(
[collection],
model,
queryVector);
// Do something with the search results...
Together, these two snippets show the full lifecycle of LM-Kit's built-in vector storage: how to create, populate, persist, and later reload your embedding collection for querying, all without relying on external infrastructure.
🎁 A CEO's Modest Proposal for the Brave
I'm offering a gift to anyone who manages to implement a faster .NET version of the 2 scripts above, without using LM-Kit.
LM-Kit's Built-In Vector DB: The Unsung Hero
Aspect | Details |
---|---|
Best When | • Desktop or local applications • Offline-first tools • Medium-scale datasets (up to millions of vectors) • No external database infrastructure available |
Upsides | • Zero external dependencies • File-based portability • Fast local queries • Version control friendly |
Downsides | • Single-machine scale • Not designed for distributed queries • Limited compared to specialized vector DBs |
Conclusion
Embedding Storage Methods Compared
🔧 Method | ✅ Best For | 💾 Persistence | 📈 Scale | 🌐 Infra Required |
---|---|---|---|---|
In-Memory | Quick tests, small-scale prototyping | Temporary (can serialize manually) | Low | None |
Built-In Vector DB | Local apps, offline tools, medium-scale use | Yes (file-based) | Medium (single machine) | None |
Qdrant Vector Store | High-scale, distributed or cloud deployments | Yes | High | Qdrant instance |
Custom via IVectorStore | Custom backends, proprietary infra | Yes (you implement it) | Varies | Your own infrastructure |
Each method serves a purpose. Use in-memory embeddings for quick tests or when you're feeding results into something immediately. If you need persistence but want to keep things simple and local, go with LM-Kit's built-in vector DB. For large-scale or distributed systems, Qdrant is a solid external option. And if your stack is special, you can always bring your own vector store.
The best part? DataSource is unified. You can switch between these options without rewriting your code, just plug in a different backend and you're set.
Let us know what you're building. And if you're doing something wild with embeddings, we want to hear about it. ✨
Top comments (0)