DEV Community

Loïc Carrère
Loïc Carrère

Posted on • Originally published at lm-kit.com on

🧬 Four Local Vector Storage Patterns for C# Developers

Introduction

In this post, we'll break down four vector storage patterns supported by LM-Kit.

LM-Kit simplifies the complexity of embedding storage by offering a unified, developer-friendly interface that supports both instant prototyping and scalable deployment.

It supports four storage patterns, each tailored to different stages of your project:

  • In-Memory: Ideal for fast prototyping and low-volume tasks with zero setup.
  • Built-In Vector DB: Self-contained file-based storage for local tools or offline apps.
  • Qdrant Vector Store: External high-performance DB for cloud or large-scale deployments.
  • Custom IVectorStore: Build your own backend to integrate with proprietary systems.

All methods use the same DataSource abstraction, so you can switch storage backends without changing your code logic.

Modern AI apps, from semantic search to retrieval-augmented generation, rely on embeddings to turn text or images into dense vectors. LM-Kit gives you a few ways to handle those vectors: you can compute them in memory when needed or store them for the long haul. Each approach has tradeoffs in speed, complexity, and cost.

If you're new to embeddings, check out our Embeddings Glossary. Or better yet, talk to LM-Kit Maestro, our free chatbot generator: GitHub: LM-Kit/Maestro

The Four Patterns

LM-Kit supports four main embedding storage patterns, ranging from ephemeral in-memory use to persistent vector databases, so you can match your infrastructure to the needs of your app.

  1. On-the-Fly (In-Memory) Embeddings
  2. Persistent Storage with an External Vector Database
    • Prebuilt Qdrant Vector Store
    • Custom Vector Store via IVectorStore
  3. Persistent Storage with LM-Kit's Built-In Vector DB

LM-Kit uses a DataSource class to manage all of this. It's your main tool for embedding storage, representing a collection that can handle anything from text and documents to images or web pages. A DataSource element contains multiple Section elements, and each section holds TextPartition¹ objects (basically embedding vectors). Metadata can be associated to DataSource and to Section structures.

  • DataSource (can include metadata)
    • Sections (can include metadata)
    • TextPartitions (stores dense vectors)

¹ The "Text" in TextPartition is a bit misleading now. These partitions can handle images too.

LM-Kit Vector Storage Architecture

On-the-Fly Embeddings (In-Memory)

How it works

When you need an embedding, LM-Kit calls the model, gets the vector, and keeps it in RAM for immediate use. Nothing is written to disk. No external service needed.

Code Example

Create an in-memory vector database and perform retrieval:

// Define some strings from which we want to generate embeddings
string[] examples =
{
    "How do I bake a chocolate cake?",
    "What is the recipe for chocolate cake?",
    "I want to make a chocolate cake.",
    "Chocolate cake is delicious.",
    "How do I cook pasta?",
    "I need instructions to bake a cake.",
    "Baking requires precise measurements.",
    "I like vanilla ice cream.",
    "The weather is sunny today.",
    "What is the capital of France?",
    "Paris is a beautiful city.",
    "How can I improve my coding skills?",
    "Programming requires practice."
};

// Load the embedding model
var model = LM.LoadFromModelID("nomic-embed-text");

// Specify optional metadata to attach to the new collection (a.k.a. DataSource)
var collectionMetadata = new MetadataCollection
{
    { "description", "my description" },
    { "another-pair-key", "another-pair-value" }
};

// Create a new in-memory vector database
var collection = DataSource.CreateInMemoryDataSource(
    "my-collection",
    model,
    collectionMetadata);

// Compute embeddings to insert into the collection
var embedder = new Embedder(model);
List<DataSource.VectorEntry> vectorEntries = new List<DataSource.VectorEntry>();

// Run multithreaded embedding on the list of examples
var embeddings = embedder.GetEmbeddings(examples);

for (int index = 0; index < examples.Length; index++)
{
    vectorEntries.Add(new DataSource.VectorEntry(
        vector: embeddings[index],
        payload: examples[index]));
}

const string SectionIdentifier = "my-section-identifier";

// Specify optional metadata to attach to the new section.
// Note: a single collection can contain multiple sections.
var sectionMetadata = new MetadataCollection
{
    { "description", "my description" },
    { "another-pair-key", "another-pair-value" }
};

// Add the computed embedding vectors to the collection
collection.Upsert(
    SectionIdentifier,
    vectorEntries,
    sectionMetadata);

// Now perform search
// Build the query vector
string query = "How do I bake a chocolate cake?";
var queryVector = new Embedder(model).GetEmbeddings(query);

// Search for similar vectors across partitions
var similarPartitions = VectorSearch.FindMatchingPartitions(
    [collection],
    model,
    queryVector);

// Do something with the search results...
Enter fullscreen mode Exit fullscreen mode

This example demonstrated how to set up an in-memory vector store using DataSource.CreateInMemoryDataSource, generate embeddings with a model, and organize them into a section with optional metadata. It finishes by performing a semantic search using a query vector. Everything runs locally in RAM, making it ideal for fast prototyping, real-time classification, or experimentation without persistent storage.

In-Memory Embeddings: When and Why to Use Them

Aspect Details
Best When • Rapid prototyping or experimentation
• One-shot queries (no need to reuse later)
• Small datasets that fit in memory
• Real-time classification/semantic search
Upsides • Zero infrastructure or setup
• Instant start
• No file I/O overhead
Downsides • Lost on restart
• Not suitable for large collections
• No sharing across processes

Serialization Note

Although referred to as "in-memory", any DataSource instance can be serialized to a file or stream using the Serialize() method. It can then be fully restored into memory with the Deserialize() method.

This provides flexibility to:

  • Save your in-memory collections between sessions
  • Store intermediate states during experimentation
  • Transfer embeddings across environments without requiring an external vector database

Persistent Storage with an External Vector DB

For production use or anything large-scale, you'll want to persist your vectors in a proper database. LM-Kit supports this through the IVectorStore abstraction.

Qdrant Vector Store (Prebuilt)

LM-Kit offers an out-of-the-box integration with Qdrant via the QdrantEmbeddingStore class. Qdrant is an open-source, high-performance vector database that supports HNSW indexing and advanced payload filtering.

QdrantEmbeddingStore is a simple implementation of the IVectorStore interface. It has been open-sourced and is available as part of the dedicated LM-Kit.NET.Data.Connectors.Qdrant package.

The source code for this package is hosted in the LM-Kit.NET Data Connectors GitHub repository.

Additional prebuilt vector store integrations will be added progressively to the same repository. If you require a specific implementation on a short timeline, feel free to reach out to our team.

// Initializing store
// We're using local environment that we've started with: docker run -p 6333:6333 -p 6334:6334
// Check this tutorial to setup qdrant local environment: https://qdrant.tech/documentation/quickstart/
var store = new QdrantEmbeddingStore(new Uri("http://localhost:6334"));
var model = LM.LoadFromModelID("nomic-embed-text");
var collection = DataSource.CreateVectorStoreDataSource(store, "my-collection", model);
Enter fullscreen mode Exit fullscreen mode

Qdrant Vector Store: When and Why to Use it

Aspect Details
Best When • Production-scale semantic search
• Cloud or distributed deployments
• Need advanced filtering by metadata
• Sharing embeddings across multiple services
Upsides • Battle-tested, open-source vector DB
• High performance (HNSW indexing)
• Powerful metadata filtering
• Scales horizontally
Downsides • Requires standing up a Qdrant instance
• Network latency vs local
• Additional operational overhead

Custom Vector Store with IVectorStore

If you're building your own backend or want to hook into existing systems, just implement the IVectorStore interface.

public interface IVectorStore
{
    public Task<bool> CollectionExistsAsync(string collectionIdentifier, CancellationToken cancellationToken = default);
    public Task CreateCollectionAsync(string collectionIdentifier, uint vectorSize, CancellationToken cancellationToken = default);
    public Task DeleteCollectionAsync(string collectionIdentifier, CancellationToken cancellationToken = default);
    public Task UpsertAsync(string collectionIdentifier, string id, float[] vectors, MetadataCollection metadata, CancellationToken cancellationToken = default);
    public Task DeleteFromMetadataAsync(string collectionIdentifier, MetadataCollection metadata, CancellationToken cancellationToken = default);
    public Task UpdateMetadataAsync(string collectionIdentifier, string id, MetadataCollection metadata, bool clearFirst, CancellationToken cancellationToken = default);
    public Task<MetadataCollection?> GetMetadataAsync(string collectionIdentifier, string id, CancellationToken cancellationToken = default);
    public Task<List<VectorRecord>> RetrieveFromMetadataAsync(string collectionIdentifier, MetadataCollection metadata, bool getVector, bool getMetadata, CancellationToken cancellationToken = default);
    public Task<List<VectorRecord>> SearchSimilarVectorsAsync(string collectionIdentifier, float[] vector, uint limit, bool getVector, bool getMetadata, CancellationToken cancellationToken = default);
}
Enter fullscreen mode Exit fullscreen mode

Custom Vector Store: When and Why to Build Your Own

Aspect Details
Best When • Integrating with proprietary systems
• Using an existing vector DB not yet supported by LM-Kit
• Need custom indexing or sharding logic
• Full control over storage/retrieval
Upsides • Total flexibility
• Can leverage existing infrastructure
• Tailored to your exact requirements
Downsides • You own the implementation and maintenance
• More upfront development effort

Persistent Storage with LM-Kit's Built-In Vector DB

When you need durable embedding storage without deploying an external service, LM-Kit's built-in vector database is your go-to solution. Think of it as a SQLite for dense vectors: a self-contained, file-based engine optimized for storing and querying embeddings at scale. Designed to handle millions of vectors on a single node, it delivers low-latency insertions, deletions and searches even as your dataset grows.

Under the hood, it stores vectors and metadata in an optimized file format and provides two clear APIs for managing and querying the data:

  • DataSource.CreateFileDataSource(path, name, model, metadata, overwrite: true) - Initialize or overwrite a local vector store at the specified file path.
  • DataSource.LoadFromFile(path, model, readOnly: true) - Reopen an existing store for querying or modification.

These methods let you insert, delete, and search embeddings entirely on disk. This makes the built-in store ideal for rapid prototyping, desktop tools, or any scenario where you want portable, versionable vector storage without standing up a full vector-DB cluster.

Now, let's dive into some source code to see how LM-Kit's built-in vector storage works in practice, from creating a local database to querying it later.

Creating and Populating a Local Vector Database

// Define some strings from which we want to generate embeddings
string[] examples =
{
    "How do I bake a chocolate cake?",
    "What is the recipe for chocolate cake?",
    "I want to make a chocolate cake.",
    "Chocolate cake is delicious.",
    "How do I cook pasta?",
    "I need instructions to bake a cake.",
    "Baking requires precise measurements.",
    "I like vanilla ice cream.",
    "The weather is sunny today.",
    "What is the capital of France?",
    "Paris is a beautiful city.",
    "How can I improve my coding skills?",
    "Programming requires practice."
};

// Load the embedding model
var model = LM.LoadFromModelID("nomic-embed-text");

// Specify optional metadata to attach to the new collection (a.k.a. DataSource)
var collectionMetadata = new MetadataCollection
{
    { "description", "my description" },
    { "another-pair-key", "another-pair-value" }
};

// Create a new local vector database (overwriting if it already exists)
const string CollectionPath = "d:\\collection.ds";
var collection = DataSource.CreateFileDataSource(
    CollectionPath,
    "my-collection",
    model,
    collectionMetadata,
    overwrite: true);

// Compute embeddings to insert into the collection
var embedder = new Embedder(model);
List<DataSource.VectorEntry> vectorEntries = new List<DataSource.VectorEntry>();

// Run multithreaded embedding on the list of examples
var embeddings = embedder.GetEmbeddings(examples);

for (int index = 0; index < examples.Length; index++)
{
    vectorEntries.Add(new DataSource.VectorEntry(
        vector: embeddings[index],
        payload: examples[index]));
}

const string SectionIdentifier = "my-section-identifier";

// Specify optional metadata to attach to the new section.
// Note: a single collection can contain multiple sections.
var sectionMetadata = new MetadataCollection
{
    { "description", "my description" },
    { "another-pair-key", "another-pair-value" }
};

// Add the computed embedding vectors to the collection
collection.Upsert(
    SectionIdentifier,
    vectorEntries,
    sectionMetadata);

// Close the database
collection.Dispose();
Enter fullscreen mode Exit fullscreen mode

Loading and Querying an Existing Database

// Load the embedding model
var model = LM.LoadFromModelID("nomic-embed-text");
const string CollectionPath = "d:\\collection.ds";

// Load our previously created database in read-only mode (sufficient for querying)
var collection = DataSource.LoadFromFile(
    CollectionPath,
    model,
    readOnly: true);

// Build the query vector
string query = "How do I bake a chocolate cake?";
var queryVector = new Embedder(model).GetEmbeddings(query);

// Search for similar vectors across partitions
var similarPartitions = VectorSearch.FindMatchingPartitions(
    [collection],
    model,
    queryVector);

// Do something with the search results...
Enter fullscreen mode Exit fullscreen mode

Together, these two snippets show the full lifecycle of LM-Kit's built-in vector storage: how to create, populate, persist, and later reload your embedding collection for querying, all without relying on external infrastructure.

🎁 A CEO's Modest Proposal for the Brave

I'm offering a gift to anyone who manages to implement a faster .NET version of the 2 scripts above, without using LM-Kit.

LM-Kit's Built-In Vector DB: The Unsung Hero

Aspect Details
Best When • Desktop or local applications
• Offline-first tools
• Medium-scale datasets (up to millions of vectors)
• No external database infrastructure available
Upsides • Zero external dependencies
• File-based portability
• Fast local queries
• Version control friendly
Downsides • Single-machine scale
• Not designed for distributed queries
• Limited compared to specialized vector DBs

Conclusion

Embedding Storage Methods Compared

🔧 Method ✅ Best For 💾 Persistence 📈 Scale 🌐 Infra Required
In-Memory Quick tests, small-scale prototyping Temporary (can serialize manually) Low None
Built-In Vector DB Local apps, offline tools, medium-scale use Yes (file-based) Medium (single machine) None
Qdrant Vector Store High-scale, distributed or cloud deployments Yes High Qdrant instance
Custom via IVectorStore Custom backends, proprietary infra Yes (you implement it) Varies Your own infrastructure

Each method serves a purpose. Use in-memory embeddings for quick tests or when you're feeding results into something immediately. If you need persistence but want to keep things simple and local, go with LM-Kit's built-in vector DB. For large-scale or distributed systems, Qdrant is a solid external option. And if your stack is special, you can always bring your own vector store.

The best part? DataSource is unified. You can switch between these options without rewriting your code, just plug in a different backend and you're set.

Let us know what you're building. And if you're doing something wild with embeddings, we want to hear about it. ✨

Top comments (0)