DEV Community

Cover image for Forget Your RAG: Build Your Own LLM Wiki in C# with Ollama + Kimi (Step‑by‑Step Guide)
David Au Yeung
David Au Yeung

Posted on

Forget Your RAG: Build Your Own LLM Wiki in C# with Ollama + Kimi (Step‑by‑Step Guide)

Introduction

Happy coding! Today I want to share a practical AI tutorial in .NET style, with real code, simple architecture, and a result you can run on your own machine.

When many developers start building AI knowledge assistants, the first idea is usually RAG. That is a good choice in many cases, but not always the simplest one. Sometimes, your knowledge base is actually small, focused, and stable. In that lovely situation, building a small LLM Wiki can be a more elegant choice.

Let me tell a short lovely story.

Imagine a small tea house run by a kind grandmother. Every morning, her grandchildren ask the same questions:

  • Which tea is good for a cold day?
  • Which dessert has nuts?
  • What is the house special today?

Now imagine two ways to help them.

  1. The first way is to hire a fast librarian who runs to a giant archive room every time a child asks a question. That is a bit like RAG.
  2. The second way is to write one beautiful, well-organized family handbook and keep it on the table all the time. That is a bit like an LLM Wiki.

If the tea house menu is small and stable, the handbook is often faster, simpler, and more reliable. But if the tea house becomes a huge restaurant chain with thousands of daily updates, you will eventually want the librarian too.

That is the heart of LLM Wiki vs RAG.

In this tutorial, we will build an open-source local LLM Wiki demo in C# using:

  • .NET 8
  • Ollama
  • OllamaSharp
  • a local markdown-based wiki structure
  • the kimi-k2.6:cloud Ollama model

By the end, you will have:

  • a console-based demo app
  • a LocalWiki folder with markdown files
  • document ingestion logic
  • index maintenance logic
  • wiki question-answering logic
  • a clear understanding of when to use a wiki instead of RAG

What We Are Building

We are building a small local AI workflow like this:

  1. Prepare a source document.
  2. Send that document to an LLM through Ollama.
  3. Ask the LLM to return structured wiki content.
  4. Save the result as markdown in a local wiki folder.
  5. Update Index.md automatically.
  6. Ask questions against the wiki content.
  7. Let the model answer using the wiki instead of the raw source file.

This is not a full vector database pipeline. It is intentionally lighter.

LLM Wiki vs RAG

Before we code, let us understand the design choice.

What is RAG?

Retrieval-Augmented Generation (RAG) improves LLM output by retrieving relevant information from external sources before generating a response. Instead of depending only on training data, the model gets fresh context from documents, databases, or web content.

Common characteristics of RAG:

  • dynamic retrieval at question time
  • usually based on chunking + embeddings + vector search
  • good for large and changing knowledge bases
  • better for source attribution and enterprise-scale search
  • more infrastructure and tuning effort

What is an LLM Wiki?

An LLM Wiki is a curated, structured markdown knowledge base written so the model can reason over it directly. Rather than retrieving many chunks at runtime, you keep important domain knowledge in concise markdown files and pass those files into the model context.

Common characteristics of an LLM Wiki:

  • markdown-first knowledge organization
  • compact, human-readable, and model-friendly
  • great for small and stable domains
  • simple to maintain for focused use cases
  • low setup cost compared with a full RAG pipeline

When should you use a Wiki instead of RAG?

A wiki is a strong choice when:

  • your knowledge base is relatively small
  • your content changes slowly
  • you want to ship fast
  • you prefer markdown and file-based maintenance
  • you want fewer moving parts
  • you are building a demo, prototype, internal assistant, or focused domain tool

RAG is usually better when:

  • you have a very large knowledge base
  • documents change frequently
  • you need deep retrieval across many domains
  • you need richer source attribution and search behavior
  • your content does not fit comfortably into context

A practical rule of thumb

Start with an LLM Wiki if your knowledge is:

  • focused
  • stable
  • not too large
  • curated by humans or AI into clean markdown

Move to RAG when your knowledge becomes:

  • too large
  • too dynamic
  • too fragmented
  • too cross-domain

And yes, a hybrid approach is often best:

  • use a wiki for stable core knowledge
  • use RAG for large or fast-changing content

References and background reading

The ideas in this tutorial align with public explanations of RAG and LLM Wiki tradeoffs, including:

  • Wikipedia on Retrieval-Augmented Generation
  • AWS explanation of RAG and external knowledge grounding
  • MindStudio and 99helpers comparisons of LLM Wiki vs RAG for smaller knowledge bases

Prerequisites

Before we begin, make sure you have the following:

  • Visual Studio or another C# IDE
  • .NET 8 SDK
  • Ollama installed locally
  • the Kimi model available through Ollama
  • a Windows terminal or PowerShell

To use the model from this demo, run:

ollama run kimi-k2.6:cloud
Enter fullscreen mode Exit fullscreen mode

If Ollama is already running, that is enough for the demo.

Step 1: Create the Console App

If you want to build this from scratch in a new workspace, start with:

dotnet new console -n LLMWikiDemo
cd LLMWikiDemo
Enter fullscreen mode Exit fullscreen mode

In this project, the feature was added into an existing .NET 8 console app, but the same idea applies to a fresh app.

Step 2: Add the NuGet Package

Add OllamaSharp so your C# application can talk to Ollama:

dotnet add package OllamaSharp
Enter fullscreen mode Exit fullscreen mode

In this workspace, the project file already contains:

<PackageReference Include="OllamaSharp" Version="5.4.25" />
Enter fullscreen mode Exit fullscreen mode

Step 3: Define the Architecture

To keep the system architecture simple, we use these parts:

  • Program.cs as the entry point
  • LlmWikiManager.cs as the core service
  • LocalWiki/ as the generated markdown wiki
  • SCHEMA.md to define the wiki rules
  • Index.md as the article catalog
  • Articles/ for wiki pages

The flow is:

Source Document
    -> LlmWikiManager.IngestDocumentAsync(...)
    -> Ollama / Kimi model
    -> JSON-like structured response
    -> Markdown article
    -> Index.md update
    -> QueryWikiAsync(...)
    -> Answer from wiki context
Enter fullscreen mode Exit fullscreen mode

Step 4: Create the Wiki Manager

The main logic lives in Utils/LLMWikiManager.cs.

This class is responsible for:

  • creating the wiki folders
  • creating SCHEMA.md and Index.md
  • sending prompts to Ollama
  • parsing the model response
  • writing article markdown
  • reading article markdown back for Q&A

The manager constructor

We configure the Ollama endpoint and default model here:

public LlmWikiManager(
    string ollamaApiUrl = "http://localhost:11434",
    string modelName = "kimi-k2.6:cloud",
    string? wikiRootDirectory = null)
{
    _ollamaClient = new OllamaApiClient(new Uri(ollamaApiUrl))
    {
        SelectedModel = modelName
    };

    _wikiDirectory = wikiRootDirectory ?? Path.Combine(AppContext.BaseDirectory, "LocalWiki");
    _schemaFilePath = Path.Combine(_wikiDirectory, "SCHEMA.md");
    _indexFilePath = Path.Combine(_wikiDirectory, "Index.md");
    _articlesDirectory = Path.Combine(_wikiDirectory, "Articles");
}
Enter fullscreen mode Exit fullscreen mode

This means the wiki is generated near the app runtime folder, and the model defaults to kimi-k2.6:cloud.

Step 5: Bootstrap the Wiki Structure

We want the app to create the wiki structure automatically if it does not exist.

public async Task EnsureWikiStructureAsync()
{
    Directory.CreateDirectory(_wikiDirectory);
    Directory.CreateDirectory(_articlesDirectory);

    if (!File.Exists(_schemaFilePath))
    {
        await File.WriteAllTextAsync(_schemaFilePath, DefaultSchema);
    }

    if (!File.Exists(_indexFilePath))
    {
        var initialIndex = "# Local Wiki Index" + Environment.NewLine + Environment.NewLine;
        await File.WriteAllTextAsync(_indexFilePath, initialIndex);
    }
}
Enter fullscreen mode Exit fullscreen mode

This is one of the reasons the wiki approach feels nice. It is file-based, easy to inspect, and easy to explain.

Step 6: Stream Text from Ollama

OllamaSharp returns generation results as an async stream, so we collect the response text like this:

private async Task<string> GenerateTextAsync(string prompt)
{
    var builder = new StringBuilder();

    await foreach (var chunk in _ollamaClient.GenerateAsync(prompt))
    {
        if (!string.IsNullOrWhiteSpace(chunk?.Response))
        {
            builder.Append(chunk.Response);
        }
    }

    return builder.ToString();
}
Enter fullscreen mode Exit fullscreen mode

This is a small but important detail. If you try to await the sequence directly, the build will fail. You must use await foreach.

Step 7: Ingest a Source Document

Now comes the most interesting part.

When a source document arrives, we:

  1. read the file
  2. read the schema
  3. ask the model to return structured wiki data
  4. parse the result
  5. create a markdown article
  6. update Index.md

The ingestion prompt

The ingestion prompt in LlmWikiManager.cs looks like this:

var ingestionPrompt = $"""
    You are an AI assistant maintaining a markdown wiki.

    Schema:
    ---
    {schemaContent}
    ---

    Process this source document and respond with ONLY one JSON object using these keys:
    - summaryTitle: concise title for the article
    - summaryFilename: kebab-case markdown filename
    - summaryContent: full markdown article body
    - indexUpdate: one markdown bullet linking article from Index.md

    Source filename: {sourceFileName}
    Source document:
    ---
    {sourceContent}
    ---
 """;
Enter fullscreen mode Exit fullscreen mode

This is a practical pattern:

  • tell the model who it is
  • give it the wiki rules
  • give it the source content
  • enforce a structured output format

The ingestion method

A simplified view of the ingestion code:

public async Task<LlmWikiIngestionResult> IngestDocumentAsync(string sourceFilePath)
{
    await EnsureWikiStructureAsync();

    var sourceContent = await File.ReadAllTextAsync(sourceFilePath);
    var schemaContent = await File.ReadAllTextAsync(_schemaFilePath);
    var sourceFileName = Path.GetFileName(sourceFilePath);

    var responseText = await GenerateTextAsync(ingestionPrompt);
    var parsed = TryParseIngestionPayload(responseText);

    var summaryTitle = !string.IsNullOrWhiteSpace(parsed?.SummaryTitle)
        ? parsed!.SummaryTitle!
        : $"Summary of {Path.GetFileNameWithoutExtension(sourceFileName)}";

    var summaryFilename = NormalizeSummaryFilename(parsed?.SummaryFilename ?? string.Empty, summaryTitle);

    var summaryContent = !string.IsNullOrWhiteSpace(parsed?.SummaryContent)
        ? parsed!.SummaryContent!
        : BuildFallbackSummaryContent(sourceFileName, sourceContent);

    var summaryFilePath = Path.Combine(_articlesDirectory, summaryFilename);
    await File.WriteAllTextAsync(summaryFilePath, summaryContent);

    var normalizedRelativeArticlePath = $"Articles/{summaryFilename}";
    var indexUpdate = !string.IsNullOrWhiteSpace(parsed?.IndexUpdate)
        ? parsed!.IndexUpdate!.Trim()
        : $"- [{summaryTitle}]({normalizedRelativeArticlePath})";

    // append to Index.md if needed
}
Enter fullscreen mode Exit fullscreen mode

Why fallback logic matters

Real LLMs are helpful, but sometimes they do not follow the JSON contract perfectly.

That is why the demo includes:

  • JSON extraction from mixed text
  • deserialization attempts
  • fallback summary generation
  • filename normalization
  • index deduplication

This makes the POC stronger and more production-minded than a fragile one-shot prompt demo.

Step 8: Parse the LLM Response Safely

Structured AI output is never perfect, so the code extracts the JSON object defensively.

private static string? ExtractJsonObject(string text)
{
    if (string.IsNullOrWhiteSpace(text))
    {
        return null;
    }

    var start = text.IndexOf('{');
    var end = text.LastIndexOf('}');

    if (start < 0 || end <= start)
    {
        return null;
    }

    return text[start..(end + 1)];
}
Enter fullscreen mode Exit fullscreen mode

Then we try to deserialize it:

private static LlmWikiIngestionPayload? TryParseIngestionPayload(string responseText)
{
    var rawJson = ExtractJsonObject(responseText);
    if (rawJson is null)
    {
        return null;
    }

    try
    {
        return JsonSerializer.Deserialize<LlmWikiIngestionPayload>(rawJson, new JsonSerializerOptions
        {
            PropertyNameCaseInsensitive = true
        });
    }
    catch
    {
        return null;
    }
}
Enter fullscreen mode Exit fullscreen mode

This is a great lesson for readers: LLM output should be handled like user input. Trust it carefully, never blindly.

Step 9: Query the Wiki

Once the wiki exists, we can query it.

The app reads:

  • SCHEMA.md
  • Index.md
  • the latest markdown articles

Then it builds a prompt for the model.

public async Task<string> QueryWikiAsync(string question)
{
    await EnsureWikiStructureAsync();

    var schemaContent = await File.ReadAllTextAsync(_schemaFilePath);
    var indexContent = await File.ReadAllTextAsync(_indexFilePath);
    var articleContext = await BuildArticleContextAsync(maxArticles: 6, maxCharsPerArticle: 2400);

    var queryPrompt = $"""
        You are an AI assistant answering questions from a local markdown wiki.

        Rules:
        - Use only the provided wiki context.
        - If information is insufficient, explicitly say so.
        - Keep response concise.
        - Cite article filenames used as sources.

        Schema:
        ---
        {schemaContent}
        ---

        Index:
        ---
        {indexContent}
        ---

        Article context:
        ---
        {articleContext}
        ---

        Question: {question}
     """;

    var answer = (await GenerateTextAsync(queryPrompt)).Trim();

    if (string.IsNullOrWhiteSpace(answer))
    {
        return "Insufficient information in the wiki context to answer this question.";
    }

    return answer;
}
Enter fullscreen mode Exit fullscreen mode

This is a wiki-style answer flow, not a vector retrieval flow.

Step 10: Wire Everything in Program.cs

The entry point demonstrates the entire POC.

using MyPlaygroundApp.Utils;

class Program
{
    static async Task Main(string[] args)
    {
        Console.WriteLine("Open-Source LLM Wiki Demo (.NET + Ollama)");
        Console.WriteLine("------------------------------------------");
        Console.WriteLine();

        var wikiManager = new LlmWikiManager(modelName: "kimi-k2.6:cloud");
        await wikiManager.EnsureWikiStructureAsync();

        var demoInputDirectory = Path.Combine(AppContext.BaseDirectory, "DemoInput");
        Directory.CreateDirectory(demoInputDirectory);

        var sourceDocumentPath = Path.Combine(demoInputDirectory, "csharp-ai-benefits.txt");
        var sourceDocumentContent = """
            C# is effective for AI solution development due to strong typing, mature tooling, and excellent package management.
            The .NET ecosystem enables production-ready services, background workers, APIs, and cloud integration.
            Developers can integrate local LLMs using Ollama and open-source libraries such as OllamaSharp and Semantic Kernel.
            This approach is useful for private knowledge workflows like a local markdown wiki with ingestion and Q&A.
        """;

        await File.WriteAllTextAsync(sourceDocumentPath, sourceDocumentContent);

        Console.WriteLine("Ingesting source document into local wiki...");
        var ingestionResult = await wikiManager.IngestDocumentAsync(sourceDocumentPath);
        Console.WriteLine($"Article created: {ingestionResult.SummaryFilename}");
        Console.WriteLine($"Wiki location: {wikiManager.WikiDirectory}");
        Console.WriteLine();

        const string question = "What are the benefits of using C# for local LLM wiki development?";
        Console.WriteLine($"Question: {question}");
        var answer = await wikiManager.QueryWikiAsync(question);
        Console.WriteLine();
        Console.WriteLine("Answer:");
        Console.WriteLine(answer);

        Console.WriteLine();
        Console.WriteLine("Demo complete.");
    }
}
Enter fullscreen mode Exit fullscreen mode

This is excellent for a tutorial because readers can run it and immediately see the full loop.

Step 11: Run the Demo

From the solution root, run:

dotnet run --project .\MyPlaygroundApp\MyPlaygroundApp.csproj
Enter fullscreen mode Exit fullscreen mode

If Ollama is available and kimi-k2.6:cloud is usable, the demo should:

  • create the wiki structure
  • ingest the sample source document
  • write a markdown article
  • update the index
  • answer a question from the wiki context

Step 12: Show the Generated Wiki Files

These example results are perfect to show your readers.

Index.md

# Local Wiki Index

- [C# for AI Development](Articles/csharp-ai-development.md)
Enter fullscreen mode Exit fullscreen mode

SCHEMA.md

# LLM Wiki Schema

## Goal
Maintain a local markdown wiki from source documents.

## Ingestion Workflow
1. Read new source document.
2. Summarize key points with technical accuracy.
3. Create or update one markdown article in `Articles/`.
4. Ensure `Index.md` contains one bullet link per article.
5. Keep filenames kebab-case and `.md`.

## Query Workflow
1. Use only article/index content provided as context.
2. Answer concisely and cite article filenames used.
3. If context is insufficient, say information is insufficient.
Enter fullscreen mode Exit fullscreen mode

csharp-ai-development.md

# C# for AI Development

C# is a strong choice for building AI solutions, offering robust language features and a mature ecosystem.

## Key Advantages

- **Strong Typing & Tooling**: Static typing reduces runtime errors, while mature IDEs and debugging tools improve developer productivity.
- **Package Management**: NuGet provides reliable dependency management for complex projects.
- **Production-Ready Infrastructure**: The .NET ecosystem supports building scalable services, background workers, APIs, and seamless cloud integration.

## Local LLM Integration

Developers can run local large language models via **Ollama**, using open-source libraries such as:

- **OllamaSharp**: A C# client for interacting with Ollama.
- **Semantic Kernel**: Microsoft's SDK for integrating AI services into applications.

## Use Cases

This stack is particularly effective for **private knowledge workflows**, such as local markdown wikis with document ingestion and question-answering capabilities, ensuring data remains on-premise.
Enter fullscreen mode Exit fullscreen mode

These examples make the demo much more concrete and easier for readers to trust.

Why This POC Is Good

This POC is simple, but it teaches many important engineering ideas:

  • local model integration in C#
  • markdown-based AI knowledge design
  • AI-assisted ingestion workflow
  • safe structured output parsing
  • file-based knowledge management
  • query-time synthesis from curated context

It is a very good educational bridge between:

  • plain prompt engineering
  • file-based knowledge systems
  • and full RAG architectures

Limitations of This POC

Be honest with your readers. This demo is useful, but it is still a demo.

Current limitations:

  • it is console-based only
  • article selection is simple and file-based
  • there is no embedding or semantic ranking layer
  • article context size is manually bounded
  • source attribution is prompt-based, not enforced by retrieval metadata
  • large wiki collections may require a hybrid or full RAG architecture later

That is okay. Simplicity is a feature here.

How to Extend It Later

If your readers want to continue, they can add:

  • command-line arguments for custom ingestion and queries
  • ASP.NET Core Minimal API endpoints
  • Blazor UI for article management
  • stronger JSON schema validation
  • better article ranking before query
  • Semantic Kernel orchestration
  • hybrid wiki + RAG support
  • file watchers for automatic wiki refresh

A nice next-step architecture could be:

  • LLM Wiki for curated stable knowledge
  • RAG for large and dynamic long-tail knowledge

That gives the best of both worlds.

Conclusion

We built a practical LLM Wiki POC in C# using .NET 8, Ollama, OllamaSharp, markdown files, and the kimi-k2.6:cloud model.

More importantly, we learned a design lesson that is easy to forget in modern AI work:

Not every problem needs RAG first.

Sometimes, the best answer is a clean, curated, lovingly maintained wiki.

If your knowledge is small, stable, and important, a markdown-first LLM Wiki can be wonderfully effective. It is easy to understand, easy to debug, easy to version-control, and easy to teach.

Then, when your world grows larger, you can invite RAG to the party.

Happy coding, and may your markdown always stay tidy.

Love C# & AI!

Top comments (0)