David Au Yeung

Posted on May 6

Forget Your RAG: Build Your Own LLM Wiki in C# with Ollama + Kimi (Step‑by‑Step Guide)

#llm #wiki #dotnet #rag

Introduction

Happy coding! Today I want to share a practical AI tutorial in .NET style, with real code, simple architecture, and a result you can run on your own machine.

When many developers start building AI knowledge assistants, the first idea is usually RAG. That is a good choice in many cases, but not always the simplest one. Sometimes, your knowledge base is actually small, focused, and stable. In that lovely situation, building a small LLM Wiki can be a more elegant choice.

Let me tell a short lovely story.

Imagine a small tea house run by a kind grandmother. Every morning, her grandchildren ask the same questions:

Which tea is good for a cold day?
Which dessert has nuts?
What is the house special today?

Now imagine two ways to help them.

The first way is to hire a fast librarian who runs to a giant archive room every time a child asks a question. That is a bit like RAG.
The second way is to write one beautiful, well-organized family handbook and keep it on the table all the time. That is a bit like an LLM Wiki.

If the tea house menu is small and stable, the handbook is often faster, simpler, and more reliable. But if the tea house becomes a huge restaurant chain with thousands of daily updates, you will eventually want the librarian too.

That is the heart of LLM Wiki vs RAG.

In this tutorial, we will build an open-source local LLM Wiki demo in C# using:

.NET 8
Ollama
OllamaSharp
a local markdown-based wiki structure
the kimi-k2.6:cloud Ollama model

By the end, you will have:

a console-based demo app
a LocalWiki folder with markdown files
document ingestion logic
index maintenance logic
wiki question-answering logic
a clear understanding of when to use a wiki instead of RAG

What We Are Building

We are building a small local AI workflow like this:

Prepare a source document.
Send that document to an LLM through Ollama.
Ask the LLM to return structured wiki content.
Save the result as markdown in a local wiki folder.
Update Index.md automatically.
Ask questions against the wiki content.
Let the model answer using the wiki instead of the raw source file.

This is not a full vector database pipeline. It is intentionally lighter.

LLM Wiki vs RAG

Before we code, let us understand the design choice.

What is RAG?

Retrieval-Augmented Generation (RAG) improves LLM output by retrieving relevant information from external sources before generating a response. Instead of depending only on training data, the model gets fresh context from documents, databases, or web content.

Common characteristics of RAG:

dynamic retrieval at question time
usually based on chunking + embeddings + vector search
good for large and changing knowledge bases
better for source attribution and enterprise-scale search
more infrastructure and tuning effort

What is an LLM Wiki?

An LLM Wiki is a curated, structured markdown knowledge base written so the model can reason over it directly. Rather than retrieving many chunks at runtime, you keep important domain knowledge in concise markdown files and pass those files into the model context.

Common characteristics of an LLM Wiki:

markdown-first knowledge organization
compact, human-readable, and model-friendly
great for small and stable domains
simple to maintain for focused use cases
low setup cost compared with a full RAG pipeline

When should you use a Wiki instead of RAG?

A wiki is a strong choice when:

your knowledge base is relatively small
your content changes slowly
you want to ship fast
you prefer markdown and file-based maintenance
you want fewer moving parts
you are building a demo, prototype, internal assistant, or focused domain tool

RAG is usually better when:

you have a very large knowledge base
documents change frequently
you need deep retrieval across many domains
you need richer source attribution and search behavior
your content does not fit comfortably into context

A practical rule of thumb

Start with an LLM Wiki if your knowledge is:

focused
stable
not too large
curated by humans or AI into clean markdown

Move to RAG when your knowledge becomes:

too large
too dynamic
too fragmented
too cross-domain

And yes, a hybrid approach is often best:

use a wiki for stable core knowledge
use RAG for large or fast-changing content

References and background reading

The ideas in this tutorial align with public explanations of RAG and LLM Wiki tradeoffs, including:

Wikipedia on Retrieval-Augmented Generation
AWS explanation of RAG and external knowledge grounding
MindStudio and 99helpers comparisons of LLM Wiki vs RAG for smaller knowledge bases

Prerequisites

Before we begin, make sure you have the following:

Visual Studio or another C# IDE
.NET 8 SDK
Ollama installed locally
the Kimi model available through Ollama
a Windows terminal or PowerShell

To use the model from this demo, run:

ollama run kimi-k2.6:cloud

If Ollama is already running, that is enough for the demo.

Step 1: Create the Console App

If you want to build this from scratch in a new workspace, start with:

dotnet new console -n LLMWikiDemo
cd LLMWikiDemo

In this project, the feature was added into an existing .NET 8 console app, but the same idea applies to a fresh app.

Step 2: Add the NuGet Package

Add OllamaSharp so your C# application can talk to Ollama:

dotnet add package OllamaSharp

In this workspace, the project file already contains:

<PackageReference Include="OllamaSharp" Version="5.4.25" />

Step 3: Define the Architecture

To keep the system architecture simple, we use these parts:

Program.cs as the entry point
LlmWikiManager.cs as the core service
LocalWiki/ as the generated markdown wiki
SCHEMA.md to define the wiki rules
Index.md as the article catalog
Articles/ for wiki pages

The flow is:

Source Document
    -> LlmWikiManager.IngestDocumentAsync(...)
    -> Ollama / Kimi model
    -> JSON-like structured response
    -> Markdown article
    -> Index.md update
    -> QueryWikiAsync(...)
    -> Answer from wiki context

Step 4: Create the Wiki Manager

The main logic lives in Utils/LLMWikiManager.cs.

This class is responsible for:

creating the wiki folders
creating SCHEMA.md and Index.md
sending prompts to Ollama
parsing the model response
writing article markdown
reading article markdown back for Q&A

The manager constructor

We configure the Ollama endpoint and default model here:

public LlmWikiManager(
    string ollamaApiUrl = "http://localhost:11434",
    string modelName = "kimi-k2.6:cloud",
    string? wikiRootDirectory = null)
{
    _ollamaClient = new OllamaApiClient(new Uri(ollamaApiUrl))
    {
        SelectedModel = modelName
    };

    _wikiDirectory = wikiRootDirectory ?? Path.Combine(AppContext.BaseDirectory, "LocalWiki");
    _schemaFilePath = Path.Combine(_wikiDirectory, "SCHEMA.md");
    _indexFilePath = Path.Combine(_wikiDirectory, "Index.md");
    _articlesDirectory = Path.Combine(_wikiDirectory, "Articles");
}

This means the wiki is generated near the app runtime folder, and the model defaults to kimi-k2.6:cloud.

Step 5: Bootstrap the Wiki Structure

We want the app to create the wiki structure automatically if it does not exist.

public async Task EnsureWikiStructureAsync()
{
    Directory.CreateDirectory(_wikiDirectory);
    Directory.CreateDirectory(_articlesDirectory);

    if (!File.Exists(_schemaFilePath))
    {
        await File.WriteAllTextAsync(_schemaFilePath, DefaultSchema);
    }

    if (!File.Exists(_indexFilePath))
    {
        var initialIndex = "# Local Wiki Index" + Environment.NewLine + Environment.NewLine;
        await File.WriteAllTextAsync(_indexFilePath, initialIndex);
    }
}

This is one of the reasons the wiki approach feels nice. It is file-based, easy to inspect, and easy to explain.

Step 6: Stream Text from Ollama

OllamaSharp returns generation results as an async stream, so we collect the response text like this:

private async Task<string> GenerateTextAsync(string prompt)
{
    var builder = new StringBuilder();

    await foreach (var chunk in _ollamaClient.GenerateAsync(prompt))
    {
        if (!string.IsNullOrWhiteSpace(chunk?.Response))
        {
            builder.Append(chunk.Response);
        }
    }

    return builder.ToString();
}

This is a small but important detail. If you try to await the sequence directly, the build will fail. You must use await foreach.

Step 7: Ingest a Source Document

Now comes the most interesting part.

When a source document arrives, we:

read the file
read the schema
ask the model to return structured wiki data
parse the result
create a markdown article
update Index.md

The ingestion prompt

The ingestion prompt in LlmWikiManager.cs looks like this:

var ingestionPrompt = $"""
    You are an AI assistant maintaining a markdown wiki.

    Schema:
    ---
    {schemaContent}
    ---

    Process this source document and respond with ONLY one JSON object using these keys:
    - summaryTitle: concise title for the article
    - summaryFilename: kebab-case markdown filename
    - summaryContent: full markdown article body
    - indexUpdate: one markdown bullet linking article from Index.md

    Source filename: {sourceFileName}
    Source document:
    ---
    {sourceContent}
    ---
 """;

This is a practical pattern:

tell the model who it is
give it the wiki rules
give it the source content
enforce a structured output format

The ingestion method

A simplified view of the ingestion code:

public async Task<LlmWikiIngestionResult> IngestDocumentAsync(string sourceFilePath)
{
    await EnsureWikiStructureAsync();

    var sourceContent = await File.ReadAllTextAsync(sourceFilePath);
    var schemaContent = await File.ReadAllTextAsync(_schemaFilePath);
    var sourceFileName = Path.GetFileName(sourceFilePath);

    var responseText = await GenerateTextAsync(ingestionPrompt);
    var parsed = TryParseIngestionPayload(responseText);

    var summaryTitle = !string.IsNullOrWhiteSpace(parsed?.SummaryTitle)
        ? parsed!.SummaryTitle!
        : $"Summary of {Path.GetFileNameWithoutExtension(sourceFileName)}";

    var summaryFilename = NormalizeSummaryFilename(parsed?.SummaryFilename ?? string.Empty, summaryTitle);

    var summaryContent = !string.IsNullOrWhiteSpace(parsed?.SummaryContent)
        ? parsed!.SummaryContent!
        : BuildFallbackSummaryContent(sourceFileName, sourceContent);

    var summaryFilePath = Path.Combine(_articlesDirectory, summaryFilename);
    await File.WriteAllTextAsync(summaryFilePath, summaryContent);

    var normalizedRelativeArticlePath = $"Articles/{summaryFilename}";
    var indexUpdate = !string.IsNullOrWhiteSpace(parsed?.IndexUpdate)
        ? parsed!.IndexUpdate!.Trim()
        : $"- [{summaryTitle}]({normalizedRelativeArticlePath})";

    // append to Index.md if needed
}

Why fallback logic matters

Real LLMs are helpful, but sometimes they do not follow the JSON contract perfectly.

That is why the demo includes:

JSON extraction from mixed text
deserialization attempts
fallback summary generation
filename normalization
index deduplication

This makes the POC stronger and more production-minded than a fragile one-shot prompt demo.

Step 8: Parse the LLM Response Safely

Structured AI output is never perfect, so the code extracts the JSON object defensively.

private static string? ExtractJsonObject(string text)
{
    if (string.IsNullOrWhiteSpace(text))
    {
        return null;
    }

    var start = text.IndexOf('{');
    var end = text.LastIndexOf('}');

    if (start < 0 || end <= start)
    {
        return null;
    }

    return text[start..(end + 1)];
}

Then we try to deserialize it:

private static LlmWikiIngestionPayload? TryParseIngestionPayload(string responseText)
{
    var rawJson = ExtractJsonObject(responseText);
    if (rawJson is null)
    {
        return null;
    }

    try
    {
        return JsonSerializer.Deserialize<LlmWikiIngestionPayload>(rawJson, new JsonSerializerOptions
        {
            PropertyNameCaseInsensitive = true
        });
    }
    catch
    {
        return null;
    }
}

This is a great lesson for readers: LLM output should be handled like user input. Trust it carefully, never blindly.

Step 9: Query the Wiki

Once the wiki exists, we can query it.

The app reads:

SCHEMA.md
Index.md
the latest markdown articles

Then it builds a prompt for the model.

public async Task<string> QueryWikiAsync(string question)
{
    await EnsureWikiStructureAsync();

    var schemaContent = await File.ReadAllTextAsync(_schemaFilePath);
    var indexContent = await File.ReadAllTextAsync(_indexFilePath);
    var articleContext = await BuildArticleContextAsync(maxArticles: 6, maxCharsPerArticle: 2400);

    var queryPrompt = $"""
        You are an AI assistant answering questions from a local markdown wiki.

        Rules:
        - Use only the provided wiki context.
        - If information is insufficient, explicitly say so.
        - Keep response concise.
        - Cite article filenames used as sources.

        Schema:
        ---
        {schemaContent}
        ---

        Index:
        ---
        {indexContent}
        ---

        Article context:
        ---
        {articleContext}
        ---

        Question: {question}
     """;

    var answer = (await GenerateTextAsync(queryPrompt)).Trim();

    if (string.IsNullOrWhiteSpace(answer))
    {
        return "Insufficient information in the wiki context to answer this question.";
    }

    return answer;
}

This is a wiki-style answer flow, not a vector retrieval flow.

Step 10: Wire Everything in Program.cs

The entry point demonstrates the entire POC.

using MyPlaygroundApp.Utils;

class Program
{
    static async Task Main(string[] args)
    {
        Console.WriteLine("Open-Source LLM Wiki Demo (.NET + Ollama)");
        Console.WriteLine("------------------------------------------");
        Console.WriteLine();

        var wikiManager = new LlmWikiManager(modelName: "kimi-k2.6:cloud");
        await wikiManager.EnsureWikiStructureAsync();

        var demoInputDirectory = Path.Combine(AppContext.BaseDirectory, "DemoInput");
        Directory.CreateDirectory(demoInputDirectory);

        var sourceDocumentPath = Path.Combine(demoInputDirectory, "csharp-ai-benefits.txt");
        var sourceDocumentContent = """
            C# is effective for AI solution development due to strong typing, mature tooling, and excellent package management.
            The .NET ecosystem enables production-ready services, background workers, APIs, and cloud integration.
            Developers can integrate local LLMs using Ollama and open-source libraries such as OllamaSharp and Semantic Kernel.
            This approach is useful for private knowledge workflows like a local markdown wiki with ingestion and Q&A.
        """;

        await File.WriteAllTextAsync(sourceDocumentPath, sourceDocumentContent);

        Console.WriteLine("Ingesting source document into local wiki...");
        var ingestionResult = await wikiManager.IngestDocumentAsync(sourceDocumentPath);
        Console.WriteLine($"Article created: {ingestionResult.SummaryFilename}");
        Console.WriteLine($"Wiki location: {wikiManager.WikiDirectory}");
        Console.WriteLine();

        const string question = "What are the benefits of using C# for local LLM wiki development?";
        Console.WriteLine($"Question: {question}");
        var answer = await wikiManager.QueryWikiAsync(question);
        Console.WriteLine();
        Console.WriteLine("Answer:");
        Console.WriteLine(answer);

        Console.WriteLine();
        Console.WriteLine("Demo complete.");
    }
}

This is excellent for a tutorial because readers can run it and immediately see the full loop.

Step 11: Run the Demo

From the solution root, run:

dotnet run --project .\MyPlaygroundApp\MyPlaygroundApp.csproj

If Ollama is available and kimi-k2.6:cloud is usable, the demo should:

create the wiki structure
ingest the sample source document
write a markdown article
update the index
answer a question from the wiki context

Step 12: Show the Generated Wiki Files

These example results are perfect to show your readers.

`Index.md`

# Local Wiki Index

- [C# for AI Development](Articles/csharp-ai-development.md)

`SCHEMA.md`

# LLM Wiki Schema

## Goal
Maintain a local markdown wiki from source documents.

## Ingestion Workflow
1. Read new source document.
2. Summarize key points with technical accuracy.
3. Create or update one markdown article in `Articles/`.
4. Ensure `Index.md` contains one bullet link per article.
5. Keep filenames kebab-case and `.md`.

## Query Workflow
1. Use only article/index content provided as context.
2. Answer concisely and cite article filenames used.
3. If context is insufficient, say information is insufficient.

`csharp-ai-development.md`

# C# for AI Development

C# is a strong choice for building AI solutions, offering robust language features and a mature ecosystem.

## Key Advantages

- **Strong Typing & Tooling**: Static typing reduces runtime errors, while mature IDEs and debugging tools improve developer productivity.
- **Package Management**: NuGet provides reliable dependency management for complex projects.
- **Production-Ready Infrastructure**: The .NET ecosystem supports building scalable services, background workers, APIs, and seamless cloud integration.

## Local LLM Integration

Developers can run local large language models via **Ollama**, using open-source libraries such as:

- **OllamaSharp**: A C# client for interacting with Ollama.
- **Semantic Kernel**: Microsoft's SDK for integrating AI services into applications.

## Use Cases

This stack is particularly effective for **private knowledge workflows**, such as local markdown wikis with document ingestion and question-answering capabilities, ensuring data remains on-premise.

These examples make the demo much more concrete and easier for readers to trust.

Why This POC Is Good

This POC is simple, but it teaches many important engineering ideas:

local model integration in C#
markdown-based AI knowledge design
AI-assisted ingestion workflow
safe structured output parsing
file-based knowledge management
query-time synthesis from curated context

It is a very good educational bridge between:

plain prompt engineering
file-based knowledge systems
and full RAG architectures

Limitations of This POC

Be honest with your readers. This demo is useful, but it is still a demo.

Current limitations:

it is console-based only
article selection is simple and file-based
there is no embedding or semantic ranking layer
article context size is manually bounded
source attribution is prompt-based, not enforced by retrieval metadata
large wiki collections may require a hybrid or full RAG architecture later

That is okay. Simplicity is a feature here.

How to Extend It Later

If your readers want to continue, they can add:

command-line arguments for custom ingestion and queries
ASP.NET Core Minimal API endpoints
Blazor UI for article management
stronger JSON schema validation
better article ranking before query
Semantic Kernel orchestration
hybrid wiki + RAG support
file watchers for automatic wiki refresh

A nice next-step architecture could be:

LLM Wiki for curated stable knowledge
RAG for large and dynamic long-tail knowledge

That gives the best of both worlds.

Conclusion

We built a practical LLM Wiki POC in C# using .NET 8, Ollama, OllamaSharp, markdown files, and the kimi-k2.6:cloud model.

More importantly, we learned a design lesson that is easy to forget in modern AI work:

Not every problem needs RAG first.

Sometimes, the best answer is a clean, curated, lovingly maintained wiki.

If your knowledge is small, stable, and important, a markdown-first LLM Wiki can be wonderfully effective. It is easy to understand, easy to debug, easy to version-control, and easy to teach.

Then, when your world grows larger, you can invite RAG to the party.

Happy coding, and may your markdown always stay tidy.

Love C# & AI!

DEV Community