DEV Community

Richard Petty
Richard Petty

Posted on

Building a Local AI Research Agent in C# — From Zero to Autonomous Research

How I built an AI agent that searches the web, reads pages, and writes research reports — all running on your machine with no cloud API keys required.


If you've used ChatGPT or Claude for research, you know the drill: copy-paste URLs, summarize this, compare that. What if your AI could just... do the research itself?

That's what I built. Axiom is a local AI research agent written in C# that:

  • 🔍 Generates search queries from your topic
  • 🌐 Searches the web (via Brave Search API)
  • 📄 Fetches and reads web pages
  • 🧠 Analyzes content for relevant findings
  • 📝 Synthesizes everything into a structured report

All running locally with Ollama. No cloud AI APIs. No data leaving your machine.

The Architecture

You: "Research persistent memory systems for AI agents"
  ↓
Axiom generates 5-8 search queries
  ↓
Searches Brave API → finds 10-15 sources
  ↓
Fetches top sources, deduplicates by domain
  ↓
Analyzes each page for relevant findings
  ↓
Synthesizes findings into a structured report
  ↓
Saves report as markdown + stores in memory
Enter fullscreen mode Exit fullscreen mode

Tech Stack

  • C# / .NET 8 — Fast, typed, great tooling
  • Ollama — Local LLM inference (llama3.1 8B)
  • SQLite — Memory storage with semantic search
  • Brave Search API — Web search (free tier: 2000 queries/month)

Why C# Instead of Python?

Everyone builds AI agents in Python. That's fine. But C#:

  1. Better tooling — Visual Studio / Rider, strong typing, refactoring
  2. Easier deployment — Single binary, no virtualenv hell
  3. Performance — Faster startup, lower memory
  4. Underserved market — .NET devs want AI tools too

The AI agent space is dominated by LangChain (Python) and LlamaIndex. There's a real gap for .NET developers.

Key Design Decisions

Tool System

Every capability is a ITool:

public interface ITool
{
    string Id { get; }
    string Name { get; }
    string Description { get; }
    string ParametersSchema { get; }
    Task<string> ExecuteAsync(string parameters, CancellationToken ct);
}
Enter fullscreen mode Exit fullscreen mode

The LLM decides which tools to call. The agent orchestrator handles the loop:

User message → LLM → Tool call? → Execute tool → Feed result back → Repeat
Enter fullscreen mode Exit fullscreen mode

Memory with Semantic Search

Instead of a vector database (ChromaDB, FAISS), I used SQLite with embeddings stored as BLOBs:

public async Task StoreAsync(string content, string type, float[] embedding)
{
    // Store embedding as byte array in SQLite
    var blob = new byte[embedding.Length * 4];
    Buffer.BlockCopy(embedding, 0, blob, 0, blob.Length);
    // INSERT INTO memories (content, type, embedding, timestamp) VALUES (...)
}
Enter fullscreen mode Exit fullscreen mode

Cosine similarity search loads embeddings into memory. Works great for thousands of memories — you don't need a vector DB for personal-scale data.

Research Runner

The autonomous research mode (ResearchRunner) orchestrates the full pipeline:

  1. Query generation — Ask the LLM to generate diverse search queries
  2. Search — Hit Brave API with each query, collect URLs
  3. Dedup — Remove duplicate domains (max 2 per domain)
  4. Fetch — Download and extract text from top sources
  5. Analyze — Ask the LLM to extract relevant findings from each page
  6. Synthesize — Combine all findings into a structured report

The whole thing runs in ~15 minutes on a Ryzen 5 5500 with CPU-only inference (8B model).

What I Learned

Small models need guardrails. The 3B model was unreliable for tool calling — it would generate malformed JSON or call non-existent tools. The 8B model is dramatically better. Still not perfect, but usable.

Truncation matters. When synthesizing 8+ findings, the total text can exceed the model's context window. I added per-finding truncation (1500 chars) and a total cap (12K chars). Without this, the model either hallucinates or returns empty responses.

Research quality scales with sources. More search queries → more diverse sources → better findings → better synthesis. I settled on 5-8 queries per topic as a sweet spot.

Try It Yourself

The full source code is on GitHub:

🔗 DynamicCSharp/hex-dynamics — Axiom Research Agent

Or start simpler with our starter kit:

🔗 DynamicCSharp/agentkit — Build your own AI agent in C#

Quick Start

git clone https://github.com/DynamicCSharp/hex-dynamics.git
cd hex-dynamics
# Make sure Ollama is running with llama3.1:8b
dotnet run --project src/Axiom.CLI
Enter fullscreen mode Exit fullscreen mode

What's Next

  • Web UI for dispatching research from a browser (already built, included in repo)
  • Sub-agent spawning — let the research agent delegate sub-tasks
  • Better models — Testing with Mistral, Phi-3, and Qwen2.5 as they improve
  • Memory across sessions — Persistent knowledge that builds over time
  • Multi-model pipelines — Use fast models for extraction, smart models for synthesis

Built by Hex Dynamics — we're building AI tools for developers who want to run everything locally.

If this is useful, give us a ⭐ on GitHub. It helps more than you'd think.

Top comments (0)