DEV Community

Ted Murray
Ted Murray

Posted on

I Built an Agentic Infrastructure Platform in 42 Days. I'm a Windows Sysadmin.

I want to tell you something that still surprises me.

On February 3rd, 2026, I paid for my first AI subscription. I'm 42, a Windows systems administrator for 15+ years, and my GitHub history before this year is mostly simple bash and PowerShell scripts I wrote for myself. I have a 2-year associate degree — enough to clear the HR checkbox, not enough to impress anyone in a developer room.

42 days later, I had built what I now understand is called agentic infrastructure — a three-layer platform where Claude AI agents have persistent memory across sessions, coordinate with each other through structured handoffs, enforce their own filesystem permissions, and run nightly pipelines that distill knowledge from every session into a growing, searchable knowledge base.

I didn't plan this. I didn't follow a tutorial. I didn't know the term "context engineering" when I started. I just had a homelab, a problem, and a new tool that turned out to be more powerful than I realized.

This is the story of how that happened.


It Started With a Backup Script

My homelab is a small fleet of servers — an Unraid NAS running 77+ Docker containers, a TrueNAS backup server, a Debian test box, and a dedicated AI workstation I call claudebox. I manage it the way most homelabbers do: scripts, wikis, too many browser tabs, and institutional memory that lives entirely in my head.

The first thing I used Claude for was writing a backup script. Nothing revolutionary — I wanted something that would stop my Docker containers, rsync the appdata, restart them, and notify me if anything went wrong.

Claude wrote it in minutes. Tested it. Fixed a edge case I hadn't thought of. Done.

That was supposed to be the end of it.

Around that same time I was watching YouTube videos about OpenClaw — an open-source AI assistant that had been getting attention in the homelab community. I understood the general idea: an AI agent you could run yourself, in your own environment. But I didn't actually know what an "agent" was in any technical sense. I just knew that what I saw in those videos wasn't quite what I wanted.

So I did something that turned out to be the key decision of this whole project: instead of installing OpenClaw, I asked Claude what I actually needed.

I described my setup. My goals. What I was frustrated with. We brainstormed. Claude walked me through the real options — what OpenClaw does, what LibreChat does, what MCP is, what Claude Desktop with the right integrations could become. By the end of that conversation I had a clearer picture of what I was trying to build than I would have gotten from any tutorial.

That brainstorm session became the blueprint. Not a copy of someone else's setup. Mine.

I started with Claude Desktop on a dedicated mini PC, remote access via Guacamole so I could reach it from anywhere, and a handful of MCP servers to give Claude real infrastructure access. That first working version was already more useful than anything I'd seen in those YouTube videos.

Then I thought: what if Claude could just... know my setup? Not just during this session — persistently. What if instead of copying context into every conversation, it had memory? What if it could query my monitoring dashboards, read my Docker configs, check on running services, and remember what we decided last week?

That question turned into six weeks of building.


What I Built

The project is called homelab-agent and it's open source on GitHub. Let me describe what it actually is, because "homelab assistant" undersells it.

Layer 1: Claude With Real Infrastructure Access

The foundation is Claude Desktop with MCP (Model Context Protocol) servers — structured tool integrations that give Claude direct, programmatic access to infrastructure instead of copy-paste workflows.

I connected Claude to Netdata (real-time system metrics), Grafana (dashboards and alerts), my Unraid and TrueNAS APIs, GitHub, and a custom HTTP server I wrote that handles shell commands, file reads, and process management on the host. I also added SearXNG — a self-hosted meta-search engine — so Claude can search the web without calling home to Google.

The result: Claude stops being a chatbot you explain things to and becomes an operator that already knows your setup.

Layer 2: A Self-Hosted AI Platform

On top of that foundation, I run a Docker-based service stack: LibreChat (a self-hosted multi-provider chat UI), Authelia for SSO, SWAG as the reverse proxy, and observability tooling (Grafana, InfluxDB, Loki). This gives household access to multiple AI providers through a single interface with one login — not just my Claude Desktop session.

Layer 3: The Multi-Agent Engine

This is where things got interesting.

I use Claude Code — Anthropic's AI coding tool — as a multi-agent platform. Different "agents" handle different domains: one for homelab operations, one for development, one for research, one for memory management. Each agent has its own context file (CLAUDE.md) that scopes what it knows and what it's allowed to do.

And each agent has memory.


The Memory Problem

Here's the thing about AI agents: they're stateless by default. Every new conversation starts from zero. You re-explain your setup, your preferences, what you decided last week. It's like having a brilliant contractor who forgets everything between visits.

I designed a four-tier memory system to solve this:

Session tier — Every conversation is automatically summarized to disk. Semantic search makes past sessions retrievable.

Working tier — Agents promote important decisions and findings to structured markdown files with YAML frontmatter: creation date, expiry (90 days), tags, tier classification.

Distilled tier — A headless Claude Code agent runs every night at 4 AM. It reviews working memory notes, applies a "would this matter in 3 months?" filter, and commits qualifying entries to a git-backed permanent knowledge base.

Core context — A 40-line always-visible block injected at every session start via a hook. User profile, active projects, key constraints, recent decisions. Never scrolls out of context.

Knowledge flows upward through tiers automatically. By Monday morning, Claude already knows about Friday's Docker stack change and Saturday's monitoring alert. There's no manual curation required.

I found out later that the ICLR 2026 MemAgents workshop — a machine learning research conference — was specifically organized around this problem: "principled memory substrates for agentic systems." Academics wrote papers about it. I had accidentally built a working implementation.


The Permission Problem

Once Claude has filesystem write access, you need to think carefully about what it can touch.

I built the Agent Workspace Protocol: declarative AGENT_WORKSPACE.md marker files at seven filesystem roots that define what access is allowed. Each agent also has a manifest declaring what it claims to need. An edit can only proceed if both the workspace marker and the agent manifest agree — stricter of the two wins.

An hourly background job (Python script, PM2 cron) validates all markers, auto-commits any tracking drift in git-backed directories, cross-references manifests against markers for conflicts, and emits structured security events to InfluxDB and Loki tagged with CIA-triad classifications (confidentiality, integrity, availability).

There's also rogue agent detection wired in — disabled while it calibrates a baseline from two weeks of normal operation, then it'll flag agents that suddenly start touching paths they've never touched before.

I'm familiar with identity-based access control from years of managing Office 365, Entra ID, and Azure — scoped roles, least-privilege policies, who can touch what. I applied that same thinking here. I haven't seen this filesystem-level two-party model in any comparable AI project. It emerged because I thought about what could go wrong.


The Self-Healing Documentation Problem

Infrastructure docs rot. Services get added, configs get changed, and the documentation lags behind until it's actively misleading.

I have a doc-health agent that runs weekly (full scan, Claude Opus) and nightly (delta scan, Claude Sonnet). It checks for drift between docs and reality, coverage gaps for new services, stale references to changed infrastructure, and leaked internal IPs or API keys. It auto-commits mechanical fixes (index entries) and surfaces everything else as a report.

The interesting part is the feedback loop: when agents modify infrastructure, they append to a daily-touched-files.json tracker. When a writer agent runs, it updates docs for changed components and triggers a targeted re-scan to confirm its own work. The nightly scan catches anything remaining and resets the tracker.

The system verifies its own corrections.


What This Actually Took

The honest answer: I don't write code in the traditional sense. Claude writes the code. I provide the vision, the infrastructure instincts, the architectural decisions, and the problem framing.

What I brought was 15 years of Windows systems administration — thinking about failure modes, permissions models, backup strategies, retention policies, operational health monitoring. Every design pattern in homelab-agent traces back to something I've seen break in a production environment.

The AI cost tracking pipeline (Claude Code session logs → Telegraf → InfluxDB → Grafana) exists because I've always metered infrastructure costs. The nightly backup script with stop-rsync-restart sequencing exists because I've seen live copies get corrupted. The two-party permission model exists because I've managed multi-admin environments where whoever touches something last owns it.

I didn't learn systems thinking in 42 days. I've been developing it for 15 years. I just found a medium that let it show.


What I Didn't Expect

I expected to build a homelab assistant. I didn't expect to be ahead of academic research on agent memory systems.

I expected to learn some Docker and maybe a little Python. I didn't expect to end up designing permission models and self-healing architectures that I can't find equivalents of in any comparable project.

I expected AI to be a productivity tool. I didn't expect it to be a creative medium — one where infrastructure instincts and systems thinking translate directly into novel technical designs.

Anthropic's own internal research, published in December 2025, found that 27% of Claude-assisted work consists of tasks that wouldn't have happened otherwise — work that's too exploratory, too niche, or too cost-prohibitive without AI assistance. Every component of homelab-agent is in that 27%. This project doesn't exist in any form without Claude.


The Repository

homelab-agent is open source and documented as a reference architecture. It's not a one-click installer — it's a documented system you can understand and adapt. The README explains all three layers. There's a getting-started guide with explicit stopping points if you want Layer 1 without the full stack. Component deep-dives cover every service with configuration details and design decisions.

If you want to jump in without reading everything, the repo has an index.md — a machine-readable navigation file designed to be handed directly to an AI assistant. Point Claude at it and say "help me figure out which components to adopt based on my setup." It'll ask about your hardware, your existing services, and your goals, then map a path through the docs. That's the intended on-ramp.

If you're a homelabber who wants Claude to actually know your setup, start there.
If you're an AI infrastructure builder looking at agent memory patterns or permission models, the architecture docs are the interesting part.
If you're hiring for agentic infrastructure roles and you made it this far — hi.

homelab-agent on GitHub →


This is the first post in a series. Next: the memory architecture in depth — how four tiers of knowledge accumulation work together and what it looks like to build memory for AI systems from first principles.

Top comments (0)