DEV Community

Hamdi (KHELIL) LION
Hamdi (KHELIL) LION

Posted on

πŸ€– Kubernetes as Your AI Control Plane: Running Claude and Ollama Agents with kagent 🧠

πŸ‘‹ Hey there

If you run a homelab, you already have a Kubernetes cluster sitting there humming away. So why send every AI workload off to a SaaS dashboard when you can run agents right next to your pods, with the same kubectl and GitOps flow you already trust?

That is exactly what kagent gives us. It is a CNCF sandbox project (started at Solo.io, built by folks from the Istio world) that turns AI agents into first class Kubernetes workloads. Agents, models, and tools all become custom resources you apply with kubectl.

πŸ‘‰ The big idea: kagent puts one clean abstraction between your agents and your models, so you can point the same agent at Claude in the cloud or at Ollama running in your own cluster, and switch by changing a single field.

In this guide we will build a small but real inference stack in a homelab cluster, wired to both.

🧱 What we will cover

  • βœ… What the kagent inference stack actually looks like
  • βœ… Installing kagent with Helm
  • βœ… Wiring up Claude for cloud inference (the ModelConfig resource)
  • βœ… Deploying Ollama inside the cluster for fully local, on prem inference
  • βœ… Building agents that use each model with real Kubernetes tools
  • βœ… Swapping models with a one line change
  • βœ… Adding a human approval gate so your agent never nukes the wrong namespace

πŸ—ΊοΈ The inference stack, layer by layer

Before we touch YAML, here is the mental model. In kagent your "inference stack" is just a few layers, each a Kubernetes resource or component:

  • βœ… Model provider: where the tokens come from. Claude via the Anthropic API, or Ollama serving a local model.
  • βœ… ModelConfig (CRD): tells kagent which provider, which model, and which secret holds the key.
  • βœ… Agent (CRD): a system prompt plus a set of tools plus a reference to a ModelConfig.
  • βœ… Tools over MCP: kagent ships a tool server that exposes Kubernetes actions as tools your agent can call.
  • βœ… Controller and runtime: the Go controller watches the CRDs, and the runtime (Agent Development Kit) runs the agent loop.
  • βœ… Storage: a bundled PostgreSQL instance for agent state (swap it for your own in production).

The nice part is that the Agent resource does not care where the model lives. It just references a ModelConfig by name. Cloud or local, the agent spec looks the same.

🧰 Prerequisites

Nothing exotic, just homelab basics:

  • βœ… A running Kubernetes cluster (k3s, kind, minikube, or your bare metal cluster all work)
  • βœ… kubectl and helm installed and pointing at that cluster
  • βœ… An Anthropic API key for the Claude part (grab one from the Anthropic console)
  • βœ… Enough CPU and RAM on a node for a local model (an 8B model like llama3.1 is happy with a couple of cores and around 8Gi of memory)

Heads up on versions: everything below was verified against kagent v0.9 (CRD apiVersion kagent.dev/v1alpha2), current Claude model ids claude-haiku-4-5 and claude-sonnet-4-5, and the Ollama llama3.1 model, all checked in July 2026. kagent moves fast, so if a field looks different, check the release notes.

πŸ“¦ Step 1: Install kagent

kagent installs as two Helm charts: one for the CRDs, one for the controller and friends. Since v0.7, the kmcp tool subproject comes bundled, so you get the built in Kubernetes tools out of the box.

First, install the CRDs:

helm install kagent-crds oci://ghcr.io/kagent-dev/kagent/helm/kagent-crds \
  --namespace kagent \
  --create-namespace
Enter fullscreen mode Exit fullscreen mode

Now install kagent itself. We will set Anthropic as the default provider so we get a working Claude backed model config the moment the pods come up:

export ANTHROPIC_API_KEY="your-anthropic-key-here"

helm install kagent oci://ghcr.io/kagent-dev/kagent/helm/kagent \
  --namespace kagent \
  --set providers.default=anthropic \
  --set providers.anthropic.apiKey=$ANTHROPIC_API_KEY
Enter fullscreen mode Exit fullscreen mode

Give it a minute, then check the pods:

kubectl get pods -n kagent
Enter fullscreen mode Exit fullscreen mode

You should see the controller, the UI, the kagent tool server, and a bundled PostgreSQL pod.

Want the dashboard? Port forward the UI:

kubectl port-forward -n kagent svc/kagent-ui 8080:8080
Enter fullscreen mode Exit fullscreen mode

Then open http://localhost:8080.

Homelab tip: the default install ships a bundled postgres:18 pod so you can play right away. For anything you actually care about, point kagent at your own PostgreSQL instead.

☁️ Step 2: Wire up Claude for cloud inference

Because we passed the Anthropic provider at install time, kagent already created a default ModelConfig for us. In v0.9 that default uses claude-haiku-4-5, which is fast and cheap and perfect for quick cluster questions.

Let us look at what landed:

kubectl get modelconfigs -n kagent
Enter fullscreen mode Exit fullscreen mode

Now let us be explicit and add our own ModelConfig that pins a stronger model, claude-sonnet-4-5, for the heavier reasoning tasks. First store the key in a Secret (kagent reads the model key from a Secret you control):

export ANTHROPIC_API_KEY="your-anthropic-key-here"

kubectl create secret generic kagent-anthropic -n kagent \
  --from-literal ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY
Enter fullscreen mode Exit fullscreen mode

Then create the ModelConfig that references it:

apiVersion: kagent.dev/v1alpha2
kind: ModelConfig
metadata:
  name: claude-sonnet
  namespace: kagent
spec:
  provider: Anthropic
  model: claude-sonnet-4-5
  apiKeySecret: kagent-anthropic
  apiKeySecretKey: ANTHROPIC_API_KEY
  anthropic: {}
Enter fullscreen mode Exit fullscreen mode

Apply it:

kubectl apply -f claude-modelconfig.yaml
Enter fullscreen mode Exit fullscreen mode

That is your cloud inference layer done. Two model configs now live in the cluster: a fast Haiku default and a Sonnet config for the deeper work.

Note on model ids: claude-haiku-4-5 and claude-sonnet-4-5 are convenience aliases that resolve to the latest dated snapshot. If you want a fully pinned version for reproducibility, use the dated form such as claude-haiku-4-5-20251001.

🏠 Step 3: Deploy Ollama for on prem inference

Now the fun part for homelabbers: running the model yourself, no tokens leaving your network.

We will run Ollama as a normal Deployment, give it a PersistentVolumeClaim so the downloaded model survives restarts (models are big, you do not want to re download on every reschedule), and expose it with a Service.

Create the namespace:

kubectl create ns ollama
Enter fullscreen mode Exit fullscreen mode

Apply the Ollama stack:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ollama-models
  namespace: ollama
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
  namespace: ollama
spec:
  replicas: 1
  selector:
    matchLabels:
      name: ollama
  template:
    metadata:
      labels:
        name: ollama
    spec:
      containers:
        - name: ollama
          image: ollama/ollama:latest
          ports:
            - name: http
              containerPort: 11434
              protocol: TCP
          resources:
            requests:
              cpu: "2"
              memory: 8Gi
            limits:
              memory: 12Gi
          volumeMounts:
            - name: models
              mountPath: /root/.ollama
      volumes:
        - name: models
          persistentVolumeClaim:
            claimName: ollama-models
---
apiVersion: v1
kind: Service
metadata:
  name: ollama
  namespace: ollama
spec:
  type: ClusterIP
  selector:
    name: ollama
  ports:
    - name: http
      port: 80
      targetPort: http
      protocol: TCP
Enter fullscreen mode Exit fullscreen mode

Wait for the pod to be ready:

kubectl get pods -n ollama -w
Enter fullscreen mode Exit fullscreen mode

Once it is running, pull a model. This matters: kagent drives agents through tool calling, so you must use a model that supports function calling. Plain llama3 does not do this reliably, but llama3.1 and newer (or qwen2.5 and above) do. We will use llama3.1:

kubectl -n ollama exec deploy/ollama -- ollama pull llama3.1
Enter fullscreen mode Exit fullscreen mode

Confirm it landed:

kubectl -n ollama exec deploy/ollama -- ollama list
Enter fullscreen mode Exit fullscreen mode

Your local model is now reachable inside the cluster at:

http://ollama.ollama.svc.cluster.local
Enter fullscreen mode Exit fullscreen mode

(The Service listens on port 80 and forwards to the container on 11434, so no port number is needed in the URL.)

πŸ”Œ Step 4: Tell kagent about your local model

Same ModelConfig pattern as Claude, just a different provider and a host that points at the in cluster Ollama Service.

One small quirk: Ollama does not need an API key, but the ModelConfig field still expects a secret reference. So we create a throwaway placeholder secret (Ollama ignores the value):

kubectl create secret generic kagent-ollama -n kagent \
  --from-literal OLLAMA_API_KEY=ollama-placeholder
Enter fullscreen mode Exit fullscreen mode

Now the ModelConfig:

apiVersion: kagent.dev/v1alpha2
kind: ModelConfig
metadata:
  name: ollama-llama31
  namespace: kagent
spec:
  provider: Ollama
  model: llama3.1
  apiKeySecret: kagent-ollama
  apiKeySecretKey: OLLAMA_API_KEY
  ollama:
    host: http://ollama.ollama.svc.cluster.local
Enter fullscreen mode Exit fullscreen mode

Apply it:

kubectl apply -f ollama-modelconfig.yaml
Enter fullscreen mode Exit fullscreen mode

Check that both cloud and local configs are now registered:

kubectl get modelconfigs -n kagent
Enter fullscreen mode Exit fullscreen mode

You now have a hybrid inference stack: Claude for the heavy lifting, Ollama for private, offline friendly work, both described the same declarative way.

🧠 Step 5: Build an agent that uses your models

Here is where the abstraction pays off. An Agent is just a system prompt, a set of tools, and a modelConfig reference. kagent ships a built in tool server called kagent-tool-server that exposes Kubernetes actions as tools, so our agent can actually inspect the cluster.

Let us create a Claude backed cluster helper:

apiVersion: kagent.dev/v1alpha2
kind: Agent
metadata:
  name: cluster-helper-claude
  namespace: kagent
spec:
  description: A friendly Kubernetes helper backed by Claude.
  type: Declarative
  declarative:
    modelConfig: claude-sonnet
    systemMessage: |
      You are a friendly and careful Kubernetes helper.

      # Instructions
      - Use the available tools to answer questions about the cluster.
      - If a question is unclear, ask for clarification before running any tool.
      - Never invent an answer. If you are unsure, say so.

      # Response format
      - Always format your response as Markdown.
      - Summarize the actions you took and explain the result.
    tools:
      - type: McpServer
        mcpServer:
          name: kagent-tool-server
          kind: RemoteMCPServer
          apiGroup: kagent.dev
          toolNames:
            - k8s_get_resources
            - k8s_get_available_api_resources
            - k8s_describe_resource
Enter fullscreen mode Exit fullscreen mode

Apply it:

kubectl apply -f agent-claude.yaml
Enter fullscreen mode Exit fullscreen mode

πŸ‘‰ Now the magic trick. To run the exact same agent on your local model instead of Claude, you change one field: modelConfig. Here is the on prem twin:

apiVersion: kagent.dev/v1alpha2
kind: Agent
metadata:
  name: cluster-helper-local
  namespace: kagent
spec:
  description: The same helper, running fully local on Ollama.
  type: Declarative
  declarative:
    modelConfig: ollama-llama31
    systemMessage: |
      You are a friendly and careful Kubernetes helper.

      # Instructions
      - Use the available tools to answer questions about the cluster.
      - If a question is unclear, ask for clarification before running any tool.
      - Never invent an answer. If you are unsure, say so.

      # Response format
      - Always format your response as Markdown.
      - Summarize the actions you took and explain the result.
    tools:
      - type: McpServer
        mcpServer:
          name: kagent-tool-server
          kind: RemoteMCPServer
          apiGroup: kagent.dev
          toolNames:
            - k8s_get_resources
            - k8s_get_available_api_resources
            - k8s_describe_resource
Enter fullscreen mode Exit fullscreen mode
kubectl apply -f agent-local.yaml
Enter fullscreen mode Exit fullscreen mode

Same prompt, same tools, different brain. That is the whole point of putting a ModelConfig in front of your agents.

πŸ’¬ Step 6: Talk to your agents

You can chat through the dashboard, but the CLI is quicker for a test. Grab the kagent CLI:

brew install kagent
Enter fullscreen mode Exit fullscreen mode

or

curl https://raw.githubusercontent.com/kagent-dev/kagent/refs/heads/main/scripts/get-kagent | bash
Enter fullscreen mode Exit fullscreen mode

List your agents:

kagent get agent
Enter fullscreen mode Exit fullscreen mode

Ask the Claude agent something:

kagent invoke -t "What API resources are available in my cluster?" --agent cluster-helper-claude
Enter fullscreen mode Exit fullscreen mode

Then ask the local one the same thing and compare:

kagent invoke -t "List the pods in the kagent namespace and tell me if any are unhealthy." --agent cluster-helper-local
Enter fullscreen mode Exit fullscreen mode

The local agent keeps every token inside your homelab, which is a lovely thing when you are poking at private workloads.

πŸ›‘οΈ Step 7: Add a safety gate (highly recommended)

Giving an AI agent access to your cluster is great until it decides to be too helpful. kagent has a built in Human in the Loop feature: mark any tool as needing approval, and the agent pauses for your yes or no before running it.

Here is an agent that can read freely but must ask before it changes anything:

apiVersion: kagent.dev/v1alpha2
kind: Agent
metadata:
  name: cluster-operator
  namespace: kagent
spec:
  description: A Kubernetes operator agent with approval gates on destructive actions.
  type: Declarative
  declarative:
    modelConfig: claude-sonnet
    systemMessage: |
      You are a Kubernetes operator assistant.
      Before making any change, explain what you plan to do and why.
    tools:
      - type: McpServer
        mcpServer:
          name: kagent-tool-server
          kind: RemoteMCPServer
          apiGroup: kagent.dev
          toolNames:
            - k8s_get_resources
            - k8s_describe_resource
            - k8s_apply_manifest
            - k8s_delete_resource
          requireApproval:
            - k8s_apply_manifest
            - k8s_delete_resource
Enter fullscreen mode Exit fullscreen mode
kubectl apply -f agent-operator.yaml
Enter fullscreen mode Exit fullscreen mode

Now the read tools run freely, but the moment the agent tries to apply or delete something, the UI shows Approve and Reject buttons. If you reject, your reason is fed back to the model as context. Two lines of YAML, a lot of peace of mind.

Good to know: every tool you list in requireApproval must also appear in toolNames, and kagent validates this when the resource is created.

πŸ” Why this pattern is nice for a homelab

  • βœ… One workflow: agents, models, and tools are all just kubectl apply, so GitOps and PR reviews work exactly like the rest of your cluster.
  • βœ… No lock in: swap Claude for Ollama, or route different agents to different models, without rewriting anything.
  • βœ… Cost control: send routine questions to a local model, save the cloud model for the hard stuff.
  • βœ… Privacy by default: keep sensitive workloads on the on prem model, no data leaves the house.
  • βœ… Guardrails built in: approval gates and clear observability instead of a black box.

πŸš€ What is next

A few directions to keep exploring from here:

  • βœ… Point kagent at an external PostgreSQL and enable memory so your agents remember context across sessions.
  • βœ… Try the Go runtime for faster agent cold starts by setting runtime: go in the declarative spec.
  • βœ… Add your own MCP tools with kmcp so agents can talk to your internal APIs.
  • βœ… Put a GPU node under Ollama and jump to a bigger local model for better tool use.

πŸ¦† Bonus: drive it all from Goose

Once your stack is up, you do not have to live inside the kagent UI. Goose is a local, open source AI agent (CLI and desktop, now part of the Linux Foundation Agentic AI Foundation) that speaks MCP. That means it can plug straight into what you just built, in two complementary ways.

Think of it like this: Goose needs a brain (a model provider) and it can borrow tools (MCP extensions). Your cluster can supply both.

πŸ”‘ First, give Goose a brain (your deployed models)

Install Goose from the official install page (https://goose-docs.ai/docs/getting-started/installation), then run the config wizard:

goose configure
Enter fullscreen mode Exit fullscreen mode

Point it at Claude, using the same Anthropic key you gave kagent. Goose will ask for your API key and store it in your system keyring:

β”Œ   goose-configure
β”‚
β—‡  What would you like to configure?
β”‚  Configure Providers
β”‚
β—‡  Which model provider should we use?
β”‚  Anthropic
β”‚
β—‡  Enter a model from that provider:
β”‚  claude-sonnet-4-5
β””  Configuration saved successfully
Enter fullscreen mode Exit fullscreen mode

Prefer to stay fully local and reuse the Ollama you deployed in Step 3? Goose runs on your workstation, so first port forward the in cluster Ollama Service (it listens on port 80 and targets the container on 11434):

kubectl -n ollama port-forward svc/ollama 11434:80
Enter fullscreen mode Exit fullscreen mode

Then configure Goose to use Ollama. If you do not set a host, Goose defaults to localhost:11434, which is exactly where the port forward lands:

β”Œ   goose-configure
β”‚
β—‡  What would you like to configure?
β”‚  Configure Providers
β”‚
β—‡  Which model provider should we use?
β”‚  Ollama
β”‚
β—‡  Enter a model from that provider:
β”‚  llama3.1
β””  Configuration saved successfully
Enter fullscreen mode Exit fullscreen mode

πŸ‘‰ One thing to keep in mind: Goose leans heavily on tool calling, so pick a model that is good at it. llama3.1 and up or qwen2.5 handle light tasks, but for serious agent loops Claude will be a lot more reliable.

πŸ”— Then, hand Goose your kagent agents as tools

kagent exposes every running agent through an MCP server built into the control plane, over the Streamable HTTP transport. So Goose can discover and call your kagent agents as if they were tools.

Port forward the kagent control plane:

kubectl port-forward -n kagent svc/kagent-controller 8083:8083
Enter fullscreen mode Exit fullscreen mode

Add it to Goose as a remote extension:

goose configure
Enter fullscreen mode Exit fullscreen mode
β”Œ   goose-configure
β”‚
β—‡  What would you like to configure?
β”‚  Add Extension
β”‚
β—‡  What type of extension would you like to add?
β”‚  Remote Extension (Streamable HTTP)
β”‚
β—‡  What would you like to call this extension?
β”‚  kagent-agents
β”‚
β—‡  What is the Streamable HTTP endpoint URI?
β”‚  http://localhost:8083/mcp
β”‚
β—‡  Please set the timeout for this tool (in secs):
β”‚  300
β””  Added kagent-agents extension
Enter fullscreen mode Exit fullscreen mode

That writes an entry to ~/.config/goose/config.yaml that looks like this:

extensions:
  kagent-agents:
    enabled: true
    type: streamable_http
    name: kagent-agents
    uri: http://localhost:8083/mcp
    timeout: 300
Enter fullscreen mode Exit fullscreen mode

The kagent MCP server hands Goose two tools: list_agents to discover what is available, and invoke_agent to run a specific agent by name (with session support for follow ups). Start a session:

goose session
Enter fullscreen mode Exit fullscreen mode

and ask something like:

List my kagent agents, then use the cluster-helper-local agent to check for unhealthy pods in the kagent namespace.
Enter fullscreen mode Exit fullscreen mode

Goose will call list_agents, pick cluster-helper-local, and delegate the cluster work to it through kagent. You end up with Goose as the friendly front door, your kagent agents as specialized workers, and Claude or Ollama as the brain, all wired together through open protocols.

Note: this kagent endpoint currently supports Streamable HTTP, not SSE. If you put kagent behind a gateway you can skip the port forward and point Goose at the public URL instead.

🎁 Wrapping Up

We went from an empty homelab cluster to a working, hybrid AI inference stack: kagent as the control plane, Claude for cloud inference, and Ollama running fully local, all described as plain Kubernetes resources. The best part is how boring it feels in the best way. Agents are just workloads, models are just config, and swapping between cloud and on prem is a one line change.

If you have been waiting for a reason to bring agentic AI into your cluster without handing everything to a SaaS, this is a really pleasant place to start.

Happy clustering and stay safe! πŸ§‘β€πŸš€

Top comments (0)