DEV Community

Cover image for Your AI Agents Deserve the Same Ops Treatment as Your Microservices
rdmnl
rdmnl

Posted on

Your AI Agents Deserve the Same Ops Treatment as Your Microservices

Your AI Agents Deserve the Same Ops Treatment as Your Microservices

A few months ago I was looking at how our team was actually running AI agents in production. One was a Python script in a tmux session on someone's laptop. Another was a cron job with no timeout. A third had no cost limits — it had quietly burned through $800 in API calls over a weekend because it got stuck in a loop.

None of this would fly for a microservice. We'd never ship a service with no health checks, no resource limits, and no way to roll back a bad deploy. But agents were getting a free pass because they felt different somehow. They're AI, not "real" infrastructure.

I don't think that's a good enough reason.


The thing is, agents are just workloads

Strip away the LLM part and an agent is a long-running process that consumes resources, has a health state, needs to scale, and requires configuration management. That's just a service. Kubernetes already knows how to manage services.

The missing piece was a way to tell Kubernetes what an agent is — not in terms of CPU and memory, but in terms of model, system prompt, and tool access.

So I built a Kubernetes operator that does exactly that: ARKONIS — Agentic Reconciler for Kubernetes, Operator-Native Inference System.


What it actually looks like

You define an agent the same way you define a Deployment:

apiVersion: arkonis.dev/v1alpha1
kind: ArkonisDeployment
metadata:
  name: research-agent
spec:
  replicas: 3
  model: claude-sonnet-4-20250514
  systemPrompt: |
    You are a research agent. Gather and summarise information
    accurately. Always cite your sources.
  limits:
    maxTokensPerCall: 8000
    maxConcurrentTasks: 5
    timeoutSeconds: 120
Enter fullscreen mode Exit fullscreen mode
kubectl apply -f research-agent.yaml
kubectl get aodep
# NAME             MODEL                      REPLICAS   READY   AGE
# research-agent   claude-sonnet-4-20250514   3          3       45s
Enter fullscreen mode Exit fullscreen mode

Three agent pods, managed by Kubernetes. Scale to 10:

kubectl patch aodep research-agent --type=merge -p '{"spec":{"replicas":10}}'
Enter fullscreen mode Exit fullscreen mode

GitOps, RBAC, namespaces, kubectl — all of it works without modification because agents are just Kubernetes resources now.


The part I'm most proud of: semantic health checks

Standard liveness probes check if a process is responding to HTTP. That's fine for a web server. For an LLM, a process can be "alive" while producing complete nonsense.

ARKONIS adds a semantic probe type — a secondary LLM call that validates whether the agent is actually working:

livenessProbe:
  type: semantic
  intervalSeconds: 60
  validatorPrompt: "Reply with exactly one word: HEALTHY"
Enter fullscreen mode Exit fullscreen mode

If the agent fails that check, the pod gets pulled from routing until it recovers. Same as any other failing health check, except the health check understands what "healthy" actually means for an LLM.


Token limits that you can't accidentally delete

This was the $800 problem. The fix: limits live in the infrastructure, not in application code.

limits:
  maxTokensPerCall: 8000
  maxConcurrentTasks: 5
  timeoutSeconds: 120
Enter fullscreen mode Exit fullscreen mode

The operator injects these as environment variables into every agent pod it creates. A developer can't remove them by editing the wrong file. A misconfigured prompt can't cause an infinite loop that runs until your credit card gets declined.


Rolling back a bad system prompt

Change a prompt → open a PR → merge → kubectl apply. Roll back → git revertkubectl apply. The full history of who changed what prompt and when is in git, same as any other infrastructure change.

This sounds obvious until you've had to figure out why an agent started behaving differently last Tuesday and nobody can remember what changed.


Multi-agent pipelines without the glue code

This is the one that surprised me the most when it actually worked. You can chain agents together declaratively:

apiVersion: arkonis.dev/v1alpha1
kind: ArkonisPipeline
metadata:
  name: research-then-summarize
spec:
  input:
    topic: "AI in healthcare"
  steps:
    - name: research
      arkonisDeployment: research-agent
      inputs:
        prompt: "Research this topic: {{ .pipeline.input.topic }}"
    - name: summarize
      arkonisDeployment: summarizer-agent
      dependsOn: [research]
      inputs:
        prompt: "Summarize these findings: {{ .steps.research.output }}"
  output: "{{ .steps.summarize.output }}"
Enter fullscreen mode Exit fullscreen mode

The operator handles the queue, waits for each step to complete, passes the output to the next step, and updates the pipeline status. I tested this locally and watched kubectl get aopipe -w go from Running to Succeeded while two separate LLMs did their thing. It's a bit surreal.


Try it

Prerequisites: Docker, kind, kubectl, Go 1.25+

git clone https://github.com/arkonis-dev/arkonis-operator.git
cd arkonis-operator
make dev ANTHROPIC_API_KEY=sk-ant-...
Enter fullscreen mode Exit fullscreen mode

That one command creates a kind cluster, builds both Docker images, deploys Redis and the operator inside the cluster, and sets up the API key secret. When it finishes, deploy an agent:

kubectl apply -f config/samples/arkonis_v1alpha1_arkonisdeployment.yaml
kubectl get aodep -w
Enter fullscreen mode Exit fullscreen mode

To see the LLM actually respond, submit a task:

kubectl exec -it -n agent-infra redis-0 -- \
  redis-cli XADD agent-tasks '*' prompt "What is the capital of France? One sentence."

kubectl exec -it -n agent-infra redis-0 -- \
  redis-cli XREAD COUNT 10 STREAMS agent-tasks-results 0
# "The capital of France is Paris."
Enter fullscreen mode Exit fullscreen mode

Honest caveats

This is v0.3.0, single contributor, early alpha. What's shipped:

  • All five CRDs with full controllers (ArkonisDeployment, ArkonisService, ArkonisConfig, ArkonisPipeline, ArkonisMemory)
  • Redis Streams task queue with consumer groups, dead-letter queue, and retry/backoff
  • Persistent memory backends: in-context, Redis, and vector store (Qdrant)
  • Semantic liveness probes
  • Typed artifact passing between pipeline steps with JSON schema validation

What isn't done yet:

  • Parallel pipeline steps — steps without dependencies still run sequentially
  • KEDA autoscaling on queue depth — CPU is the wrong signal for agent workloads, queue depth is right, not wired yet
  • Only Anthropic for now — the provider interface exists for OpenAI/Gemini but no implementations yet

I'm writing this because I think the problem is real and worth solving, not because the solution is finished.


If any of this sounds familiar — agents running in tmux sessions, no cost controls, prompt changes deployed by SSHing into a box — I'd genuinely like to hear how you're handling it. And if you want to contribute, CONTRIBUTING.md has the setup.

GitHub: arkonis-dev/arkonis-operator
Docs: arkonis.dev

Top comments (2)

Collapse
 
klement_gunndu profile image
klement Gunndu

The semantic health check via a secondary LLM call is the part that clicks — standard liveness probes never catch an agent that's technically running but producing garbage. Ran into the exact same runaway cost problem with an agent stuck in a retry loop over a weekend.

Collapse
 
arbilgin profile image
Arda
print("nice tool!")
Enter fullscreen mode Exit fullscreen mode