What I Learned at the CNCF Montreal KubeCon NA 2025 Recap

#ai #community #kubernetes

On December 10th, the Cloud Native Montreal community hosted a recap of KubeCon NA 2025 in Atlanta. Rather than being a traditional conference, this was a community-driven evening with lightning talks and reflections on where the cloud-native ecosystem is heading.

Instead of focusing on slides or announcements, the event emphasized patterns and lessons emerging across the ecosystem — from AI agents and observability to GitOps and energy-aware infrastructure.

Here are the key takeaways that stood out.

Cloud Native Is Becoming AI-Native

One recurring theme was that AI workloads are now first-class citizens in cloud-native environments.

Traditional observability answers questions like:

Is the service up?
Is latency within SLOs?

AI systems introduce new operational questions:

What prompt triggered this behavior?
Which model call was expensive?
Why did this agent take a specific action?

Tools such as OpenLLMetry extend OpenTelemetry with instrumentation for LLM and agent workflows, while OpenCost provides visibility into Kubernetes and cloud spend across workloads, teams, and environments.

The takeaway is clear:

You can’t scale AI systems you can’t observe or financially understand.

Observability Is Shifting From Dashboards to Agents

Observability is evolving beyond dashboards and alerts toward agent-assisted operations.

Instead of engineers manually correlating metrics, logs, and recent deployments, emerging tools aim to:

Perform root-cause analysis
Triage alerts
Recommend remediation steps

Projects like k8sgpt, Seraph, and newer agentic SRE tools suggest a future where observability systems don’t just surface data — they actively reason over it.

Several tools highlighted this shift:

k8sgpt — AI-native Kubernetes troubleshooting
HolmesGPT / Seraph — Automated root cause analysis and alert mitigation

Emerging Agent-Based Platforms:

These agents correlate logs, metrics, deployments, and incidents to assist on-call engineers and reduce alert fatigue.

This doesn’t replace engineers, but it changes the workflow: less time searching for signals, more time making informed decisions.

Abstraction Helps — but Security Must Follow

Another major topic was Cyclops, an open-source platform that simplifies Kubernetes by replacing raw YAML with structured, form-based abstractions.

Cyclops introduces:

Modules — logical groupings of all Kubernetes resources an application needs
Templates — mappings that translate module inputs into valid Kubernetes manifests

How Cyclops works with Helm:

Helm charts define the Kubernetes resources (Deployments, Services, Ingress, etc.) using templated YAML.

Cyclops wraps those Helm charts and exposes their values as validated forms instead of free-text YAML edits.
Users fill in forms, and Cyclops renders the underlying Helm templates into valid Kubernetes manifests.

Cyclops also supports AI-driven operations through a Model Context Protocol (MCP) server, allowing agents to manage applications using natural language rather than direct cluster access.

The key lesson here wasn’t blind automation, but caution:

Code generated by AI should be treated as untrusted.

Security risks still apply. As abstraction increases, guardrails, validation, and testing become even more critical.

GitOps Works Best When Designed for Teams

A practical GitOps case study highlighted that repository structure matters as much as tooling.

Key principles discussed:

Align configuration structure with team ownership
Centralize configuration while keeping environments explicit
Keep related files close together (“proximity matters”)
Optimize for developer experience, not just correctness

Using ArgoCD, deployments become automated, auditable, and consistent — but only when GitOps is treated as both a technical and organizational design.