Why Agentic AI Changes Everything for Brisbane DevOps Teams — An Infrastructure Perspective
Agentic AI is not a smarter chatbot. It is a fundamentally different workload — and if your current infrastructure stack was designed around traditional LLM inference, you are already behind. The AMD and Red Hat announcement of May 2026 makes this explicit: agentic AI systems that reason, plan, execute multistep tasks, and call external tools continuously require a new infrastructure stack from the ground up. For Brisbane DevOps and platform engineering teams running Kubernetes-based workloads on OpenShift, EKS, or GKE, this is not a future problem. It is a now problem.
What Happened
AMD and Red Hat Draw a Hard Line
Red Hat's May 2026 blog post, published in partnership with AMD, makes a clear architectural argument: traditional inference infrastructure — optimised for single-shot, stateless model calls — is structurally mismatched to agentic workloads. Agents call models repeatedly in loops, fan out across tools and data sources, maintain state across multistep reasoning chains, and must run continuously and cost-effectively at scale.
The implication is direct. The infrastructure stack that got your team through the first wave of LLM integration — a GPU node here, a FastAPI wrapper there, maybe a Helm chart for a model server — will not survive contact with production agentic workloads. Red Hat is positioning OpenShift AI alongside AMD Instinct accelerators as the answer, but the deeper message is architectural, not vendor-specific.
Why It Matters for DevOps and Platform Engineering
Agents Are a Continuous Workload, Not a Batch Job
The single most disruptive characteristic of agentic AI for infrastructure teams is continuity. Traditional inference is stateless and bursty — a request comes in, a model responds, the compute is released. Agents are stateful and long-running. They maintain context, hold tool call queues, and re-enter reasoning loops. This breaks standard autoscaling assumptions built into KEDA and Kubernetes HPA, because the signal for "busy" is no longer just QPS — it is reasoning depth, tool latency, and memory pressure across multi-turn sessions.
Observability Gaps Become Critical Failures
Your existing Prometheus and Grafana dashboards were not built for agent trace visibility. An agent that silently loops, fails a tool call, or degrades in reasoning quality will not trigger a CPU alert. OpenTelemetry is the right foundation here, but instrumentation must extend into the agent framework layer — LangChain, LlamaIndex, or AutoGen — capturing span-level tool invocations, token consumption per reasoning step, and inter-agent message latency. Without this, your SRE team is flying blind on a workload that can burn GPU budget in minutes.
GitOps and IaC Assumptions Break at the Model Layer
Teams managing infrastructure with ArgoCD and Terraform are accustomed to declarative, idempotent deployments. Agent systems introduce non-determinism at the application layer that bleeds into infrastructure decisions. Model versions, context window sizes, and tool configurations all affect infrastructure sizing — and they change frequently. Your IaC pipeline needs to treat model configuration as a first-class infrastructure variable, not an application concern left to the data science team.
Impact on Brisbane Teams
The Local Talent and Tooling Gap
Brisbane's DevOps market is deep in Kubernetes and cloud-native tooling, but MLOps and LLMOps expertise remains genuinely scarce. According to Seek data from Q1 2026, ML/AI engineering roles in Queensland were advertised at three times the rate of available candidates. The practical consequence: most Brisbane platform teams will be asked to run agentic AI infrastructure before they have hired anyone with production LLMOps experience. That means the platform engineering team owns this problem whether they signed up for it or not.
OpenShift Shops Have a Head Start
Brisbane enterprises running Red Hat OpenShift — common across Queensland government, resources, and financial services — have a concrete advantage here. The AMD and Red Hat integration ships with OpenShift AI operator support, meaning the serving layer, model registry, and pipeline orchestration are available within an existing GitOps-managed cluster. For teams already using ArgoCD and OpenShift Pipelines, the onboarding path to a production-grade agentic AI stack is shorter than starting from raw EKS.
FinOps Complexity Multiplies
Agentic workloads are cost unpredictable by design. A single agent session can trigger dozens of model calls, each with variable token depth. Brisbane teams with mature FinOps practices — tagging strategies in AWS or Azure, budget alerts, Spot instance policies — will need to extend those controls to cover per-agent-session cost attribution. Without this, the first production agentic workload will blow a cloud budget in a way that is genuinely difficult to explain to a CFO.
What To Do Now
Audit Your Current Stack Against Agentic Requirements
This week, map your existing infrastructure against three agentic AI requirements: stateful workload support (do your Kubernetes configs handle long-running sessions?), deep observability (does your OpenTelemetry instrumentation reach the model and tool call layer?), and cost attribution (can you tag and track spend per agent session?). Identify the gaps before a project lands on your backlog.
Instrument One Agent Framework End-to-End
Pick one agent framework — LangChain is the most common entry point — and instrument it fully with OpenTelemetry, exporting spans to your existing Grafana stack. This gives your SRE team a realistic picture of what agentic workload visibility actually looks like before you are under production pressure.
Treat Model Config as Infrastructure
Add model version, context window size, and tool manifest to your Terraform or Helm values files today. Enforce review through your existing GitOps pipeline. This costs almost nothing now and prevents a class of production incidents later.
Engage the AMD-Red Hat Stack Specifically
If your team runs OpenShift, the OpenShift AI operator with AMD Instinct GPU support is worth a proof-of-concept this quarter. The integration is mature enough for non-production validation and the architectural patterns it enforces — model serving via vLLM, pipeline orchestration via Kubeflow Pipelines — are transferable even if you later diverge from the vendor stack.
Frequently Asked Questions
What is the difference between agentic AI and traditional LLM inference infrastructure?
Traditional LLM inference handles stateless, single-turn requests — one input, one output, compute released. Agentic AI infrastructure must support stateful, multistep reasoning loops where a model calls tools, processes results, re-enters reasoning, and maintains context across many model invocations in a single session. This requires different autoscaling, observability, and cost management approaches.
Does Kubernetes support agentic AI workloads out of the box?
Not without modification. Standard Kubernetes autoscaling (HPA, KEDA) is optimised for request-based scaling signals. Agentic workloads require custom metrics — reasoning session depth, tool call queue length, memory pressure — to scale correctly. Teams need to instrument their agent frameworks with OpenTelemetry and expose custom metrics to the Kubernetes control plane.
How does OpenShift AI fit into an agentic AI infrastructure stack?
OpenShift AI provides a Kubernetes-native operator stack covering model serving (via vLLM), model registry, pipeline orchestration (Kubeflow Pipelines), and GPU resource management. For teams already running OpenShift with ArgoCD-based GitOps, it extends the existing declarative infrastructure model to the AI serving layer, reducing the operational gap between platform engineering and ML workloads.
What should Brisbane DevOps teams prioritise first when preparing for agentic AI?
Observability first. Before you optimise compute or redesign autoscaling, instrument your agent framework with OpenTelemetry so you can see what is actually happening inside a running agent session. Blind optimisation of agentic infrastructure is expensive and slow. Visibility gives your team the data to make every subsequent decision correctly.
Frequently Asked Questions
What is the difference between agentic AI and traditional LLM inference infrastructure?
Traditional LLM inference handles stateless, single-turn requests — one input, one output, compute released. Agentic AI infrastructure must support stateful, multistep reasoning loops where a model calls tools, processes results, re-enters reasoning, and maintains context across many model invocations in a single session. This requires different autoscaling, observability, and cost management approaches.
Does Kubernetes support agentic AI workloads out of the box?
Not without modification. Standard Kubernetes autoscaling (HPA, KEDA) is optimised for request-based scaling signals. Agentic workloads require custom metrics — reasoning session depth, tool call queue length, memory pressure — to scale correctly. Teams need to instrument their agent frameworks with OpenTelemetry and expose custom metrics to the Kubernetes control plane.
How does OpenShift AI fit into an agentic AI infrastructure stack?
OpenShift AI provides a Kubernetes-native operator stack covering model serving via vLLM, model registry, pipeline orchestration via Kubeflow Pipelines, and GPU resource management. For teams already running OpenShift with ArgoCD-based GitOps, it extends the existing declarative infrastructure model to the AI serving layer, reducing the operational gap between platform engineering and ML workloads.
What should Brisbane DevOps teams prioritise first when preparing for agentic AI?
Observability first. Before you optimise compute or redesign autoscaling, instrument your agent framework with OpenTelemetry so you can see what is actually happening inside a running agent session. Blind optimisation of agentic infrastructure is expensive and slow. Visibility gives your team the data to make every subsequent decision correctly.
Top comments (0)