Open Craft

Posted on Jun 11 • Originally published at ocraft.id

AI Workflow Automation for Operations Teams: It's Not a Platform, It's Plumbing

#ai #automation #devops #productivity

Most vendor material on this topic opens with transformation narratives. I'll start somewhere else: the ticket queue, the approval chain, the three Slack threads duplicating the same status update. That's where operations automation lives. It's not glamorous. It's plumbing.

This piece covers what AI workflow automation actually looks like for enterprise ops teams, which use cases pay off fastest, how to sequence a rollout without creating a mess, and whether the work is worth it.

1. What operations automation actually means for enterprises

Not a unified AI platform that rewrites how your company operates.

A layer of lightweight agents and triggered scripts that handle the repetitive handoffs between systems your team already uses: ticketing, ERP, CRM, communication channels, approval workflows.

The distinction matters because it changes what you build. A "platform" framing leads to months of architecture work before anything runs in production. A "plumbing" framing means you ship a working bot that routes escalation tickets in week two, then extend it.

AI automation earns its keep in three areas:

Structured data handoffs: moving information between systems with a defined schema (PO line items into ERP, incident metadata into Jira, shift change into the HR system)
Decision routing: classifying inbound requests and sending them to the right queue, team, or escalation path without a human in the loop
Status aggregation: pulling from five sources and generating one summary instead of five people writing five updates

It does not reliably replace judgment calls that require political context, novel situation handling, or accountability that can't be delegated.

2. High-impact use cases across operations

These are the use cases where teams consistently see payback in under 90 days. None require a custom model.

Incident triage and routing. An LLM reads inbound support or ops tickets, classifies severity and category, and writes the Jira issue with relevant fields populated. A human still reviews the critical path. The AI handles the upstream 80%.

Approval workflow summarization. Procurement, IT change management, and HR workflows all involve a human reading a document and approving or rejecting it. An agent can summarize the document, flag policy deviations, and surface the key decision point, cutting review time from 20 minutes to two.

Runbook execution. For well-documented operational procedures, an agent can walk the steps, call APIs, and log results. The output is an audit trail. This is particularly useful for overnight or weekend on-call scenarios.

Lead and vendor data enrichment. Inbound form fills or vendor submissions that require manual lookup in three systems get an automated enrichment pass before they hit a human queue. Clearbit and similar APIs are common data sources; the AI layer handles synthesis and scoring.

3. A practical rollout: where to start

Step 1: map the handoffs, not the workflows

Before writing code, spend two hours with an ops team lead listing every recurring manual step that moves data or status between systems. You want handoffs, not full workflows. A handoff looks like: "Someone reads a form submission and creates a Jira ticket." That's automatable. "Someone decides whether we take on a new enterprise client" is not.

End with a ranked list, shortest time-to-value first.

Step 2: build one agent, end-to-end

Pick the top item and ship it. Here's a minimal ops triage agent using the Anthropic SDK:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

async function triageTicket(rawTicketText: string) {
  const response = await client.messages.create({
    model: "claude-sonnet-4-6",
    max_tokens: 512,
    messages: [
      {
        role: "user",
        content: `You are an operations triage agent. Classify this ticket and return JSON.

Ticket:
${rawTicketText}

Return: { "category": string, "severity": "low"|"medium"|"high"|"critical", "suggested_owner": string, "summary": string }`,
      },
    ],
  });

  const text =
    response.content[0].type === "text" ? response.content[0].text : "";
  return JSON.parse(text);
}

Run it against a sample ticket:

$ npx ts-node triage.ts

Input: "Production DB replica lag > 5 minutes, alerts firing since 03:40 UTC"

Output:
{
  "category": "infrastructure/database",
  "severity": "high",
  "suggested_owner": "platform-oncall",
  "summary": "DB replica lag exceeding threshold since 03:40 UTC"
}

The agent doesn't page anyone. It writes the structured record. A human or a downstream automation decides what happens next. That separation is what makes this safe to ship.

Step 3: wire it to your actual data path

Once the classification logic is stable, connect it to your real inbound channel. If tickets come in via email:

# Minimal inbound listener using a webhook relay
curl -X POST https://your-ops-relay.internal/tickets \
  -H "Content-Type: application/json" \
  -d '{"raw_text": "...", "source": "email", "received_at": "2026-06-11T03:40:00Z"}'

The relay calls triageTicket(), writes the result to your ticketing system's API (Jira, Linear, ServiceNow all have REST endpoints), and logs the AI output alongside the original for auditability.

Step 4: instrument before you expand

Before building agent two, add structured logging to agent one. You want classification accuracy, time from inbound to routed, and human override rate. Without this, you're flying blind when something goes wrong.

A simple log schema:

{
  "ticket_id": "OPS-1042",
  "ai_category": "infrastructure/database",
  "human_override": false,
  "override_category": null,
  "latency_ms": 840,
  "model": "claude-sonnet-4-6",
  "timestamp": "2026-06-11T03:41:22Z"
}

Human override rate above 15% means the model is miscategorizing enough to warrant prompt tuning or a training data audit. Below 5% and you can probably extend to the next handoff.

4. Measuring ROI and avoiding common pitfalls

What to measure

ROI on ops automation is not hard to quantify if you log the right things from day one:

Time reclaimed per handoff: clock how long the manual version takes, then subtract the human review time after automation
Error rate before and after: misrouted tickets, missed escalations, duplicate entries
Queue cycle time: time from inbound to resolved, by category

Most teams see 30-60% reduction in queue cycle time for structured handoffs within the first quarter. The gains are real. They're just not the kind you put in a press release.

Pitfalls that reliably show up

Overpromising scope to stakeholders. An agent that routes tickets is not an agent that resolves incidents. Conflating the two creates expectation debt you'll spend months unwinding. Define the automation boundary in writing before you go to production.

No human override path. Every automated decision needs a one-click override and an audit log. Not because the AI is unreliable, but because compliance, incident retrospectives, and edge cases will require it. Build the override UI before you go live, not after the first escalation.

Prompt drift. The prompt you wrote in week one will not hold up in month six. As ticket vocabulary changes, as new systems come online, as team structure shifts, the agent's classification will degrade without maintenance. Schedule a quarterly prompt review the same way you'd schedule dependency updates.

Stalling after agent two. The first agent is interesting. The second is useful. Agents three through ten are where the real operational leverage is, and they're tedious to build. Teams that treat this as a project rather than a practice stall out after agent two. Assign an owner, not just a project.

Treating the LLM as a database. If you're asking the model to recall specific facts about past tickets, customers, or incidents, you will get hallucinations. Route retrieval to a real database or a vector store with grounded context. The LLM handles synthesis and generation; structured data stays in structured systems. See the Anthropic docs on tool use for how to wire this.

Takeaway

The payoff comes not from buying a platform but from treating this as plumbing: find one manual handoff, ship an agent that handles it with a human override, log everything, and extend. Eliminating five or ten recurring friction points across a quarter is where the real business case is. If you want to see how this maps to a broader AI transformation program, the full framework is on the Opencraft blog.

DEV Community