Lars Winstand

Posted on Jun 10 • Originally published at standardcompute.com

I think the best OpenAI API alternative for customer email is a 4-step draft workflow, not an “AI employee”

#ai #api #automation #developers

I clicked a Reddit thread because the title was so bad I assumed the post would be useless.

It was basically: fire your staff, replace them with OpenClaw.

The post had a score of 0. The comments were roasting it. Fair enough.

But buried inside the worst possible framing was a genuinely solid pattern for customer email automation:

read inbound email
fetch live pricing / order / inventory data
draft a reply
escalate weird cases

That’s it.

Not an AI employee.
Not a digital teammate.
Not a fake support rep with “human-level reasoning.”

Just a narrow workflow with live system access.

And honestly, that’s the part a lot of teams miss.

If you’re building support automation, the useful unit is usually not “replace support.” It’s “automate the boring 30% safely.”

The useful idea: shrink the task surface

The line from the thread that actually mattered was this:

The hard part is the employee has to look up our system for product pricing, orders, inventory, etc. Now OpenClaw can do all of that with CLI and MCP.

That’s the whole game.

The breakthrough is not that OpenClaw became an employee.
The breakthrough is that someone reduced the job to tasks agents can actually do reliably:

classify intent
look up order status
look up pricing
check inventory
draft a response
hand off when confidence is low

That maps cleanly to function calling and MCP.

It does not map cleanly to “handle customer relationships like a human.”

That second version is how you get hallucinated discounts, fake shipping updates, and apology emails about orders that never existed.

Why MCP makes this much more real

For support, the big failure mode is obvious: the model makes stuff up.

Pricing is not in the model.
Your ERP is not in the model.
Today’s inventory count is definitely not in the model.

MCP and function calls fix that by letting the model ask your systems directly.

For a customer email workflow, that means the model can do this:

read the email
decide which tools it needs
call get_order_status(order_id)
call lookup_price(sku)
write a draft using actual data

That is a completely different reliability profile from “answer from prompt context and vibes.”

The safest version does not send anything

This is the rollout pattern I’d recommend to almost everyone:

inbound email comes in
model classifies it
tools fetch live data
model writes a Gmail draft
human reviews and sends

Stopping at draft creation is the key move.

It turns a risky automation project into a review workflow.

That gives you:

side-by-side comparison with human replies
an audit trail
a way to measure accuracy before auto-send
a clean path to enable automation only for low-risk categories

If you skip this phase and go straight to autonomous sending, you’re basically volunteering to debug trust in production.

A practical architecture

Here’s the version I’d actually build.

Step	What happens
1. Classify	Detect intent: order status, pricing question, refund request, inventory check, escalation
2. Retrieve	Call Shopify, NetSuite, Postgres, internal APIs, or CLI tools via MCP/function calling
3. Draft	Generate a Gmail draft with the retrieved facts
4. Review / send	Human approves at first; later auto-send only for safe categories

That’s boring.

Which is why it works.

Example: tool-based support drafting

If you’re using an OpenAI-compatible API, the call shape is straightforward.

{
  "model": "gpt-5.4",
  "tools": [
    {
      "type": "function",
      "name": "get_order_status",
      "description": "Return shipping and fulfillment status for an order"
    },
    {
      "type": "function",
      "name": "lookup_price",
      "description": "Return current pricing for a SKU"
    },
    {
      "type": "function",
      "name": "check_inventory",
      "description": "Return current inventory for a SKU"
    }
  ],
  "input": "Customer asks whether order 18422 has shipped and whether SKU-A13 is in stock. Draft a reply."
}

The important part is not the exact model name.

The important part is the contract:

tools return structured facts
the model drafts from those facts
the workflow decides whether to escalate

Example function contracts

This is the level of specificity I’d use.

type OrderStatusArgs = {
  orderId: string;
};

type OrderStatusResult = {
  orderId: string;
  fulfillmentStatus: "unfulfilled" | "partial" | "fulfilled";
  shipmentStatus: "not_shipped" | "in_transit" | "delivered";
  trackingNumber?: string;
  estimatedDelivery?: string;
};

type PriceLookupArgs = {
  sku: string;
  customerTier?: string;
  currency?: string;
};

type PriceLookupResult = {
  sku: string;
  price: number;
  currency: string;
  discountApplied?: string;
};

If your tools return fuzzy text blobs, your drafts will be fuzzy too.

If your tools return strict structured data, your support pipeline gets much easier to reason about.

Gmail draft creation is underrated

A lot of teams think support automation has to be all-or-nothing.

It doesn’t.

You can create drafts and keep humans in the loop while you evaluate quality.

Minimal Node example:

import { google } from "googleapis";

const gmail = google.gmail({ version: "v1", auth });

function toBase64Url(str: string) {
  return Buffer.from(str)
    .toString("base64")
    .replace(/\+/g, "-")
    .replace(/\//g, "_")
    .replace(/=+$/, "");
}

async function createDraft(to: string, subject: string, body: string) {
  const mime = [
    `To: ${to}`,
    `Subject: ${subject}`,
    "Content-Type: text/plain; charset=utf-8",
    "",
    body
  ].join("\r\n");

  const raw = toBase64Url(mime);

  await gmail.users.drafts.create({
    userId: "me",
    requestBody: {
      message: { raw }
    }
  });
}

That one implementation detail changes the rollout strategy completely.

A local service shape I’d actually ship

If I were wiring this up quickly, I’d split it into three services:

support-router/
  src/classify.ts
  src/policy.ts
mcp-tools/
  src/order-status.ts
  src/pricing.ts
  src/inventory.ts
gmail-drafter/
  src/create-draft.ts

Then the pipeline becomes:

incoming email -> classify -> call tools -> generate draft -> human review

And later:

incoming email -> classify -> call tools -> auto-send for safe intents -> escalate everything else

What I would automate first

I would start with the lowest-risk, highest-repeat categories.

Category	Good candidate?	Why
Order status	Yes	Usually structured, easy to verify, low ambiguity
Inventory check	Yes	Pull from a source of truth and answer directly
Basic pricing	Yes, with guardrails	Fine if pricing rules are clean and customer-specific exceptions are handled
Refund disputes	Not first	Higher risk, policy-heavy, emotional context matters
Wholesale account issues	Not first	Contract terms and negotiated pricing create failure risk
Angry escalation emails	No	Tone and judgment matter more than speed

This is the part that gets lost in the “AI employee” pitch.

Support is not one task.
It’s a pile of tasks with very different risk levels.

Treating them all the same is a design mistake.

The big vendors already chose the boring answer

This is the funny part.

If you listen to the loudest AI people online, everyone is building autonomous workers.

If you look at what Intercom and Zendesk actually sell, they’re mostly building scoped support systems with grounding, simulation, and escalation.

That tells you a lot.

The market already voted.

The winning pattern is not “general AI employee.”
It’s “tight workflow with live data and handoff.”

Cost is where DIY gets weird fast

This is also where a lot of agent projects fall apart.

Support automation sounds cheap until every step hits the most expensive model.

A sane pipeline uses different models for different jobs:

cheap model for intent classification
retrieval/tool layer for facts
stronger model for customer-facing drafting
human review for edge cases

That’s much better than using one giant premium model for every token of every email.

And this is exactly why OpenAI-compatible routing matters.

If your code already talks to an OpenAI-style API, you can swap providers or route between models without rebuilding the whole stack.

For teams running automations all day in n8n, Make, Zapier, OpenClaw, or custom workers, that flexibility matters a lot.

Per-token billing punishes experimentation.
It also punishes long-running agent workflows with lots of intermediate steps.

That’s one reason Standard Compute is interesting here: it gives you an OpenAI-compatible endpoint with flat monthly pricing, so you can run this kind of multi-step workflow without doing token math every five minutes.

That is a much better fit for agent pipelines than treating every classification, tool call, retry, and draft pass like a billing event you need to babysit.

What I’d build with Standard Compute

If I were implementing this stack today with Standard Compute, I’d do something like:

keep my existing OpenAI SDK client
point it at Standard Compute’s API
route cheap classification to a smaller model
route nuanced draft generation to GPT-5.4 or Claude Opus 4.6
keep Gmail drafts as the default output
only auto-send after weeks of parallel review

That gives you:

predictable monthly cost
freedom to test routing strategies
no per-token anxiety while iterating
compatibility with existing agent/automation code

For support workflows, that’s a real advantage because the architecture is inherently multi-step.

A rollout plan that won’t blow up trust

If you want this to work in production, I’d do it in phases.

Phase 1: draft only

classify inbound email
fetch data from systems of record
create Gmail drafts
compare draft quality with human replies

Phase 2: auto-send only for safest intents

order status
inventory checks
simple pricing questions

Phase 3: confidence-based routing

auto-send only when tool outputs are complete and confidence is high
escalate everything else

Phase 4: continuous evaluation

Track:

draft acceptance rate
human edit distance
escalation rate
incorrect factual statements
customer satisfaction by category

If you don’t have those metrics, you’re not operating a support automation system.
You’re just hoping.

The main thing I think people get wrong

The mistake is trying to automate all of support at once.

That usually means automating trust away.

A customer asking whether order 18422 shipped is not the same problem as a wholesale buyer disputing negotiated pricing.

One is a retrieval problem.
The other is a judgment problem.

Good agent systems respect that difference.
Bad ones flatten everything into “the model will handle it.”

It won’t. Not reliably.

My take

The Reddit post had terrible framing.

But the implementation idea inside it was good.

The best OpenAI API alternative setup for customer email is usually not a full AI employee.
It’s a bounded workflow:

read inbound email
fetch live data through MCP or function calls
draft the reply
escalate edge cases

That is much less glamorous than “replace your team.”

It is also much closer to something I’d actually trust in production.

And if you’re running this as an always-on automation, predictable cost matters almost as much as model quality.

That’s why the combo I like is:

narrow workflow design
OpenAI-compatible API calls
multi-model routing
draft-first rollout
flat-cost infrastructure like Standard Compute for the actual agent runtime

That’s not a flashy story.

It’s just the version that survives contact with real support operations.

DEV Community