DEV Community: Vishal Kumar Singh

Why This Backend Engineer Stopped Calling LLM APIs From Every Service And Started Running a Local Agent Instead

Vishal Kumar Singh — Tue, 21 Apr 2026 13:09:09 +0000

This is a submission for the OpenClaw Writing Challenge.

The Problem Every Backend Team Has Right Now

If you run a Java or Node backend in 2026, your architecture diagram probably has a
new blob labelled "LLM" with an arrow pointing to it from every second service.
Someone added an OpenAIClient bean six months ago to summarise tickets. Someone
else added an AnthropicClient to rewrite customer emails. The data science team
spun up their own proxy because they wanted Gemini for embeddings.
From an engineering perspective, this is the same mess we cleaned up a decade ago
with Kafka and service meshes, just with a different label. Every team re-implements:

Retry, timeout, and circuit-breaker policy around an external API
Secret management for a rotating pile of vendor keys
Audit logs of "what prompt went out, what came back, who paid for it"
Context collection -- pulling email, calendar, docs, tickets -- over and over I wanted a different pattern: one local daemon that owns the model connection and the context, and every app in my personal workflow talks to it over a stable API. That is exactly what OpenClaw is. This post is a backend engineer's mental model of OpenClaw, how I set it up, and why I think the "AI as a local gateway" pattern is about to eat a lot of the LLMClient beans we have been writing. ## What OpenClaw Actually Is (In Backend Terms) OpenClaw is an open-source personal AI agent you run on your own machine. Strip away the marketing and the shape is very familiar:
A daemon listening on localhost:18789 (they call it the Gateway)
A control plane UI in the browser for config, skills, and chat
A pluggable model backend -- you bring your own Anthropic, OpenAI, Google, Ollama, or local model key
A channel adapter layer that speaks Slack, Discord, Telegram, WhatsApp, Signal, Teams, and a few more
A skills abstraction -- small, composable units of capability that the agent can invoke If you squint, this is a little personal API gateway plus a rules engine plus a provider-agnostic LLM client. The thing that makes it interesting is that the context and the policy live on your box, not in a vendor tenant. ## The Five-Minute Install Node 24 is the recommended runtime (22.14+ also works). On macOS/Linux/WSL2:

curl -fsSL https://openclaw.ai/install.sh | bash

On Windows PowerShell:

iwr -useb https://openclaw.ai/install.ps1 | iex

Or just npm i -g openclaw@latest if you already run a Node toolchain. Then:

openclaw onboard --install-daemon
openclaw gateway status     # should show listening on 18789
openclaw dashboard          # opens the Control UI

You paste in an API key for whichever provider you already have a budget with --
I used Anthropic for authoring-heavy work and a small Ollama model for anything
that touches private data. That switch is a single field in the dashboard, not a
code change in your apps.

The Mental Model That Clicked For Me

I kept trying to think about OpenClaw as "just another chat UI", and none of it
made sense until I flipped the frame. Here is the frame that worked:

OpenClaw is a sidecar for your digital life. Each skill is a tiny service.
Each channel is a transport. You -- the human -- are the load balancer.
Once I thought about it that way, every design decision made sense:

Why a local daemon? Because sidecars have to be co-located. Latency, trust, and data locality all get much easier when the agent is on the same host as the secrets it needs.

Why skills instead of one giant prompt? Because production services are composed of small, testable units with clear contracts. Prompts without boundaries become the stored procedures of the AI era.

Why so many channels? Because in a sidecar model, transport is cheap. Slack, Telegram, CLI -- all just ingress to the same brain. ## A Concrete Setup For A Backend Engineer Here is the layout I actually ended up with after two evenings of tinkering. It mirrors how I would structure an internal platform team's "AI access layer" at work, just scaled down to one human. Providers

Primary model: Anthropic Claude (authoring, refactoring, design docs)

Private model: a local Ollama qwen variant for anything that references real customer data, internal service names, or unpublished work

Embeddings: provider-native on whichever model the skill is pointed at, so I do not have to babysit a second key Skills I wrote or enabled

jira-triage -- reads a ticket URL, extracts repro steps, asks clarifying questions, drafts a reply. Rule: never posts automatically. Always returns a draft to me.

pr-review-prep -- given a PR URL, pulls the diff, identifies risky files using a small heuristic (touches application-prod.yml, changes a migration, edits a security filter), and writes a review checklist.

weekly-brag -- every Friday, scrapes my GitHub activity, merged PRs, and Medium drafts, and emits three bullets in "impact / action / metric" form for my running brag doc. Feeds my promotion file instead of dying in chat history.

calendar-debrief -- pre-meeting skill that pulls the calendar event, finds the last three email threads with the attendees, and gives me a two-sentence primer. Channels

Slack in a private workspace just for me (@claw ping me when X)

CLI (openclaw ask "...") for terminal-first work

iMessage for phone-originated captures Policies

Any skill that would write to an external system (post a ticket comment, send an email, push code) must return a draft instead of executing, by default

Any skill that touches *.mmt.internal or files under ~/work/ must use the local model, never a hosted one

Every agent run writes a JSON line to ~/.openclaw/audit.log -- same spirit as any access log you would keep in a regulated system That last one is worth emphasising. If you think of AI usage as traffic, then everything you already know about traffic applies: log it, budget it, rate-limit it, and be able to answer the question "what did this system say and do, on my behalf, yesterday?" ## What Running This For A Week Actually Felt Like I have opinions. ### The good

The context stays put. I stopped copy-pasting ticket text into a chat window. My Slack thread with @claw already has what it needs. This is the biggest quality-of-life change, and it is underrated.

Multi-model is a config field. The fact that switching from a hosted model to a local one was a radio button meant I actually did it for sensitive work, rather than promising myself I would.

The skill abstraction composes. pr-review-prep calls jira-triage when a PR mentions a ticket ID. They were written days apart and nothing broke. This is the composition story that "custom GPTs" never quite nailed because the boundaries were not proper ones. ### The sharp edges

Skill authoring is still more art than engineering. You are writing prompts, tool definitions, and fallback logic in a format that is evolving quickly. Expect to refactor your own skills every few weeks. Treat them like any other code: version them, review them, and write little acceptance notes.

Observability is young. The audit log is there, but the analytics around "how often does this skill run, how much does it cost, which provider is flaking" are not what you get from a mature API gateway. I ended up writing a tiny shell script to jq the audit log into a weekly summary.

Provider drift is real. Two of my skills broke when I switched models because one provider is stricter about tool-call JSON. Treat the model as an external dependency: pin versions where you can, write a smoke test where you cannot. ### The thing I did not expect I write a lot of Java for a living. The most surprising win was using OpenClaw as the ingress for design-doc drafting from my phone. I dictate a messy idea in iMessage while walking, a skill cleans it up into a structured doc template, and by the time I am back at my laptop it is a well-formed markdown file waiting in ~/docs/drafts/. That workflow does not exist in the "open a chat UI, paste, copy back" model. It only shows up when the agent lives on your side of the API. ## What I Would Tell Another Backend Engineer Before They Start If you already run services for a living, you will underestimate how much of your existing mental model transfers. Here is the short version:

Treat skills like services. Name them clearly, give them inputs/outputs, version their prompts, write down their failure modes. A "prompt with tool use" is just a service with two backends (the model and the tool).

Put a policy at the edge. Decide early which skills are allowed to talk to the hosted model, which must use local, and which are draft-only. It is much harder to retrofit.

Log everything. The audit log is the single most reassuring file on my machine right now. Future-you will want to know what past-you authorised.

Do not try to replace your team's AI strategy with your personal one. OpenClaw is a sidecar for you. If your employer has an enterprise story, use it for work. Use OpenClaw for the long tail of personal and side-project work where you are the CISO, the SRE, and the user all at once. ## Where I Think This Pattern Goes I have been building backend systems long enough to recognise a shape when it repeats. We went from "every app has its own auth" to OAuth gateways. We went from "every app has its own cache" to Redis-as-a-service. We went from "every app has its own queue" to Kafka. Each time, the shared thing moved to a well-defined, pluggable gateway, and app code got smaller and clearer. AI is on the same curve. The "every app embeds an OpenAIClient" phase is the first draft. The next phase is a gateway that owns the provider connection, the context, the policy, and the audit trail. For enterprises, that gateway will be an internal platform service. For me, personally, it is OpenClaw. If you have been waiting for the right time to stop pasting ticket text into a chat window, this is that time. Install the daemon, write one skill that replaces one copy-paste workflow, and see what happens. Budget an hour. ## Appendix: The One-Hour Starter Skill If you want to actually try this today, here is the smallest useful skill I wrote and how I structured it. This is the kind of thing that takes an evening and pays for itself in a week. Full source (with tests and CI) lives at github.com/singhvishalkr/pr-review-prep. An OpenClaw skill is a directory with a SKILL.md at the root. Everything the agent needs to know about when to use the skill is in the YAML frontmatter; everything it needs to know about how is in the markdown body. That's the whole format.

pr-review-prep/
├── SKILL.md              # frontmatter (name/description) + markdown guidance
├── scripts/
│   └── risk-scan.sh      # deterministic heuristic, ~60 lines of bash
└── test/
    ├── run.sh            # 5 unit tests, zero deps
    └── fixtures/         # file-list + PR-body fixtures

Here is the heart of SKILL.md. The frontmatter is what OpenClaw reads to
decide when this skill should fire. The metadata.openclaw.requires block is
what lets the dashboard offer to brew install gh for you the first time you
invoke it.

---
name: pr-review-prep
description: "\"Prep a GitHub pull-request review by pulling the diff with `gh`,"
  flagging risky files via heuristics, and emitting a reviewer checklist.
  Use when: (1) the user pastes a GitHub PR URL and asks for a review/checklist,
  (2) the user wants a pre-review summary before a 1:1 code review.
  NOT for: merging PRs, posting review comments directly to GitHub, or
  non-GitHub PRs (GitLab/Bitbucket)."
metadata:
  openclaw:
    emoji: "🦞"
    requires: { bins: ["gh", "bash"] }
    install:
      - id: brew
        kind: brew
        formula: gh
        bins: [gh]
        label: "Install GitHub CLI (brew)"
---

The key design choice: the risk detection is a bash script, not a prompt.
Here is the relevant slice of scripts/risk-scan.sh:

if grep -Eq '(^|/)application(-[a-zA-Z0-9]+)?\.ya?ml$' "$FILES"; then
  flag "config-change — confirm per-env overrides exist"
fi
if grep -Eq 'Migration\.java$|(^|/)migrations/|\.sql$' "$FILES"; then
  flag "db-migration — check rollback and dual-write plan"
fi
if grep -Eq '(^|/)(security|auth|authz|authn)/' "$FILES"; then
  flag "security-sensitive — require second reviewer"
fi

Every rule is one line of grep -E you can read in git. The LLM then composes
the checklist prose from those flags. That split matters: your organisation's
risk model is the thing that should be visible in git blame, not hidden
inside a prompt string an agent loaded at startup.
The repo ships with 5 unit tests (bash test/run.sh) and a GitHub Actions
workflow that runs shellcheck plus the tests on every push. Boring choices.
That is the point: we already know how to make small pieces of infrastructure
trustworthy, and a skill is a small piece of infrastructure.
Happy clawing. 🦞

I Built a Production-Grade Microservice That Does Absolutely Nothing

Vishal Kumar Singh — Mon, 06 Apr 2026 18:13:15 +0000

This is a submission for the DEV April Fools Challenge: Silly Software

What I Built

Teapot-as-a-Service (TaaS) — a Spring Boot microservice whose sole purpose in life is to refuse your coffee brewing requests by returning HTTP 418 I'm a teapot. Every. Single. Time.

It's built with the same architectural rigor you'd use for a payment processing system at a Fortune 500 company. Except instead of processing payments, it processes existential crises about being a teapot.

Demo

singhvishalkr / teapot-as-a-service

Production-grade, cloud-native, enterprise teapot infrastructure. RFC 2324 compliant. Zero business value. DEV April Fools 2026.

Teapot-as-a-Service (TaaS) ☕→🫖→418

Production-grade, cloud-native, enterprise teapot infrastructure. RFC 2324 compliant. Zero business value. Maximum over-engineering.

What is this?

A Spring Boot microservice whose sole purpose is to refuse coffee brewing requests by returning HTTP 418 I'm a teapot. Built with the same architectural rigor you'd use for a payment processing system — circuit breakers, health checks, Prometheus metrics, scheduled self-affirmation — except it does absolutely nothing useful.

This is my submission for the DEV April Fools Challenge 2026.

Architecture

┌─────────────────────────────────────────────────────────┐
│                    TaaS v418.0.0-ENTERPRISE              │
├─────────────────────────────────────────────────────────┤
│  Controller Layer     → Accepts brew requests            │
│  Validation Layer     → Validates (then rejects anyway)  │
│  Brewing Engine       → Refuses with style               │
│  Exception Handler    → Always returns 418               │
│  Health Indicator     → "Am I still a teapot? Yes."      │
│  Self-Affirmation     → Logs motivational quotes q/30s   │
│  Prometheus Metrics   → Tracks refusal rate per second   │

…

View on GitHub

Try it yourself:

curl -X POST http://localhost:4180/api/v1/brew \
-H "Content-Type: application/json" \
-d '{"beverageType":"coffee","temperatureCelsius":92,"volumeMl":250,"additive":"none"}'

What you get back:

{
"status": 418,
"error": "I'm a teapot",
"message": "I'm a teapot. I refuse to brew coffee. This is not a negotiation.",
"incidentId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"rfcReference": "https://datatracker.ietf.org/doc/html/rfc2324",
"timestamp": "2026-04-06T18:00:00Z",
"metrics": {
"totalRefusals": 42,
"refusalRatePerSecond": 0.0023,
"uptime": "0d 5h 12m 33s",
"teapotMood": "Smugly Ceramic"
}
}

Every refusal comes with a unique incident ID (for your records), an RFC reference (for compliance), and the teapot's current mood. Because enterprise software needs observability, even when it's refusing to do anything.

Prize Category

Best Ode to Larry Masinter — This entire project is a love letter to RFC 2324 and the HTTP 418 status code. Larry Masinter authored the original "Hyper Text Coffee Pot Control Protocol" RFC in 1998 as an April Fools' joke. Twenty-eight years later, I've built the production infrastructure his joke deserves.

How I Built It

The "Problem"

Every enterprise needs a teapot. But not just any teapot — a cloud-native, RFC-compliant, observable, self-affirming teapot. The kind of teapot that would pass a MAANG system design interview.

The Architecture

I used Java 21 and Spring Boot 3.4 because if you're going to over-engineer something, you should use the most enterprise framework available. Here's what's inside:

The Brewing Engine — Accepts your coffee request, validates it against bean validation constraints, then refuses it with a randomly selected passive-aggressive message. The validation is the best part: it checks your beverage type, temperature (per ISO 3103), volume, and tea additive... and then rejects you anyway.

The Health Indicator — Spring Actuator health check that confirms, every 30 seconds, that the service is still a teapot. Key health details include canBrewCoffee: false, willBrewCoffee: false, and shouldBrewCoffee: "absolutely not".

Prometheus Metrics — Tracks taas.brew.refusals with the tag reason: identity_crisis_averted. You can graph your refusal rate per second in Grafana. For when you need a dashboard showing how much coffee you're not making.

The Self-Affirmation Scheduler — Every 30 seconds, the teapot logs a motivational message to itself:

[SELF-AFFIRMATION] I am a teapot. I am enough.
[SELF-AFFIRMATION] My handle is strong. My spout is true.
[SELF-AFFIRMATION] No amount of POST requests will change who I am inside.
[SELF-AFFIRMATION] I am not broken. I am 418.

Because even teapots need mental health support in production.

The Global Exception Handler — No matter what goes wrong, the response is always 418. Server error? 418. Validation failure? 418. The heat death of the universe? Believe it or not, also 418.

What Makes It Enterprise-Grade

Version: 418.0.0-ENTERPRISE (semver, but make it dramatic)

Port: 4180 (418 + 0, obviously)

SLA: 99.999% refusal uptime guaranteed

Compliance: RFC 2324, RFC 7168, ISO 3103, and SOC2_TEAPOT (I made that last one up)

Docker support: Because even useless software deserves containerization

6 unit tests: All passing. All confirming the teapot refuses coffee. All completely unnecessary.

The ASCII Banner

When you start the service, you're greeted with a full ASCII art "TEAPOT" banner, the version number, and the motto: "I'm short, I'm stout, and I'm not brewing your coffee."

What I Learned

Bean validation is hilarious when applied to beverages. The error message "Only coffee variants are supported — this is a TEAPOT, after all" brings me joy every time.

Spring Actuator health checks can have existential depth. shouldBrewCoffee: "absolutely not" is technically a valid health detail.

The best code is code that does nothing, perfectly. Zero bugs in production because there's zero business logic. This is the dream.

RFC 2324 is genuinely funny. It defines BREW and WHEN HTTP methods, a Content-Type: message/coffeepot header, and the 418 status code. Larry Masinter was ahead of his time.

The Repo

Everything is open source under MIT: github.com/singhvishalkr/teapot-as-a-service

Clone it. Run it. Try to brew coffee. I dare you.

git clone https://github.com/singhvishalkr/teapot-as-a-service.git
cd teapot-as-a-service
./mvnw clean package
java -jar target/teapot-as-a-service-418.0.0-ENTERPRISE.jar

The teapot will be waiting. And it will say no.