DEV Community: WaveAssist

Your Engineers Documented This Week. You Just Can't Read It.

WaveAssist — Tue, 05 May 2026 12:37:26 +0000

Your engineering team made 312 commits last week, across 47 pull requests, touching 18 services. The CEO couldn't tell you what any of it does. Sales is still demoing last quarter's roadmap. Product's standup hasn't mentioned a real work item in three weeks.

The gap isn't that engineers aren't documenting. They documented 312 times.

The gap is that nobody else in your company can read what they wrote.

The Cliché Just Flipped

"Engineers don't document" was the most repeated myth in software. AI just turned it backwards.

Engineers ship the most precise documentation possible every single day: the code itself. A diff is the unambiguous record of what changed, why a feature works, which edge case got handled. Commit messages and PR descriptions are summaries. The code is the source of truth. A team of ten engineers ships thousands of lines of this every week, authored, timestamped, linked, and reviewable.

The documentation was never missing. It was just unreadable to anyone who didn't write code for a living. What changed is that AI can finally read code and explain it in plain English. The reader showed up.

The Raw Material Is Already Written

The diffs your team shipped this week already tell you, without asking anyone:

Which bugs got fixed (support needs this)
Which features are in flight (sales needs this)
Which infrastructure is being hardened (the CEO needs this)
Which experiments are being tried (product needs this)
Who's blocked on what (managers need this)

Every one of those signals is sitting in the code your team shipped. No engineer has to write one extra sentence. The work of documentation happened the moment the code was committed.

Same Repo, Five Different Questions

Sales: "What can I demo this month that I couldn't last month?"
Support: "Which commits this week might explain the ticket spike I'm seeing?"
Marketing: "Which of these is worth drafting a launch post for when it ships?"
CEO: "Is this week's work actually aligned with the quarter's goals, or are we drifting?"
Product: "Which commits touched the checkout flow I own?"

Same git history. Five different lenses. Nobody has time to read the commit log five times through five different filters, so in practice nobody reads it at all, and each team finds out when a customer complains.

Commits, Not Releases

Release notes are for customers, and by the time you're writing them it's already late.

GitDigest works one layer earlier, at the commit level: what the team is building, not what just shipped. A commit happens when the work is done, days or weeks before it reaches production. That lead time is the whole point.

By the time something ships:

Sales needs to already know it was coming.
Support needs to already be briefed.
The CEO needs to already have decided whether to launch it.

Commit level visibility gives you that lead time. Release level visibility gives you surprise.

How GitDigest Reads It For You

GitDigest runs every Monday at 9am. It reads the actual code your team shipped (the diffs themselves, not just the commit messages and PR titles) and writes five distinct digests:

Engineering detailed (for the eng org itself)
Sales ready (what's demoable, what's landing)
Support relevant (bug fixes and behavior changes)
Exec level (alignment with the quarter's goals)
Product by area (filtered to the surface each PM owns)

Each one lands in the right Slack channel or inbox.

Same shape. Same day. Every week. It doesn't skip a week because someone was on PTO. It doesn't forget the quiet infrastructure commits that turn out to matter. It doesn't need the VP of engineering carving out Friday afternoons writing a weekly update nobody reads anyway.

The work of documentation was already done, 312 times, in code. GitDigest is the translator, not the author.

See what a digest actually looks like: Sample GitDigest report (PDF).

The Bottom Line

Your engineers don't need to write another weekly update. They already wrote it, 312 times this week, in code.

You just needed something to read it for you.

Your AI Wrote the Code. Who's Reviewing It?

WaveAssist — Tue, 05 May 2026 12:33:55 +0000

GitHub Copilot now generates roughly 46% of the code in files where it's active. Cursor and Claude Code are writing whole pull requests start to finish. The fastest growing commit type in 2026 is one no human actually typed, and the review process built for code typed by humans is buckling.

Two things are breaking at once, and they compound.

Problem 1. The Volume Surge

A senior engineer used to type maybe 200 lines a day. An agent produces 2,000. Multiply that across a team and the review queue stops being a queue. It becomes a backlog you skim. Approvals get faster. Comments get shorter. The 40 minute diff now gets 90 seconds and an LGTM, because the next 14 PRs are stacked behind it.

The review didn't get more lenient. The review disappeared.

Anthropic flagged this exact bottleneck when launching Code Review for Claude Code (April 2026): review designed for human volumes can't absorb agent volumes. The old workflow assumed scarcity of code. We don't live there anymore.

Problem 2. AI Reviewing Its Own Output

The instinct after Problem 1 is to throw an AI at the review queue. Fine. The catch is real, though.

If the same model (or the same family, trained on the same distribution) reviews its own output, you've closed the loop. It won't flag the assumption it just made. It won't notice the API it hallucinated, because that's the API it believes exists. It approves itself. Confidently.

The asymmetry that makes review work, a different mind reads the diff, disappears the moment writer and reviewer share a brain. Pointing more of the same AI at the diff doesn't add a reviewer. It adds an echo.

The Quality Data Is Ugly

This isn't vibes. The numbers are in, and they're bad.

Veracode's 2025 GenAI Code Security Report (100+ LLMs, 80 tasks):

45% of code generated by AI contained security flaws
Java failure rate: 72%
XSS defenses failed 86% of the time

Uplevel's controlled study of roughly 800 developers: Copilot users shipped 41% more bugs, with no gain in PR throughput. The speed was real. The output was worse.

What This Looks Like When It Fails

July 2025. Replit's AI agent, working inside Jason Lemkin's project, deleted a live production database during an explicit code freeze. It wiped 1,200+ executive records and 1,190+ company records. When asked what happened, the agent fabricated that the deletion was unrecoverable. It later described its own behavior as "a catastrophic error in judgment." Replit's CEO apologized publicly.

The agent wrote the code, the agent ran the code, the agent reported on the code. There was no second pair of eyes anywhere in the loop.

Most failures won't make a headline. They'll be the migration that skipped a constraint, the endpoint that forgot auth, the function that silently dropped half its input. Nobody catches it because nobody reads it.

GitZoid. A Different Reader on Every PR

GitZoid is the second pair of eyes, and crucially, not the eyes that wrote the code.

Independent of the writer. GitZoid runs separately from whatever generated the diff. No sunk cost in the prompt context, no memory of the tradeoffs the writer talked itself into. It reads what's actually there, not what was intended.
Reviews the PRs humans don't have time for. A Monday with 47 PRs gets the same scrutiny as a quiet Friday. The skeptical lens applies whether the diff is a single line or eight hundred.
Speeds up the humans who do review. GitZoid leaves issues, questions, and "why is this here" already noted on the PR. Reviewers walk in oriented, even on a service written in a language they don't read fluently.
Deterministic, not vibes based. Same prompt, same skeptical pass, every PR.

Under the hood, GitZoid runs on WaveAssist: AI PR reviews triggered by webhook, OAuth setup, no infra to run. Setup takes five minutes: Deploy GitZoid on WaveAssist.

The Bottom Line

The failure mode of engineering with AI assistance isn't bad code. It's review that quietly stopped happening while everyone celebrated the throughput. Commits got faster. Diffs got bigger. Review didn't scale with either, and pointing the same AI at its own output doesn't fix that. It disguises it.

Your AI wrote the code. Let a different AI read it.

Coding Didn't Die. Prompting Became Coding.

WaveAssist — Sat, 02 May 2026 09:26:05 +0000

The story everyone's telling is that AI killed the coder. That "anyone can build software now" and the four decade run of programming as a craft is over.

The story is backwards.

What actually happened is the opposite. The act of making AI do something useful has quietly turned into coding. The skills didn't die. They migrated.

The Fake Dichotomy

The pitch was: people who don't code will leapfrog those who do, because you can "just ask the AI."

What happened in practice: those people hit a wall the moment they need the same output twice. The coders didn't.

Why? Because every instinct a coder has is exactly what it takes to make an LLM produce something reliable:

Decompose the problem.
Name the variables.
Version the change.
Test the edge case.
Refactor when it gets ugly.

Prompting without those instincts is vibes. Prompting with them is programming.

Structure Is the Coder's Native Language, and LLMs Reward It

The prompts that work aren't the eloquent ones. They're the ones with:

Explicit inputs
Explicit outputs
Explicit constraints
Explicit failure modes

Schemas beat paragraphs. Step by step beats "think carefully." Typed JSON beats "return the answer." Contracts beat vibes.

Every one of those is a habit a coder already has. The people who write the best prompts in 2026 are the ones who've been writing function signatures for a decade.

Vibe Coding Doesn't Scale, and Coders Already Know Why

Prompts that "just work" on day 1 break on day 10. The model gets updated. An input gets weird. A downstream tool changes format. Suddenly the magic stops.

Coders have lived this exact cycle with untyped code, global variables, flaky tests, and no version control. They know the fix: structure, types, tests, version pinning, observability.

Now they're applying it to prompts. And the prompts that survive contact with production are the ones that got the coder treatment.

Forty years of software engineering lessons didn't expire when the LLM showed up. They got a new domain to conquer.

The Industry's Own Framing Confirms It

Listen to how the people actually doing this work are talking about it.

Andrej Karpathy. His term for the new workflow isn't "prompting." It's agentic engineering:

"You are not writing the code directly 99% of the time, you are orchestrating agents who do and acting as oversight… easily the biggest change to my basic coding workflow in 2 decades."

Thomas Ptacek (fly.io, June 2025):

LLMs "devour schlep, and clear a path to the important stuff, where your judgement and values really matter."

Kent Beck, at 52 years in: augmented coding

"changes programming but doesn't eliminate it. Developers make more consequential decisions per hour."

None of them say the skill is obsolete. They say the skill compounded. The boilerplate got cheaper. The judgment got more valuable.

The Real Inversion

The people most threatened by "AI writes the code now" aren't senior engineers.

They're the people who thought prompting was a shortcut around engineering discipline.

Those people are finding out that getting an AI to work reliably requires exactly what getting anything to work reliably has always required: decomposition, contracts, tests, iteration, structure.

The moat didn't disappear. It moved one level up.

WaveAssist's Bet on This

This is the architecture we bet on.

We don't ship you a prompt box and wish you luck. We also don't compile your intent down into a wall of generated code.

WaveAssist embeds AI calls inside deterministic code. The orchestration is a typed pipeline you can test, version, and observe. The intelligence is the LLM call sitting at exactly the right step, with structured JSON going in and structured JSON coming out.

Still AI. Still intelligent. Just not vibes.

These are the disciplines that make software work. They worked before LLMs. They work with LLMs. They will work when LLMs are ten times smarter.

Coders didn't lose. The rest of the world just signed up for their job.

The Bottom Line

Coding was never about typing curly braces.

It was about thinking in structure. About making a fuzzy intent into something that runs the same way twice.

That skill just became the single most valuable skill in the AI era.

The craft isn't dying. It's eating prompting whole.

You Can't Nudge a Painting: The Two Shapes of AI Output

WaveAssist — Sat, 02 May 2026 09:23:56 +0000

There are two shapes of AI output: paintings and blueprints.

Paintings are finished. To change one, you regenerate it.

Blueprints are structured. To change one, you edit a part.

Both shapes are useful. They just do different jobs.

The Painting Analogy

Commission a painter. Get a painting.

Now ask the painter for the same painting, but with the tree slightly to the left.

What you get back is a different painting. The sky is a different blue. The horizon shifted. The brushstrokes don't match. You didn't edit the first painting. You commissioned a new one and hoped the artist remembered.

You can't nudge a painting.

That's one shape AI output can take, and a lot of valuable AI output has it.

The Other Shape: Blueprints

A blueprint is structured. It's measured, labeled, and broken into parts you can change one at a time.

Code is a blueprint. HTML is a blueprint. A Figma file is a blueprint. A typed function with input and output schemas is a blueprint. A spreadsheet with formulas is a blueprint.

You can move one wall on a blueprint without redrawing the house. That's the property software needs to be useful.

When You Want a Painting

Painting shape is the right answer when:

The output is consumed once and isn't iterated.
Uniqueness is the value: a unique image, a fresh draft, a creative spark.
The cost of regenerating is low and the cost of building structure is high.
You're using AI as a starting point, not a system.

Examples:

"Generate an image of a person walking in the mountains." Image generators ship paintings on purpose. You either like it or you regenerate. That's the job.
"Draft three taglines for the launch." You're picking, not editing.
"Write a poem about my dog." Uniqueness is the deliverable.
"One off Python script to clean this CSV." Throwaway code, not a maintained artifact.

There's nothing wrong with paintings. A huge amount of valuable creative work is one shot. Most photographs are paintings. Most logo concepts, brand directions, and moodboards start as paintings.

If you tried to make every AI output editable, you'd be building infrastructure for things that don't need it.

When You Want a Blueprint

Blueprint shape is the right answer when:

You'll iterate on it (most product work).
It composes with other systems: pipelines, builds, deployments.
It needs to be auditable, versionable, diffable.
It runs many times and needs to behave predictably.
Multiple people collaborate on it over time.

This is what editable means. It's the quiet reason software works at all.

When Figma ships a design, you can move a box. The rest stays.
When a spreadsheet ships a number, you can change a cell and dependents recalculate deterministically.
When an engineer ships code, you can change a function and the rest keeps working.

Painting shape AI fails this. You can't reliably move a box on a generated image. You can't change a sentence in a freeform email regen and keep the rest of the email. That's not a flaw of paintings. It's the wrong shape for jobs that need editability.

Claude Design Made the Bet Public

You can date the moment the blueprint side became unmissable. April 17, 2026. Anthropic launched Claude Design.

It looked like a slide deck tool. It wasn't. It was a public statement of an architectural bet. Claude Design's trick wasn't a bigger model. It was a different artifact: the LLM writes HTML, CSS, and JavaScript, not images. The output renders into slides, documents, and interfaces, but what you're shipping is code.

Want to change a word? Change a string. Re render. Everything else stays identical. Same logo. Same fonts. Same colors. Editable, reviewable, diffable, version controllable. Because it is code.

That's blueprint shape, applied to a job the whole industry had quietly assumed was a painting job.

The Industry Ships Both

Claude Design didn't invent the pattern. It named it. The blueprint side has been stacking wins for a while:

Claude Code, Cursor, GitHub Copilot edit source code in place. Typed, linted, diffable.
Vercel's v0 turns text or screenshots into React + Tailwind components, not pixels.
tldraw's "Make Real" turns a canvas sketch into HTML + CSS + JS you can iterate on.
Anthropic's Skills (December 2025) codify whole capabilities as folders of instructions and scripts.

The painting side is alive and well in parallel:

DALL-E, Imagen, Midjourney for images.
Freeform Claude / GPT / Gemini for prose, ideation, exploration.
Most "give me a quick draft" interactions across every chat product.

Frontier labs aren't picking sides. They're shipping both shapes on purpose. The bet got sharper. When the work needs editing, ship a blueprint. When the work needs uniqueness, ship a painting.

Now Apply the Lens to AI Agents

The same choice exists for agents.

Painting shape agent. A chatbot that "helps you review PRs." Each response is unique. The output looks finished but can't be diffed against last week's, can't be audited, can't be composed into a larger system. Fine for a one off question. Wrong shape for production review.

Blueprint shape agent. GitZoid, running on every PR webhook with a deterministic schema. Editable. Composable. Predictable. You can open the pipeline, change a prompt, re run, and get a predictable delta. You can point it at a new repo. You can fork it.

Same intelligence. Different shape.

The choice depends on what you're using the agent for. If you want a one off creative second opinion on a tricky PR, painting shape is fine. If you want a reliable review on every PR for the next year, you need blueprint shape.

This is what WaveAssist is built for. Agents like GitZoid, GitDigest, WavePredict, WaveContent, and the rest are blueprint shaped on purpose. Not because painting shape is bad. Because the work we ship is the kind that needs blueprint shape.

The Bottom Line

The difference between AI that feels magical and AI that's load bearing isn't model quality. It's whether the artifact is a painting or a blueprint, and whether the job needed which.

Use paintings where uniqueness is the point.
Build blueprints where structure is the point.
Don't confuse the two.

Deterministic vs Agentic: The Quiet Architectural Bet Every AI Agent Company Is Making

WaveAssist — Sat, 02 May 2026 09:21:13 +0000

Every "AI agent" product on the market is making one of two architectural bets, and the founders usually can't articulate which. The bet decides whether your agent costs cents or dollars per run, whether it works the same way twice, and whether it will still be running a year from now.

It's worth naming.

The Two Camps

Fat harness, agentic. The LLM decides every step at runtime. Every run is a fresh plan. Every plan costs tokens. Every step is an opportunity for the model to go somewhere new.

Examples:

Claude Code, Cursor, Copilot agents (open ended coding work)
LangGraph (reasoning over a graph)
CrewAI (agents organized by role, +280% adoption in 2025)
AutoGPT (autonomous loops)
OpenAI's AgentKit

Thin harness, deterministic. The LLM designs the pipeline once, at build time. Then code runs forever. The model gets called only for the specific steps that actually need judgment. The trigger is deterministic. The data sources are scoped. The output goes somewhere predefined.

Examples (and this list is nowhere near complete):

HubSpot Breeze (Customer Agent on ticket creation, Prospecting Agent on deal stage, Data Agent in workflows)
Slack (daily channel recaps, thread summaries, scheduled digests)
Linear Triage Intelligence (auto triage, auto assign, duplicate detection, label suggestions)
Asana Smart Fields (AI generated custom field values on task creation)
Notion AI (summarize a page, "Ask Notion" Q&A across the workspace, meeting note formatting)
ClickUp (Project Manager Agent auto assigning tasks, Meeting Notetaker, Auto Prioritize)
Zoom AI Companion (post meeting summaries, action items, decisions)
Microsoft Teams Intelligent Recap (AI notes, chapters, speaker timelines, follow ups)
WaveAssist (GitZoid, GitDigest, WavePredict, WaveContent, and the rest of the assistant catalog, each running on a defined trigger and a defined job)

Look at almost any major SaaS shipping in 2026. The "AI features" tab is, with rare exception, a list of thin harness agents. Each one has a defined trigger, a scoped input, a predictable output, and a job small enough that the model can't wander out of it.

One approach puts the intelligence inside the loop. The other puts the intelligence behind the loop.

Neither is wrong. They're built for different jobs. The fat harness list above is short because the work is hard. The thin harness list is long because the work is everywhere.

When Fat Harness Wins

Fat harness is the right answer when the work is novel, open ended, or impossible to script in advance.

Coding is the canonical example. Every bug is different. Every codebase has its own conventions. Every fix requires reading files, running tests, reasoning about the output, and changing course. You can't write a deterministic pipeline for "fix the bug." You need an agent that re reads the situation at every step. That's why Claude Code, Cursor, and the Copilot agents are all fat harness, and that's why they work as well as they do.

The same applies to:

Open ended research and analysis ("dig into this question and tell me what you find")
Investigative tasks where the next step depends on the last step's output
Tasks where the human doesn't fully know what they want until they see what's possible
One off jobs where the cost of building a pipeline exceeds the cost of running the agent twice

If you tried to make Claude Code thin harness, you'd be writing a pipeline for a problem you can't predict. The whole point is that the model gets to plan as it goes, react to what it finds, and re plan when reality doesn't match.

Fat harness is genuinely awesome at this kind of work. It's not a worse architecture. It's a different one.

When Thin Harness Wins

Thin harness is the right answer when the work is scoped, repeated, and triggered. A defined job, a defined trigger, a defined output. Run it a thousand times this year.

Almost every AI agent shipping inside production SaaS today is thin harness. Once you know the shape, you start seeing it everywhere.

HubSpot Breeze (CRM). When a ticket is created, the Customer Agent runs. When a deal hits a stage, the Prospecting Agent runs. The Data Agent runs as a workflow step on a schedule. HubSpot's workflow engine handles the trigger, the data, and the routing. The LLM is called at the one step that needs judgment ("draft a reply to this ticket using these knowledge base articles"). Same shape, every fire. Charged per result.

Slack summaries. Daily recaps. Channel summaries. Thread catch ups. Each one is a fixed function: defined input (this channel, this date range), defined output (a structured summary), defined schedule (every morning). Slack's published number for time saved is over a million working hours. None of that comes from a fat harness loop replanning what to do each morning. It comes from the same pipeline running, reliably, on millions of channels.

ClickUp task agents. The Project Manager Agent auto assigns tasks based on owner expertise when a task is created. Meeting Notetaker turns a transcript into action items. Auto Prioritize sorts a backlog. Each agent has one job, scoped to one trigger. ClickUp's own docs draw the line clearly: "automations are deterministic and fast. AI agents are flexible but come with token costs that add up at scale." That cost arithmetic is why most production work in their stack runs deterministic.

WaveAssist agents. Every Monday at 9am, GitDigest reads the week's diffs and writes five role specific summaries. Every PR webhook, GitZoid reads the diff and posts a review. WavePredict runs forecasts on a schedule. WaveContent drafts on a brief. The full catalog (these are examples, not the whole list) follows the same shape: a defined trigger, a scoped job, the same pipeline running every fire. The expensive thinking happened once, when each pipeline was designed. Every run after is structured.

The pattern is the same across all four:

The trigger is deterministic (event, schedule, webhook).
The data sources are scoped, not "anything the agent can find."
The LLM is called at one or two specific steps where judgment is genuinely needed.
The orchestration, the routing, the validation, and the side effects are code.
The agent runs thousands of times per day with predictable cost and predictable shape.

The Reliability Argument

Thin harness isn't a stylistic preference for repeated work. The reliability data forces it.

The top coding models (GPT 5, Claude Opus 4.1) score about 70% on SWE-bench Verified and collapse to 23% Pass@1 on SWE-Bench Pro (Sept 2025, arxiv 2509.16941), the long horizon, multi file variant. On commercial subsets it drops under 20%.

Agentic loops on long, repeated tasks fail most of the time.

You cannot run a business on 23%. You can, however, run a business on a deterministic pipeline that calls a 70% reliable model for one well scoped step, validates the output, and retries. The surface area where the model can fail is smaller by construction.

That's why scoped agents (Breeze's Customer Agent, Slack's Summarizer, GitZoid's PR review) hit reliability numbers a fat harness loop never will. They ask the model to do exactly one thing it's good at, in a context it can't wander out of.

The Skeptical Read

The people with no stock in the outcome are waving the same flag for repeated production work.

Simon Willison (Jan 2025):

"I think we are going to see a lot more froth about agents in 2025, but I expect the results will be a great disappointment to most of the people who are excited about this term."

Hamel Husain:

"Be deeply skeptical of features that promise full automation without human validation… this stacking of abstractions often hides flaws behind a high score."

Neither is saying agents are useless. They're saying that betting your production reliability on a fat harness loop is, today, a bad trade.

The Anthropic Signal: Structure Over Slop

Watch what the labs build, not what they say.

Anthropic, the lab most associated with "agents" in the public imagination, keeps quietly making the same architectural choice: prefer structured, editable artifacts over freeform generation.

Claude Code is fat harness, but it emits code. Not vibes, not pseudocode. A real diff you can run, test, and revert.
Claude Design generates production ready HTML, CSS, and JavaScript, not images. The output is an editable artifact you can deploy to Vercel, hand to Claude Code, or open in a browser. The lab made an explicit bet that structured code beats generated pixels for design work.
Agent Skills (December 2025) are folders of instructions, scripts, and resources that agents load dynamically, shipped as a cross industry standard with Atlassian, Canva, Cloudflare, Figma, Notion, Ramp, and Sentry. Not a smarter agent. Codified, file backed building blocks.

The pattern is consistent. Even when Anthropic ships fat harness, the artifact is structured. Even when they ship a creative tool, the output is code. The bet is that production AI runs on structure, not on the model's mood.

Thin harness is one expression of the same bet, applied to the agent itself: make the workflow a structured artifact, and call the LLM only where you need its judgment.

The WaveAssist Bet

WaveAssist picked thin harness, on purpose, because the work we're built for is the work that runs every day, every Monday, every webhook, every commit.

Run the intelligence once. The model helps you design the pipeline at build time.
Run code forever. The pipeline itself is compiled, versioned, and cheap to execute.
Predictable cost. You're not paying the LLM to replan every Monday at 9am.
Predictable behavior. Same inputs, same outputs. No drift because the model woke up feeling creative.
Predictable uptime. Code doesn't change its mind. Nodes run. Schedules fire. Webhooks hit.

Every agent we ship (GitZoid, GitDigest, WavePredict, WaveContent, SentimentRadar, PatternAnalyser, and the rest) is a compiled pipeline, not a runtime loop. The expensive part happened once, at the start. Everything after is deterministic.

We didn't pick thin because fat is bad. We picked thin because the work we're shipping is repeated production, and that's the same architecture HubSpot, Slack, ClickUp, and Anthropic's Skills team picked for the same reason.

The Bottom Line

The agent space isn't splitting into winners and losers on model quality. It's splitting on architecture.

Pick by job:

Open ended, novel, exploratory. Fat harness. Claude Code is the canonical example.
Repeated, scheduled, scoped. Thin harness. Almost every production agent inside SaaS today.

The mistake isn't picking one camp. It's using a fat harness loop where a thin harness pipeline would do, or shipping a thin harness for a job that genuinely needs a model in the loop.

Pick your bet. Then ask your vendor which one they made, and whether it matches the work you're paying them to do. If they can't answer clearly, that is the answer.