The Fear Is Real. The Framing Is Wrong.
In 2026, the most common question I get from agency leaders isn't "which AI tool should we use?" It's "should I be worried about my team's jobs?" That fear is understandable. It's also pointed at the wrong target. The actual threat to agency teams isn't a reasoning model writing copy. It's the six hours a week each person spends on tasks that require no judgment at all: reformatting briefs, pulling performance data, organizing research, writing first-draft outlines that everyone rewrites anyway.
McKinsey's research on the future of work found that automation and AI are more likely to augment work by eliminating repetitive tasks rather than replacing workers entirely, allowing employees to focus on higher-value creative and strategic activities (McKinsey, "The Future of Work After COVID-19"). That finding matches what we've seen building automation pipelines for agencies. The displacement isn't happening at the creative or strategic layer. It's happening at the administrative layer, and that's exactly where it should happen.
The problem is that most agencies are either ignoring this shift entirely or adopting AI in a way that creates new busywork: prompting, reviewing, correcting, re-prompting. That's not productivity. That's just a different kind of overhead.
What AI Actually Does Well in an Agency Context
Let's be specific. An LLM is good at tasks with a clear input-output structure and a high tolerance for iteration. Research synthesis: give it ten URLs and ask for a structured summary. First-draft outlines: give it a brief and a target audience, get back a skeleton. Reformatting content across channels: take a long-form article and produce a LinkedIn post, an email subject line, and a tweet thread. These tasks share a common property. They require pattern recognition and text manipulation, not judgment about what a specific client actually needs.
Where the reasoning layer breaks down is anywhere client context matters. A content strategist who has worked with a B2B SaaS client for two years knows things no prompt can capture: the founder's communication style, the topics that have historically underperformed with their audience, the competitive sensitivities that make certain angles off-limits. An LLM doesn't know any of that unless someone feeds it in explicitly, and even then, it can't weigh those factors the way a person who has sat in the quarterly review meetings can.
This is the distinction that gets lost in most AI coverage. The question isn't "can AI do this task?" It's "does this task require judgment that lives in a person's head?" If the answer is yes, the pipeline needs a human in the loop. If the answer is no, automating it is just good operations.
How We Actually Structure the Work
When we build automation pipelines for agency workflows, we start by mapping every recurring task against two axes: how much does it vary week to week, and how much does it require client-specific knowledge? Tasks that score low on both axes are candidates for full automation. Tasks that score high on either axis need a person involved, either at the input stage, the review stage, or both.
A practical example: competitive research for a monthly content calendar. The data-gathering step, pulling recent articles, identifying trending topics, flagging competitor content, is fully automatable using tools like Perplexity's API or a web-scraping node in n8n. The synthesis step, deciding which of those trends actually matters for this client's positioning, requires a strategist. So we automate the first step and hand off a structured brief to the person doing the second. The strategist spends twenty minutes on judgment instead of two hours on data collection.
That's the architecture. Not "AI does everything" and not "AI assists with everything." It's a deliberate split based on where judgment is actually required. We've written about how fragmented tech stacks make this kind of split harder to maintain in practice, and the same principle applies here: when your tools don't talk to each other, the automation layer breaks and the work falls back on people.
The Pricing Lesson That Changed How I Think About Complexity
We price our automation builds by pipeline complexity, not by integration count. A contact scorer with four agents running a straightforward fetch-score-format cycle sits at one price point. An RFP intelligence build with five agents across two conditional phases sits at a higher one. Phase 1 decides whether to even write a response before Phase 2 invests the tokens to generate it. The price difference reflects three times more system prompt engineering, twice the test surface, and a conditional architecture that most teams wouldn't build from scratch because the branching logic is genuinely hard to get right.
I mention this because it illustrates something important about AI adoption that agencies miss. The value isn't in the number of tools you connect. It's in the decision logic that sits between them. A pipeline that blindly generates an RFP response for every inbound request wastes tokens and produces mediocre output. A pipeline that first evaluates whether the opportunity is worth pursuing, and only then generates the response, produces better work and costs less to run. That conditional architecture is where the real engineering lives, and it's not something an off-the-shelf AI tool gives you.
Implementation: Where Agencies Actually Get Stuck
The most common failure mode I see is agencies automating the wrong layer first. They build a content generation pipeline before they've solved for brief quality. The output is mediocre, they blame the LLM, and they conclude that AI "doesn't work for creative." What actually happened is that garbage went in and garbage came out. The automation exposed a process problem that already existed; it just made it faster and more visible.
Start with the input layer. Before you automate any output, ask: is the information going into this process clean, consistent, and complete? For most agencies, the answer is no. Client briefs are inconsistent. Research is stored in different formats across different people. Campaign data lives in three platforms that don't share a schema. Fixing those problems first makes every downstream automation more reliable. It also makes the team's work better even without any AI involved.
The second failure mode is skipping the review step because the output looks good. An LLM can produce confident, well-structured text that is factually wrong or strategically misaligned. We've seen this in our own builds: a pipeline that summarizes competitor positioning can miss a recent product launch because the source data was stale. The automation didn't fail technically. It produced a clean output from bad inputs. A person reviewing that output for thirty seconds would catch it. Removing that review step to save time is how agencies ship errors to clients.
This approach works well for high-volume, repeatable tasks with clear success criteria. It breaks down when the task requires real-time market awareness, nuanced client relationship knowledge, or creative risk-taking that an LLM will consistently sand down toward the average. Know which category your work falls into before you build the pipeline.
What the Teams Who Get This Right Look Like
Agencies that implement this thoughtfully don't look like they've replaced anyone. They look like they've given their best people more time to do the work those people are actually good at. The account manager who used to spend Friday afternoons pulling weekly reports now spends that time on client calls. The content strategist who used to write first drafts now reviews and elevates them. The project manager who used to chase status updates now has a dashboard that surfaces blockers automatically.
None of those people are doing less work. They're doing different work. The administrative layer that used to consume a meaningful portion of their week now runs in the background, and the output lands in their inbox already formatted. That's the actual productivity gain: not fewer people, but the same people operating closer to the ceiling of what they're capable of.
If you want to see what this looks like at the automation infrastructure level, the builds we catalog at ForgeWorkflows are organized around exactly this principle: pipelines that handle the structured, repeatable work so the people running them can focus on the parts that require judgment. We also document our quality standards at our BQS methodology page for anyone who wants to understand how we evaluate whether a pipeline is actually ready to run unsupervised.
What We'd Do Differently
We'd audit for hidden judgment calls before automating anything. The tasks that look purely mechanical almost always contain one or two moments where a person is making a micro-decision they don't even notice. Those moments are where automated pipelines produce outputs that are technically correct but contextually wrong. We now map those decision points explicitly before writing a single node, and we build review checkpoints around them rather than assuming the LLM will handle them.
We'd build the feedback loop into the pipeline from day one. Most automation builds we've seen treat the pipeline as finished once it runs without errors. The ones that actually improve over time have a mechanism for capturing when the output was wrong and why. That doesn't have to be complex: a simple Slack message asking "was this output usable?" with a yes/no button generates enough signal to identify which steps need tightening. We added this retroactively to several builds and wish we'd started with it.
We'd be more honest with clients about what the automation can't do. Early on, we undersold the limitations because we didn't want to undermine confidence in the build. That backfired. When a pipeline produced a mediocre output in an edge case, clients were surprised. Now we document the failure modes explicitly during handoff: here's what this pipeline handles well, here's where it will need a human override, and here's how to tell the difference. That transparency has made every client relationship easier to manage.
Top comments (0)