The Agentic Stack

#ai #leadership

The wrong question

Most leaders ask: "Should we use AI?"

That's the wrong level. Everyone's using AI — in their IDE, their code review tool, their incident runbook.

The better question is: What's the architecture?

When AI stops being a feature and starts being an execution layer — something that does work rather than assists with work — your system's structure changes. How you define capability changes. How intent enters the system changes.

That's what the agentic stack is about.

The Agentic Stack

The agentic stack has three layers:

Claw is the new unit of architecture. A claw is a bounded execution unit — a focused capability with its own role, context, tools, and constraints. It's not a microservice. It doesn't expose an API endpoint. It does work: it checks Slack, reads files, calls APIs, drafts responses, and takes action.

Skill is the new programming language. A skill is the reusable instruction layer that tells a claw what to do, how to do it, what rules to follow, what tools to use, and what output is expected.

Prompt is the new protocol. A prompt is how intent enters the system — not a rigid API contract, but a natural language instruction that activates claws and routes work through skills.

Three layers. Each one maps to something you already know.

What this replaces — and what it doesn't

Here's where most explanations go wrong: they frame this as replacement.

It isn't.

Services still define how systems run. Databases still store state. APIs still integrate third-party tools. None of that goes away.

What changes is the layer on top.

Before the agentic stack, work that crossed system boundaries required a human: a developer to write the glue code, a manager to coordinate the steps, an analyst to interpret the output.

After, a claw handles it — reading inputs from one system, applying a skill's logic, taking action in another.

A backend engineer still builds the services. But a claw does the work that used to live in Notion docs, Jira comments, and Slack threads.

A concrete example

Meet Priya. She's a Senior Engineering Manager at a company scaling its platform team.

Every week, Priya's team reviews pull requests from junior engineers. The process: she assigns reviewers, engineers leave feedback, authors iterate, someone approves or requests changes. It takes three to five days per PR. Senior engineers spend two to three hours daily just reviewing.

The old way:

PR opened → Priya manually assigns reviewer based on who knows the area
Reviewer reads 300–500 lines of code
Leaves comments → author reads them, guesses at intent → revises
Reviewer re-reads, re-approves or re-requests changes
Repeat until it's good enough

That's six to eight human touchpoints per PR. Multiply by 15 PRs a week.

The new way with the agentic stack:

A claw watches for new PRs. Its skill encodes the team's review standards — naming conventions, test coverage requirements, security anti-patterns, documentation expectations. When a PR opens, the claw reviews it against the skill's logic, posts structured feedback, flags high-risk changes for senior review, and routes the rest.

Priya's senior engineers see only the PRs that need a human decision. Everything else moves without them.

The claw did the work. The skill encoded the judgment. The prompt — "a PR was opened" — was the trigger.

What it looks like in practice

Here's a simplified skill definition for a code review claw:

name: code-review
role: >
  Review pull requests against team engineering standards.
  Flag violations. Approve clean PRs. Escalate anything touching auth or payments.

context:
  - repo: standards/engineering-guidelines.md
  - file: .github/CODEOWNERS

tools:
  - read_pr_diff
  - post_review_comment
  - request_human_review
  - approve_pr

constraints:
  - Never approve PRs touching /src/auth without human sign-off
  - Always flag missing tests as blocking
  - Post feedback in plain English, not just line references

output:
  - structured review with verdict: approve | changes_requested | escalate

The skill is the logic. The claw is the executor. The PR opening is the prompt.

No new service to build. No new API to maintain. You're composing behaviour, not writing code.

What changes for engineering leaders

Your senior engineers stop being the default reviewers. Their judgment gets encoded into skills — and applied at scale, consistently, without their calendar being blocked. They shift from doing reviews to defining what a good review looks like.

Code standards become executable, not aspirational. The difference is whether the standard is applied or just stated.

Your junior engineers get faster feedback loops. Instead of waiting three days for a senior to review their work, they get structured feedback within minutes — and it's the same feedback a senior would give, because the skill was written by a senior. They learn faster and ship faster.

You get visibility into work that was previously invisible. Every claw action is logged. Every decision is traceable. You can see exactly what was reviewed, what was flagged, what was approved, and why — without sitting in every meeting.

How it compares to traditional automation

You've probably built automation before: CI pipelines, bots, cron jobs, Zapier flows. They're great at deterministic work — run tests, send a notification, trigger a deploy.

The agentic stack handles judgement-based work. Not "if PR opened, ping #engineering" but "read this PR, apply our standards, decide what to do." The skill carries the judgment. The claw executes it. The line between automation and decision-making gets blurry — in a useful way.

They're not in competition. Your CI pipeline still runs your tests. A claw now does the first-pass review. The human approves the verdict. It's layers, not replacement.

How to start

Pick one workflow where humans are the bottleneck — code review, incident triage, sprint planning prep, deployment approvals
Write down exactly what a good human would do in that workflow — step by step, with the rules they apply
That's your first skill. The claw is the executor. The trigger is your first prompt
Run it alongside your human process for two weeks. Compare outputs
Trust the parts that are right. Fix the parts that aren't

You're not replacing anyone. You're encoding the judgment that already exists in your best people — and making it available to the whole team, all the time.

Claw is the architecture. Skill is the language. Prompt is the protocol. That's the stack your team is building on — whether you've named it yet or not.

What part of your workflow still requires a human because nobody's encoded the judgment yet? Drop a comment — I'm genuinely curious what's still stuck.