DEV Community: Swrly

Multi-Agent Systems: A Practical Guide for Engineering Teams

Swrly — Tue, 14 Apr 2026 03:00:14 +0000

Multi-agent systems are not a new research concept. They have been studied in academia for decades. What is new is that they are now cheap and fast enough to run in production, and the engineering patterns for building them are still being figured out in real time.

This guide is for engineering teams who have built a single AI agent, hit its limits, and are trying to understand what "multi-agent" actually means in practice — not the theory, but the concrete decisions: when to split into multiple agents, how to pass context between them, how to handle failures, and how to know when you have added too many moving parts.

Why Single Agents Break Down

Before going multi-agent, understand why single agents fail. The primary failure modes are:

Context window saturation. A single agent doing a complex task accumulates context as it works: the initial input, tool call results, intermediate reasoning, partial outputs. For long tasks — analyzing a large codebase, processing a long document, doing research across many sources — the context fills up, the agent starts losing earlier information, and output quality degrades. Splitting the task into smaller agents, each with a fresh focused context, sidesteps this.

Prompt complexity beyond reliability. When you stuff too many responsibilities into one agent's system prompt — analyze, then decide, then write, then format, then validate — you are asking one prompt to reliably govern five different behaviors. Reliability degrades as complexity increases. Each responsibility added to a system prompt is another thing that can go wrong, and the interactions between responsibilities are hard to test.

Sequential bottlenecks. A single agent is sequential. If you need to do three things that do not depend on each other, the single agent does them one at a time. Three agents running in parallel are three times faster.

Lack of specialization. A generalist agent is mediocre at everything. A specialist agent — configured with the right tools, the right context, the right system prompt for one job — is genuinely good at that job. The cost of specialization is the coordination overhead between specialists.

The Core Patterns

Multi-agent systems follow a small number of structural patterns. Knowing the patterns helps you make the right architectural choice for your use case.

Pipeline

Each agent's output is the next agent's input. Sequential, no branching. Use this when each step depends on the previous step's output and the steps are naturally sequential.

Example: Research pipeline — Agent 1 searches the web for sources, Agent 2 reads and summarizes each source, Agent 3 synthesizes the summaries into a structured report.

The pipeline is simple to reason about and debug. The tradeoff is that it is only as fast as the slowest step, and a failure at any stage stops the whole pipeline.

Parallel Fan-Out with Join

A coordinator dispatches the same task (or different subtasks) to multiple agents simultaneously. A join node collects all outputs before passing them to the next step. Use this when you have independent subtasks that can run simultaneously.

Example: PR review — Code quality reviewer, security scanner, and test coverage analyzer all run in parallel on the same PR diff. A join node collects all three reviews. A decision agent synthesizes them into a final verdict.

This is the most impactful pattern for throughput. Three reviewers in parallel cuts review time by roughly two-thirds compared to sequential execution.

Supervisor with Workers

One agent (the supervisor) breaks a high-level task into subtasks and dispatches them to specialized worker agents. The supervisor collects results and either composes a final output or continues assigning work until the goal is achieved.

Example: Research assistant — Supervisor receives "write me a competitive analysis of the top 5 CRM tools." It dispatches five agents, one per CRM, each tasked with gathering specific information about one competitor. Supervisor collects the five reports and composes the final comparison.

This pattern handles variable-length tasks well. The supervisor can add more workers if the initial set does not cover the task, or retire workers that have finished. The tradeoff is that the supervisor itself needs to be reliable — if it misassigns subtasks or loses track of what has been done, the whole system degrades.

Critic and Reviser

Two agents in a loop: a generator and a critic. The generator produces output; the critic evaluates it against defined criteria; if it fails, the output goes back to the generator with the critic's notes. Loop repeats until the critic approves or a maximum iteration count is hit.

Example: Blog post drafting — Writer agent produces a draft, Editor agent checks it against brand guidelines and quality criteria, returns specific revision notes if it fails. Writer revises. Loop continues until the Editor approves or three revision cycles have elapsed.

This pattern is useful for quality control on subjective output. The tradeoff is that it can loop more than intended — set a hard iteration cap and a timeout or you will burn tokens and time on endless revision loops that never converge.

Event-Driven Routing

A trigger agent receives an event, classifies it, and routes it to the appropriate specialist agent. The specialist handles it and potentially triggers downstream agents.

Example: Support triage — Trigger receives an incoming support ticket, classification agent determines the category (billing issue, technical bug, feature request, account problem), routes to the appropriate specialist agent (billing agent, engineering triage agent, product feedback agent).

This pattern is excellent for high-volume inbound processing where you cannot predict what will arrive. The tradeoff is that the routing logic needs to be reliable — if the classifier puts a billing issue in the engineering queue, downstream handling will be wrong.

Coordination: The Hard Part

Coordination overhead is the cost of going multi-agent. Every time you split work between two agents, you need to define:

The interface. What does Agent A produce, and what does Agent B expect? Vague interfaces — "summarize the findings" — work for demos but fail in production because Agent A might structure its output in ways Agent B does not handle. Explicit output formats (JSON with defined fields, structured text with clear headers) make the interface robust.

What context passes. Each agent gets its own context window. Passing too much context between agents (every upstream agent's full output) defeats the purpose of splitting contexts. Passing too little context means downstream agents lack information they need. The right amount: each agent receives the minimum context required to do its job, formatted clearly.

Who handles failures. If Agent B receives Agent A's output and cannot make sense of it, what happens? Define a fallback for each interface: retry with a broader prompt, pass to a human, log and continue with a default. Undefined failure modes become production incidents.

How state is shared. If multiple agents need to read and write shared state — a scratchpad, a running summary, a task queue — you need a shared data layer. In Swrly, the scratchpad tools (swrly_scratchpad_set and swrly_scratchpad_get) provide shared key-value storage accessible to all agents in a run. Agents can write intermediate results and other agents can read them without the orchestrator shuttling data between nodes.

How Many Agents Is Too Many?

There is no formula, but there are warning signs.

You have too many agents when:

You need a diagram of the agents to understand what any single agent does
Debugging a failure requires tracing through more than four agent outputs to find the root cause
The coordination overhead (prompt engineering, context passing, retry logic) exceeds the time saved by splitting
Most of your agents spend more time reading upstream context than actually doing work

You probably need more agents when:

A single agent's system prompt is longer than 2,000 words and covers more than three distinct responsibilities
A single agent's runs are slow because it is doing sequential work that could be parallelized
Output quality is inconsistent in ways that correlate with task complexity — the agent does well on simple instances but degrades on complex ones

The practical target for most production workflows: 3-7 agents. Below 3, you are probably dealing with a task that is fine as a single agent. Above 7, you are introducing coordination overhead that requires careful justification.

Observability Is Not Optional

Multi-agent systems fail in non-obvious ways. An agent upstream of a failure might appear to succeed — it produced output — but the output quality is low enough that the downstream agent cannot use it effectively. The downstream agent fails. The root cause is two nodes upstream.

Without observability at every node, you cannot diagnose this. You see a failed run and a cryptic error message from the final node, with no visibility into what the intermediate agents actually produced.

Good multi-agent observability gives you:

Status per node: succeeded, failed, timed out, skipped
Full input and output logged at each node
Duration and token cost per node
A visual representation of which path execution took through the graph

This is table stakes, not a nice-to-have. Building a multi-agent system without per-node observability is like building a microservice architecture without logs. You will eventually debug a production incident by reading tea leaves.

In Swrly, every run produces a full execution trace visible in the run overlay on the canvas. You can click any node after a run and see exactly what it received as input and what it produced as output. When something goes wrong, the investigation starts at the failed node and works backward through its inputs — not through the whole system.

A Concrete Starting Point

If you are new to multi-agent systems, start with the parallel fan-out with join pattern. It is the easiest to reason about, produces the most obvious throughput gain, and has the simplest failure model.

Pick a task you currently run as a single agent that has multiple independent review dimensions. Code review is the canonical example, but it applies equally to document analysis, content review, data validation, or research synthesis.

Split the single agent into three focused specialists. Give each one a clear, narrow responsibility. Run them in parallel. Collect their outputs with a join node. Pass the combined output to a synthesis agent that makes the final call.

Run that for a month. Measure output quality, throughput, and cost. Then decide what to add — whether that is another specialist dimension, a critic loop on the synthesizer, or a supervisor layer for more complex task decomposition.

Multi-agent systems compound. The value is not in any individual agent; it is in the composition. But you build the composition one well-designed step at a time.

You can start with several pre-built multi-agent templates in Swrly — including a parallel PR reviewer and a research synthesis workflow — or sign up free and build your own.

How to Cut AI Agent Costs by 80% (Without Sacrificing Quality)

Swrly — Tue, 14 Apr 2026 03:00:10 +0000

When we first put Swrly's PR review workflow into production, it cost roughly $4.20 per review. That sounds fine until you realize a mid-sized engineering team opens 200–300 PRs a month. That is $840–$1,260 a month for one workflow. We had four more like it in the queue.

We did not have a scale problem. We had a cost architecture problem.

Three months later, after applying the patterns below, the same PR review workflow costs $0.38 per run — a reduction of just over 90%. Quality is objectively better. Speed is higher. We learned most of this the hard way, which means you do not have to.

Here is what actually moves the needle on AI agent costs.

The Real Sources of Cost

Before you can cut costs, you need to know where they come from. For most agent pipelines, the breakdown looks like this:

Model selection — using Opus for every step, including ones that do not require it, is the single largest source of overspend. It accounts for 40–60% of total token cost in unoptimized pipelines.
Context bloat — most pipelines pass the full upstream context to every downstream step. In practice, 60–70% of those tokens are irrelevant to what the current step needs to do.
Retry loops — when an agent fails and retries with the same full context, you pay for the failed attempt in full. Three retries with 4,000-token context costs 4x the tokens of one successful call. No guardrails means this compounds quietly.
Platform markup — some platforms charge per-token fees on top of your LLM provider bill. These markups range from 2x to 5x the raw inference cost. They are rarely disclosed clearly on pricing pages.

Most teams try to fix costs by switching to cheaper models across the board. That is the wrong instinct — you end up degrading quality on steps that actually need reasoning power and saving almost nothing on the steps that do not.

The right approach is targeted: fix the architecture, not just the model selection.

Pattern 1: BYOK — Eliminate the Platform Markup First

This one change can reduce your total AI spend by 50–80% before you touch a single workflow.

Most AI agent platforms bundle orchestration and inference into one bill. They act as a pass-through to your LLM provider, add a markup, and call it "credits" or "compute units." The math is rarely transparent. What looks like a $49/month plan ends up costing $300+ once token charges are applied.

BYOK (bring your own key) routes inference directly from the platform to your LLM provider. There is no intermediary markup. A call that costs $0.015 on the Anthropic API costs $0.015 in your Swrly workflow — not $0.045 after a 3x platform markup.

The compounding effect matters here. If you run 1,000 agent executions per month and each uses an average of 5,000 tokens:

Best LangChain Alternatives in 2026 (Honest Comparison)

Swrly — Tue, 14 Apr 2026 02:59:32 +0000

LangChain was the right tool at the right time. When it launched in 2022, connecting LLMs to tools and memory was non-trivial — LangChain made it tractable. That mattered. A lot of teams built real things with it.

But as agent workflows have grown in complexity, the cracks have become harder to ignore. Debugging a chain six abstractions deep is miserable. The framework's rapid release cadence means APIs you relied on last month are now deprecated. Adding memory to a multi-agent setup requires reading four pages of documentation and hoping the example hasn't rotted. Teams that built on LangChain often find themselves fighting the framework as much as the actual problem.

None of this means LangChain is bad. It means it was built for a problem that has since evolved. If you are evaluating alternatives — whether because you are starting fresh, migrating, or just tired of langchain_community being a mystery box — this post is for you.

We will cover five paths: LangGraph, CrewAI, Zapier/n8n, Swrly, and rolling your own. We will be direct about the tradeoffs.

What to Look for in a LangChain Alternative

Before comparing options, agree on what matters for your use case. The wrong criteria lead to the wrong choice.

Visual vs code. Some teams want to define workflows in Python. Others want a canvas where non-engineers can see and modify the flow. Neither is inherently better — it depends on who owns the workflow. If your workflows are owned by a product manager or solutions engineer, code-first will create a handoff problem.

Observability. Agent runs fail in subtle ways. An LLM returns plausible-looking output that is structurally wrong. A tool call succeeds but returns empty data. A loop runs 47 times instead of 3. You need to see exactly what happened — which nodes ran, what each one received, and what it returned. "It worked in dev" is not a production posture.

Cost model. LangChain is open source and free. Some alternatives charge per run, per seat, or per agent. Others let you bring your own API keys (BYOK) so the model costs go directly to your provider account and you are not paying a markup. If you are running hundreds of agent executions per day, the cost model matters as much as the feature set.

Team collaboration. Can two people work on the same workflow? Can a junior engineer make a change without breaking the production version? Version control, branching, and role-based access are easy to overlook until the moment you need them.

Debugging ergonomics. When something goes wrong, can you see exactly what the LLM said, what tool calls it made, and what it got back — without adding custom logging everywhere? This is where LangChain is most painful, and where the alternatives differ most sharply.

LangGraph

LangGraph is LangChain's own response to the criticism. Rather than chains, it uses explicit directed graphs — you define nodes and edges, including cycles, and state flows through them. This makes the execution model transparent in a way that vanilla LangChain is not.

It is a meaningful improvement. Cycles are a first-class concept, so retry loops and iterative refinement are straightforward. State is explicit and typed, so you always know what data is available at each node. The graph structure maps more directly to how teams think about agent workflows.

The limitations are inherited from LangChain. It is Python-only. Debugging still requires reading Python tracebacks and adding logging by hand. Deployment is your responsibility. If you want to run LangGraph in production, you are setting up LangServe or LangGraph Cloud, managing infrastructure, and writing your own observability.

LangGraph is the right call if you are a Python team that wants explicit state management and is comfortable owning the operational side. It is not a good fit if you want a managed runtime, a visual interface, or non-engineer participation.

# LangGraph: define state and a simple two-node graph
from typing import TypedDict
from langgraph.graph import StateGraph

class State(TypedDict):
    pr_diff: str
    review: str
    verdict: str

def review_agent(state: State) -> State:
    # call Claude, parse output, return updated state
    ...

def notify_agent(state: State) -> State:
    # post to Slack based on verdict
    ...

graph = StateGraph(State)
graph.add_node("review", review_agent)
graph.add_node("notify", notify_agent)
graph.add_edge("review", "notify")
graph.set_entry_point("review")
app = graph.compile()

Clean code. But you still own the runner, the queue, the error handling, and the observability stack.

For a deeper comparison, see our LangChain vs Swrly breakdown.

CrewAI

CrewAI takes a role-based approach: you define agents by role (researcher, writer, reviewer), assign them tools, and let a "crew" collaborate on a task. The mental model is intuitive — most teams already think about AI workflows in terms of roles.

Setup is simpler than LangChain for common patterns. A research-and-write pipeline that would take 200 lines of LangChain code takes 40 lines of CrewAI YAML. The framework handles agent-to-agent communication and task sequencing.

The rough edges appear at scale. Task routing is sequential by default — parallel execution requires explicit configuration and is less battle-tested. The YAML-first approach works well for standard patterns but becomes awkward when you need conditional branching, loops, or dynamic tool selection. Debugging is better than LangChain but still requires reading logs rather than inspecting a structured trace.

CrewAI is strongest for linear multi-agent pipelines with well-defined roles and predictable data flow. It is weaker for complex branching workflows, real-time observability, or non-engineer ownership.

Zapier and n8n

Zapier and n8n are automation platforms that added AI capabilities. They were not built for agent orchestration, but they work for simple flows: trigger on event, call an LLM, write the result somewhere.

Zapier's strength is integrations. If you need to connect a form submission to a GPT call to a Google Sheet, Zapier is probably the fastest path. No code, large integration library, well-documented. The ceiling is low though — it is not designed for multi-agent coordination, complex conditions, or long-running workflows.

n8n is the self-hosted alternative with more flexibility. You get a visual node editor, branching logic, and the ability to write JavaScript for custom nodes. It handles more complexity than Zapier but still was not designed with LLM-native workflows in mind. AI nodes feel bolted on rather than native.

Both tools are appropriate when the AI call is a single step in a broader automation — not when you are orchestrating multiple agents with interdependencies. If you need agents that use tools, accumulate context, and hand off state to each other, you will hit the ceiling quickly.

Swrly

We built Swrly because we kept running into the same problems across different teams: LangChain workflows that were hard to debug, Python scripts that only one engineer could modify, and agent pipelines that collapsed in production because nobody could see what was happening inside them.

Swrly is a visual drag-and-drop agent orchestration platform. You build workflows on a canvas — no code required for standard patterns. Each node is an agent, integration, condition, loop, or trigger. Edges define the data flow. When a workflow runs, you watch it happen on the canvas in real time.

The key design choices:

BYOK (Bring Your Own Keys). Your LLM costs go directly to your provider account. We do not take a margin on inference. You use your Claude Code subscription, and agent runs are charged to it. This matters at scale — teams running 500+ daily agent executions save significantly compared to platforms that mark up API costs.

346+ MCP tools across 51 integrations. GitHub, Slack, Linear, Jira, Notion, PostgreSQL, MySQL, Redis, Stripe, Twilio, Telegram, Bluesky, and more. Agents can call any of these tools without you writing integration code. Connect your accounts in Settings, and the tools are available in the agent builder.

Production observability. Every run is logged with full input/output per node, timestamps, token usage, and tool call traces. When something breaks, you click into the run history and see exactly what happened — no log scraping required.

26 workflow templates. Common patterns — PR review, bug triage, content pipeline, customer support routing — are available as one-click starting points in the template marketplace.

Visual without being limited. Condition branching, parallel execution, loop nodes (with configurable exit conditions and iteration caps), approval gates, and cross-swirl triggers are all available on the canvas. For teams that need code, every workflow is exportable.

Here is the same PR reviewer workflow from above, built visually in Swrly versus the LangGraph equivalent:

Model Context Protocol (MCP) Explained: Why It Matters in 2026

Swrly — Tue, 14 Apr 2026 02:59:29 +0000

In 2024, every AI framework shipped its own way to call tools. LangChain had one approach. AutoGen had another. The Claude API had a third. If you wanted to write a GitHub tool that worked across all three, you wrote it three times — and maintained three versions.

That problem is solved now. Model Context Protocol (MCP) is an open standard from Anthropic that defines a single way for AI models to discover and call external tools. By 2026 it has been adopted across Claude Code, Cursor, GitHub Copilot, and is supported in OpenAI's tooling as well. It is the closest thing the AI tooling ecosystem has to a standard.

This post explains what MCP actually is, how it works under the hood, when you should use it, and when it is overkill. If you are building anything involving AI agents and external tools, this is worth understanding.

The Problem MCP Solves

Before MCP, every team building AI-powered tooling faced the same problem: there was no standard interface for exposing tools to language models.

Say you wanted an agent that could create a GitHub pull request. You would write a function that calls the GitHub API, wrap it in whatever tool-calling schema the model expected, and test it. Then someone asks for the same capability in a different framework. You write it again. Different schema, different transport, different error handling conventions — same underlying API call.

Multiply this across the entire ecosystem. GitHub tools, Slack tools, database tools, monitoring tools — each one reimplemented for each framework. The result was a fragmented landscape where the same work was being done hundreds of times, with no shared discovery mechanism, no standard error format, and no way to compose tools across tool providers.

MCP addresses all three of these. It defines:

How servers expose tools — a standard JSON schema format for describing tool inputs and outputs
How clients discover tools — a standard handshake where clients ask servers to list their available tools
How calls are made — a standard request/response format using JSON-RPC 2.0

Build a GitHub MCP server once. Any MCP client — Claude Code, Cursor, Swrly, your custom agent — can connect to it and use its tools without modification.

What MCP Actually Is

MCP is a client-server protocol built on JSON-RPC 2.0. The structure is simple:

MCP servers expose tools, resources, and prompts
MCP clients (your agent or IDE) connect to one or more servers, discover what is available, and call tools during model execution
Transports define the communication channel between client and server

The best analogy is the Language Server Protocol (LSP), which standardized how code editors communicate with language analysis tools. Before LSP, every editor (VS Code, Vim, Emacs) had to implement its own TypeScript integration, its own Python type checker integration, and so on. LSP moved that complexity to the server side. MCP does the same thing for AI tool integrations.

The Two Transport Modes

MCP supports two transport modes, and understanding both is important because they are not interchangeable:

SSE (Server-Sent Events) was the original transport. The client opens an HTTP connection, the server streams events back. This transport is required for SDK v1.x clients like Claude Code SDK v1 — those clients use the query() API and cannot establish bidirectional HTTP streams. If you are using @anthropic-ai/claude-code at version 1.x, you need an SSE endpoint.

HTTP Streamable is the newer transport introduced for SDK v2+. It uses standard HTTP POST requests with streaming responses. It supports authorization headers, which SSE traditionally cannot (the browser EventSource API does not allow setting request headers). This transport is better for production server-to-server communication.

In practice, production MCP servers often expose both endpoints — /sse for backward compatibility with v1 clients and /mcp for v2+ clients. SSE endpoints typically cannot enforce Authorization header checks because legacy clients cannot send them, so you need to handle authentication at the tool level instead.

How Tool Discovery Works

When a client connects to an MCP server, it makes a tools/list request. The server responds with an array of tool definitions:

{
  "tools": [
    {
      "name": "github_create_pr",
      "description": "Creates a pull request on GitHub",
      "inputSchema": {
        "type": "object",
        "properties": {
          "owner": { "type": "string", "description": "Repository owner" },
          "repo": { "type": "string", "description": "Repository name" },
          "title": { "type": "string", "description": "PR title" },
          "body": { "type": "string", "description": "PR description" },
          "head": { "type": "string", "description": "Branch to merge from" },
          "base": { "type": "string", "description": "Branch to merge into" }
        },
        "required": ["owner", "repo", "title", "head", "base"]
      }
    }
  ]
}

The model receives this schema and knows how to call the tool. When it decides to create a PR, it emits a tools/call request with a JSON object matching the inputSchema. The server executes the tool and returns the result. The model incorporates the result into its reasoning.

This is the core loop: discover, call, incorporate.

A Minimal MCP Server

Here is what a minimal MCP server looks like using the official TypeScript SDK:

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

const server = new McpServer({
  name: "github-tools",
  version: "1.0.0",
});

server.tool(
  "github_get_pr",
  "Fetch details of a pull request",
  {
    owner: z.string().describe("Repository owner"),
    repo: z.string().describe("Repository name"),
    pr_number: z.number().describe("Pull request number"),
  },
  async ({ owner, repo, pr_number }) => {
    const response = await fetch(
      `https://api.github.com/repos/${owner}/${repo}/pulls/${pr_number}`,
      {
        headers: {
          Authorization: `Bearer ${process.env.GITHUB_TOKEN}`,
          Accept: "application/vnd.github+json",
        },
      }
    );

    if (!response.ok) {
      throw new Error(`GitHub API error: ${response.status}`);
    }

    const pr = await response.json();
    return {
      content: [
        {
          type: "text",
          text: JSON.stringify({
            title: pr.title,
            body: pr.body,
            state: pr.state,
            draft: pr.draft,
            mergeable: pr.mergeable,
          }),
        },
      ],
    };
  }
);

const transport = new StdioServerTransport();
await server.connect(transport);

That is a complete, working MCP server. It exposes one tool. Any MCP client can connect to it via stdio and call github_get_pr. The Zod schema is automatically converted to the JSON Schema format that tools/list returns.

The HTTP transport version adds a few lines to set up an Express (or Hono) server and swap StdioServerTransport for SSEServerTransport or StreamableHTTPServerTransport, but the tool definition code is identical.

When MCP Is Worth It

MCP adds real value in specific situations.

You are building a platform or a reusable agent system. If multiple agents — or multiple teams — need access to the same tool, an MCP server is the right abstraction. You define the tool once, expose it over the network, and every agent connects to it. Updates to the tool propagate to all consumers without any redeployment of agent code.

You have a shared tool library. Teams that are building dozens of agent workflows benefit from an MCP server that centralizes all their integrations — GitHub, Slack, Jira, database queries. Individual agents connect to the shared server and use whatever subset of tools they need. This mirrors how a shared library works in traditional software but with the discovery mechanism built in.

Tools run in separate processes for isolation or security. MCP servers can run as separate processes, separate containers, or even separate machines. If your database tools need different credentials than your Slack tools, you can run them as separate MCP servers. The client connects to both and routes tool calls to the right server. This is significantly cleaner than embedding all your tool logic in a single agent process.

You want local-first agent setups. Running tools locally — against a local database, a local file system, a local development environment — is straightforward with stdio transport. The MCP server runs as a subprocess, the agent process spawns it, and they communicate over stdin/stdout. No network, no auth, no infrastructure.

When MCP Is Overkill

MCP adds complexity. For some use cases, that complexity does not pay off.

One-off agents with one or two tools. If you are building a script that calls the Stripe API and sends a Slack message, you do not need MCP. Write two functions, call them from your agent, move on. The overhead of running an MCP server, handling tool discovery, and managing the connection is not justified for two endpoints.

Rapid prototyping where schemas change frequently. MCP tool schemas are defined at server startup. Every time you change a tool's input or output, you need to restart the server and reconnect any clients. When you are iterating quickly on what a tool should do, this friction adds up. A direct function call is easier to change. Add MCP once the interface has stabilized.

Situations where the model needs to compose tools dynamically. MCP is designed around a static list of tools discovered at connection time. If you need to generate tools programmatically at runtime — for example, generating a unique tool per database table — MCP can handle it, but it requires careful design to avoid listing hundreds of tools in every tools/list response, which wastes context tokens.

Common Gotchas in 2026

MCP is straightforward once you understand these rough edges.

Transport mismatch causes silent failures. The most common issue when connecting an MCP client to a server is transport incompatibility. If your client uses SDK v1 and expects SSE, and your server only exposes an HTTP Streamable endpoint, the connection will fail — often with an unhelpful error. Always confirm which transport version your client SDK requires and check that the server exposes the correct endpoint. When in doubt, expose both.

Tool discovery cost scales with tool count. Every MCP client requests the full tools/list on connection. If your server exposes 300 tools, that list goes into the model's context on every connection. At GPT-4 or Claude Sonnet token rates, a 300-tool schema can cost 5,000-10,000 tokens per session before the model does any work. Two mitigations: run tool filtering so each agent only sees the tools it is allowed to use, or organize tools across multiple smaller MCP servers and only connect agents to the servers they need.

MCP server instances must be per-connection for concurrent agents. A common architecture mistake is using a singleton MCP server instance shared across all connections. This works fine for one agent at a time but causes "already connected" and state corruption errors when multiple agents run concurrently. The correct pattern is a factory function — each incoming SSE connection instantiates its own McpServer object, isolated from all others. The underlying tool implementations can share infrastructure (database connections, HTTP clients), but the MCP server object itself must be per-connection.

Authorization on SSE endpoints is limited. SSE relies on browser EventSource, which cannot set custom headers. This means you cannot use Authorization: Bearer <token> on SSE connections — your clients simply cannot send it. Production SSE servers typically authenticate at the tool level (passing credentials as tool arguments or reading them from environment variables) rather than at the transport level.

Tool name collisions across servers. When a client connects to multiple MCP servers, tool names from all servers appear in the same namespace. If two servers both define a tool called create_issue (one for Jira, one for Linear), you have a collision. Adopt a consistent naming convention upfront: jira_create_issue, linear_create_issue. This is especially important when building platforms where third-party MCP servers may be added over time.

The MCP Ecosystem Today

The MCP ecosystem has grown faster than most expected. By early 2026:

Over 10,000 public MCP servers are listed in community registries
Major IDE platforms — Cursor, VS Code with Copilot, JetBrains AI — all support MCP natively
Claude Code integrates MCP servers via the .claude/settings.json configuration file
OpenAI's tool calling format has aligned closely enough with MCP that adapters are trivial
AWS, Cloudflare, and Vercel all offer managed MCP server hosting

The adoption pattern followed the same arc as LSP: slow uptake in the first year while the tooling ecosystem matured, then rapid adoption once the major IDEs and frameworks committed to the standard. The key difference from previous AI tool standards is that MCP is genuinely simple. The spec fits in a single document. A working server is under 50 lines of TypeScript. That simplicity is not an accident — it is why it won.

How Swrly Uses MCP

Swrly is built on MCP end-to-end. Every integration — GitHub, Slack, Linear, Jira, PagerDuty, Stripe, and 45 others — is exposed as a set of MCP tools through a single server at port 3002. When an agent workflow runs, the runner service connects to the MCP server and passes the available tools to the Claude Code SDK. The model sees the tools, decides which ones to call, and the MCP server executes them.

The platform currently exposes 346 tools across 51 connectors. Presenting all 346 tools to every agent would be wasteful and would dilute the model's attention. Instead, each agent node in a workflow has a selectedTools configuration — an explicit list of the MCP tool names that agent is allowed to use. The runner filters the tools/list response to only include those tools before passing it to the model.

This per-agent tool restriction serves two purposes. First, it keeps context lean — an agent that only needs github_list_pr_files and github_get_content does not need to see the 50 Stripe tools. Second, it enforces least-privilege access — a code review agent cannot call stripe_create_refund even if the model tries.

The MCP server runs as a separate Docker container from the main web app and the runner service. Tools that need to make outbound HTTP requests include SSRF protection — all URLs are checked against a blocklist before the request is made. Credentials (API keys, OAuth tokens) are stored encrypted in the database, decrypted at the MCP server level, and never passed through the agent's context.

Conclusion

MCP won because it solved a real problem simply. The AI tool integration ecosystem was fragmented, duplicative, and incompatible across frameworks. MCP gave it a common language — not a complicated enterprise spec, but a thin JSON-RPC protocol with a clear handshake and a standard schema format.

If you are building a one-off agent script, you can ignore MCP entirely. Write functions, call them directly, ship it. But if you are building agent infrastructure — a platform, a shared tool library, a multi-team automation system — MCP is the right foundation. The ecosystem is there, the tooling is mature, and the alternative (custom integration code per framework) is worse.

Understanding the transport modes, the tool discovery cost at scale, and the per-connection instance requirement will save you real debugging time. The gotchas are not subtle once you know them.

For a working reference implementation, the Swrly MCP server demonstrates the SSE + HTTP Streamable dual-transport pattern, per-connection instance isolation, and per-agent tool filtering at production scale.

What BYOK Really Means for AI Platform Costs

Swrly — Sun, 12 Apr 2026 22:07:59 +0000

If you have evaluated AI agent platforms recently, you have probably noticed the pricing pages are designed to be confusing. There is a platform fee, plus credits, plus per-token charges, plus overages. You sign up for $19/month and end up paying $300 because your agents were chatty.

This is the problem BYOK solves, and it is the core of how Swrly handles pricing.

The Hidden Cost Problem

Most AI platforms bundle two separate things into one bill: the orchestration layer (workflow management, integrations, UI) and the AI compute (LLM inference, token usage). Bundling them together lets platforms charge unpredictable per-token markups that scale with your usage.

Here is what the landscape looks like today:

Relevance AI charges $19/month for the Pro plan, but that includes a limited number of credits. Each agent run consumes credits based on token usage. Complex workflows with multiple agent steps can burn through credits in days. Overages are billed per-credit, and the cost per credit varies by model.

CrewAI starts at $200/month for the Enterprise plan. It includes "unlimited" agents, but inference costs are billed separately based on the LLM provider and token volume. The total monthly cost depends entirely on how many tokens your agents consume.

LangSmith/LangChain charges per trace for observability, and you still pay your LLM provider separately. The traces add up — a busy team running thousands of workflows can see observability costs alone exceed $100/month.

In every case, the total cost is unpredictable until you get the bill.

Swrly's BYOK Model

Swrly separates the two costs completely:

AI compute: You pay Anthropic directly for your Claude Code subscription. $20/month for Pro. That is between you and Anthropic — Swrly never touches it.
Orchestration: You pay Swrly for the platform. Free for individuals, $49/month for Pro, $99/month for Teams. Fixed price, no per-token charges, no credit system.

The mechanism is straightforward. In Swrly's Settings page, you paste your Claude Code session token. Swrly encrypts it using AES-256-GCM (the same encryption standard used by banks and government systems) and stores the encrypted token in the database. When a workflow runs, the token is decrypted ephemerally in memory, used for that execution, and discarded. It is never written to logs, never stored in plaintext, and never accessible to Swrly staff.

Your key, your costs, your control.

The Cost Comparison

Here is what a team of 5 engineers running approximately 1,000 agent workflow executions per month can expect to pay:

Building AI Workflows for DevOps Teams

Swrly — Sun, 12 Apr 2026 22:07:56 +0000

DevOps teams are some of the best candidates for multi-agent automation. The work is repetitive, high-stakes, and built on integrations between tools that already have APIs. When your PagerDuty fires at 2 AM, the response is always the same sequence: check the alert, pull the recent deploys, look at the error logs, assess severity, notify the right people. Every step is manual, every step costs time, and every step is something an agent can do.

This guide walks through four production-ready DevOps workflows you can build with Swrly. We will go deep on the first one — Incident Response — and then sketch the other three so you can adapt them to your stack.

Use Case 1: Incident Response

This is the workflow teams ask about most. The premise is straightforward: when an incident fires, an AI agent triages it before a human even opens their laptop.

What the Workflow Does

A PagerDuty webhook fires when a new incident is created
The workflow pulls incident details from PagerDuty and recent errors from Sentry in parallel
An AI analyst agent correlates the signals: PagerDuty alert details, Sentry stack traces, and recent GitHub commits
A second AI agent drafts a response runbook with mitigation steps
A condition node checks severity
Critical incidents go to #incidents-critical on Slack; everything else goes to #ops-log

By the time the on-call engineer opens Slack, there is already a root cause hypothesis, a list of affected components, and step-by-step mitigation instructions waiting for them.

Walking Through the Build

Start by creating a new swirl and dragging a Trigger node onto the canvas. Set the trigger type to Webhook. Copy the generated URL and configure it as a PagerDuty webhook — under your PagerDuty service, add a webhook subscription for the incident.triggered event type.

Next, add a PagerDuty integration node connected to the trigger. Configure it with the pagerduty_get_incident action and pass {{trigger.payload.incident_id}} as the incident ID parameter. This fetches the full incident details including title, description, service, urgency, and assigned escalation policy.

Now add two integration nodes in parallel — this is where Swrly's directed graph model shines. Connect the PagerDuty node to both a Sentry node (using sentry_list_issues with the query is:unresolved) and a GitHub node (using github_list_commits to pull recent commits from your main repository). These run simultaneously, cutting the data-gathering time in half.

Connect both integration nodes to an Agent node called "Incident Analyst." This is the core of the workflow. Here is the system prompt:

You are a senior SRE performing incident triage.

PagerDuty incident details:
{{get_incident.output}}

Recent Sentry errors:
{{list_issues.output}}

Recent GitHub commits:
{{list_commits.output}}

Correlate these signals to identify the most likely root cause.
Output a structured analysis with:
1. Root cause hypothesis (with confidence level)
2. Affected components
3. Timeline of events
4. Severity estimate (1 = critical, 2 = major, 3 = minor)

Output the severity as a number in a field called `severity`.

The agent receives data from three sources — PagerDuty, Sentry, and GitHub — and produces a structured analysis. Set maxTurns to 15 and enable accumulateContext so the agent retains all upstream data.

Add a second agent, "Incident Responder," connected to the analyst. Its prompt takes the analyst's output and produces a step-by-step runbook:

Based on the analysis, produce:
1. A step-by-step runbook to resolve the issue
2. Immediate mitigation actions (first 15 minutes)
3. Communication template for stakeholders
4. Estimated time to recovery

Finally, add a Condition node after the responder. Set the field to output, the operator to contains, and the value to severity: 1. Connect the true branch to a Slack node posting to #incidents-critical and the false branch to a Slack node posting to #ops-log.

Why This Works

The key insight is that incident triage is pattern matching. Given an alert, some errors, and some recent changes, an experienced SRE correlates the signals and forms a hypothesis. An AI agent does the same thing, but in 30 seconds instead of 10 minutes. The human still makes the final call — the agent just gives them a running start.

The condition node adds intelligent routing. Critical incidents get immediate visibility in the high-urgency channel. Lower-severity issues are logged but do not wake anyone up unnecessarily.

Integration Highlights

This workflow uses four integrations:

PagerDuty (pagerduty_get_incident) — Fetches full incident details including metadata, assignments, and escalation policy
Sentry (sentry_list_issues) — Pulls recent unresolved errors with stack traces and frequency data
GitHub (github_list_commits) — Lists recent commits with messages, authors, and timestamps for change correlation
Slack (slack_send_message) — Delivers the triage report to the right channel based on severity

All four are available in Swrly's integration library. Connect them once in Settings, and every workflow in your workspace can use them.

Use Case 2: Deployment Monitoring

Trigger: Cron schedule, every 30 minutes

After every deployment, you want to know if things are healthy. This workflow runs on a timer and checks for signs of trouble.

The flow:

Cron trigger fires every 30 minutes
Sentry integration node fetches errors from the last 30 minutes (firstSeen:-30m)
Agent node analyzes error volume, new error types, and spike patterns
Condition node checks if the agent flagged any anomalies
If yes: Slack message to #deploys with the analysis and a recommendation to investigate or rollback
If no: silent — no noise when things are healthy

Why it is useful: Most deployment monitoring is threshold-based. "Alert if error rate exceeds 5%." But threshold alerts miss slow-burn regressions and novel error types. An AI agent can identify patterns that static rules miss: "Three new TypeError exceptions appeared in the auth module 20 minutes after the last deploy. These were not present in the previous 24 hours."

Integration highlights: Sentry for error data, Slack for notifications. Add a Datadog node if you want to correlate with infrastructure metrics. Add a GitHub node with github_list_deployments to tie errors to specific deploys.

Use Case 3: Security Audit

Trigger: Daily cron, 6 AM

Security reviews should not be a quarterly event. This workflow runs every morning and checks for security-relevant changes.

The flow:

Daily cron trigger at 6 AM
Two parallel integration nodes: GitHub (github_list_commits for the last 24 hours) and Sentry (sentry_list_issues for new unresolved errors)
Security Analyst agent reviews commits for security-sensitive changes: auth logic modifications, dependency updates, config file changes, new API endpoints, cryptographic code
Security Reporter agent formats findings into a structured report with severity ratings
Condition node checks if any critical findings exist
Critical findings: Slack to #security and PagerDuty incident creation
Non-critical findings: Slack to #security-log for the daily digest

Why it is useful: The security audit template is one of the most popular in Swrly's template library. It catches the kinds of changes that slip through code review: a dependency bump that introduces a known CVE, an environment variable accidentally logged in a new error handler, an auth middleware accidentally removed from a route. The agent checks every commit, every day, without getting tired or distracted.

Tips for production:

Set the GitHub integration to scan all repositories in your organization, not just one
Add Snyk or Dependabot integration nodes for automated vulnerability database checks
Adjust the analyst's max turns based on your daily commit volume — 20 turns handles most teams

Use Case 4: Standup Reports

Trigger: Daily cron, 8 AM (or Monday/Wednesday/Friday for async standups)

The flow:

Cron trigger fires before standup
Three parallel integration nodes: GitHub (commits and PRs from the last 24 hours), Linear or Jira (recently updated issues), and Slack (messages from #engineering, optional)
Report Writer agent synthesizes everything into a standup-format summary: what shipped, what is in progress, what is blocked
Slack message to #standup with the report

Why it is useful: Standup reports are a tax on engineering time. Every developer spends 5-10 minutes remembering what they did yesterday and writing it up. Multiply by team size and frequency, and it adds up. An AI agent that reads the actual work artifacts — commits, PRs, issue updates — produces a more accurate summary than memory-based self-reporting, and it takes zero developer time.

Variations:

Add a condition node to flag blockers and route them to #engineering-leads
Run it weekly instead of daily for sprint summaries
Add a Notion integration to auto-update a team wiki page

How Triggers Work

All four workflows above use one of two trigger types.

Webhook triggers fire when an external service sends an HTTP POST to the workflow's unique URL. The payload is available to all downstream nodes via {{trigger.payload.*}} template variables. PagerDuty, GitHub, Sentry, Stripe, and most modern SaaS tools support webhook configuration. The webhook URL is generated automatically when you add a trigger node — no server configuration required.

Cron triggers fire on a schedule using standard cron syntax. 0 6 * * * runs daily at 6 AM UTC. */30 * * * * runs every 30 minutes. 0 8 * * 1-5 runs at 8 AM on weekdays only. Cron triggers do not carry a payload, so downstream nodes rely on integration calls to fetch current data rather than event-driven input.

Choose webhooks for event-driven workflows where something needs to happen immediately after an external event. Choose cron for polling-based workflows that check on things periodically.

Getting Started

The fastest way to start is to clone a template. Go to the template gallery in Swrly and search for "Incident Response," "Security Audit," or "Standup Report." Click "Use Template" to clone it into your workspace. Then customize the integration connections, adjust the agent prompts for your stack, and run a test.

All templates are free and available on every plan. You can modify them however you want — add nodes, change prompts, swap integrations, add condition branches. They are starting points, not locked configurations.

Clone the Incident Response Template

Sign up at swrly.com, navigate to Templates, and search for "Incident Response." One click clones it into your workspace. Connect your PagerDuty, Sentry, GitHub, and Slack integrations, and you have a production-ready incident triage pipeline in under 5 minutes.

If your team is already drowning in alerts and spending too much time on manual triage, this is the highest-ROI workflow to automate first. Let the agents do the correlation. Let the humans make the decisions.

Swrly vs Building Your Own Agent Framework

Swrly — Fri, 10 Apr 2026 16:47:04 +0000

Every engineering team building with LLMs hits the same fork in the road. Do you use a platform, or do you build your own agent framework? The answer is not obvious, and anyone who tells you it is probably selling something.

We built Swrly because we hit the limits of DIY agent orchestration ourselves. But we are not going to pretend it is the right choice for every team. Here is an honest breakdown.

The Build-vs-Buy Question

The decision is not really about cost. It is about what your team should be spending its time on.

If your core product is an AI agent system -- if orchestration logic is your competitive advantage -- then building your own framework makes sense. You need full control over execution, memory, and tool dispatch because those are the things that differentiate you.

If your team uses AI agents to automate internal workflows, review PRs, triage incidents, or augment existing processes, then building orchestration infrastructure from scratch is a distraction. You would not build your own CI/CD system either.

What You Get with Swrly

Swrly is an opinionated platform. That means you trade some flexibility for a lot of velocity. Here is what you get out of the box:

Visual builder with 8 node types. Agent nodes, integration nodes, condition nodes, triggers, loops, approvals, and joins. You drag them onto a canvas, connect them, and you have a directed workflow. No Python glue code. No YAML files. The topology is the source of truth.

Trigger system. Five trigger types -- manual, webhook, cron, API, and event-based. Your workflows can fire on a GitHub push, a Slack message, a cron schedule, or an API call from your existing CI/CD pipeline. This is plumbing that takes weeks to build correctly and months to make reliable.

Observability from day one. Every agent run streams logs in real time. You can see which node is executing, what it produced, how long it took, and where it failed. When a 6-node workflow produces bad output, you click on the node that went wrong instead of reading through a 500-line trace.

47 integrations, 350+ MCP tools. GitHub, Slack, Linear, Jira, PagerDuty, Datadog, Sentry, databases, HTTP endpoints. Each integration is a set of pre-built tools that agents can call. No writing API wrapper code.

Zero infrastructure. No Redis clusters to manage, no BullMQ queues to tune, no worker processes to scale. You configure your agents, connect your API keys via BYOK, and run.

Plan enforcement and RBAC. Usage limits, role-based access, API key management, workspace isolation. The kind of boring operational stuff that takes a surprising amount of time to build.

What You Get Rolling Your Own

There are real advantages to building a custom framework. We would be dishonest to ignore them.

Full control over the execution engine. Swrly runs agents through the Claude Code SDK. If you need to use a different model provider, or you need custom execution semantics like speculative branching or dynamic agent spawning, you need your own runtime.

Custom memory and state management. Swrly provides context accumulation and scratchpad-based state. If your use case requires long-term vector memory, retrieval-augmented generation, or custom knowledge graphs, a platform's state model may not be enough.

Deep framework integration. If your team already has a LangChain or CrewAI codebase with months of investment, migrating to a platform has real costs. Sometimes the right move is to keep building on what you have.

Niche execution patterns. Multi-turn conversations, adversarial agent debates, agent-as-judge evaluation loops -- these patterns do not map cleanly to a DAG-based workflow builder. Custom code gives you the freedom to model these directly.

No vendor dependency. Your orchestration logic lives in your repo, runs on your infra, and does not depend on a third-party service being available.

Decision Framework

Here is a table we use internally when talking to teams evaluating Swrly:

5 AI Workflows Every DevOps Team Should Automate

Swrly — Fri, 10 Apr 2026 16:47:01 +0000

DevOps teams spend a shocking amount of time on tasks that follow predictable patterns. Review this PR. Triage this alert. Summarize this deploy. Check these dependency updates. The patterns are consistent enough for AI agents to handle, but most teams have not automated them because wiring up the integrations is tedious.

Here are five workflows we see DevOps teams build in Swrly within their first week. Each one replaces hours of toil per week with a workflow that runs in minutes.

1. PR Review Pipeline

What it does. When a pull request is opened, an agent reads the diff, checks for common issues (security patterns, missing tests, style violations, performance concerns), and posts a structured review comment on the PR. A second agent summarizes the changes for the team lead in Slack.

Nodes you would use:

Trigger node -- webhook trigger, fired by GitHub's pull_request.opened event
Agent node ("Code Reviewer") -- reads the diff via GitHub MCP tools, produces a structured review
Condition node -- branches on severity: if critical issues found, route to the notification path
Integration node -- posts the review as a PR comment via github_create_review
Agent node ("PR Summarizer") -- generates a plain-language summary
Integration node -- sends the summary to a Slack channel via slack_post_message

Why it matters. Code review is a bottleneck on every team. This does not replace human review -- it augments it. The AI catches the mechanical stuff (unused imports, missing error handling, hardcoded secrets) so human reviewers can focus on architecture and logic. We have a template for this one: "Ship It -- PR Review Pipeline."

2. Incident Triage

What it does. When PagerDuty or Datadog fires an alert, an agent pulls recent logs, checks for similar past incidents, and posts a triage summary with recommended next steps. If the severity is high enough, it pages the on-call engineer with context already assembled.

Nodes you would use:

Trigger node -- webhook trigger from PagerDuty or Datadog
Agent node ("Incident Analyst") -- fetches logs via the Datadog or Sentry MCP tools, analyzes the error pattern
Agent node ("History Checker") -- searches past incident records for similar patterns
Condition node -- routes based on severity level
Integration node -- posts triage summary to the incidents Slack channel
Integration node -- if critical, creates a Linear issue and pages the on-call via PagerDuty

Why it matters. The first 10 minutes of an incident are usually spent gathering context. An agent can do that gathering in 30 seconds. By the time the on-call engineer opens their laptop, they have a summary of what is failing, what changed recently, and whether this has happened before. Mean time to resolution drops significantly when you eliminate the "what is even happening" phase.

3. Deploy Notification Digest

What it does. After a deployment completes, an agent collects all the commits included in the release, groups them by category (features, fixes, chores), generates a human-readable changelog, and posts it to Slack and optionally updates a Notion page.

Nodes you would use:

Trigger node -- API trigger called from your deploy script or GitHub Action
Agent node ("Changelog Writer") -- uses github_list_commits and github_compare_commits to gather the diff between the previous and current release tags
Integration node -- posts the formatted changelog to Slack via slack_post_message
Integration node -- updates a Notion database with the release record via notion_create_page

Why it matters. Nobody writes good release notes manually. They either skip it entirely or write something vague like "various fixes and improvements." An agent that reads the actual commits and PR descriptions produces genuinely useful changelogs. Product managers and support teams actually know what shipped.

4. Log Anomaly Detection

What it does. On a cron schedule, an agent queries your logging system for the past hour, identifies patterns that deviate from normal (error rate spikes, new error types, unusual request patterns), and surfaces anything worth investigating.

Nodes you would use:

Trigger node -- cron trigger, runs every hour
Agent node ("Log Analyst") -- queries Datadog or your logging endpoint via HTTP tools, compares current patterns against baseline
Condition node -- if anomalies detected, route to notification path; otherwise, log a clean summary and exit
Integration node -- posts anomaly report to the monitoring Slack channel
Integration node -- creates a Linear issue for anything that needs investigation

Why it matters. Most monitoring tools are good at threshold-based alerts. Error rate above 5 percent, latency above 500ms. They are less good at detecting subtle pattern shifts -- a new error type appearing at low volume, a gradual increase in a specific endpoint's latency, or a sudden drop in traffic from one region. An AI agent can spot these patterns because it reads logs the way a human would, just faster and more consistently.

5. Dependency Update Reviewer

What it does. When Dependabot or Renovate opens a PR with dependency updates, an agent reviews the changelogs of the updated packages, checks for breaking changes, evaluates whether your codebase uses any affected APIs, and posts a risk assessment on the PR.

Nodes you would use:

Trigger node -- webhook trigger on PRs from the bot user
Agent node ("Dependency Analyst") -- reads the PR diff to identify which packages are being updated and to which versions
Agent node ("Changelog Reader") -- fetches the changelogs and release notes for each updated package via HTTP tools
Condition node -- routes based on risk level: major version bumps get extra scrutiny
Integration node -- posts a risk assessment comment on the PR via github_create_review
Approval node -- for high-risk updates, pauses the workflow until a human approves

Why it matters. Dependency updates are one of those tasks that everyone knows they should do but nobody wants to do. The reason is that evaluating whether an update is safe takes real effort -- reading changelogs, checking for breaking changes, testing. An agent cannot run your test suite, but it can do the research step and tell you "this major version bump removes an API you use in 3 files" before you waste time on a merge that will break the build.

Getting Started

Each of these workflows takes 15 to 30 minutes to build in Swrly's visual builder. The PR review pipeline and incident triage workflows are the most common starting points because they deliver immediate, visible value.

The key insight is that none of these workflows replace human judgment. They handle the mechanical parts -- gathering context, reading diffs, checking changelogs, formatting summaries -- so that humans can make better decisions faster.

If you are running a DevOps team and you are not automating at least the PR review and incident triage workflows, you are leaving hours on the table every week. Start with one, see the results, and expand from there.

How Swrly Works: Agent Teams vs Anonymous Chains

Swrly — Fri, 10 Apr 2026 16:46:23 +0000

If you have worked with LLM frameworks like LangChain, CrewAI, or AutoGen, you have probably experienced this: your "agents" are really just sequential function calls with shared state. They do not have names. They do not have specialties. They are anonymous steps in a chain, and when something goes wrong, good luck figuring out which step failed and why.

Swrly takes a fundamentally different approach.

The Problem with Anonymous Chains

Most agent frameworks model workflows as linear pipelines. You define a series of steps, each one receiving the output of the previous one. The framework handles the plumbing. It looks clean in a tutorial, but in production it creates real problems:

No role clarity. Every step is interchangeable. There is no concept of "this agent is a code reviewer" versus "this agent is a project manager." They are all just prompt-plus-tool combinations in a list.
Debugging is painful. When a 7-step chain produces bad output, you have to trace through every step to find where things went wrong. There are no names, no boundaries, no clear ownership.
Context bleed. Because agents share a single context window, earlier steps can pollute later ones. A verbose data-fetching step consumes tokens that your reasoning step needs.
No conditional logic. Linear chains are linear. If you need branching — "if the review passes, deploy; if it fails, notify" — you are writing custom Python glue code.

This is not orchestration. It is scripting with extra steps.

Swrly's Approach: Named Specialist Agents

In Swrly, every agent in a workflow is a named specialist with a clear identity:

A distinct role — "Code Reviewer," "PR Summarizer," "Deployment Manager"
A tailored system prompt — instructions specific to that agent's job
Its own tool access — each agent gets only the integrations it needs
Configurable behavior — max turns, context accumulation, output format

When you look at a Swrly workflow, you can immediately understand what each agent does and why it is there. This is not a cosmetic difference. It changes how you design, debug, and maintain your automations.

The Visual Builder

Swrly workflows are built on a drag-and-drop canvas, not in YAML files or Python scripts. The builder gives you a spatial view of your entire workflow — which agents connect to which, where conditions branch, and how data flows through the system.

You add nodes by dragging them from the sidebar or using the command palette (Cmd+K). You connect them by drawing edges between ports. You configure each node by clicking it and editing its properties in the side panel.

This is not a toy. The builder supports:

Undo/redo with full state history
Auto-layout for organizing complex workflows
Viewport persistence so your canvas position is saved between sessions
Resizable configuration panels for comfortable editing
Keyboard shortcuts and full accessibility support

Everything you build visually maps directly to the execution model. What you see is what runs.

Six Node Types

Swrly workflows are composed of six node types, each serving a distinct purpose:

Agent nodes are the core. Each one wraps a Claude-powered agent with a system prompt, tool access, and configurable parameters like maxTurns and accumulateContext.

Integration nodes connect to external services without needing an agent. Send a Slack message, create a Jira ticket, or fetch data from an API — directly, with no LLM call required.

Condition nodes enable branching logic. Define rules like "if the previous output contains APPROVED" or "if the sentiment score is above 0.8," and the workflow splits into separate paths. Six operators are supported: equals, not_equals, contains, not_contains, greater_than, and less_than.

Trigger nodes start workflows automatically. Configure a webhook URL, a cron schedule, or an event listener, and the workflow runs without manual intervention.

Loop nodes repeat a section of the workflow over a list of items — process each file in a PR, each row in a spreadsheet, or each message in a queue.

Approval nodes pause execution and wait for a human to review and approve before continuing. Critical for workflows where you want a human in the loop before deployment or notification.

37 Integrations, 318 MCP Tools

Swrly agents do not just talk to each other. They interact with the tools your team already uses. The platform ships with 37 integrations and 318 MCP tools spanning:

Development: GitHub, GitLab, Bitbucket, Vercel, AWS
Project management: Jira, Linear, Asana, Notion, Airtable
Communication: Slack, Discord, Twilio, Microsoft Teams
Monitoring: PagerDuty, Sentry, Datadog, PostHog
AI/ML: OpenAI, Hugging Face
Productivity: Google Workspace, Confluence, Trello

Each integration exposes specific tools — github_create_pr, slack_send_message, linear_create_issue — that agents can call during execution. You configure which tools each agent has access to, keeping the principle of least privilege.

BYOK: Your Claude Subscription, Not Ours

Here is something that matters a lot in production: Swrly does not charge you for AI usage.

Most platforms charge per-token or per-credit on top of the platform fee. With Swrly, you bring your own Claude Code subscription. You pay Anthropic directly for your AI usage ($20/month for Claude Pro). Swrly charges separately for orchestration features ($0-$99/month depending on your plan).

You paste your Claude Code session token in Settings. Swrly encrypts it with AES-256-GCM, stores it securely, and uses it ephemerally during execution. Your key is never logged, never shared, and never used outside your workflow runs.

Two separate, predictable bills. No surprises.

Real-Time Observability

When a workflow runs, you do not stare at a loading spinner. Swrly provides a real-time execution overlay directly on the canvas:

Each node lights up as it executes — pending, running, completed, or failed
Server-Sent Events (SSE) stream status updates with no polling
Click any completed node to see its full output, token usage, and execution time
The run history panel shows every past execution with timestamps and status

This is not a log file you parse after the fact. It is a live view of your workflow doing its job.

Example: A 3-Agent PR Review Workflow

Here is a concrete workflow you can build in Swrly in about 10 minutes:

Webhook Trigger — listens for pull_request.opened events from GitHub
Code Reviewer Agent — receives the PR diff, analyzes code quality, security issues, and style violations. Uses github_get_content and github_list_pr_files tools.
Condition Node — checks if the reviewer's output contains "APPROVED"
Slack Notification (approved branch) — posts to #shipped: "PR #42 approved by Swrly"
Slack Notification (revision branch) — posts to #reviews: "PR #42 needs changes" with the reviewer's feedback

The Code Reviewer agent's system prompt might look like this:

You are a senior code reviewer. Analyze the PR diff for:
- Security vulnerabilities
- Performance issues
- Code style violations
- Missing error handling

End your review with either APPROVED or NEEDS_REVISION followed by
a summary of your findings.

No Python. No YAML. No deployment scripts. You build it visually, test it with a real PR, and it runs.

Who Is This For?

Swrly is built for engineering teams that want to automate complex, multi-step workflows involving AI reasoning. If you are:

Manually reviewing PRs and posting summaries to Slack
Writing one-off scripts to triage bugs across Jira and GitHub
Building internal tools that need LLM reasoning in the middle
Running AI-powered QA pipelines that need human approval gates

Then Swrly replaces the glue code, the cron jobs, and the fragile scripts with a visual, observable, maintainable system.

Start Building

Swrly is free to start. No credit card required. Sign up at swrly.com, create your first swirl, and see what named specialist agents can do for your team.

BYOK Isn't Just About Cost — It's About Control

Swrly — Fri, 10 Apr 2026 16:46:20 +0000

When most platforms talk about Bring Your Own Key, they frame it as a cost optimization. Use your own API keys, pay the model providers directly, avoid the markup. That is a real benefit. But it is not the most important one.

The real reason BYOK matters is control. Control over your credentials. Control over your data flow. Control over your audit trail. For any team that cares about security or compliance, BYOK is not a nice-to-have pricing feature. It is a fundamental architecture decision.

The Key Problem

Here is the default model for most AI platforms: you sign up, the platform gives you access to AI models through their API keys. Your prompts and data flow through the platform's credentials. The platform handles billing, rate limiting, and access control on your behalf.

This is convenient. It is also a security liability.

When you use a platform's shared API keys, your data is processed under their account with their model provider. You are trusting the platform to not log your prompts, to not commingle your usage with other customers, and to handle key rotation, access controls, and incident response for credentials that touch your data.

For a side project, this is fine. For a company handling customer data, financial information, or anything regulated, it introduces risk that is difficult to quantify and impossible to audit from the outside.

You cannot verify what happens to your data once it leaves your application and enters the platform's credential scope. You cannot enforce your own retention policies on the model provider's side because the API key is not yours. And if the platform's credentials are compromised, every customer is affected, not just the one whose data was targeted.

What BYOK Actually Means

BYOK means your API keys, your account, your control surface. When an AI agent runs a task on your behalf, it authenticates with the model provider using credentials that belong to you. The platform orchestrates the workflow, but the data flows through your billing relationship with the model provider.

This is a meaningful architectural difference, not just a billing redirect.

With BYOK, your usage appears in your own model provider dashboard. You can see exactly which models were called, how many tokens were consumed, and when. You can set your own rate limits, spending caps, and alerting thresholds directly with the provider. If you need to revoke access, you rotate your own key. The platform never had it permanently to begin with.

At Swrly, BYOK tokens are encrypted at rest with AES-256-GCM and stored in ephemeral Redis keys that are consumed atomically via GETDEL. The platform never stores your key persistently. It exists in memory only for the duration of the workflow execution, encrypted in transit between services. When the run completes, the key is gone.

Security Implications

BYOK changes your threat model in three concrete ways.

Reduced blast radius. If the orchestration platform is compromised, attackers do not get a master key that accesses every customer's model provider account. Each customer's credentials are isolated, ephemeral, and encrypted. Compromising one customer's key does not expose anyone else.

Auditable data flow. When your agent calls Claude or GPT-4 using your API key, that call appears in your Anthropic or OpenAI dashboard. You have a first-party record of what data was sent to the model, when, and by which key. This is not a log provided by the orchestration platform that you have to trust. It is a log from the model provider that you can verify independently.

Key lifecycle control. You decide when to rotate keys, how to scope them, and what permissions they carry. If your security policy requires monthly key rotation, you rotate on your schedule without depending on the platform's rotation cadence. If you need to restrict a key to specific models or set spending limits, you do it in your provider's dashboard.

These are not theoretical benefits. They are the kind of concrete controls that security teams ask about during vendor reviews and that auditors check during compliance assessments.

Compliance Made Simpler

SOC2 Type II requires demonstrating that you control access to sensitive systems and that you can produce evidence of that control over time. AI model API keys are sensitive credentials. They provide access to systems that process your data.

When you use a platform's shared keys, the compliance burden for those credentials falls on the platform. You have to trust their controls, review their SOC2 report, and hope their key management practices meet your auditor's expectations. You are adding a dependency to your compliance program that you cannot directly verify.

With BYOK, the credentials are yours. You manage them in your own secrets manager. You rotate them according to your own policy. You can produce evidence of key creation, rotation, and revocation from your own systems. The orchestration platform is just a workflow engine. It does not hold the keys to your kingdom.

This simplification matters for GDPR as well. When your data is processed using your own API credentials, the data processing relationship is between you and the model provider. The orchestration platform is a processor, not a sub-processor that independently accesses AI services on your behalf. This is a cleaner data flow to document and a simpler relationship to audit.

What Swrly Does Differently

Most platforms that claim BYOK support simply store your API key in their database and use it on your behalf. That is barely better than using their key. Your credential is still persisted on their infrastructure, still accessible to their application code, still at risk if their database is compromised.

Swrly treats BYOK as a security primitive, not a feature checkbox. Keys are encrypted with AES-256-GCM before they touch any storage layer. They are stored in ephemeral Redis keys with automatic expiration. The GETDEL operation ensures that a key can only be consumed once. The encryption key supports rotation via a previous-key fallback mechanism, so key rotation does not require re-encrypting all stored credentials simultaneously.

At the platform level, Swrly does not have a "master" API key for any model provider. There is no shared credential that processes customer data. Every model call uses the customer's own key. This is not a configuration option. It is the only mode of operation.

This architecture means that Swrly's infrastructure, if fully compromised, does not expose a single model provider credential in plaintext. The encrypted ephemeral tokens are useless without the encryption key, and even with it, the GETDEL semantics mean most tokens no longer exist by the time anyone could attempt to decrypt them.

BYOK is usually the last item on a feature comparison spreadsheet, positioned as a cost optimization for high-volume users. It should be the first question you ask any AI platform that will process your data. Not "do you support BYOK" but "is BYOK the only way your platform operates." The answer tells you everything about how seriously they take your security.

The Hidden Cost of AI Agent Sprawl

Swrly — Fri, 10 Apr 2026 16:45:43 +0000

It starts innocently. Someone on the team writes a script that uses Claude to summarize Slack threads. It runs on a cron job. It works. A week later, someone else writes a Lambda function that reviews PRs with GPT-4. Then another developer builds a Jupyter notebook that generates weekly reports from your analytics data and emails them out.

Six months later, you have 15 AI-powered scripts running across 4 different environments, using 3 different model providers, with no shared configuration, no centralized logs, and no one person who knows what all of them do.

This is agent sprawl. And it is happening at every company that has adopted AI tooling without a plan for managing it.

How Sprawl Happens

Agent sprawl follows the same pattern as the microservice sprawl of the 2010s, just faster. AI makes it trivially easy to build useful automations. A developer can go from idea to working prototype in an afternoon. The barrier to creating a new agent is so low that nobody thinks to check whether a similar one already exists.

The sprawl accelerates because there is no natural pressure to consolidate. Each script works fine in isolation. The developer who built it knows how it works. It runs on their preferred platform. It uses whatever model they are most comfortable with. From any individual perspective, there is no problem to solve.

The problems are systemic, and they only become visible when you zoom out.

The Symptoms

Nobody knows what is running. Ask your team lead to list every AI-powered automation currently active in your organization. They cannot do it. Some are cron jobs on EC2 instances. Some are GitHub Actions. Some are Slack apps running on someone's side project Heroku account. There is no inventory because there was never a reason to build one, until something breaks and you need to find it.

Failures are silent. A cron job that summarizes support tickets stops working because the model API changed its response format. The script throws an error, the cron job silently fails, and nobody notices for two weeks because the output was going to a Slack channel that three people check. There is no alerting because the script was never set up with monitoring. It was a quick hack that became permanent.

Costs are invisible. Each individual script costs a few dollars a month in API calls. But when you have 15 of them, some running more frequently than intended, some retrying on errors without backoff, some sending the same data to the model multiple times because of bugs, the aggregate cost creeps up. We have talked to teams spending $800 per month on scattered AI API calls that they could not account for because the billing was spread across personal API keys, team accounts, and company credit cards.

Security and compliance are an afterthought. Each script has its own API keys, stored in environment variables, dotfiles, or sometimes hardcoded. There is no rotation policy. There is no audit trail of what data each agent accesses. When your security team asks which AI tools have access to customer data, the honest answer is "we do not know." For any team pursuing SOC2 or handling regulated data, this is disqualifying.

The Cost Nobody Tracks

The biggest cost of agent sprawl is not the API bills. It is the organizational overhead.

When a new team member joins, they have to discover the existing automations through tribal knowledge. When something breaks, debugging requires tracking down the original author and hoping they remember how the script works. When you want to extend an existing automation, it is often easier to build a new one from scratch than to find and modify the original.

This is the same technical debt pattern that engineering teams have fought for decades, just applied to AI. And just like with microservices, the answer is not to stop building things. It is to build them in a way that is manageable.

What Orchestration Actually Means

Orchestration does not mean "make everything more complex." It means giving your AI automations the same infrastructure discipline you give your application code. Specifically, it means four things.

A single place to see what is running. Every agent, every workflow, every scheduled execution, visible in one dashboard. Not spread across AWS Lambda, Heroku, cron tabs, and GitHub Actions.

Structured execution with logs. Every run produces a trace with inputs, outputs, durations, and token costs per step. When something fails, you know what failed, when, and why. When something succeeds, you have a record of what it did.

Centralized credential management. API keys stored once, encrypted, with access controls. Not scattered across environment variables in 15 different runtime environments. Rotation happens in one place. Audit trails come for free.

Defined ownership and boundaries. Each workflow belongs to a workspace. Each agent has a named role. When someone asks "what AI tools access our customer database," you can answer with a query instead of a scavenger hunt.

A Better Pattern

The alternative to sprawl is not "build fewer things." It is "build things in a place where they can be found, observed, and managed."

When a developer on your team has an idea for a new AI automation, the workflow should be: open the orchestration platform, check if a similar agent already exists, build or extend the workflow in the visual builder, and deploy it with the same observability and credential management as everything else.

The individual agents stay simple. The developer still gets to move fast. But the result is a workflow that the whole team can see, debug, and maintain instead of a script on someone's laptop that everyone forgot about until it stopped working.

Agent sprawl is not a technology problem. It is an organizational one. And like most organizational problems in engineering, the fix is better tooling and clearer defaults, not more process documents that nobody reads.

Why AI Agents Need Workflows, Not Just Prompts

Swrly — Fri, 10 Apr 2026 16:45:34 +0000

You build an agent. You give it a system prompt, a few tools, and access to an API. It works. You show the demo. Everyone is impressed. Then you try to put it in production and everything falls apart.

This is the story of almost every team that has tried to ship AI agents in the last two years. The single-agent, single-prompt pattern is a great starting point. It is a terrible architecture for anything that needs to be reliable.

The Single-Agent Ceiling

A single-prompt agent is essentially a loop: take input, think, call tools, return output. Frameworks like LangChain and CrewAI make this loop easy to set up. You can get a working prototype in an afternoon. The problem is that prototypes and production systems have fundamentally different requirements.

A prototype needs to work once. A production system needs to work every time, fail gracefully when it cannot, tell you what happened either way, and do all of this at scale without anyone babysitting it.

Single-prompt agents cannot meet these requirements because they operate as monoliths. One prompt does everything: reasoning, tool selection, error handling, output formatting. When the task gets complex, you end up stuffing more instructions into the prompt, which makes the agent less reliable, not more. You are fighting the context window instead of designing for the problem.

What Breaks in Production

Here are the concrete failure modes we see when teams try to run single-agent architectures in production.

No retry boundaries. When a single agent fails at step 4 of a 7-step task, you have two options: restart from scratch or accept the failure. There is no way to retry just the step that failed because there are no steps. It is one big prompt execution. For a workflow that takes 3 minutes and costs $0.50 in tokens per run, restarting from scratch on every transient error gets expensive fast.

Context window pollution. An agent that fetches data, analyzes it, makes a decision, and takes action is doing four distinct jobs in one context window. The raw data from the fetch step consumes tokens that the analysis step needs. The analysis output stays in context when you only need the decision. By the time you get to the action step, you are working with a bloated context that makes the agent slower and less focused.

No observability. When a single agent produces bad output, you get one blob of text and a list of tool calls. Which reasoning step went wrong? Was it the data gathering? The analysis? The decision logic? You have to read through the entire trace manually and hope you can spot where the reasoning diverged. For a team running hundreds of agent executions per day, this does not scale.

No branching logic. Real workflows have conditions. If the code review passes, merge the PR. If it fails, post a comment and notify the author. If the CI is still running, wait and check again. A single-prompt agent handles this through in-context reasoning, which means the branching logic is implicit, untestable, and invisible. You cannot look at a system diagram and understand the flow because there is no diagram. There is just a prompt.

No handoffs. Different tasks require different expertise. A data analysis task needs different tools, different context, and a different system prompt than a code generation task. When one agent does both, it is mediocre at each. When you split them into separate agents, you need a way to pass context between them. A single-agent architecture has no concept of handoffs.

Why Workflows Change the Equation

A directed workflow addresses each of these failure modes by making the implicit structure of your agent's task explicit.

Instead of one agent doing everything, you have multiple specialist agents connected by edges in a graph. Each agent has a clear role, a focused prompt, and access to only the tools it needs. The workflow engine handles routing, retries, and context passing between them.

This is not a new idea. Software engineering figured this out decades ago with microservices, Unix pipes, and CI/CD pipelines. The principle is the same: small, focused units of work composed into larger systems. The composition layer gives you the observability, retry logic, and branching that the individual units do not need to know about.

With a workflow-based architecture, you get retry boundaries at every node. If the data fetch agent fails, retry it without re-running the analysis. You get isolated context windows. Each agent sees only what it needs, not the entire conversation history. You get visual observability. Every node in the graph has a status, a duration, an input, and an output. You can see exactly where things went wrong. You get explicit branching. Condition nodes route execution based on upstream outputs, and the branches are visible in the graph. And you get composable handoffs. Agent A's output becomes Agent B's input through a well-defined interface.

What This Looks Like in Practice

Consider a PR review workflow. Without orchestration, you might write a single agent with a prompt like: "Review this PR. Check the code quality, verify the tests pass, check for security issues, and either approve it or request changes."

That works for simple PRs. But for a real codebase, you want separate concerns. A code quality reviewer that focuses on style, patterns, and maintainability. A security scanner that checks for known vulnerability patterns. A test coverage analyzer that verifies the changed code is tested. And a decision agent that takes all three reviews and decides whether to approve, request changes, or flag for human review.

In Swrly, this is a four-agent workflow with a join node. The three reviewers run in parallel. The join node waits for all three to complete. The decision agent receives their combined output and makes the final call. A condition node routes the result: approve the PR, post review comments, or send a Slack message to the team lead.

Each agent is focused. Each agent can be tested independently. Each agent can be swapped out without touching the others. And when the security scanner takes too long, you can see it immediately in the execution overlay instead of wondering why the whole thing is slow.

The Composability Unlock

The real power of workflows is not just reliability. It is composability.

Once you have a library of well-defined agents, you can compose them into new workflows without writing new code. Your PR review agents can be reused in a deployment pipeline. Your data analysis agent can feed into a reporting workflow or a monitoring alert. Your Slack notification agent works in every workflow that needs to notify someone.

This is the same unlock that microservices gave to backend engineering and that CI/CD pipelines gave to DevOps. Small, reusable units composed through a declarative graph. The individual pieces stay simple. The composition layer handles the complexity.

Single-prompt agents will always have a place for simple, one-shot tasks. But the moment you need reliability, observability, or composition, you need workflows. The ceiling is real, and the only way through it is to stop pretending that one prompt can do everything.