DEV Community: Rohith Singh

The missing layer between you and your AI agent

Rohith Singh — Tue, 21 Apr 2026 20:43:31 +0000

Three agents, four terminal windows, no idea which one's waiting on me. That was my setup three weeks ago.

I run Claude Code, Codex, and OpenCode depending on what I'm building. Each one behaves differently and after a while you just reach for the right one instinctively, Claude Code for deep context across files, Codex for something faster and surgical, OpenCode when I want to experiment with open source models on the same problem.

But the agent is only part of the setup. There's everything around it too, where it runs, how you know when it needs you, how you review what it did, how you manage three of them without losing track. Most people are running powerful agents inside a terminal that hasn't changed in decades. You kick off a task, come back 15 minutes later, it's been waiting on a permission prompt. Two agents going at once and you're hoping you remember which window is which.

That's the gap. I've been using Warp as that environment, specifically as an ADE, an Agentic Development Environment. A terminal runs commands. An ADE is built around the assumption that an agent is doing most of the work and you're steering it.

(For context: over 1M Claude Code and Codex sessions have already been run inside Warp, across 700K+ monthly active developers.)

Warp doesn't replace Claude Code, Codex, or OpenCode. I still run all three. It's the environment they run inside, nothing about your existing agent workflow changes.

TL;DR

If you're already running CLI agents daily and just want the highlights, here's what changes:

What	Why it matters
Vertical tabs + metadata	See all running agents and their state instantly, no tab-hunting
Tab Configs (`.toml`)	Save your entire multi-agent workspace, reopen it with one click
Notifications	Agents ping you when they need attention, stop watching the terminal
Code review panel	Review diffs and send inline corrections directly to the running agent
Rich input (`Ctrl+G`)	Multiline prompts, `@file` references, image attachments, voice input
Oz cloud agents	Agents that run in the background without your machine open
Warp Drive	Workflows, Rules, MCP servers, Prompts, synced across your team
Session sharing	Share a link from the tab menu, anyone can monitor or steer the agent

Setting up

Download Warp from warp.dev/download. Runs natively on Windows, Mac, and Linux. On Windows you need Windows 10+ and Git for Windows, no WSL required.

Make sure your agents are installed:

# Claude Code, requires a paid Claude plan or API credits
npm install -g @anthropic-ai/claude-code

# OpenCode
npm install -g opencode-ai

# Codex
npm install -g @openai/codex

Verify them:

claude --version
opencode --version
codex --version

Codebase indexing + AGENTS.md

Before running any agent, let Warp index your project first.

Navigate to your project directory and start an agent session. On first run a dialog pops up asking if you want to index the codebase, confirm it. You can track the status anytime under Settings → AI → Codebase Indexing, which shows "Synced" once it's done.

After indexing, Warp generates an AGENTS.md file at your project root, persistent context for every agent session. Your stack, conventions, commands, things the agent should know before touching anything. The filename must be all caps for Warp to pick it up automatically. If you already have a CLAUDE.md in your project, it carries over as-is.

Vertical tabs + Tab Configs

Enable vertical tabs from Settings → Appearance → Tabs → Use vertical tab layout.

New tab: Ctrl+Shift+T. Once vertical tabs are on, your sessions live in a sidebar instead of stacked at the top, showing which agent is running, which branch you're on, working directory, and whether the agent is active, waiting for input, or idle. All visible without clicking into anything.

Tab Configs are how you save this setup permanently. Each config is a .toml file stored in ~/.warp/tab_configs/. To create one: click the + button in the tab bar → New tab config, or hit Ctrl+Alt+Shift+T. Warp creates the file and opens it for editing.

The default looks like this:

# Warp Tab Config
# Stored in ~/.warp/tab_configs/, rename this file and edit anytime!
name = "My Tab Config"

[[panes]]
id = "main"
type = "terminal"
# directory = "~/code/my-project"
commands = []

Here's what mine actually looks like for a multi-agent setup:

name = "Agent Workspace"
color = "blue"

[[panes]]
id = "root"
split = "horizontal"
children = ["claude", "opencode"]

[[panes]]
id = "claude"
type = "terminal"
directory = "C:\\\\Users\\\\rohit\\\\projects\\\\content-machine"
commands = ["claude"]
is_focused = true

[[panes]]
id = "opencode"
type = "terminal"
directory = "C:\\\\Users\\\\rohit\\\\projects\\\\content-machine"
commands = ["opencode"]

This opens two panes side by side, both pointing at the same project, each launching their agent automatically. One click and the entire workspace is live.

Pane type can be "terminal", "agent" for Warp's Agent Mode, or "cloud" for a cloud pane with no local shell. You can also parameterize configs with {{branch_name}} syntax, Warp prompts you to fill in values when the tab opens, which pairs well with git worktrees. If you'd rather not write the TOML manually, type /skills in Agent Mode → update-tab-config and describe what you want in plain English. Oz writes it for you.

Running parallel agents

With Tab Configs set up, running multiple agents in parallel is just opening the right tabs.

The workflow I use: give each agent a different task on an isolated git worktree so they're not touching the same files simultaneously.

# Tab 1, Claude Code on feature work
cd ~/projects/myapp
claude
# "Refactor src/auth/ to use async/await instead of callbacks"

# Tab 2, Codex writing tests in parallel
cd ~/projects/myapp-worktree
codex
# "Write comprehensive tests for src/auth/ covering edge cases"

Create one Tab Config that opens both side by side, each pointing at different worktrees. Next time you sit down, one click and both agents are already running.

Notifications

Before this I had two modes: watch the terminal, or walk away and miss things. Warp fixes this with a unified notification center. Setup is one-time per agent:

Claude Code, run claude inside Warp and a notification chip appears at the bottom. One click installs the plugin. Manual install: github.com/warpdotdev/claude-code-warp
Codex, Warp sets it up automatically the first time you run it. Nothing manual required
OpenCode, add the plugin from github.com/warpdotdev/opencode-warp to your opencode.json

Worth noting: notifications currently work for Claude Code, Codex, and OpenCode. Gemini CLI and a few other supported harnesses don't have notification support yet.

When you have multiple agents running, the vertical tab sidebar shows an attention-needed indicator when any session needs input, click it to jump directly there.

Code review

Agents get you to about 80% on most tasks. The rest is you, reviewing what it did, catching what it got wrong, redirecting it.

Warp has a code review panel built in. After the agent makes changes, open it from the top right toggle (enable Show code review button in Settings → Features if you don't see it). Inline diff per file. Leave a comment on any line and it sends directly to the running agent session. The agent picks it up and iterates, no starting over, no copy-pasting feedback.

Rich input + agent toolbar

Ctrl+G opens Warp's rich input editor for any agent session, a proper text editor, not a terminal prompt. Multiline prompts, @filename to attach file context inline, image attachments, /prompts for saved templates. Ctrl+Shift+Space attaches selected blocks or text directly as agent context without retyping anything.

Every CLI agent session gets a toolbar at the top of the pane, same one regardless of whether you're running Claude Code, Codex, or OpenCode. From there you can toggle the rich input editor, attach images, open the file explorer, open code review, and enable voice transcription. All one click, nothing to switch away from.

The image attachment is particularly useful for UI debugging, drop a screenshot of what's broken directly into the session instead of describing it in text.

Oz: agents that run without you

Local agents run on your machine, in your terminal. Oz adds a second layer, agents that run in Warp's cloud infrastructure, in the background, without your machine needing to be open.

Zach Lloyd wrote about how to think about deploying coding agents at scale, worth reading if you're thinking beyond local sessions. The short version: Oz is the answer for teams who don't want to build their own orchestration layer.

The practical difference:

Parallel work without parallel machines, kick off a bug fix, a refactor, and a PR review simultaneously
Remote steering, start a task, close your laptop, check in from your phone or browser later. Share a session link from the tab's option menu → Share Session, and anyone with the link can monitor or steer the agent
Triggers, connect Oz to GitHub, Linear, or Slack so agents fire automatically when issues are filed or PRs are opened

Cloud agents share the same context as local ones, same AGENTS.md, same codebase index, same Warp Drive rules. An agent you start locally and hand off to the cloud picks up exactly where it left off.

Warp Drive: shared context for everything

Warp Drive is a side panel (Ctrl+Shift+\\) where you store and sync Workflows, Prompts, Notebooks, Rules, and Environment Variables, personally or across a team.

Three things I actually use daily:

Workflows are saved commands with parameterized arguments, a deploy command, a test runner with flags, a git flow. Searchable, shareable, fillable with Shift+Tab. Your team can share them so everyone runs the same patterns.

Rules are persistent context for agents. Global rules apply to every session. Project rules live in AGENTS.md or WARP.md and apply automatically when you're in that project. A rule like "always use async/await, never callbacks" means you stop repeating it in every prompt.

MCP Servers connect your agents to external tools, Linear for tickets, GitHub for PRs, Figma, Sentry, Slack. Add them under Warp Drive → MCP Servers → + Add. Once running, your agent can read, write, and reason about those systems without leaving the terminal. MCP configs are shareable across a team, onboarding becomes handing someone a Warp invite.

My actual setup

Three vertical tabs, Claude Code on the dev branch, OpenCode on a worktree for whatever parallel task I've got going, plain shell for git and the dev server. AGENTS.md covers the stack and conventions. A handful of Rules for things I was repeating in every prompt. GitHub and Linear MCP servers so the agent can actually look at issues and PRs without me copy-pasting.

The Tab Config TOML is maybe 15 lines. The AGENTS.md took a few sessions to refine. The whole thing took about an hour to set up properly.

What I notice day to day isn't any single feature. It's that I stopped context-switching. The agent runs, I do something else, a notification tells me when it needs me, I leave a code review comment, it keeps going. That loop, which used to require four windows and a lot of manual attention, now just happens.

The agents didn't change. The environment did. Turns out that was most of the problem. Warp is the agentic development environment born out of the terminal. Download Warp for free today at → https://go.warp.dev/rohittdevtohoa

Kimi K2 Thinking vs. Claude 4.5 Sonnet vs. GPT-5.1 Codex: Tested the best models for agentic coding

Rohith Singh — Fri, 14 Nov 2025 13:20:00 +0000

Three new AI coding models dropped in the past two months. Claude Sonnet 4.5 with extended thinking on September 29. GPT-5 Codex with unified reasoning on September 23. Kimi K2 Thinking with 1T parameters on November 6-7. All three claim to handle complex coding tasks better than anything before them.

The benchmarks say they're close. I wanted to see what that means for actual development work. So I gave all three the same prompts for two hard problems in my observability platform: statistical anomaly detection and distributed alert deduplication. Same codebase, same requirements, same IDE setup.

Full code's on github.com/rohittcodes/tracer if you want to dig in. Fair warning: it's an evaluation harness I built for this, not a polished product. Expect rough edges.

TL;DR

Test 1 - Advanced Anomaly Detection: GPT-5 Codex was the only one that shipped working code. Claude and Kimi both had critical bugs that would crash in production.

Test 2 - Distributed Alert Deduplication: Codex won again with actual integration. Claude had solid architecture, but didn't wire it up. Kimi had clever ideas but a broken duplicate-detection logic.

The kicker: Codex cost me $0.95 total vs Claude's $1.68. That's 43% cheaper for code that actually works.

The Official Benchmarks (For What They're Worth)

Model	SWE-bench Verified	GPQA Diamond	Context Window	Released
Claude Sonnet 4.5	77.2% (82.0% parallel)	83.4%	200K	Sept 29, 2025
GPT-5 Codex	74.5%	89.4%	400K (128K out)	Sept 15, 2025
Kimi K2 Thinking	71.3%	-	256K	Nov 6-7, 2025

Pricing:

Claude: $3/M input, $15/M output
GPT-5: $1.25/M input, $10/M output
Kimi: $0.60/M input, $2.50/M output

How I Tested This

I gave all three models identical prompts for two hard problems in an observability platform: statistical anomaly detection and distributed alert deduplication. Not toy problems, the kind of stuff that needs deep reasoning about edge cases and system architecture.

I set up everything in Cursor IDE, and tracked token usage, time, code quality, and whether it actually integrated with the existing codebase. That last part turned out to matter way more than I expected.

Quick note on the tooling: Codex CLI has gotten way better since I last used it. Streams reasoning, resumes sessions reliably, and shows you cached token usage. Claude Code is still the most polished, with inline critiques, replayable steps, and clean thinking traces. Kimi CLI feels early. No easy way to see the model's reasoning, context fills up faster, and cost tracking is basically non-existent (just a dashboard number). Made iteration painful.

Test 1: Statistical Anomaly Detection

The challenge: Build a system that learns baseline error rates, uses z-scores and moving averages, catches rate-of-change spikes, and handles 100k+ logs/minute with under 10ms latency.

Claude's Attempt

Time: 11m 23s | Cost: $1.20 | +3,178 lines across 7 files

Commit 05dbf00

Claude went all-in. Statistical detector with z-score, EWMA, and rate-of-change checks. Extensive docs. Synthetic benchmarks. It looked impressive.

Then I actually ran it.

The calculateRateOfChange() function returns Infinity when the previous window is zero, and the alert formatter calls toFixed() on it. Instant RangeError crash. The baseline isn't actually rolling; the circular buffer drops old samples, but RunningStats keeps everything, so it can't adapt to regime changes. Unit tests use Math.random(), making the whole suite non-deterministic. Oh, and none of this is wired into the actual processor pipeline.

Cool prototype. Completely broken for production.

GPT-5 Codex's Attempt

Tokens: 86,714 input (+ 1.5M cached) / 40,805 output (29,056 reasoning)

Time: 18m | Cost: $0.35 | +157 net lines across 4 files

Commit 878f313

Codex actually integrated it. Modified the existing AnomalyDetector class, wired it into index.ts. It runs in production immediately.

The edge case handling is solid, checks for Number.POSITIVE_INFINITY and uses a descriptive string instead of crashing on toFixed(). The baseline is truly rolling with circular buffers and incremental statistics (sum, sum-of-squares) that update in O(1). Time buckets align on wall-clock boundaries for predictability. Tests are deterministic with controlled bucket emissions.

There are trade-offs. The bucket approach is simpler but slightly less flexible than circular buffers. It extended the existing class instead of creating a new one, which couples statistical detection to threshold logic. Documentation is minimal compared to Claude's novel-length bundle.

But here's the thing: this code ships. Right now. As-is.

Kimi's Attempt

Time: ~20m | Cost: ~$0.25 (estimated) | +2,800 lines

Commit ed72f3f

Kimi tried to support both streaming logs and batch metrics. Added MAD and EMA-based detection. Ambitious.

The fundamentals are broken, though. It updates the baseline before checking the new value, making the z-score effectively zero. Real anomalies never fire. There's a TypeScript compilation error: DEFAULT_METRIC_WINDOW_SECONDS used before declaration. Rate-of-change divided by previousValue without checking for zero, same Infinity crash as Claude. Tests reuse the same log object in tight loops, never seeing realistic patterns. Nothing's integrated.

This doesn't even compile.

Round 1 Quick Compare

	Claude	GPT-5	Kimi
Integrated?	No	Yes	No
Edge cases?	Crashes	Handled	Crashes
Tests work?	Non-deterministic	Yes	Unrealistic
Ships?	No	Yes	No
Cost	$1.20	$0.35	~$0.25

Codex pulled ahead because it was the only one that shipped working, integrated code.

Tool Router Integration

I wanted to dogfood Tool router which is in beta and basically allows you to add any Composio apps and it can load tools from appropriate toolkits only when needed based on task context. Reducing you MCP context bloat by a mile. You can read here.

Before kicking off Test 2, I integrated everything through our tool router MCP that we ship with Tracer. Quick refresher on why I bother with it: Tool Router exposes all of a user's connected apps as ready-to-call tools for any agent. One OAuth handshake per user, and the AI SDK gets a unified surface instead of me hand-wiring Slack, Jira, PagerDuty, and whatever comes next.

What that buys me in practice:

Unified access with per-user auth: one router for 500+ apps, and each session only sees the integrations that the user actually connected.
No redeploys, SDK-native: new connections show up instantly with proper params/schemas, so agents can call them without glue code.

(Also: this is the exact service backing Rube MCP on the backend.) The helper that spins it up lives in packages/ai/src/composio-client.ts:

export class ComposioClient {
  constructor(config: ToolRouterConfig) {
    this.apiKey = config.apiKey;
    this.userId = config.userId || 'tracer-system';
    this.toolkits = config.toolkits || ['slack', 'gmail'];

    this.composio = new Composio({
      apiKey: this.apiKey,
      provider: new OpenAIAgentsProvider(),
    }) as any;
  }

  async createMCPClient() {
    const session = await this.getSession();

    return await experimental_createMCPClient({
      transport: {
        type: 'http',
        url: session.mcpUrl,
        headers: session.sessionId
          ? { 'X-Session-Id': session.sessionId }
          : undefined,
      },
    });
  }
}

With that in place, any LLM can drop the same Slack/Jira/PagerDuty hooks without me juggling tokens. Swap the toolkit list and any agent, or even an internal automation, get the same stabilized catalog.

Test 2: Distributed Alert Deduplication

The challenge: Fix race conditions when multiple processors detect the same anomaly. Handle ≤3s clock skew and processor crashes. Prevent duplicate alerts when processors fire within 5 seconds of each other.

Claude's Take

Time: 7m 1s | Cost: $0.48 | +1,439 lines across 4 files

Commit a7cac5c

Claude designed a three-layer architecture: L1 cache, L2 advisory locks + DB query, L3 unique constraints. Handles clock skew with database NOW() instead of processor timestamps. PostgreSQL advisory locks auto-release on connection close, handling crashes gracefully. The test suite is 493 lines covering cache hits, lock contention, clock skew, and crashes.

Same problem as test 1: not integrated into apps/processor/src/index.ts. The L1 cache uses Math.abs(ageMs), which doesn't account for clock skew (though L2 catches it). Advisory lock key is service:alertType without a timestamp, causing unnecessary serialization. The unique constraint blocks all duplicate active alerts, not just within the 5-second window.

Great architecture. Still just a prototype.

GPT-5's Take

Tokens: 44,563 input (+ 1.99M cached) / 39,792 output (30,464 reasoning)

Time: ~20m | Cost: $0.60 | +166 net lines across 6 files

Commit 6d9cf3b

Codex integrated it. Modified the existing processAlert function and wired in deduplication. Uses a reservation-based approach with a dedicated alert_dedupe table with expiration, simpler than advisory locks, and easier to reason about. Transaction-based coordination with FOR UPDATE locks for serialization. Handles clock skew with database NOW(). Crashes are handled through a transaction rollback that clears reservations automatically.

There's a minor race condition in the ON CONFLICT clause where both processors can pass the WHERE check before either commits. No background cleanup for expired alert_dedupe entries (though stale entries get cleaned up on each insert). The dedupe key includes projectId, treating the same service+type across projects as different; it might be intentional, but worth noting.

Production-ready except for that small ON CONFLICT fix.

Kimi's Take

Time: ~20m | Cost: ~$0.25 (estimated) | +185 net lines across 7 files

Commit 31aa8f5

Kimi actually integrated this one. Modified processAlert and wired in deduplication. Uses discrete 5-second time buckets, simpler than a reservation table. Atomic upsert with database-native ON CONFLICT DO UPDATE to handle races. Implements exponential backoff retry logic.

Critical bugs, though. Duplicate detection compares createdAt timestamps, which are identical for simultaneous inserts, and returns the wrong isDuplicate flag. The retry logic calculates a new bucket but never uses it, passes the same timestamp, and hits the same conflict again. The severity update SQL is unnecessarily complex.

Good approach, broken execution.

Round 2 Quick Compare

	Claude	GPT-5	Kimi
Integrated?	No	Yes	Yes
Approach	Advisory locks	Reservation table	Time buckets
Critical bugs?	None (but not wired)	Minor race	Duplicate detection broken
Cost	$0.48	$0.60	~$0.25

Codex won again with cleaner integration and fewer showstopper bugs.

The Money

Total across both tests:

Claude: $1.68
GPT-5 Codex: $0.95 (43% cheaper)
Kimi: ~$0.51 (estimated from aggregate)

Codex is cheaper despite using more tokens. Claude's extended thinking and higher output costs ($15/M vs $10/M) kill you. Codex's cached reads (1.5M+ tokens) bring costs way down. Kimi's CLI only shows aggregate project spend, so I had to estimate per-test costs.

What I Actually Learned

Codex won both tests by shipping production-ready code with the fewest critical bugs. Claude made better architectures, Kimi had clever ideas, but Codex was the only one consistently delivering working code.

Why Codex won:

Actually integrates code instead of creating parallel prototypes
Catches edge cases everyone else misses (that Infinity.toFixed() bug bit both Claude and Kimi)
Both implementations are production-ready
43% cheaper than Claude

The downsides:

Less comprehensive documentation than Claude
Minor ON CONFLICT race in test 2
Takes longer (18-20m vs Claude's 7-11m), but worth it for code that works

When to Use Claude Sonnet 4.5

Best for architecture design and documentation. The thinking is genuinely excellent; the three-layer defense in test 2 shows real distributed systems understanding. Documentation is thorough (7 files for test 1). Fast execution at 7-11 minutes. The extended thinking mode with self-reflection produces well-reasoned solutions.

But it doesn't integrate anything. You get prototypes that need serious wiring. Critical bugs in both tests. More expensive at $1.68. Over-engineered (3,178 lines vs Codex's 157 net).

Use it when: You want a thoughtful architecture review or documentation pass, and you're okay spending time wiring it in and fixing bugs.

When to Use Kimi K2 Thinking

Best for creative solutions and alternative approaches. Time buckets in test 2, MAD/EMA attempts in test 1 show creative thinking. Actually integrates code like Codex. Good test coverage. Probably the cheapest (though CLI doesn't expose usage).

But there are critical bugs in core logic everywhere. Broken duplicate detection and retry in test 2. Baseline update order issues in test 1. CLI limitations (no cost visibility, context fills fast). Fundamental logic errors prevent the code from working.

Use it when: You want creative ideation and can afford to refactor the output. Budget extra time to harden everything.

Bottom Line

I'm shipping production work with GPT-5 Codex. It delivers integrated code that handles edge cases, costs 43% less than Claude, and needs minimal polish. Claude's my go-to for architecture reviews or documentation, even though I know I'll spend time wiring it in and chasing bugs. Kimi's the wild card, creative and cheap, but the logic bugs mean I budget serious refactoring time.

The real insight? All three models generate impressive-looking code. But only Codex consistently ships. Claude designs better but doesn't integrate. Kimi has clever ideas but introduces showstoppers. For real-world development where you need working code fast, Codex is the practical choice.

How to ship apps faster with full-stack Claude Code setup (Skills, MCP, Plugins)

Rohith Singh — Sun, 09 Nov 2025 15:26:07 +0000

While most agentic coding tools like Codex, Cursor, and Windsurf are adding SDKs and plugin APIs, Anthropic’s Claude Code is trying to do something a bit different. They’ve been quietly building a complete stack - skills for domain context, Plugins for modular workflows, and MCPs for tool integrations, all connected through one environment.

I wanted to see how that actually works when you build something real.

So I picked a project I’ve been planning for a while. Luno is a personal finance platform. It includes payment integrations, cron jobs for bill reminders, an agentic chatbot (wired with Tool Router for calling tools like Gmail, Notion, Stripe, etc, integrated inside the app), household sharing, subscription tracking, and analytics.

The goal was simple: test the entire Claude Code setup. Skills, Plugins, MCP servers, Sub Agents, and slash commands, and see if it really helps speed up real-world development or just adds more setup overhead.

TL;DR

Built Luno in 2-3 days using Claude Code's full stack. Setup took a day (creating Skills, configuring Rube MCP). After that, features were shipped in 30-60 minutes instead of 8-10 hours of manual work. Cost: $12.67 for Claude Code usage (~15.5M input, 174k output tokens) + a Cursor Pro account for routine CRUD. Context7 MCP was critical in setup and development by pulling the right docs in‑session.

The infrastructure works, but requires upfront investment.

Skills taught Claude: Code: my patterns for workflows; Rube MCP connected everything through one server; dev‑toolkit plugin handled security/testing/reviews. Tool Router powers the agentic chatbot. It would usually take 2-3 months and 200+ hours.

You can find the repository here, don't forget to drop a star!

Quick look: Here’s what the dashboard looks like:

Day One: The Setup

I started by creating Skills for CC. Not because I love documentation, but because I was tired of having to explain the same patterns over and over. "Use TanStack Query for data fetching." "RLS policies for multi-tenant data." "Error boundaries here, not there."

I asked Claude Code to generate Skills for my workflow:

Feature development patterns
Database architecture (Supabase)
Tool Router integration with AI SDK
Analytics pipeline patterns
Design-to-code workflow

They're just markdown files in .claude/skills/. Nothing fancy. But here's the thing: once I had them, Claude stopped generating code that looked like it came from a tutorial. It started generating code that looked like my code.

Then I set up Rube MCP. The problem with MCPs is that they eat your context window; multiple servers = less space for the model to think. Rube connects to 500+ apps through a single MCP server. GitHub, Linear, Figma, Supabase, all through one connection. It manages a sandbox environment for tool actions and stores data there, so your context window stays free. In parallel, Context7 MCP removed a ton of context‑switching by fetching authoritative docs directly in the session.

I already had my dev-toolkit plugin from last month (when Anthropic dropped plugin support). 16 specialised agents, 10+ slash commands, MCP integrations. Things like /security-scan for OWASP reviews, /test for running tests with coverage reports. I wanted to stress-test it on something real.

Setup took a day. Then I started building.

Building Luno: My personal finance management (The Messy Part)

I gave Claude a prompt: "Build the database schema for a personal finance platform. Transactions, accounts, categories, budgets, goals, household sharing with invitations."

It generated the complete Supabase schema. Foreign keys, indexes, RLS policies. The schema made sense because the Skills taught Claude how to structure databases. But then I looked closer, and it forgot the indexes on the household invitations table. The token and email Columns needed indexes for performance. Had to point that out.

First lesson: Skills help consistency, but you still need to review.

For the UI, I passed Claude a Figma design link to get some ideas and build on top of it. I'd already set up my theme using tweakcn, so the implementation was based on that existing setup rather than an exact match of the Figma design. The designs don't match, but the UI came out clean and consistent.

Authentication was manual, and Supabase Auth handles most of it anyway.

Then I hit a wall: cost. Claude's pricing was adding up fast. I was generating a lot of code, and at ~$3 per million tokens, it gets expensive quickly. So I switched to Cursor for routine feature work, transaction management, budget tracking, and basic CRUD. Cursor's $20/month subscription made more sense for that stuff.

I came back to Claude Code when I needed to integrate Composio's Tool Router with the AI SDK for the chatbot. The docs weren't clear on some patterns, and I kept getting the integration wrong. I used Context7 MCP to fetch the actual AI SDK docs and Tool Router examples.

What is Tool Router (and why use it)?

Tool Router exposes connected apps as callable tools for your AI agent, without hand‑wiring each integration. You connect once (per user), and the AI SDK gets a unified tool surface.

Unified access + per‑user auth: One router for 500+ apps, with tools automatically scoped to each user’s connections.
Zero redeploys + AI‑SDK native: New connections appear as tools immediately; tools already come with parameters/schemas for direct calls.

In Luno, that meant email/calendar/issue flows existed only if the user had connected those apps, with no special‑case code or per‑app SDKs.

Btw, this is what powers Rube MCP, at the backend.

Claude pulled the documentation and showed me the pattern:

// Initialize Tool Router MCP client
const mcpClient = await createToolRouterMCPClient(user.id)

if (mcpClient) {
  // Get tools from MCP client (AI SDK format)
  const mcpToolSet = await mcpClient.tools()

  // Combine with database tools
  const allTools = {
    ...dbTools,
    ...Object.fromEntries(
      Object.entries(mcpToolSet).map(([name, tool]) =>
        [`toolRouter_${name}`, tool]
      )
    )
  }

  // Use with streamText
  const result = streamText({
    model,
    messages: modelMessages,
    tools: allTools,
  })
}

Once I had the pattern, it clicked. Tool Router creates a session per user, exposes all connected apps as MCP tools, handles auth per user, and returns tools in AI SDK format. So if a user connects Gmail, Calendar, and Notion, the chatbot automatically gets those tools. No code changes. Dynamic tool access based on what the user has connected.

That's what makes Tool Router powerful. It's not just connecting apps, it's exposing those connections as tools your AI can use directly.

The RLS Policies That Took Three Tries

The household invitation system was probably the most complex part. Invitations expire after 7 days, need email templates, and proper permission checks. Claude got the schema right on the first try. But the RLS policies? Three iterations.

First version: members could see invitations, but couldn't check ownership hierarchy properly. Second version: fixed the hierarchy but broke the permission logic for expired invitations. Third version: finally worked. The policy checked ownership, membership, and expiration all in the right order.

This is where having the plugin helped. I ran /security-scan and it caught issues I would've missed:

Trending updates weren't locked down properly
Anonymous tracking needed server‑issued signed tokens with TTL
The queue work needed proper batching
Long queries should be precomputed on a schedule
SQL indexes are missing on some aggregations

Fixed all of these before deploying. The security‑reviewer agent checks for OWASP Top 10, suggests fixes, and validates implementations.

Payment Integration and Cron Jobs

Lemon Squeezy integration was straightforward. The Skill had patterns for webhook handling and subscription management. Claude generated webhook handlers with proper signature verification on the first try.

Cron jobs for bill reminders were more interesting. I set up Supabase (edge functions) that:

Check upcoming bills
Send email reminders via Resend
Update notification preferences
Handle timezone conversions

The timezone handling part needed manual fixes. Claude generated code that worked, but didn't account for daylight saving time properly. Had to correct that myself.

What Actually Shipped

After 2-3 days, I had a production‑ready finance platform: transactions with categories, budgets with alerts, household sharing (7‑day invitations), subscription tracking, analytics, an agentic chatbot (Tool Router), automated bill reminders, and Lemon Squeezy payments.

The analytics dashboard was the last piece. Claude generated working code, but it split queries that should have been combined and missed opportunities to memoise. I optimised those manually.

Dev-Toolkit Plugin (Quick Context)

Since I keep mentioning it, this plugin (built when Anthropic launched plugins) bundles the day‑to‑day work, security reviews, testing, and system design into slash commands and specialised agents.

Core agents: security reviewer (OWASP), performance/load tester, compliance/testing/architecture specialists.
Core commands: /test, /code-review, /security-scan, /deploy, /monitor.

You can install it: rohittcodes/claude-plugin-suite

The plugin meant I could run security reviews and code standardization continuously instead of at the end. Caught a lot of issues early.

The Real Numbers

Let's talk money and time, because that's what actually matters.

Setup (day 1): ~1 day total (Skills ≈4h, Rube MCP ≈30m, Supabase ≈2h).

Development (2-3 days):

Claude Code: $12.67 (architecture, complex integrations, Tool Router)
- Usage: ~15.5M input tokens, ~173k output tokens
Cursor: Pro account (routine CRUD, UI polish)
Rube MCP: Free tier
Total: $12.67 + Cursor Pro

Time saved: 200+ hours. It typically takes 2-3 months of solid work.

But here's the context: that $12.67 is after I switched to Cursor Pro for routine work. If I'd used Claude Code for everything, it would've been closer to $50-60. The cost management part is real; you need to be strategic about when you use the expensive model.

Would I Do It Again?

Absolutely. But if I were starting over, I'd approach a few things differently. I'd create Skills on day zero, before writing a single line of code. The consistency they bring matters more than moving fast in the beginning, and every feature afterwards benefits from having those patterns established. The plugin would be there from the start, too; catching security issues and enforcing code standards continuously is far better than fixing problems at the end.

I'd also budget more realistically. If you're building something serious, plan for $50-100 a month in AI costs. It's still cheaper than hiring someone or spending months of your own time, but it's not free. And Context7 MCP would be non-negotiable from the start, having documentation accessible in-session instead of constantly context-switching to docs is a massive productivity unlock.

The thing is, you still review code. You still make architecture decisions. You still fix edge cases. Claude Code handles the boring stuff, boilerplate, migrations, and configuration so that you can focus on the hard problems: architecture, security, performance, and user experience. That upfront day spent on Skills and setup? It paid for itself ten times over in consistency and speed.

Final Thoughts

Claude Code's infrastructure is real. It's not hype. But it requires upfront investment in Skills, plugins, and MCP configuration. Once that's done, development becomes more conversational. "Build a subscription tracking feature with email reminders" actually works.

The value isn't replacing developers, it's handling the boring stuff so you can focus on what matters: the architecture, the security, the performance, the user experience. Luno took me 2-3 days, not 3 months. It's production‑ready with proper protection, error handling, and testing. That's the difference the full stack makes.

If you're building with Claude Code, invest in the infrastructure first. Create your Skills. Set up your MCPs properly. Build or install plugins that match your workflow. The upfront cost is worth it.

How to Connect Salesforce to OpenAI Agent Builder

Rohith Singh — Sat, 25 Oct 2025 03:48:20 +0000

OpenAI's Agent Builder gives you a straightforward way to build and deploy AI agents, combining models, tools, and logic into one visual workspace. This no-code design lets you focus on how your agent should work rather than dealing with the underlying infrastructure.

Sales teams often spend hours managing leads, updating contacts, and juggling follow-ups, repetitive tasks that take time away from closing deals. By connecting Agent Builder to external platforms like Salesforce through an MCP server (such as Rube), you can create agents that handle these tasks automatically. The MCP handles authentication, API calls, and data formatting, letting your agent focus on workflow logic rather than infrastructure.

In this guide, we’ll build a Salesforce Agent using the Rube MCP. This setup allows your agent to manage contacts, update deals, and interact with leads automatically, so you can spend more time closing deals instead of managing data.

What is Agent Builder?

Agent Builder is OpenAI’s visual, no-code platform for designing, building and deploying AI workflows. Instead of writing lines of code, you can simply drag and drop components onto a canvas and connect them to define how your Agent should behave. Each component, or “node”, has a specific purpose. Some of them handle requests, others run the logic and enforce security rules and some connect to external systems via MCPs.

With Agent Builder, you can create complex, multi-step workflows without worrying about the infrastructure or API layer. It also integrates with ChatKit widgets to display results in an interactive interface, and comes with inbuilt evaluation tools to test performance, identify bottlenecks. For a Salesforce workflow, this means you can automate repetitive tasks, manage leads and contacts, and orchestrate multi-step processes with minimal setup, all visually and in a straightforward approach.

To learn more about Agent Builder you can checkout this how-to-guide.

Why Rube MCP matters to your Salesforce workflow

When your MCP Client uses multiple MCP servers in a workflow, connecting them all directly can quickly consume the LLM’s context window, slowing down or breaking your workflow. Rube MCP (Model Context Protocol) solves this by acting as a single entry point for all your tool connections. Your agent communicates with Rube, which handles authentication, API calls, and data formatting for each external tool.

For Salesforce, using an MCP implementation like Rube means you don’t have to write custom OAuth flows or manage API keys for every endpoint. Rube provides a unified interface for hundreds of apps, including Salesforce, Gmail, Notion, and more. It also dynamically loads only the tools needed for a given context, keeping the agent’s workflow efficient and reducing the chance of context overload. This setup allows your Salesforce agent to focus on logic and decision-making, while the MCP handles the complexities of API integration and token management.

How to Add Salesforce to Agent Builder

Before we start building our Salesforce Workflow, we need to connect Rube MCP with Agent Builder. This will allow our agent to communicate with Salesforce securely through Rube’s unified MCP interface.

Step 1: Set up Rube MCP

Go to Rube, and open your Dashboard.
Navigate to Apps → Marketplace.
Search for Salesforce and click Enable App.
Choose the Recommended Composio approach, select the required scopes, and click Setup.
Add your Salesforce domain, click Connect, and authorize the app in the Salesforce authorization window.

I’ve attached a video you can follow, note you won’t see the scope/domain steps in the demo because Salesforce was already enabled on my account.
After the app is enabled, go back to the Rube dashboard and click Install Rube Anywhere.
In the modal, select Agent Builder and copy the MCP URL, you’ll need this in Agent Builder.

Scroll a little further and click Generate Token to create an access token. Copy the generated token, this is the value you’ll paste into Agent Builder (Authorization → Access token / API key).

Step 2: Connect Rube MCP inside Agent Builder

Open Agent Builder and create a new workflow by clicking “Create.”
You’ll see a canvas with two default nodes: Start and Agent. Delete the connecting edge between them for now, we’ll set up our workflow logic later.
Click on the Agent node and rename it to “Salesforce Agent”. Then, in the Instructions field, enter the following:
```
You are a helpful assistant.
```
We’ll refine this agent later with Guardrails and Logical Nodes for better control.

In the agent node’s configuration panel, click the “+” icon beside Tools. From the dropdown, select “MCP Server” → then click “+ Server”.

Paste the MCP URL you copied from Rube earlier. In the Authorisation field, choose Access Token / API Key and paste your generated token. Give your server a name like “Rube” and click Connect.

Building the Salesforce Agentic workflow

With the Salesforce connection ready inside Agent Builder, it’s time to make our agent actually do something useful, like creating or updating leads automatically. But before jumping straight to actions, we’ll make sure our workflow is secure, context-aware, and can correctly interpret what users want to do.

Step 1: Add GuardRails

Let’s start by protecting the workflow from misuse.

From the toolbar, drag the GuardRails node onto your canvas and connect it with the Start node.
Give it a label like “GuardRails” and enable the Jailbreak setting, this helps your workflow detect and block prompt injection attempts or malicious instructions.

Optionally, you can enable “Sensitive data” checks if your workflow will deal with customer details like emails or phone numbers.

Step 2: Create an Intent Classifier

Next, we’ll teach the agent to understand what a user is asking, whether it’s to create a new lead, update contact info, or close an opportunity.

Add a new Agent node to the canvas and connect it to the “Pass” output of your GuardRails node, and label it “Intent Classifier”.

Give the agent this instruction:

Understand what the user wants to do in Salesforce. Classify the intent into one of these categories: "create_lead", "update_lead", or "close_deal".

Change the Output format to JSON, then open Advanced Settings → JSON Schema and paste this schema:

{
  "type": "object",
  "properties": {
    "classification": {
      "type": "string",
      "enum": [
        "create_lead",
        "update_lead",
        "close_deal"
      ],
      "description": "classification of user's intent",
      "default": ""
    }
  },
  "additionalProperties": false,
  "required": [
    "classification"
  ],
  "title": "response_schema"
}

Update the schema and connect the Fail output of the GuardRails node to an End node.

Step 3: Add Logic Routing

We’ll now decide where each request should go depending on the classification output.

Drag an If/Else node and connect it to the Intent Classifier, and write the Case name as “isValid”

In the expression field, paste:

input.output_parsed.classification == "create_lead" || 
input.output_parsed.classification == "update_lead" || 
input.output_parsed.classification == "close_deal"

Connect the Else output to the End node so unrecognized intents terminate safely.

Step 4: Connect Salesforce Agent Node

Finally, connect the “If/Else” node to your Salesforce Agent node. In the Salesforce node’s configuration, update the instruction as:

You are a Salesforce CRM assistant. Perform the user’s intended action in Salesforce 
based on this classification: {{input.output_parsed.classification}}. 
Use the following user input to complete the task: {{workflow.input_as_text}}

That’s it, you’ve built a Salesforce-specific, guard-railed workflow that can understand user goals, route them intelligently, and perform the right CRM operation securely.

An Example Interaction

Click on the Preview button in the top-right corner of the canvas. This will open a sidebar with a chat interface where you can test the workflow by typing in natural language inputs like:

Note: At the end, If in the Preview Panel, Rube doesn’t detect your Salesforce Account, you can pass the domain name in the chat and ask the agent to authorize you. Click on the generated URL and you can login with your salesforce account.

Add John Doe as a new lead with the title Sales Manager at Acme Corp.

If everything is set up correctly, your agent should classify the intent, route it through the workflow, and create the record in Salesforce via Rube MCP. You’ll see the confirmation and logs right inside the preview panel.

What else can you build with this setup?

This setup is just the starting point. Once your Salesforce connection is stable, you can layer on additional automations to fit your sales workflow:

Multi-agent workflows: Combine Salesforce with Gmail or Slack to auto-send follow-ups when a lead status changes.
Deal tracking: Build a workflow that checks for deals stuck in the same stage for too long and notifies your team.
Weekly summaries: Have the agent generate and send a performance summary to your inbox every Monday morning.

Since the agent already runs through Rube MCP, connecting any new app is just a matter of enabling it in your Rube dashboard, no extra OAuth scopes or schema mapping.

Conclusion

Agent Builder with Rube MCP makes Salesforce automation accessible to everyone. You can build sophisticated CRM workflows without writing code, handle complex authentication automatically, and focus on the business logic that matters.

The combination of visual workflow design, built-in security, and seamless Salesforce integration creates a powerful platform for sales automation. Whether you're managing leads, updating opportunities, or orchestrating multi-step processes, this setup scales from simple tasks to complex enterprise workflows.

How to Automate HubSpot CRM Using OpenAI Agent Builder

Rohith Singh — Thu, 16 Oct 2025 14:21:28 +0000

OpenAI's Agent Builder provides you with the most straightforward set of tools to build and deploy AI Agents with ease. It brings models, tools (including MCPs, Web Search, Sub Agents, etc.), and logic into a single visual workspace, allowing you to focus on what your agent should do instead of worrying about the underlying infrastructure.

In this guide, we'll build a CRM agent that can help you manage your contacts and deals in HubSpot CRM, so you can focus on the things that matter most to you. Before we begin, let’s understand what an Agent Builder is and why you would use one.

What is Agent Builder?

Agent Builder is a visual, no-code platform for designing, building, testing, and deploying AI workflows through an excellent drag-and-drop interface. There is a set of nodes included in the OpenAI Agent builder, including Agent, FileSearch, GuardRails, MCPs, Logical Nodes, and more. You can drag and drop these nodes on the canvas, configure them with the required inputs, and connect them with edges to create your own Agentic Workflow. All the nodes are designed to be used logically so that you can make a complex Agentic workflow with ease.

Check out this blog post on how to build agents with Agent Builder.

You can also use the ChatKit Widgets to output/display the results in a widget and let your users interact with it to get the results. You can learn more about the complete Agent Kit here: OpenAI Agent Kit, which describes building and deploying AI Agents from scratch. You can also play with the templates provided by OpenAI to get started with the Agent Builder.

They also provide a comprehensive Evaluation Tool to test the performance of your Agent and identify bottlenecks, so you don't need anything further to test your workflows.

Rube MCP

Let's say you use a lot of MCPs in your day-to-day workflow and connect all these MCPs directly with your client; they will quickly eat up the context window size. Rube solves this by providing a single point of entry for all your MCPs, allowing you to connect to Rube, which will handle the authentication and API calls for you.

It is a universal MCP Server built on top of Composio's existing toolkit infrastructure. Instead of writing custom OAuth flows for each MCP or managing API keys, you can use Rube as a unified MCP interface for more than 500 apps, including HubSpot, Salesforce, Google Sheets, Airtable, Notion, etc., which work seamlessly with the Agent Builder and most other MCP Clients.

You can ask Rube to authenticate you with the app (if your app works on OAuth2); it'll handle the OAuth flow for you and get the access token. If your app works on an API key, you can just provide the API key to Rube, and it'll use it to make the API calls for you.

How to Add HubSpot to Agent Builder

Before we start building our CRM agent, we need to have a few things in place:

Step 1: Set up Rube MCP

Go to Rube. Scroll down and click “Install Rube Anywhere”

In the modal, select Agent Builder. Copy the MCP URL. We’ll paste this into Agent Builder soon.
Scroll a little further and hit the "Generate Token" button to generate the access token.

Copy that token; we’ll use it to authorise Agent Builder.

Step 2: Connect Rube MCP inside Agent Builder

Open Agent Builder and create a new workflow by clicking on “Create”.
It will redirect you to a new page with a canvas and a toolbar on the top left, and two nodes on the canvas: "Start" and "Agent." We'll be using the same Agent node later for our multi-agent workflow. Delete the edge between both nodes by clicking on the edge and hitting the delete icon on the configuration panel.

Click the "Agent" node, name the agent "HubSpot CRM Agent," and give the agent instructions as below:
```
You are a CRM agent that can help me manage my contacts and deals in HubSpot CRM.
```
Note: We'll be editing this node later with the inputs for adding guardrails and logical nodes to the agent.
Click on the "+" icon beside Tools in the agent node's configuration panel. It will open a dropdown; select "MCP Server" → click the "+ Server" button in the dialogue that appears.
Paste the MCP URL you copied from Rube, and in the Authorisation field, choose Access token / API Key and paste the token you generated.

Paste the MCP URL you copied from Rube, and in the Authorisation field, choose Access token / API Key and paste the token you generated.
Give the server a label like “Rube” and hit Connect

We’ve now successfully connected HubSpot (via Rube MCP) to Agent Builder.

Building a Secure Workflow Using Agent Builder

Now that we have an Agent Node in our workflow, we need to add some guardrails and logical nodes to ensure users don't jailbreak the agent through prompt injection or other malicious attacks.

Step 1: Adding GuardRails

Select the GuardRails node from the toolbar and drag it onto the canvas. Connect it with the Start node, and in the configuration panel, give GuardRails a label of "GuardRails.” Enable "Jailbreak" to detect and prevent the agent from being used for malicious purposes.

Step 2: Add Intent Classification

Now, create a Classification Agent for the workflow to classify the user's intent and route the request to the appropriate node. For this guide, we'll focus on creating a simple classification agent to classify the user's intent to create, update, or delete a contact and route the request to the appropriate node.
Drag and drop the Agent node from the toolbar, connect it with the "Pass" option of the Guardrails node, and in the configuration panel, give the Agent a label of "Classification Agent." Give the agent instructions as below:
```
Classify the user’s intent into one of the following categories: "create_contact", "update_contact", or "delete_contact".
```

For the output format, select "JSON," and in the JSON schema field, add the following schema by going to Advanced Settings:

{
  "type": "object",
  "properties": {
    "classification": {
      "type": "string",
      "enum": [
        "create_contact",
        "remove_contact",
        "update_contact"
      ],
      "description": "classification of user intent",
      "default": ""
    }
  },
  "additionalProperties": false,
  "required": [
    "classification"
  ],
  "title": "response_schema"
}

Update the schema by clicking the "Update" button in Advanced Settings. For the "Fail" option, we will use the "End" node to terminate the workflow, ensuring that if the GuardRails node fails, the workflow will conclude.

Step 3: Route Logic

Now, we need to use the output variables from the Classification Agent in the next node, which will be a logical node to route the request to the appropriate node based on the user's intent.
- Drag and drop the "If/else" node from the toolbar, connect it with the "Classification Agent" node, and in the configuration panel, give a CaseName to the logic: "isValid." Then pass this expression in the next input field:
```
input.output_parsed.classification == "create_contact" || input.output_parsed.classification == "update_contact" || input.output_parsed.classification == "delete_contact"
```
- For the "Else" option, connect it with the "End" node to end the workflow.

Now, we need to use the "HubSpot CRM Agent" we created at the start of the workflow and connect it with the "isValid" option of the "If/else" node.
- Click on the "HubSpot CRM Agent" node, and in the configuration panel, give the agent instructions as below:
```
You are a HubSpot CRM assistant, perform user action based on user's intention: {{input.output_parsed.classification}}. User input: {{workflow.input_as_text}} 
```

And that's it, you have a secured workflow that can classify the user's intent and route the request to the appropriate node based on the user's intention.

Testing the workflow

Click on the "Preview" button in the top right corner of the canvas. It will open a sidebar with a chat interface. You can test the workflow by typing in the chat input and seeing the results.

Evaluating the performance and publishing the workflow

To determine how your workflow is performing, you can use the "Evaluation" tool in the top right corner of the canvas by clicking the "Evaluate" button.

For deploying the agent, you can click the "Publish" button, give a name to your workflow, and click on the "Publish" button to deploy the workflow.

Taking things further with Chatkit

You can add more nodes to the workflow to enhance its robustness and further secure it according to your needs. You can also create ChatKit widgets to display the results and let your users interact with them.

OpenAI also provides you with the feature of publishing apps to OpenAI, which other users can access to use your workflow. You can learn about it here: ChatKit Overview, ChatKit Widgets, Custom ChatKit, Voice Agents (speech-to-speech architecture)

Conclusion

In this guide, we built a HubSpot CRM Agent using OpenAI’s Agent Builder, integrating it with Rube MCP and adding guardrails, logical nodes, and intent classification to make the workflow secure and efficient. In practice, when I was working with the MCP node, it became quite complex to parse the desired payload requests from the agents across MCPs for their schemas. Transform nodes can help stabilize the input, but using them can also make the workflow significantly more complex.

Nevertheless, Agent Builders are powerful because they let you design complex AI workflows visually, combine multiple tools and APIs, and focus on the logic of your agents rather than the underlying infrastructure. They make it easier to automate repetitive tasks, connect with various services, and quickly iterate on your agent’s behavior without writing a lot of boilerplate code.

Improving your coding workflow with Claude Code Plugins

Rohith Singh — Tue, 14 Oct 2025 14:09:16 +0000

I've been using Claude Code for a while now, and it keeps getting better and better. Whether it's the developer experience, the models Anthropic brings in, or the continuous improvements, everything has been solid. When Anthropic dropped plugins support on October 9, 2025, I was curious to see what they'd built. It's actually pretty useful for the development setup I prefer.

You know how you always end up with this messy setup of slash commands, custom agents, and MCP servers scattered across different projects? And then, when your teammate asks, "How do I set up the same thing on my machine?" you realise you have no idea how to recreate your own setup. Well, plugins solve that problem. They let you bundle all your customisations into shareable packages that install with a single command. Think of it like packaging your favourite tools and features into a single file, which you can share with your team or use in your own workflow. I've been using them for a few days now, and it's been a game-changer for my workflow. In this post, I'll be going over what plugins are, how to configure them, and how I've adopted them in my workflow.

TL;DR

Claude Code plugins are shareable packages that bundle slash commands, specialised agents, MCP servers, and hooks into single installable units.
They solve the "how do I set up the same agentic workflow for my setup" problem by letting teams standardise their agentic development setups.
The ecosystem is exploding with community marketplaces offering everything from DevOps automation to complete development stacks.
Installation is pretty straightforward. You need to add a marketplace, browse for plugins in that marketplace, and install them. I've been using them for a few days now, and they've genuinely improved my workflow.

A quick look at my plugin-powered workflow

If you’re like me and want to see things running before the how and why, here’s a short demo of my actual setup, with plugins, Sub Agents, Slash commands, and Rube MCP working together with Linear.

What Are Claude Code Plugins?

Alright, let's get technical for a second. According to the official docs, plugins are basically lightweight packages that bundle together:

Slash commands: Your custom shortcuts for stuff you do all the time with Claude Code
Subagents: Specialized AI agents that handle specific tasks (think of a database management flow, API testing, etc.)
MCP servers: The standardized way to connect Claude Code to external tools and data sources
Hooks: Custom behaviors that trigger at specific points in your workflow

How Plugins Work

Plugins use a standardized JSON configuration format that defines:

Metadata: Name, version, description, author
Components: Which slash commands, agents, MCP servers, and hooks to include
Dependencies: Other plugins or tools required for the setup
Configuration: Default settings and environment variables you’d need

The plugin system is built on top of Claude Code's existing extension points. So as a contributor, it's not reinventing the wheel - it's just making everything more organised and shareable.

According to Anthropic, A standard plugin structure should look like this:

enterprise-plugin/
├── .claude-plugin/           # Metadata directory
│   └── plugin.json           # Required: plugin manifest
├── commands/                 # Default command location
│   ├── status.md
│   └──  logs.md
├── agents/                   # Default agent location
│   ├── security-reviewer.md
│   ├── performance-tester.md
│   └── compliance-checker.md
├── hooks/                    # Hook configurations
│   ├── hooks.json            # Main hook config
│   └── security-hooks.json   # Additional hooks
├── .mcp.json                 # MCP server definitions
├── scripts/                  # Hook and utility scripts
│   ├── security-scan.sh
│   ├── format-code.py
│   └── deploy.js
├── LICENSE                   # License file
└── CHANGELOG.md              # Version history

How MCPs Can Power Plugin Integrations

The best thing I personally like about plugins is that they also let you share MCP configs.mcp.json) within marketplaces. In my setup, I'm using several MCP integrations that handle different parts of my workflow:

Rube: This has become one of my go-to MCP server choices these days. Instead of manually configuring each MCP server, Rube provides a unified interface to discover, connect, and manage 500+ app integrations. I can browse and connect to any supported app, manage API keys securely, and orchestrate workflows across multiple services.
Vercel MCP: For deployment automation and project management. It connects directly to my Vercel projects, so I can deploy, manage domains, and check deployment status without leaving Claude Code.
Airtable MCP: For data management and project tracking. I can create records, update databases, and manage structured data directly through slash commands without switching between tools.

These MCPs work together seamlessly. I can deploy to Vercel, create Linear issues for any problems, and manage everything through Rube's interface. This is what makes my plugin setup work; it's not just about the plugins themselves, but the ecosystem of integrations they can access through MCPs.

Plugin vs Marketplace vs Individual Components

To understand how this all fits together, let’s first try to understand individual pieces:

Individual Components: Single slash commands, agents, or MCP servers, which you can configure manually for your setup
Plugins: Bundled collections of related components that work together like a package
Marketplaces are repositories that host multiple plugins with discovery and installation tools

And now let’s say you're working on a web app that needs deployment automation. A DevOps plugin might give you:

/deploy slash command for one command secure deployments
A specialized agent that knows your infrastructure inside out
MCP servers that connect to your cloud providers
Hooks that run security scans before every deployment

All of this can be installed with a single command, so that you don’t need to copy-paste configs or set every little piece manually between projects.

Why Would You Even Want to Use Plugins?

Based on Anthropic's announcement, plugins solve some real problems that most of us face daily. Engineering leads can create standardised setups that everyone uses. Everyone gets the same tools, same configurations, same shortcuts, with no additional setup.

If you maintain open source projects, you can now ship slash commands that help developers use your stuff correctly, which can help reduce the endless GitHub issues about setup problems. Just install the plugin and everything works the way it's supposed to with Claude Code. You know that debugging setup you spent weeks perfecting with the agentic workflow you’ve? Now you can package it and share it with your team (or anyone who has access to the project). The same goes for deployment pipelines, testing harnesses, whatever. Instead of everyone figuring out their own way to do things, you can share the good stuff.

Instead of manually connecting to every service you use, plugins handle the MCP server setup with proper security and configuration—one less thing to mess up. If you're building frameworks or leading technical teams, you can package all your customisations together. Think of it like a starter template, but way more powerful.

The Plugin Ecosystem and Marketplace

The plugin ecosystem is already exploding with community-driven marketplaces. I've been going through these community marketplaces, and some of them are genuinely impressive:

Community Marketplaces

This guy, Seth Hobson has been doing something special for the past few months. He's curated over 80 specialized sub-agents that you can install instantly. We're talking about sub agents for database management, API testing, code review - the whole production ready system. It's like having a team of demand specialists.
My Personal Plugin Marketplace is a comprehensive collection that includes 16 specialized agents, 10+ slash commands, and MCP integrations for 500+ app connections that I personally love to use. It's designed to give you everything you need for DevOps, testing, security, languages, and architecture in one package.
Jeremy Longshore's Claude Code Plugins is a comprehensive marketplace and educational hub that's particularly impressive for its breadth. With 20+ plugin packs covering everything from DevOps automation to AI/ML engineering, crypto trading tools, and even creator studio workflows, it's one of the most diverse collections available. What sets it apart is the educational focus - it's not just about installing plugins, but understanding how they work. The marketplace includes detailed learning paths, templates for building your own plugins, and comprehensive documentation. It's like a complete ecosystem for both plugin users and creators.
Dan's marketplace focuses on the practical stuff - DevOps automation, documentation generation, project management. The kind of tools that actually save you time instead of just looking cool. The AITMPL marketplace is interesting because it provides complete development stacks. Think "I want to build a React app with Stripe integration" and they've got a plugin that sets up everything you need.

How Marketplaces Work?

The cool part about the plugin ecosystem is, creating your own marketplace is dead simple. Just need a git repo with a properly formatted .claude-plugin/marketplace.json file. The docs walk you through the format, and honestly, it's not that complicated.

A marketplace is essentially a GitHub repository with:

A marketplace.json file in the root
Plugin directories with their own plugin.json files
README files for documentation
Version tags for releases

Setting Up Plugins in Claude Code

Alright, let's get you set up. The installation process is actually pretty straightforward:

1. Add a Marketplace

First, you need to add a marketplace to your Claude Code installation:

/plugin marketplace add user-or-org/repo-name

For example, to add Anthropic's official marketplace:

/plugin marketplace add anthropics/claude-code

2. Browse and Install

Once you've added a marketplace, just use the /plugin menu to browse what's available. The interface is clean and shows you exactly what each plugin includes - no surprises.

You can also install specific plugins directly:

/plugin install plugin-name

3. Managing Your Plugins

Here's the smart part - plugins are designed to be toggled on and off. Working on a database-heavy project? Enable your database plugin. Switching to frontend work? Disable it to keep your context clean. No more bloated setups.

You can manage your plugins with:

/plugin list          # See all installed plugins
/plugin enable name   # Enable a specific plugin
/plugin disable name  # Disable a specific plugin
/plugin remove name   # Remove a plugin entirely

Current Limitations

While plugins are genuinely useful, the ecosystem is still young and has some rough edges. I've run into plugin management issues on my Windows setup where the TUI shows inconsistent states and there's no clear way to remove failed installations. I've documented these issues in detail on GitHub in case you run into similar problems. Despite these challenges, the benefits still outweigh the current limitations.

My Personal Setup (and how you can use the same)

Let me walk you through exactly what I'm running and how you can set up the same thing for your setup or workflow. I've been iterating on this for a few days after the release, and I think I've got something that actually works. The setup includes a set of:

16 specialised sub agents (DevOps, Testing, Security, Architecture).
10+ Slash commands, including /deploy, /test, code-review, etc..
8+ MCP integrations for all the tasks I need in my dev workflow.

If you want to replicate the setup, you can install the marketplace and then the plugin using:

/plugin marketplace add rohittcodes/claude-plugin-suite
/plugin install claude-plugin-suite@claude-plugin-suite

That’s it, your Claude Code instance now has a full CI/CD, testing and automation setup baked in. One of the most powerful aspects of the plugin system is that you can create your own. I built my Claude Plugin Suite because I couldn't find exactly what I needed with MCPs, and it's been gratifying.

If you want to explore the ecosystem, check out the official documentation, explore the community marketplaces, and see what works for your setup.

Wrapping Up

Claude Code plugins are genuinely useful. They take Claude Code from being a powerful Agentic Coding tool to something that actually adapts to how you work. The plugin ecosystem is still young, but it's already showing promise for me. You've got community-driven marketplaces, official Anthropic support, and a growing collection of plugins that solve actual problems. Whether you're working solo and want to streamline your workflow, or you're leading a team and need to standardize practices, plugins give you a path to a more efficient, consistent development experience.

Claude Sonnet 4.5 vs. GPT-5 Codex: Best model for agentic coding

Rohith Singh — Tue, 07 Oct 2025 13:53:58 +0000

OpenAI, with the release of GPT-5 Codex, has added major upgrades to Codex, the CLI agent, including longer context handling, better reasoning, and support for multi-hour autonomous sessions. Around the same time, Anthropic released Claude Sonnet 4.5, branding it as their best coding model yet. It can now run coding tasks for over 30 hours continuously, handle tools more reliably, and maintain stronger memory and context awareness throughout.

Many tech influencers have shared their opinions about both releases. There’s a YT video by Theo saying that this release by Claude is the best one to date. X is buzzing with takes like “Claude 4.5 Sonnet just refactored my entire codebase”, “GPT-5 is still better at refactoring codes”, “Sonnet 4.5 isn’t cringe”, and “It’s better at designing UI”, and what-not.

At the same time, there was also a v2 release for Claude Code, and Codex started working on HTTP-streamable support for MCPs. So, with all this happening in the AI space, I decided to test things out by building something cool using both models, and trust me, this one is the best I’ve done to date.

We’ll be comparing both side by side, and by the end, you’ll know exactly what to choose if you want to ship products that scale, stay secure, and move fast while keeping costs in mind. All the code for this comparison can be found here: github.com/rohittcodes/fashion-hub. This is actually a fork of the Turbo repo template by create-t3-stack.

Demo of the App We Built Using Both Models

Before diving into the comparisons, here’s a quick look at what we actually built. The UI and core features were generated collaboratively using Claude Sonnet 4.5 and GPT-5 Codex, running through the same MCP-powered setup.

What We'll Be Comparing These Models and Agents With?

Before proceeding, here’s how I’ll test both models using the same MCP-powered setup employed across all my builds. This will help maintain consistency across Codex and Claude when evaluating speed, reasoning, accuracy, and context retention. The whole setup that I've planned includes:

Asking for ideation and Setup: The setup will include the app schema, laying out the whole project structure, and working on the repository structure.
Cloning a Fashion E-commerce App UI
Building the recommendation pipeline for the E-commerce App

Before we dive in, let's take a moment to understand what these models and agents truly bring to the table.

Metric	Claude Sonnet 4.5 + Claude Code	GPT-5 Codex + Codex
Context & Memory	Automatically pulls context and maintains tool state across sessions. See docs for contextual prompting.	Long-context reasoning tuned for coding workflows; supports extended sessions.
Tool / MCP orchestration	Sub‑agents isolate context and tools per task.	Tight tool protocol integration with dynamic tool usage.
Error recovery / robustness	Good recovery from tool/state resets, especially with specialized agents.	Strong at iterative correction loops and large refactors.
Coding & execution	Excellent planning/architecture; can stumble on subtle async/edge cases.	Very strong execution/debugging; often ships working code in complex domains.
Steerability / prompting burden	May need more guidance for multi‑step tool chains.	More steerable out‑of‑box; less micromanagement required.
Session length / persistence	Handles long sessions; benefits from task‑scoped agents.	Built for multi‑hour autonomous execution.
Cost / efficiency	Costlier for very long runs with large contexts.	Often more efficient on large coding tasks.
Best for	Orchestration, design oversight, documentation, multi‑tool workflows.	Generation, debugging, refactoring, shipping features reliably.

TL;DR

Claude Sonnet 4.5 + Claude Code: Great at planning, system design, multi‑tool orchestration, and UI fidelity. Struggled more with lint fixes and schema edge cases in this project.
GPT‑5 Codex + Codex: Strongest at iterative execution, refactoring, and debugging; reliably shipped a working recommendation pipeline with minimal lint errors.
Cost efficiency in my runs: Claude used far more tokens for UI and lint-fixing; Codex stayed leaner and fixed issues faster in the code.
Pick Claude if you want design oversight, documentation, and multi‑tool orchestration. Pick Codex if you're going to grind through bugs, refactors, and ship features fast.
My choices for this repository are: Codex for implementation loops, and Claude for architecture notes and UI polishing.

Why MCPs Even Matter Here

MCP (Model Context Protocol) standardizes how agents call external tools and retain context across them. In this build, we relied on:

Credits: wikipedia

Rube MCP as the unified bridge to GitHub (repo init/commits), Neon (managed Postgres), Notion (planning docs), and Figma to pull frames/tokens for UI cloning reference
Context7 MCP to search documentation (e.g., Gemini API notes) directly inside the session

This made tool calls reproducible and auditable, allowing both agents to operate consistently across the same environment.

Setting up MCP servers with Claude Code and Codex

Most MCP servers are built for single-purpose integrations, one for GitHub, another for Notion, and one more for Slack. That’s fine until your workflow spans multiple tools. Then it gets chaotic: juggling multiple MCPs, shrinking model context, and hitting client limits on the number of servers you can attach.

Rube MCP addresses this issue with a universal connector layer, supporting over 500 apps through Composio’s integration stack. Instead of managing 10 separate MCP servers, you connect to Rube once and orchestrate multi‑tool workflows. Want to “take new GitHub issues and post them in Slack”? You can run it entirely through Rube without needing to figure out which server does what. Explore toolkits: https://docs.composio.dev/toolkits/introduction.

Quick Setup:

Visit https://rube.app and click “Add to Claude Code”.
Copy the command, run it in your terminal, then use /mcp in Claude Code to authenticate Rube.
For Codex, you can update the config.toml file to include the Rube MCP server, by generating a new auth token from the Rube app, and then using it in the config.toml file. You can follow the steps mentioned here to use the http streamable MCP servers with Codex.

Ideation and Setup

The idea was to build something that actually showcases complex context handling, including how both models and agents handle it, and how the agents can autonomously handle multiple MCP tool calls through a single prompt.

So, I decided to build everything inside a monorepo, letting both agents work on the same large codebase for faster feature development and debugging. Initially, I wanted to include a scalable try-on feature that could handle outfit generation for many users. But as Anthropic’s costs spiked, I had to drop that part and instead focused on building a recommendation engine that suggests outfit combinations.

The initial documentation and timeline for the project were generated using Cursor with Notion, utilising Rube MCP. Once the initial planning was complete, I began working on setting up the project using GitHub and Neon DB integrations with Rube MCP. (yeah, it can interact with all these apps)

The initial setup proceeded smoothly, with no major issues. I used create-t3-turbo to set up the monorepo, which includes a tRPC API, enhanced authentication, Drizzle, and Expo.

These were simple prompts to run the agents for performing these actions, and the agents did a good job at it. Here's a sample of the prompt I used to run the agents for performing these actions:

Using Rube MCP: create a Neon Postgres database, initialize a GitHub repo with the first commit, and generate a Notion planning page with milestones, and according to the tasks in `IDEATION.md` generate a plan for the project. Return the commands run, files changed, and any follow-ups.

UI Cloning

I actually had a UI planned for the application, but I’m bad at design, and it didn't look very exciting. I decided to get the initial UI idea from Figma and let the agents handle their own UI building. Surprisingly, they did a good job at it.

The same prompt was used for UI cloning in both models, and they performed fairly well in this task. The task was not to capture the exact UI, but to get an initial idea of the UI, and the agents did a commendable job at it. To accurately clone the UI, it’s a simple fact that Anthropic’s LLMs perform better when replicating Figma Designs. You can read more about it here: Claude code with MCP is all you need

Clone this [Link to the UI] from Figma, polished fashion e‑commerce UI for web (Next.js) and mobile (Expo): product grid/detail, cart, checkout, profile, wishlist, onboarding. Use brand style (soft pink accents, rounded‑XL, subtle gradients/shadows), add accessible focus states and good contrast, keep lint to a minimum, and build reusable primitives from packages/ui. Output how to run both apps.

Codex initially was unable to capture the design correctly, making abrupt errors with the flex properties of the design. As a result, the products weren’t visible in the catalogue for the expo app.

Our end user interface in the demo was designed by Sonnet 4.5 as you saw above in the intro, and honestly, it nailed the vibe: pixel-perfect layouts, smooth gradients, and a clean component hierarchy across both web and mobile. So I stuck with the same.

What does it cost to clone the UI:

Claude Sonnet 4.5 + Claude Code: ~5M tokens (UI cloning + iterations)
GPT‑5 Codex + Codex: ~250k tokens (UI cloning + iterations)

Refactoring and Fixing the Lint Errors

I then wanted to build a try-on feature for the application, and I started with the setup. I tried setting up local LLMs to help orchestrate the AR try-on (I tested a list of models and noted what went wrong), but the models were not performing well, so the other part of the plan was to use some image generation models to generate the try-on outfits, Gemini 2.5 Flash Image (Nano Banana). For the initial setup, I used Cursor to set up the schema for the DB and started setting up the pipeline for the try-on outfits, though I knew it’d definitely produce lint errors and a bunch of bugs, which I planned to fix with one of the models.

Once everything was set up, as I expected, I found a lot of bugs and lint errors. I prompted Claude Code to fix them, but it failed here. The lint errors were due to the schema setup, where we needed to fix relations between the tables, which Cursor (Auto mode) missed on its side. I used the same prompt later for the Codex, and the first thing it did was grab the context from the monorepo regarding the current state of the project, and then it started fixing those relations in the schema.

Although the cost up to then was relatively high, I had another feature in mind that would enable us to conduct some real coding and test how both models would perform when building a feature. This feature would involve a recommendation pipeline utilising content similarity through specific algorithms within the API layer itself. Instead of going with a pipeline to check how they could perform under constraints of the environment and situation, I focused on shipping the recommendation work.

During this detour, Context7 MCP handled Gemini API/doc lookups directly in-session, and Rube MCP updated the Notion plan with decisions and blocked items to keep context aligned across tools.

Cost of refactoring and fixing the lint errors:

Claude Sonnet 4.5 + Claude Code: ~4M tokens (Lint fixing)
GPT‑5 Codex + Codex: ~100k tokens (Lint fixing)

Building the Recommendation Pipeline

For the recommendation pipeline, both models were given the same prompt to build the required DB schema and the UI for both web and mobile apps.

Build an AI-powered Recommendation Engine in TypeScript that surfaces personalized product suggestions (For You, Similar Items, Trending) using a hybrid model (collaborative + content-based) with real-time learning from user interactions (views/cart/purchase/wishlist). Ship polished, accessible UI components, do not reference any design file. Match Fashion Hub’s brand style (soft pink accents, rounded-XL, subtle gradients, soft shadows, clean typography). The schema for the DB should be setup according to the current state of the project.

Claude Code took approximately 10 minutes to set up the schema, build the API layer, and the UI for both web and mobile apps. The total number of tokens used was around 1,189,670. However, it struggled with setting up the schema relations, which later caused issues with the API layer of the recommendation pipeline; that’s a significant oversight in terms of designing scalable and secure applications.

Codex took almost 25 minutes, to set up the schema, build the API layer, and the UI for the web. I was exhausted by that time, so I had to drop the UI for the Expo app. Tokens: ~309k. But it set up the schema relations correctly and built the API layer with minimal lint errors.

The commit for the Claude code feature generation can be found here: commit/5b193ef7d1ee8218649d6a266b475572ac0dc262

Security Flaws and Issues in the Recommendation Pipeline (both models)

From the code we generated using Sonnet 4.5, there was an issue with locking down the trending updates, and the anonymous tracking was not using a server-issued, signed, HttpOnly token mapped to an opaque ID with TTL/cleanup to avoid spoofing/cross-user pollution. The heavy queue work wasn’t being processed in a proper batching layer, and the long queries were not being precomputed on a schedule, and the SQL-side aggregation and indexes were not being added.
According to the information we received from Codex, there was an issue with long-running queries, which is definitely undesirable in a serverless environment. The UI part was missing for the Expo app, while the web app had partial implementation of these features, but was still not fully functional.

Comparing the Cost

Sonnet 4.5 cost me around $10.26 with 18M input and 117k output tokens, with a lot of lint errors but great UI design fidelity.

LMAO.. You def need to improve this Sam..

Till then, this is what I have:

GPT-5 Codex cost me around 600k input tokens and 103k output tokens which, approximately when valued with the pricing (i.e., $1.25 for input and $10 for output per 1 million tokens), would be around $2.50, with a clean-looking UI.

Developer Experience Feedback

One significant issue with Codex is its developer experience (DX). There’s no clear way to track usage or cost. There’s an OAuth login, so why not generate an API key and show total usage or cost on the dashboard? Currently, it only displays the current session cost, and I couldn’t even verify if it has been deducted from my account.

They do show context usage via session IDs, which is excellent, but overall visibility is poor. The ? The command for shortcuts no longer works.
Even the docs feel outdated and not synced with the latest features. The DX used to be better a few weeks ago.

The first thing that came to my mind was the OpenAI team vibe-coding this thing??

Final Thoughts

Today, Codex is the more practical choice for shipping features. It handles larger contexts well, iterates faster on code, fixes lint issues with fewer retries, and, in my runs, costs less than Sonnet 4.5. Claude Code has felt inconsistent in longer sessions, but Sonnet 4.5 remains strong at planning, architecture, and UI fidelity. It’s the best Sonnet so far and cheaper than Opus, just not as reliable as Codex for heavy refactors in this repo.

DX on Codex isn’t perfect, but performance and cost make up for it right now. If you depend on LLMs day to day, I’d pick Codex for the longer run if the DX improves. If you care about perfect UI and architecture guidance, bring in Sonnet 4.5 for design and docs, then let Codex implement and harden.

10 best MCP servers to take your Chatgpt experience to next level

Rohith Singh — Thu, 25 Sep 2025 15:11:10 +0000

OpenAI recently added first-class support for MCP (Model Context Protocol) servers in ChatGPT’s Developer Mode, and that’s a pretty big deal for developers, even if you use ChatGPT for your day-to-day tasks. Instead of treating ChatGPT as a read-only assistant that only suggests what to click, Developer Mode lets the model interact with external tools and services through standardised MCP servers. In practice, that means ChatGPT can fetch live data, call APIs, and crucially, propose actions that you can confirm before they run.

See the community launch thread for MCP tools in Developer Mode; and official docs here.

What is an MCP Server btw?

MCP servers expose discrete capabilities (like reading data, running an action, or let’s say return some structured results) in a predictable format the model can use. You can point ChatGPT at a hosted MCP or you can host a local MCP yourself, wire up OAuth or scoped API keys, and grant the assistant limited, auditable access to your apps. Developer Mode keeps the human in the loop: writes require explicit confirmation, and read calls are visible in the conversation.

You can learn more about mcp servers here: https://modelcontextprotocol.io

(btw, If you’re not familiar, Composio provides you with more than 250+ apps to connect AI Agents via Auth to external APIs, Such that you don’t need to manage the Auth layer yourself. You can learn more about the toolkits here.)

For anyone building tooling, internal automations, or just trying to make their workflows less clicky, MCPs are the practical bridge between natural language and actual side-effecting operations. Below I’ve rounded up the top MCP servers that bring the most immediate, day-to-day value to developer workflows, the ones I’d try first if I were wiring ChatGPT into my stack.

How to connect ChatGPT with MCP servers

In ChatGPT, open Settings → Connectors → Advanced Settings → Developer Mode
Enable Developer Mode. You’ll see an option to add connectors in the chat input.
Add or point to an MCP server. Many servers publish their own quick-start commands if you’re running them.
For any action that writes data, ChatGPT will ask you to confirm before proceeding with that action.

You can follow along this configuration for Rube MCP below, other connections will need similar/same actions.

Note: Keep secrets like API keys scoped and use least privilege keys. If you’re not much familiar with MCPs, I would suggest you to treat them as production automation tools because any connected app has the ability to run real actions against your tools.

1. Rube MCP: A Universal MCP server for all your apps

Most MCP servers connect you to a single tool - Github, Notion, Slack, etc.. That's fine if you only need one or two, but it quickly gets messy when you're working around several MCP servers. Some MCP clients even limit how many MCP servers you can add, because once you stack up too many of them, the model's context window gets smaller and even harder to work with.

Rube MCP by Composio solves that by giving you a universal place to manage them all, so instead of switching between separate MCp servers, you simply connect to Rube once and get access to 500+ apps through Composio's integration layer. That includes Slack, Notion, Github, Linear and plenty more like them.

So next time, if you wanna run a prompt like this: "Take new Github issues, and post them in Slack", you can run it entirely in Rube without any extra configurations, and thus you don't have think about which server does what.

If you want to get started with Rube MCP in your ChatGPT interface, here's a short demo for the same:

Resources to get started:
https://rube.app
https://github.com/ComposioHQ/Rube

2. Stripe

Stripe provides an official MCP server that lets you manage your payments right inside an MCP Client. If you want to check payments, send a refund, or pull some quick information, you'll have to log into the Stripe dashboard, click around and maybe even copy data somewhere else. With the official MCP support, you can do the same things directly inside ChatGPT.

You can ask it for stuff like:

Show me all unpaid invoices this week
What's today's revenue so far?

Behind the scenes, the server just exposes a set of Stripe’s API endpoints in a format that ChatGPT can call. You connect it with an API key, and from then on you don’t need to switch between tabs whenever you want quick payment info.

If you spend time in support or billing, this cuts out a lot of back-and-forth. Instead of opening Stripe for every little thing, you just stay in your chat window. You can read more about the MCP server here

3. Cloudflare Observability

Cloudflare Observability MCP server allows an MCP client to access performance and uptime metrics for websites and applications. Instead of manually checking dashboards, developers can query latency, error rates, or traffic patterns directly through their AI agent.

This is useful for monitoring system health, detecting issues early, or comparing metrics across environments without switching between multiple tools.

Example queries you might run:

Fetch uptime stats for a specific domain
List recent error logs
Compare traffic spikes over the past 24 hours

Resources:
https://github.com/cloudflare/mcp-server-cloudflare
https://developers.cloudflare.com

4. ThoughtSpot

ThoughtSpot’s MCP server brings analytics and reporting capabilities into your MCP client. Rather than navigating complex BI tools, you can request data summaries or perform searches in plain language. It’s particularly handy for quickly checking business metrics, generating insights for reports, or exploring datasets without leaving your workflow.

Example use cases:

Retrieve sales metrics for a particular product
Identify top-performing regions or categories
Generate simple tables or summaries for analysis

Resources to get started with:
https://agent.thoughtspot.app
https://github.com/thoughtspot/mcp-server

5. Carbon Voice

Carbon Voice exposes productivity and communication-related functions to your AI Agent/MCP Client. You can access notes, reminders, or task-related information without switching apps. This is useful for staying organized, automating task updates, or querying ongoing action items directly in the chat.

Possible actions you can peform:

List upcoming tasks
Summarize meeting notes
Send notifications to team members

Docs: https://www.getcarbon.app/mcp/get-started-with-mcp

6. Zine

Zine provides memory and context management capabilities for AI agents. Your MCP Client can store, retrieve, or update contextual information across interactions from various apps/tools. This helps maintain continuity in conversations or workflows, especially for long-running projects or multi-step automations.

You can perform actions like, storing key project decisions in Notion, Retrieving context from previous interactions on Twitter, track your progress on tasks in Linear over time, and a lot more.

Resources to get started:
https://www.zine.ai
https://www.youtube.com/watch?v=Qd7EkwzJbJg

7. Needle

Needle is focused on knowledge retrieval and RAG (retrieval-augmented generation) functions. ChatGPT can pull structured information from internal or external knowledge bases efficiently. This can be valuable for research, customer support, or creating documentation without manually searching multiple sources.

This is especially valuable for:

Customer support teams needing quick answers from internal documentation
Researchers pulling references from multiple knowledge repositories

Example queries you can try:

Search our internal knowledge base for all troubleshooting steps related to 502 errors
Summarize the top 3 FAQs from our product docs

Resources to get started: https://docs.needle.app/docs/guides/mcp/needle-mcp-server

8. Fireflies

Fireflies provides an MCP server that connects your AI agent to meeting intelligence. Instead of manually digging through call recordings or transcripts, your MCP client can fetch summaries, action items, or highlights directly.

With this server, you can:

Retrieve transcripts of past meetings
Generate summaries or follow-up notes
Search across conversations for specific topics or decisions

This is especially useful for teams that run multiple customer or internal calls daily and want meeting data to flow into their broader workflows (like syncing with Notion, Slack, or project tools).

Resources to get started:
https://guide.fireflies.ai/articles/8272956938-learn-about-the-fireflies-mcp-server-model-context-protocol
https://fireflies.ai/blog/fireflies-mcp-server

9. Webflow

The Webflow MCP server lets your MCP client or AI agent interact directly with Webflow projects. Instead of switching into the Webflow dashboard for every update, you can query collections, modify CMS items, and trigger publishes programmatically.

You can perform typical actions like:

Listing CMS collections
Creating or updating CMS entries (e.g., blog posts, product data, case studies)
Publishing a site update
Running batch updates across multiple projects

This makes it easier to keep content and design workflows consistent without manually clicking through the Webflow UI.

Resources:
https://developers.webflow.com/data/docs/ai-tools
https://github.com/webflow/mcp-server

10. Apify

Apify MCP server provides web scraping and automation capabilities. Your MCP client, ChatGPT in our case can request structured data from websites, automate repetitive tasks, or extract content from multiple sources. It can be useful for market research, data collection or monitoring online content without manual effort.

For example, you could pull product listings from an e-commerce site, collect user reviews or ratings, or monitor price changes over time. This allows you to gather insights and track online trends efficiently without manually visiting each site.

Quickstart guide: https://docs.apify.com/platform/integrations/mcp

Final Thoughts

These were the handpicked MCP servers I’d start with today, not because they’re flashy but because they solve everyday bottlenecks I face. The case for MCPs is simple: they give LLMs a standard, auditable way to take action with least privilege access. That means you keep control while still getting real leverage. For anyone who wants to explore more MCP servers or follow up on similar tools, you can go through this page: https://www.remote-mcp.com/

If you want one place to start and stay, use Rube MCP. One connection gives you access to hundreds of apps through a single server, unified auth, and consistent safety prompts for writes. It keeps your context tidy (no juggling multiple servers), scales as your stack grows, and lets you mix tools GitHub, Slack, Stripe, Notion, without reconfiguring every time. Start with Rube, add only what you need, and keep the human‑in‑the‑loop by design.

Claude Code vs Codex: Dev Workflow Comparison

Rohith Singh — Mon, 15 Sep 2025 03:57:55 +0000

For the past few days, there has been a lot of hype around OpenAI's Codex. And at the same time, Claude Code has been evolving day by day, to a perfect AI Agent with a list of features like subagents, slash commands, MCP support, and so much more. While I still prefer Claude Code, I thought it would be interesting to see how both of them perform on the same task. People say Codex + GPT-5 provides code closer to what a human would write, so let's test them out.

Before we begin, Codex has introduced their support for stdio based MCPs. But still lacks the direct support for HTTP endpoints for MCPs. So to make sure our MCPs work, I've written a simple proxy layer over the stdio support so that Codex can use MCPs like Figma, Jira, GitHub, and more. You can find the code here: rube-mcp-adapter-auth.js

So I ran a real build using Figma MCP for UI cloning and a separate coding challenge. And as always both agents got identical prompts, same setup.

All the code from this comparison can be found here: github.com/rohittcodes/claude-vs-codex.

TL;DR

Don't have time? Here's what happened:

Figma cloning: Claude Code captured the design better but missed the yellow theme and a few details; Codex created its own version but was faster and cheaper
Job scheduler: Claude Code provided more reasoning steps and structured code; Codex was concise and faster
Overall: Claude Code is better for complex, detailed tasks with multiple steps. Codex is more efficient for straightforward code generation, with its own way of writing code.
UX/DX: Codex felt simpler to set up, and use (not the http based MCPs); Claude’s developer experience felt deeper once you get used to it.
Cost: Claude Code used more tokens overall (Figma: 6,232,242; Scheduler: 234,772) vs Codex (Scheduler: 72,579; Figma: 1,499,455)

Introduction

Claude Code comes with native MCP support and extensive context windows. Codex recently added stdio-based MCP support (they still don't have direct support for HTTP endpoints for MCPs), while Claude Code supports MCPs out of the box. Btw, If you don't know what MCPs are, you can read about them here.

Instead of benchmarks, I wanted a practical comparison: build something devs can recognize. So, the tasks I picked were:

Figma UI cloned into a working frontend
A lightweight job scheduler with timezone handling

All within one day, with me just prompting.

How I tested them

I ran both agents through identical challenges:

Tools: Rube MCP + Figma
Languages: TypeScript
Measure: Token usage, time, code quality, dev experience
Both agents got the same prompts to keep it fair.

Rube MCP - Universal MCP Server

Rube MCP (by Composio) is the universal connection layer for MCP toolkits like Figma, Jira, GitHub, and more. Explore toolkits: docs.composio.dev/toolkits/introduction.

How to connect:

Go to rube.composio.dev
Click "Add to Claude Code"

Copy the command, run it in your terminal, then run /mcp to authenticate your Rube MCP server. Once done, you can start using the tools.

For Codex, we’ll reuse the same auth token via the proxy layer, setup the rube-mcp-adapter-auth.js file from the repo. See Codex config docs here if you want more control over Codex setup. For now, your config.toml should contain:

[mcp_servers.rube]
command = "node"
args = ["your-path-to/rube-mcp-adapter-auth.js"]

Coding Comparison

Round 1: Figma design cloning

I picked a complex landing page from Figma Community and asked both agents to recreate it using Next.js and TypeScript. You can find the Figma design here.

Prompt:

Recreate the Figma landing page at [FIGMA_URL] in Next.js + TypeScript using TailwindCSS v4 only (no config file).
Follow a modular structure (components/layout/*, components/ui/*, components/sections/*), ensure pixel-accurate fidelity (typography, spacing, shadows, colors), and make it fully responsive (desktop, tablet, mobile).
No inline styles or third-party UI libraries.
Extract reusable components for repeated Figma elements, and enforce strict TypeScript types (no any).
Goal: A clean, maintainable, production-ready codebase that mirrors the Figma design as close as possible.

I wasn’t building the full developer platform here, just cloning a large landing page to see how close each agent could get.

Claude Code results

Claude Code (Sonnet 4) delivered a working Next.js app but missed the yellow theme entirely. It captured the design structure to some extent and even exported images from the Figma design, but the visual accuracy was disappointing. The layout was there but colors, spacing, and typography were noticeably different from the original.

Tokens: used a lot more than Codex.. 6,232,242 tokens to be exact.
Time: Longer due to more iterations
Design fidelity: Partial - missed key theme elements

Codex results

Codex (GPT-5 Medium) created its own version of the landing page. It didn't replicate the theme, layout, or components from the original design. Instead, it built a decent-looking landing page from scratch with no image exports. The result was functional but completely different from the Figma design.

Tokens: fewer than Claude Code (i.e., 1,499,455 tokens)
Time: ~10 minutes
Design fidelity: None - created original design

Claude Code captured more of the original design but missed critical elements. Codex was faster and cheaper but ignored the design brief entirely.

Round 2: Job scheduler challenge

For the second task, It took a lot of time to decide upon this, it maybe not the best, but this is what I have for now.. PS: Suggest me some ideas for the new blogs.

I threw a complex TypeScript challenge at both agents: build a timezone-aware cron scheduler with persistence and catch-up execution. This tests system design, timezone handling, and production-ready code structure.

Prompt:

Build a lightweight job scheduler in TypeScript (Node.js) with the following requirements:
- Supports cron-style expressions (e.g., "09**1" = every Monday at 9AM).
- Must be timezone-aware: jobs scheduled in "America/New_York" vs "Asia/Kolkata" should trigger at correct local times even if the server runs in UTC.
- Implement a persistence layer (SQLite or JSON file) so scheduled jobs survive restarts.
- On startup, the scheduler must detect missed jobs (e.g., if server was down at scheduled time) and run catch-up executions.
- Provide a clean TypeScript interface with addJob, removeJob, listJobs methods.
- Include at least one example job (e.g., log "Hello World" daily at 9 AM in two different timezones).
- Must be written in modular, production-ready TypeScript (no any, no inline hacks).
- Optimize for readability + maintainability.

You can run both the projects by cloning the repo here

Claude Code results

Claude Code delivered a comprehensive solution with extensive documentation and reasoning steps. It provided detailed explanations, great comments for typical part of the codes, and built-in test cases. The implementation was thorough with proper error handling, graceful shutdown, and production-ready structure.

Tokens: 234,772. Higher token usage due to detailed explanations
Time: Longer due to comprehensive approach
Code quality: Production-ready with extensive documentation

Codex results

Codex was more concise and direct. It built a modular, timezone-aware cron scheduler with JSON persistence and catch-up functionality. The solution was clean and functional but with less verbose explanations. It focused on getting the job done efficiently.

Tokens: 72,579. Lower token usage, but more concise
Time: ~15 minutes
Code quality: Clean and functional

Both delivered working solutions. Claude Code provided more educational value and comprehensive documentation, while Codex was more efficient and direct.

What it costed (tokens + time)

Numbers vary by task complexity, but relative behavior was consistent:

Figma task: Claude Code used significantly more tokens due to detailed reasoning and image exports; Codex was more efficient
Scheduler task: Claude Code provided comprehensive documentation but higher token usage; Codex was concise and faster
Overall: Claude Code (Sonnet 4) ~2-3× Codex (GPT-5 Medium) on token usage

Exact usage so far, Figma: Claude Code 6,232,242; Codex 1,499,455. Scheduler: Claude Code 234,772; Codex 72,579.

Conclusion

Both can build apps with MCPs in a single day, but they approach tasks differently:

Claude Code strengths

Better design fidelity with Figma (when it follows instructions)
More comprehensive documentation and reasoning
Production-ready code structure
Educational value with detailed explanations

Codex strengths

Faster raw generation
More cost-effective token usage
Direct, concise solutions
Good for "get something running" quickly

As for my take, use Codex if you want a prototype fast and cheap, or when design fidelity isn't critical, Only use Claude Code if you care about maintainability, documentation, and production readiness. And also for design-heavy tasks, Claude Code is better but can miss key elements (like the yellow theme) or maybe it was because the recent performance issues with ClaudeAI.

I built my complete side-project in a day using Claude Code and MCP, now you know why they don't hire jr devs anymore

Rohith Singh — Thu, 28 Aug 2025 14:10:27 +0000

I've been vibe coding since almost before Karpathy named it vibe coding, but yeah, I don't wish to, yet that's how things work these days. You can promptly ship a product in a single day. With the progress of MCP Servers, things have been getting better and better. I do almost all of my work with MCPs and Claude Code. Not because I'm lazy (or as some people would say, "skill issues"), but because I can do 10x more work in a single day. Just by reviewing the code and making changes.

This post was inspired by Gareth Dwyer's blog, where he mentioned using Claude Code for almost everything he does, such as shipping products and building websites. Let's discuss my experience with the same. I did, and it blew my mind.

TL;DR

I built a complete MVP for an invoice management platform in one day using Claude Code and MCPs. What normally takes weeks of jumping between tools, Claude Code handled automatically from database setup to email testing. MCPs connected everything I needed without leaving my terminal. It was great.

The entire build flow cost me around $3.65 (Sonnet 4: $3.63 + Haiku: $0.02), that’s almost around 5.8M tokens processed, and some manual configurations for less than the price of a latte.

The build is open sourced and can be found here: https://github.com/rohittcodes/linea. You’re free to contribute as well (it’s still in early development).

Intro

I've been using Claude Code for months, but I was sceptical. Could it replace my usual development workflow? Meaning, I'm accustomed to switching between VS Code, GitHub, Figma, my database dashboard, Slack, and email. You know the drill.

Then, MCPs (Model Context Protocols) came. Think of them as bridges that let Claude Code communicate directly with all your tools, without having to hop between apps.

I happen to use Rube, a universal MCP server from Composio. It’s a bit of a shameless plug, but we kinda made it for this purpose. A single MCP server with only seven tools that can communicate with any app on demand without the OAuth fuss.

Deciding on the build

So I decided to put this to the test. Instead of building another to-do app, I wanted to create something I'd use, like an invoice management platform for freelancers.

Here's the thing: I've built similar apps before. Usually takes me 2–3 weeks of solid work. Setup, authentication, UI components, PDF generation, email integration... It’s a lot of moving parts.

With Claude Code and MCPs? I decided to give myself one day and see what happens.

The Setup (and the actual build)

I started with a simple prompt to Claude Code. Nothing fancy:

Build me an invoice management app. Next.js, Postgres(Neon), Prisma, authentication with Next-Auth/Auth.js, PDF generation, email sending. Make it look professional.

That's it. No detailed specs, no wireframes, no technical architecture documents. Just a simple request.

Setting Up Rube MCP With Claude Code

Rube is a universal MCP server that you can use to call a list of toolkits for your AI Agents. You can have toolkits like GitHub, Figma, Linear, and many more using just Rube. However, the more MCP servers you add to your AI workflows, the smaller the context window becomes, which exacerbates the issue for complex workflows. At that point, you can use Rube. Setting up Rube is just child’s play.

Visit the Rube page: https://rube.composio.dev
Click the installation button and select Claude Code

Copy the installation command and run it in your terminal (make sure Claude Code is already installed)
And done! You can just run Claude and ask the Rube MCP to do things for you. Run the /mcp command to make sure you are connected to the MCP server. If not, click on the server and authenticate yourself with Rube using the generated link.

How It Went Down

Claude Code immediately started working. The first thing I did was authorise Rube MCP and connect the GitHub toolkit. I asked it to create a new repository and open a PR, and it just worked.

I connected the Figma toolkit and asked it to analyse some designs I had lying around (I just mentioned I wanted it to "look professional"), and it extracted a complete design system. Colours, fonts, spacing-everything perfectly organised into CSS variables.

You can compare the Figma design for the templates here: https://www.figma.com/community/file/1265787783615420446/invoice-design-kit-brix-agency

The best thing about the Figma MCP was that it started with a detailed analysis and plan, then it began building the components, followed by pages... (blah, blah), and it was completed.

Meanwhile, it spun up a Postgres database using Neon MCP. No manual configuration required, eliminating the need to copy and paste connection strings. Just done.

I'm sitting here drinking my coffee, watching Claude Code work like a junior developer on steroids who works 24/7 for me and never gets tired.

The Real Magic

By lunch, I had a working authentication system. Email magic links, session management, the whole thing. I didn't write a single line of auth code myself.

Then came the fun part, building the actual invoice features. I asked it to research what users want in invoice tools, and it used a web search tool to gather feedback about pain points in existing solutions.

The crazy part? Most things just worked. Although some Tailwind configuration issues required manual fixes, the overall experience was significantly smoother than I expected.

It worked?

I had something that looked like a good product (I won't say the perfect/real product, you still gotta make some manual changes):

Clean dashboard with analytics
Client management system
Invoicing with multiple templates
PDF generation that looked good
Email sending that worked on the first try

I kept waiting for something to break—some edge cases to surface. The usual development pain points kick in. Fortunately, it was fine for the most part.

The Cost Factor to decide

Alright, let’s talk money, because building fast is great, but if it burns a hole in my pocket, that’s not sustainable.

Yesterday’s entire build session cost me $3.65. That’s it.

Claude Sonnet 4: $3.63 (did all the heavy lifting)
Claude Haiku 3.5: $0.02 (handled the quick, lightweight stuff)

To put it in perspective: I pushed around 5.8 million tokens through Claude in a single day. That’s database setup, repository creation, design parsing from Figma, authentication scaffolding, PDF generation, and even wiring email, all done for under four bucks.

Honestly, that’s great. If I compare it to the old-school workflow, spinning up infra, wrangling boilerplate, burning hours in dashboards, that’s not just cheaper, it’s ridiculously more efficient. For less than the price of a Starbucks latte, I shipped an MVP.

You would’ve to shell out a few thousand dollars for this easily. Considering that this is nothing.

So yeah, the numbers make the story even better: this isn’t some expensive gimmick, it’s a genuinely cost-effective dev workflow.

About the product

A pretty solid invoice management platform, with features like (MVP features):

User authentication (magic links)
Client management with contact details
A couple of invoice templates
PDF generation
Email sending
Basic dashboard
Revenue tracking

The tech stack? Next.js 14, PostgreSQL, Prisma, NextAuth.js, Tailwind CSS. Pretty standard stuff.

But here's the thing that surprised me: I didn't have to think about most of the setup. The database schemas were created by Claude Code (I just provided a prompt, and it did the rest). API endpoints? Done. Email configuration? I had to intervene a bit, but it was still way faster than manual setup.

But Wait, Am I Just Getting Lazy?

Look, I get it. There's this voice in my head too saying to me, "you're not really coding anymore" or "you're losing your skills. "Here's the thing. However, I'm not outsourcing the thinking. I'm still making the architectural decisions, reviewing code, and debugging when things get tricky.

Claude Code handles the repetitive stuff—the boilerplate. The "create another CRUD endpoint" tasks that we all do, but nobody enjoys.

When things get complex—such as security, data modelling, and performance optimisation—I still dive in and code it myself. But for everything else? Why waste time when I could be shipping?

What This Means for Us Developers

Honestly? This is where development is heading. Not replacing developers-we're still needed for the thinking, the architecture, the complex problem-solving.

But the grunt work? The setup, configuration, and deployment pipeline stuff? Claude Code and MCPs can handle that pretty well, and faster than manual setup. It's like having a skilled junior developer who is well-versed in the frameworks.

What's Next for the product

I will continue to build upon this workflow. The invoice platform I made is solid. I might clean it up and release it properly. I'm also going to build a few more products with this workflow.

Here’s a demo of how the interactions look like in the development stage. There are few more features stacked up, will be releasing when everything fits the vibe.

Final Thoughts

I'm not saying Claude Code and MCPs will replace traditional development overnight. There are still plenty of cases where you need to get your hands dirty with code. But for shipping products fast? For getting ideas from concept to reality in hours instead of weeks? This workflow is pretty useful.

The invoice platform I built in one day would have taken me at least 2–3 weeks the old way. And that's with me being pretty experienced with the tech stack. And don’t forget to drop a star here: https://github.com/rohittcodes/linea.

Can You Build AI Agents in Rust? Yep, and Here’s How I Did it

Rohith Singh — Tue, 19 Aug 2025 16:19:01 +0000

Everyone's building AI agents these days, and everyone's teaching you how to do it in Python or JavaScript. Nothing wrong with Python. It's fast to prototype with and has a mature ecosystem. But I wanted to try something different. What if we could build a multi-agent system that orchestrates different specialised agents, each connected to real-world tools via MCP (Model Context Protocol), and what if we built it in Rust?

That’s exactly why I built Codepilot, a multi-agent system that can handle Linear project management, GitHub repository operations, and Supabase tasks, all through a beautiful terminal UI.

It’s a fun side project, and if you’re curious and want to try things with Rust, maybe you'll find this useful. The source code is available on my GitHub here: rohittcodes/codepilot.

Why a Multi-Agent System and why Rust in particular?

Traditional AI agents are great, but they often struggle when you need to handle multiple domains or complex workflows. What if you want to:

Create a GitHub issue and link it to a Linear project.
Query your Supabase database and create a summary report.
Manage repositories across different services.

A multi-agent system solves this by having specialized agents that can collaborate and orchestrate complex workflows.

credits: Langchain

And why Rust??

Rust isn’t the usual go-to for AI, but it has some killer benefits on its side:

Performance: Zero-cost abstractions and memory safety mean your agent runs fast without eating resources
Type Safety: Errors can be caught at compile time, not when your agent’s halfway through a task.
Ecosystem Potential: Although the AI ecosystem is more mature in Python, Rust’s async/await model and strict typing make it ideal for agents to juggle between multiple tools, APIs, or tasks.

And now, if you wish to build something fast, reliable, and scalable, Rust becomes a solid choice there. So, before we dive deep into building it, let’s start with the basics.

What is an AI Agent by the way?

An AI agent is a program that can understand your intent and take actions on your behalf. Think of it as one of your intelligent assistants that doesn't just chat, it does things. In our case, the agent understands when you're asking about Linear issues, GitHub repositories, or Supabase data, and then calls the appropriate APIs to retrieve the information, combining it with natural language responses.

Agentic Architectures (credits: Langchain)

One of the key insights here is that LLMs excel at understanding intent (what you want to do), but struggle to access real-time information. By combining LLMs with APIs, we can create a program that automates tasks for you, eliminating the need for manual effort. Now you can get the best of both worlds: natural language understanding plus real-time information access.

Getting started with the Rust AI Agents

Alright, let’s build the thing. I didn’t want to overthink about the setup, just a plain Rust binary project, a few crates to make async work easier, and enough structure to plug the tools.

Setting up the project

First, we create a new Rust binary project (not a lib):

cargo new codepilot
cd codepilot

Add these dependencies to your Cargo.toml:

[dependencies]
anyhow = "1"
chrono = { version = "0.4", features = ["serde"] }
dotenv = "0.15"
tokio = { version = "1", features = ["full"] }
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
swarms-rs = "0.1.9"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
reqwest = { version = "0.11", features = ["json"] }
crossterm = "0.27"
tui = "0.19"
ratatui = "0.24"
regex = "1.10"

Building my first agent in Rust

The core idea is simple: A multi-agent system with specialized agents, which will have tools that the LLM can call, then let the LLM decide which agent to use based on the user’s query. Here’s the basic structure:

// src/agents/linear.rs
pub struct LinearAgent {
    agent: Box<dyn Agent>,
    linear_client: LinearMCPClient,
    available_tools: Vec<ToolInfo>,
}

impl LinearAgent {
    pub async fn new(api_key: String, config: &Config) -> Result<Self> {
        // Initialize agent with dynamic tool discovery
    }
}

Integrating Composio MCP Servers

To connect each agent with real-world APIs, we use Composio MCP Integration. These servers expose authenticated API actions your agents can call, without you having to handwrite integrations.

For Codepilot, I’ve set up MCP servers for:

GitHub: To handle repos, issues, PRs, etc.
Linear: For project and issue management
Supabase: For querying and updating data

You can create your own MCP server configs in a few clicks.

How to add your own MCP config in Composio

If you want to integrate your tools (or replicate what I’ve done), here’s the flow using the new Composio dashboard:

Log in and go to MCP Configs.
Click “Create MCP Config”
Give it a name (like linear-agent or github-bot)
Choose the toolkit (e.g., Linear, GitHub, Supabase)
Select how you want to handle authentication.
Paste your API keys or use OAuth to connect.
Pick the tools you want your agent to have access to
Hit “Create MCP Server”. This will prompt you to a dialog where you can copy the MCP server URL. Paste it in the .env file with appropriate variable names.

Codepilot Architecture:

The core architecture is built around three core principles:

Specialized Agents: Each agent (Linear, Github, Supabase) is an expert in its domain.
MCP Integration: All agents connect to the MCP tools via Composio Integrations.
Intelligent Orchestration: A central orchestrator that routes queries to the right agent.

The Core Components

pub struct MultiAgentOrchestrator {
    linear_agent: LinearAgent,
    github_agent: GitHubAgent,
    supabase_agent: SupabaseAgent,
}

pub struct LinearAgent {
    agent: Box<dyn Agent>,
    linear_client: LinearMCPClient,
    available_tools: Vec<ToolInfo>,
}

Each agent is a specialist who knows how to work with its specific domain tools, fetched dynamically from MCP Servers.

Dynamic Tool Discovery

One of the coolest features is that each agent discovers its tools dynamically from MCP servers:

impl LinearAgent {
    pub async fn new(api_key: String, config: &Config) -> Result<Self> {
        let linear_client = LinearMCPClient::new(config);

        // Get dynamic tools from the MCP server
        let tools_response = linear_client.get_tools().await?;
        let available_tools: Vec<ToolInfo> = tools_response
            .iter()
            .map(|tool| ToolInfo {
                name: tool["name"].as_str().unwrap_or("unknown").to_string(),
                description: tool["description"].as_str().unwrap_or("No description").to_string(),
                input_schema: tool["inputSchema"].clone(),
            })
            .collect();

        // Create dynamic system prompt with actual tool descriptions
        let tools_description = available_tools
            .iter()
            .map(|tool| format!("- {}: {}", tool.name, tool.description))
            .collect::<Vec<_>>()
            .join("\n");
       }
}

This means, no hardcoded operations - the agents automatically adapt to whatever tools are available on their MCP Servers!

What's happening here?

The system uses pure LLM-based tool selection with intelligent fallbacks. When you ask a question, here's what happens:

True LLM-Based Tool Selection

The agents use a sophisticated approach where the LLM analyses your request and mentions specific tools:

// 1. LLM analyzes the query and mentions specific tools
let llm_response = self.agent.run(query.to_string()).await?;
// LLM says: "I would use GITHUB_LIST_REPOSITORIES to fetch your repositories"

// 2. Parse LLM guidance to extract tool selection
for tool in &self.available_tools {
    if guidance_lower.contains(&tool.name) {
        // LLM mentioned this tool - execute it
        return self.execute_tool(tool, arguments).await;
    }
}

// 3. If LLM doesn't mention tools, provide a clear error message
return Ok("I don't have a tool for that request. Available tools are: [list tools]");

Constrained Agent Configuration

To prevent the LLM from calling internal tools, we use a constrained configuration:

let agent = client
    .agent_builder()
    .agent_name("GitHubAgent")
    .system_prompt(system_prompt)
    .user_name("User")
    .max_loops(1)  // Single loop to prevent tool calling
    .temperature(0.1)  // Focused responses
    .max_tokens(2048)  // Shorter responses
    .build();

Clear Tool Constraints

The system prompt explicitly constrains the LLM only to use available MCP tools:

let system_prompt = format!(
    "You are a GitHub agent. You can ONLY use these GitHub MCP tools:

{}

CRITICAL: You are NOT allowed to use any other tools.
You can ONLY mention and use the tools listed above.

When a user asks you something:
1. Look at the list of tools above
2. Find the most appropriate tool for their request
3. Mention the exact tool name you would use
4. Explain why you chose that tool

Example responses:
- 'I would use GITHUB_LIST_REPOSITORIES to fetch your repositories'
- 'I would use GITHUB_CREATE_ISSUE to create a new issue'

If no tool matches the request, say: 'I don't have a tool for that request. Available tools are: [list tools]'

Remember: ONLY use tools from the list above. Never use any other tools.",
    tools_description
);

When you ask "List all my GitHub repositories", the system:

Orchestrator LLM → "USE_GITHUB_AGENT"
GitHub Agent LLM → "I would use GITHUB_LIST_REPOSITORIES to fetch your repositories."
Tool Execution → Executes GITHUB_LIST_REPOSITORIES with proper arguments.
Result → "LLM Analysis: [reasoning] + GitHub Operation: [tool execution result]"

Understanding the code

At the core of everything here is the MultiAgentOrchestrator struct, which wires everything together:

pub struct MultiAgentOrchestrator {
    linear_agent: LinearAgent,
    github_agent: GitHubAgent,
    supabase_agent: SupabaseAgent,
}

Each agent here resides in its own module, making it easy to plug in or swap out components. The LLM is guided by a system prompt that tells it exactly what tools are available and how to use them. Something like:

You are a multi-agent orchestrator. You have access to these agents:
- Linear Agent: Project management and issue tracking
- GitHub Agent: Repository and code management
- Supabase Agent: Database operations and queries

Based on the user's query, determine which agent(s) to use and provide a helpful response.

Pretty clean, right? You get reasoning, tool usage, and a conversational reply, all in a single setup.

Demo of what I’ve built and how things work (High Level)

Here’s what the interaction looks like:

The multi-agent system intelligently routes each query to the appropriate agent, then combines the LLM's conversational response with real data.

Conclusion

This was a fun little project to work on, given the usual Python-heavy agent world. Rust isn't traditionally the go-to for these AI workflows, but it's surprisingly too good at handling real-world agent logic once you get past the initial obstructions. The type system gives you confidence, async works well enough, and once you have your tools in place, everything seems quite simple to plug.

Not production-ready yet, but as a weekend project and to learn things, I'd say it's totally worth trying to build things like this. Again, the complete source code is here: rohittcodes/codepilot. Try it out and let me know what you come up with.

How I Used Claude to Create and Assign Issues in Linear

Rohith Singh — Tue, 12 Aug 2025 17:19:14 +0000

In my previous posts, I showed how I used Claude with Composio's MCP layer to skip dashboards and manage tools like Neon and Supabase from a Claude session window. I also shared how I automated my day-to-day Jira tasks using the same approach. So if you're interested, check out that post too..

Linear and Jira both handle project Management, but Linear's focus is on fast, modern issue tracking, perfect for developers who want a smooth experience. Still, even in Linear, opening the UI every time you want to create a bug, assign tasks, or update statuses can get old fast.

So, I used the Linear MCP server from Composio, connected it to Claude Code, and now you can manage Linear projects just from your terminal, i.e., no UI, no endless clicking.

What is MCP?

This time, let’s briefly explain MCPs with a use-case lens:

Think of MCPs as a way to turn APIs into something that Claude can “understand” and “use”, like plugging tools into some AI Agent’s brain.

If you want more background on it, check out my Jira blog or Anthropic’s MCP overview.

What can a Linear MCP Server do?

Let’s say you’re in a flow, ideating or writing code, and you suddenly think:

“I should create a bug ticket and assign it to someone in the frontend team.”

With Linear MCP and Claude, you just type:

“Create a bug in the Payments project called “Fix refund edge case crash” and assign it to @alex.”

… and it’s done.

No switching tabs. No forms, no remembering project IDs.

What you can do with Linear MCP:

Create issues using LINEAR_CREATE_LINEAR_ISSUE
Update issue status, title, priority with LINEAR_UPDATE_ISSUE
Delete issues when no longer relevant using LINEAR_DELETE_LINEAR_ISSUE.
Fetch issue details on demand with LINEAR_GET_LINEAR_ISSUE

And there are a lot of tools you can use: Follow this docs page.

Why use Composio for this?

Let’s say you’re building a productivity AI or just want to let Claude manage your Linear workspace without building everything yourself. If you connect directly to Linear’s API or its MCP, you’d still need to handle:

OAuth flows or personal access tokens.
Managing sessions and tokens.
Keeping everything updated with Linear’s API changes

Composio handles all of that for you. It acts as an integration layer that hosts MCP specs, handles auth, so all you have to do is to pick Linear from Composio’s integrations list and start prompting.

What we’ll be covering

In this post, we’ll go through:

What’s a Linear MCP and how it works
How to set it up using Composio
Using the MCP server with Claude Code in your terminal

How to set up the Linear MCP using Claude Code

You can easily set up a Composio MCP in 2 ways:

Option 1: Quick Setup via Composio MCP page

Head over to the Composio MCP Page for Linear
Switch to the Claude tab.
Click Generate, then copy the generated command.
Paste and run it in your terminal.

npx @composio/mcp@latest setup "https://mcp.composio.dev/partner/composio/linear/mcp?customerId=[your-customer-id]&agent=claude" "linear-vbusm8-8" --client claude

Copy the config to your local project:

cp ~/.config/claude/claude_desktop_config.json .mcp.json

Start Claude, and ask it to authenticate you with Linear MCP. It’ll generate an Auth URL to authenticate and authorize your Client.

I could have saved a few tokens if I passed a correct prompt, i.e., to initiate connection using linear mcp..

Option 2: Use the Composio Dashboard

If you want to set up scopes, test actions, or run more advanced flows:

Head over to the Composio Dashboard
Navigate to MCP Configs, then hit Create MCP Config.
Give a name to the MCP config, pick Linear from the list of toolkits, and select how you want to handle authentication → Select the tools you want to use from the list.
In the integration step, look for Linear in the MCP Configs page, and proceed by clicking Create MCP.
Once that’s done, you’ll be prompted to connect your Linear account. A new tab will open where you can log in and grant the necessary permissions.
After that, a modal will appear with a ready-to-run npx command, copy and run it in your terminal to use the MCP with Claude code.

Once everything’s connected, test it right in the Playground in Composio first.

Example:

Create a bug in the Growth project titled "Login button unresponsive", and add "Users can't click the button on mobile."

Using the Linear MCP Server with Claude Code

Now that it’s all set up, try prompting Claude with things like

“Create a bug in the Billing project with priority High.”
“Assign this issue to Emily and label it urgent.”
“List all tasks due this week.”

You can run this from Claude Code, Cursor, Windsurf, or your own CLI wrapper using HTTP.

Conclusion

The blog post isn’t about Linear’s UI being bad (it’s super cool), but because sometimes I just want to think in tasks, not a typical dashboard. If you’re building your own AI workflows or just want a more natural way to manage issues, give the Linea MCP a try.

The best part? Once you’ve set up one MCP, doing the same for other tools like Github, Supabase, or Notion feels like a simple 5-minute job.