<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rohith Singh</title>
    <description>The latest articles on DEV Community by Rohith Singh (@rohittcodes).</description>
    <link>https://dev.to/rohittcodes</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1404328%2F421607d9-02a6-4178-899c-85ac63742cc2.jpg</url>
      <title>DEV Community: Rohith Singh</title>
      <link>https://dev.to/rohittcodes</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rohittcodes"/>
    <language>en</language>
    <item>
      <title>Kimi K2 Thinking vs. Claude 4.5 Sonnet vs. GPT-5.1 Codex: Tested the best models for agentic coding</title>
      <dc:creator>Rohith Singh</dc:creator>
      <pubDate>Fri, 14 Nov 2025 13:20:00 +0000</pubDate>
      <link>https://dev.to/composiodev/kimi-k2-thinking-vs-claude-45-sonnet-vs-gpt-5-codex-tested-the-best-models-for-agentic-coding-21e5</link>
      <guid>https://dev.to/composiodev/kimi-k2-thinking-vs-claude-45-sonnet-vs-gpt-5-codex-tested-the-best-models-for-agentic-coding-21e5</guid>
      <description>&lt;p&gt;Three new AI coding models dropped in the past two months. Claude Sonnet 4.5 with extended thinking on September 29. GPT-5 Codex with unified reasoning on September 23. Kimi K2 Thinking with 1T parameters on November 6-7. All three claim to handle complex coding tasks better than anything before them.&lt;/p&gt;

&lt;p&gt;The benchmarks say they're close. I wanted to see what that means for actual development work. So I gave all three the same prompts for two hard problems in my observability platform: statistical anomaly detection and distributed alert deduplication. Same codebase, same requirements, same IDE setup.&lt;/p&gt;

&lt;p&gt;Full code's on &lt;a href="http://github.com/rohittcodes/tracer" rel="noopener noreferrer"&gt;github.com/rohittcodes/tracer&lt;/a&gt; if you want to dig in. Fair warning: it's an evaluation harness I built for this, not a polished product. Expect rough edges.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Test 1 - Advanced Anomaly Detection:&lt;/strong&gt; GPT-5 Codex was the only one that shipped working code. Claude and Kimi both had critical bugs that would crash in production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test 2 - Distributed Alert Deduplication:&lt;/strong&gt; Codex won again with actual integration. Claude had solid architecture, but didn't wire it up. Kimi had clever ideas but a broken duplicate-detection logic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxxnepmzw0f97wqatk36o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxxnepmzw0f97wqatk36o.png" alt="cost" width="800" height="465"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The kicker:&lt;/strong&gt; Codex cost me $0.95 total vs Claude's $1.68. That's 43% cheaper for code that actually works.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Official Benchmarks (For What They're Worth)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;SWE-bench Verified&lt;/th&gt;
&lt;th&gt;GPQA Diamond&lt;/th&gt;
&lt;th&gt;Context Window&lt;/th&gt;
&lt;th&gt;Released&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Sonnet 4.5&lt;/td&gt;
&lt;td&gt;77.2% (82.0% parallel)&lt;/td&gt;
&lt;td&gt;83.4%&lt;/td&gt;
&lt;td&gt;200K&lt;/td&gt;
&lt;td&gt;Sept 29, 2025&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5 Codex&lt;/td&gt;
&lt;td&gt;74.5%&lt;/td&gt;
&lt;td&gt;89.4%&lt;/td&gt;
&lt;td&gt;400K (128K out)&lt;/td&gt;
&lt;td&gt;Sept 15, 2025&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kimi K2 Thinking&lt;/td&gt;
&lt;td&gt;71.3%&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;256K&lt;/td&gt;
&lt;td&gt;Nov 6-7, 2025&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Pricing:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude: $3/M input, $15/M output&lt;/li&gt;
&lt;li&gt;GPT-5: $1.25/M input, $10/M output&lt;/li&gt;
&lt;li&gt;Kimi: $0.60/M input, $2.50/M output&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How I Tested This
&lt;/h2&gt;

&lt;p&gt;I gave all three models identical prompts for two hard problems in an observability platform: statistical anomaly detection and distributed alert deduplication. Not toy problems, the kind of stuff that needs deep reasoning about edge cases and system architecture.&lt;/p&gt;

&lt;p&gt;I set up everything in Cursor IDE, and tracked token usage, time, code quality, and whether it actually integrated with the existing codebase. That last part turned out to matter way more than I expected.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick note on the tooling:&lt;/strong&gt; Codex CLI has gotten way better since I last used it. Streams reasoning, resumes sessions reliably, and shows you cached token usage. Claude Code is still the most polished, with inline critiques, replayable steps, and clean thinking traces. Kimi CLI feels early. No easy way to see the model's reasoning, context fills up faster, and cost tracking is basically non-existent (just a dashboard number). Made iteration painful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Test 1: Statistical Anomaly Detection
&lt;/h2&gt;

&lt;p&gt;The challenge: Build a system that learns baseline error rates, uses z-scores and moving averages, catches rate-of-change spikes, and handles 100k+ logs/minute with under 10ms latency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude's Attempt
&lt;/h3&gt;

&lt;p&gt;Time: 11m 23s | Cost: $1.20 | +3,178 lines across 7 files&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/rohittcodes/tracer/commit/05dbf00f67639c202f72dfbd46cbfd7aa0e8cc31" rel="noopener noreferrer"&gt;Commit 05dbf00&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Claude went all-in. Statistical detector with z-score, EWMA, and rate-of-change checks. Extensive docs. Synthetic benchmarks. It looked impressive.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnnb2z7ey5i7zoz1wbgox.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnnb2z7ey5i7zoz1wbgox.png" alt="Claude Results" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then I actually ran it.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;calculateRateOfChange()&lt;/code&gt; function returns &lt;code&gt;Infinity&lt;/code&gt; when the previous window is zero, and the alert formatter calls &lt;code&gt;toFixed()&lt;/code&gt; on it. Instant &lt;code&gt;RangeError&lt;/code&gt; crash. The baseline isn't actually rolling; the circular buffer drops old samples, but &lt;code&gt;RunningStats&lt;/code&gt; keeps everything, so it can't adapt to regime changes. Unit tests use &lt;code&gt;Math.random()&lt;/code&gt;, making the whole suite non-deterministic. Oh, and none of this is wired into the actual processor pipeline.&lt;/p&gt;

&lt;p&gt;Cool prototype. Completely broken for production.&lt;/p&gt;

&lt;h3&gt;
  
  
  GPT-5 Codex's Attempt
&lt;/h3&gt;

&lt;p&gt;Tokens: 86,714 input (+ 1.5M cached) / 40,805 output (29,056 reasoning)&lt;/p&gt;

&lt;p&gt;Time: 18m | Cost: $0.35 | +157 net lines across 4 files&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/rohittcodes/tracer/commit/878f31336665cdfb7e99e1934841d975b20d7c13" rel="noopener noreferrer"&gt;Commit 878f313&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Codex actually integrated it. Modified the existing &lt;code&gt;AnomalyDetector&lt;/code&gt; class, wired it into &lt;code&gt;index.ts&lt;/code&gt;. It runs in production immediately.&lt;/p&gt;

&lt;p&gt;The edge case handling is solid, checks for &lt;code&gt;Number.POSITIVE_INFINITY&lt;/code&gt; and uses a descriptive string instead of crashing on &lt;code&gt;toFixed()&lt;/code&gt;. The baseline is truly rolling with circular buffers and incremental statistics (sum, sum-of-squares) that update in O(1). Time buckets align on wall-clock boundaries for predictability. Tests are deterministic with controlled bucket emissions.&lt;/p&gt;

&lt;p&gt;There are trade-offs. The bucket approach is simpler but slightly less flexible than circular buffers. It extended the existing class instead of creating a new one, which couples statistical detection to threshold logic. Documentation is minimal compared to Claude's novel-length bundle.&lt;/p&gt;

&lt;p&gt;But here's the thing: this code ships. Right now. As-is.&lt;/p&gt;

&lt;h3&gt;
  
  
  Kimi's Attempt
&lt;/h3&gt;

&lt;p&gt;Time: ~20m | Cost: ~$0.25 (estimated) | +2,800 lines&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/rohittcodes/tracer/commit/ed72f3f2a7e61c2c0e42878072c383928c9268fd" rel="noopener noreferrer"&gt;Commit ed72f3f&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Kimi tried to support both streaming logs and batch metrics. Added MAD and EMA-based detection. Ambitious.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzt7iig1g16se3i480405.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzt7iig1g16se3i480405.png" alt="Kimi Results" width="800" height="230"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The fundamentals are broken, though. It updates the baseline before checking the new value, making the z-score effectively zero. Real anomalies never fire. There's a TypeScript compilation error: &lt;code&gt;DEFAULT_METRIC_WINDOW_SECONDS&lt;/code&gt; used before declaration. Rate-of-change divided by &lt;code&gt;previousValue&lt;/code&gt; without checking for zero, same &lt;code&gt;Infinity&lt;/code&gt; crash as Claude. Tests reuse the same log object in tight loops, never seeing realistic patterns. Nothing's integrated.&lt;/p&gt;

&lt;p&gt;This doesn't even compile.&lt;/p&gt;

&lt;h3&gt;
  
  
  Round 1 Quick Compare
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Claude&lt;/th&gt;
&lt;th&gt;GPT-5&lt;/th&gt;
&lt;th&gt;Kimi&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Integrated?&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Edge cases?&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Crashes&lt;/td&gt;
&lt;td&gt;Handled&lt;/td&gt;
&lt;td&gt;Crashes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tests work?&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Non-deterministic&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Unrealistic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ships?&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$1.20&lt;/td&gt;
&lt;td&gt;$0.35&lt;/td&gt;
&lt;td&gt;~$0.25&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Codex pulled ahead because it was the only one that shipped working, integrated code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool Router Integration
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;I wanted to dogfood Tool router which is in beta and basically allows you to add any Composio apps and it can load tools from appropriate toolkits only when needed based on task context. Reducing you MCP context bloat by a mile. You can read here.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Before kicking off Test 2, I integrated everything through our tool router MCP that we ship with Tracer. Quick refresher on why I bother with it: Tool Router exposes all of a user's connected apps as ready-to-call tools for any agent. One OAuth handshake per user, and the AI SDK gets a unified surface instead of me hand-wiring Slack, Jira, PagerDuty, and whatever comes next.&lt;/p&gt;

&lt;p&gt;What that buys me in practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Unified access with per-user auth:&lt;/strong&gt; one router for 500+ apps, and each session only sees the integrations that the user actually connected.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No redeploys, SDK-native:&lt;/strong&gt; new connections show up instantly with proper params/schemas, so agents can call them without glue code.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;(Also: this is the exact service backing Rube MCP on the backend.) The helper that spins it up lives in &lt;code&gt;packages/ai/src/composio-client.ts&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ComposioClient&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ToolRouterConfig&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;apiKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;tracer-system&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;toolkits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;toolkits&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;slack&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gmail&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;composio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Composio&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAIAgentsProvider&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;createMCPClient&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getSession&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;experimental_createMCPClient&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;transport&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;mcpUrl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sessionId&lt;/span&gt;
          &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;X-Session-Id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sessionId&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With that in place, any LLM can drop the same Slack/Jira/PagerDuty hooks without me juggling tokens. Swap the toolkit list and any agent, or even an internal automation, get the same stabilized catalog.&lt;/p&gt;

&lt;h2&gt;
  
  
  Test 2: Distributed Alert Deduplication
&lt;/h2&gt;

&lt;p&gt;The challenge: Fix race conditions when multiple processors detect the same anomaly. Handle ≤3s clock skew and processor crashes. Prevent duplicate alerts when processors fire within 5 seconds of each other.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude's Take
&lt;/h3&gt;

&lt;p&gt;Time: 7m 1s | Cost: $0.48 | +1,439 lines across 4 files&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/rohittcodes/tracer/commit/a7cac5c8c034097c32815fc8645f59c0bf67bc8f" rel="noopener noreferrer"&gt;Commit a7cac5c&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Claude designed a three-layer architecture: L1 cache, L2 advisory locks + DB query, L3 unique constraints. Handles clock skew with database &lt;code&gt;NOW()&lt;/code&gt; instead of processor timestamps. PostgreSQL advisory locks auto-release on connection close, handling crashes gracefully. The test suite is 493 lines covering cache hits, lock contention, clock skew, and crashes.&lt;/p&gt;

&lt;p&gt;Same problem as test 1: not integrated into &lt;code&gt;apps/processor/src/index.ts&lt;/code&gt;. The L1 cache uses &lt;code&gt;Math.abs(ageMs)&lt;/code&gt;, which doesn't account for clock skew (though L2 catches it). Advisory lock key is &lt;code&gt;service:alertType&lt;/code&gt; without a timestamp, causing unnecessary serialization. The unique constraint blocks &lt;em&gt;all&lt;/em&gt; duplicate active alerts, not just within the 5-second window.&lt;/p&gt;

&lt;p&gt;Great architecture. Still just a prototype.&lt;/p&gt;

&lt;h3&gt;
  
  
  GPT-5's Take
&lt;/h3&gt;

&lt;p&gt;Tokens: 44,563 input (+ 1.99M cached) / 39,792 output (30,464 reasoning)&lt;/p&gt;

&lt;p&gt;Time: ~20m | Cost: $0.60 | +166 net lines across 6 files&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/rohittcodes/tracer/commit/6d9cf3b297e3e9957adae7d0d5f597e5534fc7c9" rel="noopener noreferrer"&gt;Commit 6d9cf3b&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Codex integrated it. Modified the existing &lt;code&gt;processAlert&lt;/code&gt; function and wired in deduplication. Uses a reservation-based approach with a dedicated &lt;code&gt;alert_dedupe&lt;/code&gt; table with expiration, simpler than advisory locks, and easier to reason about. Transaction-based coordination with &lt;code&gt;FOR UPDATE&lt;/code&gt; locks for serialization. Handles clock skew with database &lt;code&gt;NOW()&lt;/code&gt;. Crashes are handled through a transaction rollback that clears reservations automatically.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhyq6eqrbr01oykv0swhe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhyq6eqrbr01oykv0swhe.png" alt="GPT-5 Results" width="800" height="112"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There's a minor race condition in the &lt;code&gt;ON CONFLICT&lt;/code&gt; clause where both processors can pass the &lt;code&gt;WHERE&lt;/code&gt; check before either commits. No background cleanup for expired &lt;code&gt;alert_dedupe&lt;/code&gt; entries (though stale entries get cleaned up on each insert). The dedupe key includes &lt;code&gt;projectId&lt;/code&gt;, treating the same service+type across projects as different; it might be intentional, but worth noting.&lt;/p&gt;

&lt;p&gt;Production-ready except for that small ON CONFLICT fix.&lt;/p&gt;

&lt;h3&gt;
  
  
  Kimi's Take
&lt;/h3&gt;

&lt;p&gt;Time: ~20m | Cost: ~$0.25 (estimated) | +185 net lines across 7 files&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/rohittcodes/tracer/commit/31aa8f5465495bdb601a775ce095cb1200e548ef" rel="noopener noreferrer"&gt;Commit 31aa8f5&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Kimi actually integrated this one. Modified &lt;code&gt;processAlert&lt;/code&gt; and wired in deduplication. Uses discrete 5-second time buckets, simpler than a reservation table. Atomic upsert with database-native &lt;code&gt;ON CONFLICT DO UPDATE&lt;/code&gt; to handle races. Implements exponential backoff retry logic.&lt;/p&gt;

&lt;p&gt;Critical bugs, though. Duplicate detection compares &lt;code&gt;createdAt&lt;/code&gt; timestamps, which are identical for simultaneous inserts, and returns the wrong &lt;code&gt;isDuplicate&lt;/code&gt; flag. The retry logic calculates a new bucket but never uses it, passes the same timestamp, and hits the same conflict again. The severity update SQL is unnecessarily complex.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9p8brom673c8ladolczx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9p8brom673c8ladolczx.png" alt="decision matrix by kimi" width="800" height="271"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Good approach, broken execution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Round 2 Quick Compare
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Claude&lt;/th&gt;
&lt;th&gt;GPT-5&lt;/th&gt;
&lt;th&gt;Kimi&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Integrated?&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Approach&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Advisory locks&lt;/td&gt;
&lt;td&gt;Reservation table&lt;/td&gt;
&lt;td&gt;Time buckets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Critical bugs?&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None (but not wired)&lt;/td&gt;
&lt;td&gt;Minor race&lt;/td&gt;
&lt;td&gt;Duplicate detection broken&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.48&lt;/td&gt;
&lt;td&gt;$0.60&lt;/td&gt;
&lt;td&gt;~$0.25&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Codex won again with cleaner integration and fewer showstopper bugs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Money
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Total across both tests:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude: $1.68&lt;/li&gt;
&lt;li&gt;GPT-5 Codex: $0.95 (43% cheaper)&lt;/li&gt;
&lt;li&gt;Kimi: ~$0.51 (estimated from aggregate)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Codex is cheaper despite using more tokens. Claude's extended thinking and higher output costs ($15/M vs $10/M) kill you. Codex's cached reads (1.5M+ tokens) bring costs way down. Kimi's CLI only shows aggregate project spend, so I had to estimate per-test costs.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Actually Learned
&lt;/h2&gt;

&lt;p&gt;Codex won both tests by shipping production-ready code with the fewest critical bugs. Claude made better architectures, Kimi had clever ideas, but Codex was the only one consistently delivering working code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Codex won:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Actually integrates code instead of creating parallel prototypes&lt;/li&gt;
&lt;li&gt;Catches edge cases everyone else misses (that &lt;code&gt;Infinity.toFixed()&lt;/code&gt; bug bit both Claude and Kimi)&lt;/li&gt;
&lt;li&gt;Both implementations are production-ready&lt;/li&gt;
&lt;li&gt;43% cheaper than Claude&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The downsides:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Less comprehensive documentation than Claude&lt;/li&gt;
&lt;li&gt;Minor ON CONFLICT race in test 2&lt;/li&gt;
&lt;li&gt;Takes longer (18-20m vs Claude's 7-11m), but worth it for code that works&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to Use Claude Sonnet 4.5
&lt;/h3&gt;

&lt;p&gt;Best for architecture design and documentation. The thinking is genuinely excellent; the three-layer defense in test 2 shows real distributed systems understanding. Documentation is thorough (7 files for test 1). Fast execution at 7-11 minutes. The extended thinking mode with self-reflection produces well-reasoned solutions.&lt;/p&gt;

&lt;p&gt;But it doesn't integrate anything. You get prototypes that need serious wiring. Critical bugs in both tests. More expensive at $1.68. Over-engineered (3,178 lines vs Codex's 157 net).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use it when:&lt;/strong&gt; You want a thoughtful architecture review or documentation pass, and you're okay spending time wiring it in and fixing bugs.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to Use Kimi K2 Thinking
&lt;/h3&gt;

&lt;p&gt;Best for creative solutions and alternative approaches. Time buckets in test 2, MAD/EMA attempts in test 1 show creative thinking. Actually integrates code like Codex. Good test coverage. Probably the cheapest (though CLI doesn't expose usage).&lt;/p&gt;

&lt;p&gt;But there are critical bugs in core logic everywhere. Broken duplicate detection and retry in test 2. Baseline update order issues in test 1. CLI limitations (no cost visibility, context fills fast). Fundamental logic errors prevent the code from working.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use it when:&lt;/strong&gt; You want creative ideation and can afford to refactor the output. Budget extra time to harden everything.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;p&gt;I'm shipping production work with GPT-5 Codex. It delivers integrated code that handles edge cases, costs 43% less than Claude, and needs minimal polish. Claude's my go-to for architecture reviews or documentation, even though I know I'll spend time wiring it in and chasing bugs. Kimi's the wild card, creative and cheap, but the logic bugs mean I budget serious refactoring time.&lt;/p&gt;

&lt;p&gt;The real insight? All three models generate impressive-looking code. But only Codex consistently ships. Claude designs better but doesn't integrate. Kimi has clever ideas but introduces showstoppers. For real-world development where you need working code fast, Codex is the practical choice.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>architecture</category>
      <category>discuss</category>
    </item>
    <item>
      <title>How to ship apps faster with full-stack Claude Code setup (Skills, MCP, Plugins)</title>
      <dc:creator>Rohith Singh</dc:creator>
      <pubDate>Sun, 09 Nov 2025 15:26:07 +0000</pubDate>
      <link>https://dev.to/composiodev/how-to-ship-apps-faster-with-full-stack-claude-code-setup-skills-mcp-plugins-516g</link>
      <guid>https://dev.to/composiodev/how-to-ship-apps-faster-with-full-stack-claude-code-setup-skills-mcp-plugins-516g</guid>
      <description>&lt;p&gt;While most agentic coding tools like Codex, Cursor, and Windsurf are adding SDKs and plugin APIs, Anthropic’s &lt;strong&gt;Claude Code&lt;/strong&gt; is trying to do something a bit different. They’ve been quietly building a complete stack - skills for domain context, Plugins for modular workflows, and MCPs for tool integrations, all connected through one environment.&lt;/p&gt;

&lt;p&gt;I wanted to see how that actually works when you build something real. &lt;/p&gt;

&lt;p&gt;So I picked a project I’ve been planning for a while. &lt;strong&gt;Luno is&lt;/strong&gt; a personal finance platform. It includes payment integrations, cron jobs for bill reminders, an agentic chatbot (wired with &lt;strong&gt;Tool Router&lt;/strong&gt; for calling tools like Gmail, Notion, Stripe, etc, integrated inside the app), household sharing, subscription tracking, and analytics.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fux5hue7f7o9x6cl548di.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fux5hue7f7o9x6cl548di.gif" alt="Giphy"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The goal was simple: test the entire Claude Code setup. Skills, Plugins, MCP servers, Sub Agents, and slash commands, and see if it really helps speed up real-world development or just adds more setup overhead.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Built &lt;a href="https://github.com/rohittcodes/luno" rel="noopener noreferrer"&gt;&lt;strong&gt;Luno&lt;/strong&gt;&lt;/a&gt; in 2-3 days using Claude Code's full stack. Setup took a day (creating Skills, configuring Rube MCP). After that, features were shipped in 30-60 minutes instead of 8-10 hours of manual work. Cost: $12.67 for Claude Code usage (~15.5M input, 174k output tokens) + a Cursor Pro account for routine CRUD. Context7 MCP was critical in setup and development by pulling the right docs in‑session.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft8fgw0kx633x91hvzsr5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft8fgw0kx633x91hvzsr5.png" alt="Cost"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The infrastructure works, but requires upfront investment. &lt;/p&gt;

&lt;p&gt;Skills taught Claude: Code: my patterns for workflows; Rube MCP connected everything through one server; dev‑toolkit plugin handled security/testing/reviews. Tool Router powers the agentic chatbot. It would usually take 2-3 months and 200+ hours. &lt;/p&gt;

&lt;p&gt;You can find the repository &lt;a href="https://github.com/rohittcodes/luno" rel="noopener noreferrer"&gt;here&lt;/a&gt;, don't forget to drop a star!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick look: Here’s what the dashboard looks like:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fozbyqhjnaljcrh9zhw6w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fozbyqhjnaljcrh9zhw6w.png" alt="Luno Dashboard"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Day One: The Setup
&lt;/h2&gt;

&lt;p&gt;I started by creating Skills for CC. Not because I love documentation, but because I was tired of having to explain the same patterns over and over. "Use TanStack Query for data fetching." "RLS policies for multi-tenant data." "Error boundaries here, not there."&lt;/p&gt;

&lt;p&gt;I asked Claude Code to generate Skills for my workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Feature development patterns&lt;/li&gt;
&lt;li&gt;Database architecture (Supabase)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.composio.dev/docs/tool-router/quick-start" rel="noopener noreferrer"&gt;Tool Router&lt;/a&gt; integration with AI SDK&lt;/li&gt;
&lt;li&gt;Analytics pipeline patterns&lt;/li&gt;
&lt;li&gt;Design-to-code workflow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They're just markdown files in &lt;code&gt;.claude/skills/&lt;/code&gt;. Nothing fancy. But here's the thing: once I had them, Claude stopped generating code that looked like it came from a tutorial. It started generating code that looked like &lt;em&gt;my&lt;/em&gt; code.&lt;/p&gt;

&lt;p&gt;Then I set up &lt;a href="https://rube.app" rel="noopener noreferrer"&gt;&lt;strong&gt;Rube MCP&lt;/strong&gt;&lt;/a&gt;. The problem with MCPs is that they eat your context window; multiple servers = less space for the model to think. Rube connects to 500+ apps through a single MCP server. GitHub, Linear, Figma, Supabase, all through one connection. It manages a sandbox environment for tool actions and stores data there, so your context window stays free. In parallel, Context7 MCP removed a ton of context‑switching by fetching authoritative docs directly in the session.&lt;/p&gt;

&lt;p&gt;I already had my &lt;a href="https://github.com/rohittcodes/claude-plugin-suite" rel="noopener noreferrer"&gt;dev-toolkit plugin&lt;/a&gt; from last month (when Anthropic dropped plugin support). 16 specialised agents, 10+ slash commands, MCP integrations. Things like &lt;code&gt;/security-scan&lt;/code&gt; for OWASP reviews, &lt;code&gt;/test&lt;/code&gt; for running tests with coverage reports. I wanted to stress-test it on something real.&lt;/p&gt;

&lt;p&gt;Setup took a day. Then I started building.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building Luno: My personal finance management (The Messy Part)
&lt;/h2&gt;

&lt;p&gt;I gave Claude a prompt: "Build the database schema for a personal finance platform. Transactions, accounts, categories, budgets, goals, household sharing with invitations."&lt;/p&gt;

&lt;p&gt;It generated the complete &lt;strong&gt;Supabase schema&lt;/strong&gt;. Foreign keys, indexes, RLS policies. The schema made sense because the Skills taught Claude how to structure databases. But then I looked closer, and it forgot the indexes on the household invitations table. The &lt;code&gt;token&lt;/code&gt; and &lt;code&gt;email&lt;/code&gt; Columns needed indexes for performance. Had to point that out.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsf7v7bwwl160cf1jrnsw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsf7v7bwwl160cf1jrnsw.png" alt="Supabase DB"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;First lesson:&lt;/strong&gt; Skills help consistency, but you still need to review.
&lt;/h3&gt;

&lt;p&gt;For the UI, I passed Claude a Figma design link to get some ideas and build on top of it. I'd already set up my theme using tweakcn, so the implementation was based on that existing setup rather than an exact match of the Figma design. The designs don't match, but the UI came out clean and consistent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feearfbxxytbcg6ob0x0y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feearfbxxytbcg6ob0x0y.png" alt="Figma Diff"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Authentication was manual, and &lt;strong&gt;Supabase Auth&lt;/strong&gt; handles most of it anyway.&lt;/p&gt;

&lt;p&gt;Then I hit a wall: cost. Claude's pricing was adding up fast. I was generating a lot of code, and at ~$3 per million tokens, it gets expensive quickly. So I switched to Cursor for routine feature work, transaction management, budget tracking, and basic CRUD. Cursor's $20/month subscription made more sense for that stuff.&lt;/p&gt;

&lt;p&gt;I came back to Claude Code when I needed to integrate Composio's &lt;strong&gt;&lt;a href="https://docs.composio.dev/docs/tool-router/quick-start" rel="noopener noreferrer"&gt;Tool Router&lt;/a&gt;&lt;/strong&gt; with the AI SDK for the chatbot. The docs weren't clear on some patterns, and I kept getting the integration wrong. I used Context7 MCP to fetch the actual AI SDK docs and Tool Router examples.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is &lt;a href="https://docs.composio.dev/docs/tool-router/quick-start" rel="noopener noreferrer"&gt;Tool Router&lt;/a&gt; (and why use it)?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Tool Router&lt;/strong&gt; exposes connected apps as callable tools for your AI agent, without hand‑wiring each integration. You connect once (per user), and the AI SDK gets a unified tool surface.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Unified access + per‑user auth:&lt;/strong&gt; One router for 500+ apps, with tools automatically scoped to each user’s connections.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero redeploys + AI‑SDK native:&lt;/strong&gt; New connections appear as tools immediately; tools already come with parameters/schemas for direct calls.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In Luno, that meant email/calendar/issue flows existed only if the user had connected those apps, with no special‑case code or per‑app SDKs.&lt;/p&gt;

&lt;p&gt;Btw, this is what powers Rube MCP, at the backend.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Claude pulled the documentation and showed me the pattern:&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Initialize Tool Router MCP client&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;mcpClient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createToolRouterMCPClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;mcpClient&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Get tools from MCP client (AI SDK format)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;mcpToolSet&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;mcpClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

  &lt;span class="c1"&gt;// Combine with database tools&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;allTools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;dbTools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromEntries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;mcpToolSet&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(([&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;`toolRouter_&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Use with streamText&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;streamText&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;modelMessages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;allTools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once I had the pattern, it clicked. Tool Router creates a session per user, exposes all connected apps as MCP tools, handles auth per user, and returns tools in AI SDK format. So if a user connects Gmail, Calendar, and Notion, the chatbot automatically gets those tools. No code changes. Dynamic tool access based on what the user has connected.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqsaoyzbzqlrrh4ukhkip.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqsaoyzbzqlrrh4ukhkip.png" alt="Luno AI"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That's what makes &lt;strong&gt;Tool Router&lt;/strong&gt; powerful. It's not just connecting apps, it's exposing those connections as tools your AI can use directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  The RLS Policies That Took Three Tries
&lt;/h2&gt;

&lt;p&gt;The household invitation system was probably the most complex part. Invitations expire after 7 days, need email templates, and proper permission checks. Claude got the schema right on the first try. But the RLS policies? Three iterations.&lt;/p&gt;

&lt;p&gt;First version: members could see invitations, but couldn't check ownership hierarchy properly. Second version: fixed the hierarchy but broke the permission logic for expired invitations. Third version: finally worked. The policy checked ownership, membership, and expiration all in the right order.&lt;/p&gt;

&lt;p&gt;This is where having the plugin helped. I ran &lt;code&gt;/security-scan&lt;/code&gt; and it caught issues I would've missed:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8gmmfmg9e5330qs56646.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8gmmfmg9e5330qs56646.png" alt="Claude Debugging"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Trending updates weren't locked down properly&lt;/li&gt;
&lt;li&gt;Anonymous tracking needed server‑issued signed tokens with TTL&lt;/li&gt;
&lt;li&gt;The queue work needed proper batching&lt;/li&gt;
&lt;li&gt;Long queries should be precomputed on a schedule&lt;/li&gt;
&lt;li&gt;SQL indexes are missing on some aggregations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm0fgt032ppt7tjhf9p0e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm0fgt032ppt7tjhf9p0e.png" alt="Results"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Fixed all of these before deploying. The security‑reviewer agent checks for OWASP Top 10, suggests fixes, and validates implementations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Payment Integration and Cron Jobs
&lt;/h2&gt;

&lt;p&gt;Lemon Squeezy integration was straightforward. The Skill had patterns for webhook handling and subscription management. Claude generated webhook handlers with proper signature verification on the first try.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fen11vz9jvmhnoaq8xyib.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fen11vz9jvmhnoaq8xyib.png" alt="Supabase Edge Functions"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cron jobs for bill reminders were more interesting. I set up &lt;strong&gt;Supabase (edge functions)&lt;/strong&gt; that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Check upcoming bills&lt;/li&gt;
&lt;li&gt;Send email reminders via Resend&lt;/li&gt;
&lt;li&gt;Update notification preferences&lt;/li&gt;
&lt;li&gt;Handle timezone conversions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The timezone handling part needed manual fixes. Claude generated code that worked, but didn't account for daylight saving time properly. Had to correct that myself.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Shipped
&lt;/h2&gt;

&lt;p&gt;After 2-3 days, I had a production‑ready finance platform: transactions with categories, budgets with alerts, household sharing (7‑day invitations), subscription tracking, analytics, an agentic chatbot (Tool Router), automated bill reminders, and Lemon Squeezy payments.&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/tJOwdKSj9Dg"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;The analytics dashboard was the last piece. Claude generated working code, but it split queries that should have been combined and missed opportunities to memoise. I optimised those manually.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dev-Toolkit Plugin (Quick Context)
&lt;/h2&gt;

&lt;p&gt;Since I keep mentioning it, this plugin (built when Anthropic launched plugins) bundles the day‑to‑day work, security reviews, testing, and system design into slash commands and specialised agents.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Core agents: security reviewer (OWASP), performance/load tester, compliance/testing/architecture specialists.&lt;/li&gt;
&lt;li&gt;Core commands: &lt;code&gt;/test&lt;/code&gt;, &lt;code&gt;/code-review&lt;/code&gt;, &lt;code&gt;/security-scan&lt;/code&gt;, &lt;code&gt;/deploy&lt;/code&gt;, &lt;code&gt;/monitor&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8wgcphqeu9wskfl56wjt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8wgcphqeu9wskfl56wjt.png" alt="Devtoolkit plugin"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can install it: &lt;a href="http://github.com/rohittcodes/claude-plugin-suite" rel="noopener noreferrer"&gt;rohittcodes/claude-plugin-suite&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The plugin meant I could run security reviews and code standardization continuously instead of at the end. Caught a lot of issues early.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Numbers
&lt;/h2&gt;

&lt;p&gt;Let's talk money and time, because that's what actually matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setup (day 1):&lt;/strong&gt; ~1 day total (Skills ≈4h, Rube MCP ≈30m, Supabase ≈2h).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq73wm6n9hcvpakbapxzv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq73wm6n9hcvpakbapxzv.png" alt="Cost"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Development (2-3 days):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude Code: $12.67 (architecture, complex integrations, Tool Router)

&lt;ul&gt;
&lt;li&gt;Usage: ~15.5M input tokens, ~173k output tokens&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Cursor: Pro account (routine CRUD, UI polish)&lt;/li&gt;

&lt;li&gt;Rube MCP: Free tier&lt;/li&gt;

&lt;li&gt;Total: $12.67 + Cursor Pro&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Time saved:&lt;/strong&gt; &lt;strong&gt;200+ hours&lt;/strong&gt;. It typically takes &lt;strong&gt;2-3 months&lt;/strong&gt; of solid work.&lt;/p&gt;

&lt;p&gt;But here's the context: that $12.67 is &lt;em&gt;after&lt;/em&gt; I switched to Cursor Pro for routine work. If I'd used Claude Code for everything, it would've been closer to $50-60. The cost management part is real; you need to be strategic about when you use the expensive model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Would I Do It Again?
&lt;/h2&gt;

&lt;p&gt;Absolutely. But if I were starting over, I'd approach a few things differently. I'd create Skills on day zero, before writing a single line of code. The consistency they bring matters more than moving fast in the beginning, and every feature afterwards benefits from having those patterns established. The plugin would be there from the start, too; catching security issues and enforcing code standards continuously is far better than fixing problems at the end.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fday7vmr49qsuug1ejk6d.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fday7vmr49qsuug1ejk6d.gif" alt="Ahh yes"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'd also budget more realistically. If you're building something serious, plan for $50-100 a month in AI costs. It's still cheaper than hiring someone or spending months of your own time, but it's not free. And Context7 MCP would be non-negotiable from the start, having documentation accessible in-session instead of constantly context-switching to docs is a massive productivity unlock.&lt;/p&gt;

&lt;p&gt;The thing is, you still review code. You still make architecture decisions. You still fix edge cases. Claude Code handles the boring stuff, boilerplate, migrations, and configuration so that you can focus on the hard problems: architecture, security, performance, and user experience. That upfront day spent on Skills and setup? It paid for itself ten times over in consistency and speed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Claude Code's infrastructure is real. It's not hype. But it requires upfront investment in Skills, plugins, and MCP configuration. Once that's done, development becomes more conversational. "Build a subscription tracking feature with email reminders" actually works.&lt;/p&gt;

&lt;p&gt;The value isn't replacing developers, it's handling the boring stuff so you can focus on what matters: the architecture, the security, the performance, the user experience. Luno took me 2-3 days, not 3 months. It's production‑ready with proper protection, error handling, and testing. That's the difference the full stack makes.&lt;/p&gt;

&lt;p&gt;If you're building with Claude Code, invest in the infrastructure first. Create your Skills. Set up your MCPs properly. Build or install plugins that match your workflow. The upfront cost is worth it.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>mcp</category>
      <category>productivity</category>
    </item>
    <item>
      <title>How to Connect Salesforce to OpenAI Agent Builder</title>
      <dc:creator>Rohith Singh</dc:creator>
      <pubDate>Sat, 25 Oct 2025 03:48:20 +0000</pubDate>
      <link>https://dev.to/composiodev/how-to-connect-salesforce-to-openai-agent-builder-526o</link>
      <guid>https://dev.to/composiodev/how-to-connect-salesforce-to-openai-agent-builder-526o</guid>
      <description>&lt;p&gt;OpenAI's &lt;strong&gt;Agent Builder&lt;/strong&gt; gives you a straightforward way to build and deploy AI agents, combining models, tools, and logic into one visual workspace. This no-code design lets you focus on how your agent should work rather than dealing with the underlying infrastructure.&lt;/p&gt;

&lt;p&gt;Sales teams often spend hours managing leads, updating contacts, and juggling follow-ups, repetitive tasks that take time away from closing deals. By connecting Agent Builder to external platforms like &lt;strong&gt;Salesforce&lt;/strong&gt; through an MCP server (such as &lt;strong&gt;Rube&lt;/strong&gt;), you can create agents that handle these tasks automatically. The MCP handles authentication, API calls, and data formatting, letting your agent focus on workflow logic rather than infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc4tyjyty20e8gs6se530.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc4tyjyty20e8gs6se530.gif" alt="intro-gif" width="480" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this guide, we’ll build a Salesforce Agent using the Rube MCP. This setup allows your agent to manage contacts, update deals, and interact with leads automatically, so you can spend more time closing deals instead of managing data.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Agent Builder?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://platform.openai.com/agent-builder" rel="noopener noreferrer"&gt;Agent Builder&lt;/a&gt; is OpenAI’s visual, no-code platform for designing, building and deploying AI workflows. Instead of writing lines of code, you can simply drag and drop components onto a canvas and connect them to define how your Agent should behave. Each component, or “node”, has a specific purpose. Some of them handle requests, others run the logic and enforce security rules and some connect to external systems via MCPs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2p8o5umw3e7e3znhpdpz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2p8o5umw3e7e3znhpdpz.png" alt="Intro Image" width="800" height="398"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With Agent Builder, you can create complex, multi-step workflows without worrying about the infrastructure or API layer. It also integrates with &lt;strong&gt;ChatKit&lt;/strong&gt; widgets to display results in an interactive interface, and comes with inbuilt evaluation tools to test performance, identify bottlenecks. For a Salesforce workflow, this means you can automate repetitive tasks, manage leads and contacts, and orchestrate multi-step processes with minimal setup, all visually and in a straightforward approach.&lt;/p&gt;

&lt;p&gt;To learn more about Agent Builder you can checkout this &lt;a href="https://composio.dev/blog/openai-agent-builder-step-by-step-guide-to-building-ai-agents-with-mcp" rel="noopener noreferrer"&gt;how-to-guide&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why &lt;a href="https://rube.app/" rel="noopener noreferrer"&gt;Rube MCP&lt;/a&gt; matters to your Salesforce workflow
&lt;/h2&gt;

&lt;p&gt;When your MCP Client uses multiple MCP servers in a workflow, connecting them all directly can quickly consume the LLM’s context window, slowing down or breaking your workflow. Rube MCP (Model Context Protocol) solves this by acting as a single entry point for all your tool connections. Your agent communicates with Rube, which handles authentication, API calls, and data formatting for each external tool.&lt;/p&gt;

&lt;p&gt;For Salesforce, using an MCP implementation like Rube means you don’t have to write custom OAuth flows or manage API keys for every endpoint. Rube provides a unified interface for hundreds of apps, including Salesforce, Gmail, Notion, and more. It also dynamically loads only the tools needed for a given context, keeping the agent’s workflow efficient and reducing the chance of context overload. This setup allows your Salesforce agent to focus on logic and decision-making, while the MCP handles the complexities of API integration and token management.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Add Salesforce to Agent Builder
&lt;/h2&gt;

&lt;p&gt;Before we start building our Salesforce Workflow, we need to connect Rube MCP with Agent Builder. This will allow our agent to communicate with Salesforce securely through Rube’s unified MCP interface.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Set up Rube MCP
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;strong&gt;&lt;a href="https://rube.app/" rel="noopener noreferrer"&gt;Rube&lt;/a&gt;&lt;/strong&gt;, and open your Dashboard.&lt;/li&gt;
&lt;li&gt;Navigate to Apps → Marketplace.&lt;/li&gt;
&lt;li&gt;Search for Salesforce and click Enable App.&lt;/li&gt;
&lt;li&gt;Choose the Recommended &lt;strong&gt;Composio&lt;/strong&gt; approach, select the required scopes, and click Setup.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Add your Salesforce domain, click Connect, and authorize the app in the Salesforce authorization window.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I’ve attached a video you can follow, note you won’t see the scope/domain steps in the demo because Salesforce was already enabled on my account.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;After the app is enabled, go back to the Rube dashboard and click Install Rube Anywhere.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the modal, select Agent Builder and copy the MCP URL, you’ll need this in Agent Builder.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fctl3gunyw3xbydp1x4jk.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fctl3gunyw3xbydp1x4jk.gif" alt="Salesforce Rube MCP" width="600" height="337"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Scroll a little further and click Generate Token to create an access token. Copy the generated token, this is the value you’ll paste into Agent Builder (Authorization → Access token / API key).&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Step 2: Connect Rube MCP inside Agent Builder
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Open &lt;strong&gt;Agent Builder&lt;/strong&gt; and create a new workflow by clicking “&lt;strong&gt;Create&lt;/strong&gt;.”&lt;/li&gt;
&lt;li&gt;You’ll see a canvas with two default nodes: Start and Agent. Delete the connecting edge between them for now, we’ll set up our workflow logic later.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Click on the Agent node and rename it to “&lt;strong&gt;Salesforce Agent&lt;/strong&gt;”. Then, in the Instructions field, enter the following:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;You are a helpful assistant.
&lt;/code&gt;&lt;/pre&gt;


&lt;blockquote&gt;
&lt;p&gt;We’ll refine this agent later with Guardrails and Logical Nodes for better control.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F75ro9x80bnpnezr18tes.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F75ro9x80bnpnezr18tes.png" alt="Agent Node" width="800" height="657"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In the agent node’s configuration panel, click the “+” icon beside Tools. From the dropdown, select “MCP Server” → then click “+ Server”.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxpsamfhytzkdu2m03908.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxpsamfhytzkdu2m03908.png" alt="MCP Server" width="800" height="397"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Paste the MCP URL you copied from Rube earlier. In the Authorisation field, choose Access Token / API Key and paste your generated token. Give your server a name like “Rube” and click Connect.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Building the Salesforce Agentic workflow
&lt;/h2&gt;

&lt;p&gt;With the Salesforce connection ready inside Agent Builder, it’s time to make our agent actually do something useful, like creating or updating leads automatically. But before jumping straight to actions, we’ll make sure our workflow is secure, context-aware, and can correctly interpret what users want to do.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Add GuardRails
&lt;/h3&gt;

&lt;p&gt;Let’s start by protecting the workflow from misuse.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;From the toolbar, drag the GuardRails node onto your canvas and connect it with the Start node.&lt;/li&gt;
&lt;li&gt;Give it a label like “&lt;strong&gt;GuardRails&lt;/strong&gt;” and enable the Jailbreak setting, this helps your workflow detect and block prompt injection attempts or malicious instructions.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzfjuyqrzol47c4dgnncy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzfjuyqrzol47c4dgnncy.png" alt="Guard Rails" width="800" height="398"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Optionally, you can enable “Sensitive data” checks if your workflow will deal with customer details like emails or phone numbers.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Step 2: Create an Intent Classifier
&lt;/h3&gt;

&lt;p&gt;Next, we’ll teach the agent to understand what a user is asking, whether it’s to create a new lead, update contact info, or close an opportunity.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add a new &lt;strong&gt;Agent node&lt;/strong&gt; to the canvas and connect it to the “Pass” output of your GuardRails node, and label it “&lt;strong&gt;Intent Classifier&lt;/strong&gt;”.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Give the agent this instruction:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Understand what the user wants to &lt;span class="k"&gt;do in &lt;/span&gt;Salesforce. Classify the intent into one of these categories: &lt;span class="s2"&gt;"create_lead"&lt;/span&gt;, &lt;span class="s2"&gt;"update_lead"&lt;/span&gt;, or &lt;span class="s2"&gt;"close_deal"&lt;/span&gt;&lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdobn14r6nu9w1lujq3i6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdobn14r6nu9w1lujq3i6.png" alt="Classifier Node" width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Change the Output format to JSON, then open Advanced Settings → JSON Schema and paste this schema:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="s2"&gt;"type"&lt;/span&gt;: &lt;span class="s2"&gt;"object"&lt;/span&gt;,
  &lt;span class="s2"&gt;"properties"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;"classification"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="s2"&gt;"type"&lt;/span&gt;: &lt;span class="s2"&gt;"string"&lt;/span&gt;,
      &lt;span class="s2"&gt;"enum"&lt;/span&gt;: &lt;span class="o"&gt;[&lt;/span&gt;
        &lt;span class="s2"&gt;"create_lead"&lt;/span&gt;,
        &lt;span class="s2"&gt;"update_lead"&lt;/span&gt;,
        &lt;span class="s2"&gt;"close_deal"&lt;/span&gt;
      &lt;span class="o"&gt;]&lt;/span&gt;,
      &lt;span class="s2"&gt;"description"&lt;/span&gt;: &lt;span class="s2"&gt;"classification of user's intent"&lt;/span&gt;,
      &lt;span class="s2"&gt;"default"&lt;/span&gt;: &lt;span class="s2"&gt;""&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="s2"&gt;"additionalProperties"&lt;/span&gt;: &lt;span class="nb"&gt;false&lt;/span&gt;,
  &lt;span class="s2"&gt;"required"&lt;/span&gt;: &lt;span class="o"&gt;[&lt;/span&gt;
    &lt;span class="s2"&gt;"classification"&lt;/span&gt;
  &lt;span class="o"&gt;]&lt;/span&gt;,
  &lt;span class="s2"&gt;"title"&lt;/span&gt;: &lt;span class="s2"&gt;"response_schema"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Update the schema and connect the Fail output of the GuardRails node to an End node.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fccn4vfc2ozkliozayw9c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fccn4vfc2ozkliozayw9c.png" alt="Schema Updation" width="800" height="394"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Add Logic Routing
&lt;/h3&gt;

&lt;p&gt;We’ll now decide where each request should go depending on the classification output.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Drag an If/Else node and connect it to the Intent Classifier, and write the Case name as “isValid”&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In the expression field, paste:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;input.output_parsed.classification &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"create_lead"&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; 
input.output_parsed.classification &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"update_lead"&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; 
input.output_parsed.classification &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"close_deal"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Connect the Else output to the End node so unrecognized intents terminate safely.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw77zk4m5fgqfrar6ixof.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw77zk4m5fgqfrar6ixof.png" alt="Conditional Node" width="800" height="409"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Connect Salesforce Agent Node
&lt;/h3&gt;

&lt;p&gt;Finally, connect the “&lt;strong&gt;If/Else&lt;/strong&gt;” node to your Salesforce Agent node. In the Salesforce node’s configuration, update the instruction as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;You are a Salesforce CRM assistant. Perform the user’s intended action &lt;span class="k"&gt;in &lt;/span&gt;Salesforce 
based on this classification: &lt;span class="o"&gt;{{&lt;/span&gt;input.output_parsed.classification&lt;span class="o"&gt;}}&lt;/span&gt;&lt;span class="nb"&gt;.&lt;/span&gt; 
Use the following user input to &lt;span class="nb"&gt;complete &lt;/span&gt;the task: &lt;span class="o"&gt;{{&lt;/span&gt;workflow.input_as_text&lt;span class="o"&gt;}}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs9rq49n7p99l51ol2fiq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs9rq49n7p99l51ol2fiq.png" alt="Agent Node" width="800" height="390"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That’s it, you’ve built a Salesforce-specific, guard-railed workflow that can understand user goals, route them intelligently, and perform the right CRM operation securely.&lt;/p&gt;

&lt;h2&gt;
  
  
  An Example Interaction
&lt;/h2&gt;

&lt;p&gt;Click on the &lt;strong&gt;Preview&lt;/strong&gt; button in the top-right corner of the canvas. This will open a sidebar with a chat interface where you can test the workflow by typing in natural language inputs like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; At the end, If in the Preview Panel, Rube doesn’t detect your Salesforce Account, you can pass the domain name in the chat and ask the agent to authorize you. Click on the generated URL and you can login with your salesforce account.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8uckws0bfujxcqas68hl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8uckws0bfujxcqas68hl.png" alt="Example Interaction" width="800" height="396"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Add John Doe as a new lead with the title Sales Manager at Acme Corp.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If everything is set up correctly, your agent should classify the intent, route it through the workflow, and create the record in Salesforce via Rube MCP. You’ll see the confirmation and logs right inside the preview panel.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo87y08uqho8dxjlzioi1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo87y08uqho8dxjlzioi1.png" alt="Perview Panel" width="800" height="397"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwgc3oezw8ahr4e6m5ryh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwgc3oezw8ahr4e6m5ryh.png" alt="Salesforce Dashboard" width="800" height="398"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What else can you build with this setup?
&lt;/h2&gt;

&lt;p&gt;This setup is just the starting point. Once your Salesforce connection is stable, you can layer on additional automations to fit your sales workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-agent workflows&lt;/strong&gt;: Combine Salesforce with Gmail or Slack to auto-send follow-ups when a lead status changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deal tracking&lt;/strong&gt;: Build a workflow that checks for deals stuck in the same stage for too long and notifies your team.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weekly summaries&lt;/strong&gt;: Have the agent generate and send a performance summary to your inbox every Monday morning.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Since the agent already runs through Rube MCP, connecting any new app is just a matter of enabling it in your Rube dashboard, no extra OAuth scopes or schema mapping.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Agent Builder with Rube MCP makes Salesforce automation accessible to everyone. You can build sophisticated CRM workflows without writing code, handle complex authentication automatically, and focus on the business logic that matters.&lt;/p&gt;

&lt;p&gt;The combination of visual workflow design, built-in security, and seamless Salesforce integration creates a powerful platform for sales automation. Whether you're managing leads, updating opportunities, or orchestrating multi-step processes, this setup scales from simple tasks to complex enterprise workflows.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>llm</category>
      <category>agents</category>
      <category>openai</category>
    </item>
    <item>
      <title>How to Automate HubSpot CRM Using OpenAI Agent Builder</title>
      <dc:creator>Rohith Singh</dc:creator>
      <pubDate>Thu, 16 Oct 2025 14:21:28 +0000</pubDate>
      <link>https://dev.to/composiodev/how-to-automate-hubspot-crm-using-openai-agent-builder-5666</link>
      <guid>https://dev.to/composiodev/how-to-automate-hubspot-crm-using-openai-agent-builder-5666</guid>
      <description>&lt;p&gt;OpenAI's &lt;a href="https://platform.openai.com/docs/guides/agent-builder" rel="noopener noreferrer"&gt;Agent Builder&lt;/a&gt; provides you with the most straightforward set of tools to build and deploy AI Agents with ease. It brings models, tools (including MCPs, Web Search, Sub Agents, etc.), and logic into a single visual workspace, allowing you to focus on what your agent should do instead of worrying about the underlying infrastructure.&lt;/p&gt;

&lt;p&gt;In this guide, we'll build a CRM agent that can help you manage your contacts and deals in HubSpot CRM, so you can focus on the things that matter most to you. Before we begin, let’s understand what an Agent Builder is and why you would use one.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Agent Builder?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Agent Builder&lt;/strong&gt; is a visual, no-code platform for designing, building, testing, and deploying AI workflows through an excellent drag-and-drop interface. There is a set of nodes included in the OpenAI Agent builder, including Agent, FileSearch, GuardRails, MCPs, Logical Nodes, and more. You can drag and drop these nodes on the canvas, configure them with the required inputs, and connect them with edges to create your own Agentic Workflow. All the nodes are designed to be used logically so that you can make a complex Agentic workflow with ease.&lt;/p&gt;

&lt;p&gt;Check out this blog post on &lt;a href="https://composio.dev/blog/openai-agent-builder-step-by-step-guide-to-building-ai-agents-with-mcp" rel="noopener noreferrer"&gt;how to build agents with Agent Builder&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkklkpc2wzgwez21ubu9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkklkpc2wzgwez21ubu9.png" alt="HubSpot agent"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can also use the &lt;strong&gt;ChatKit Widgets&lt;/strong&gt; to output/display the results in a widget and let your users interact with it to get the results. You can learn more about the complete Agent Kit here: &lt;a href="https://platform.openai.com/docs/guides/agents" rel="noopener noreferrer"&gt;OpenAI Agent Kit&lt;/a&gt;, which describes building and deploying AI Agents from scratch. You can also play with the templates provided by OpenAI to get started with the Agent Builder.&lt;/p&gt;

&lt;p&gt;They also provide a comprehensive Evaluation Tool to test the performance of your Agent and identify bottlenecks, so you don't need anything further to test your workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://rube.app/" rel="noopener noreferrer"&gt;Rube MCP&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Let's say you use a lot of MCPs in your day-to-day workflow and connect all these MCPs directly with your client; they will quickly eat up the context window size. Rube solves this by providing a single point of entry for all your MCPs, allowing you to connect to Rube, which will handle the authentication and API calls for you.&lt;/p&gt;

&lt;p&gt;It is a &lt;strong&gt;universal MCP Server&lt;/strong&gt; built on top of &lt;a href="http://composio.dev" rel="noopener noreferrer"&gt;&lt;strong&gt;Composio's&lt;/strong&gt;&lt;/a&gt; existing toolkit infrastructure. Instead of writing custom OAuth flows for each MCP or managing API keys, you can use Rube as a unified MCP interface for more than 500 apps, including HubSpot, Salesforce, Google Sheets, Airtable, Notion, etc., which work seamlessly with the Agent Builder and most other MCP Clients.&lt;/p&gt;

&lt;p&gt;You can ask Rube to authenticate you with the app (if your app works on &lt;strong&gt;OAuth2&lt;/strong&gt;); it'll handle the OAuth flow for you and get the access token. If your app works on an API key, you can just provide the API key to Rube, and it'll use it to make the API calls for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Add HubSpot to Agent Builder
&lt;/h2&gt;

&lt;p&gt;Before we start building our CRM agent, we need to have a few things in place:&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Set up Rube MCP
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://rube.app" rel="noopener noreferrer"&gt;Rube&lt;/a&gt;. Scroll down and click “Install Rube Anywhere”&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn34waiblnzdt1psjrtk9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn34waiblnzdt1psjrtk9.png" alt="Rube Setup"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In the modal, select Agent Builder. Copy the MCP URL. We’ll paste this into Agent Builder soon.&lt;/li&gt;
&lt;li&gt;Scroll a little further and hit the "Generate Token" button to generate the access token.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foxzfvaqx7iizij2jwu73.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foxzfvaqx7iizij2jwu73.png" alt="Connecting Agent Builder"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Copy that token; we’ll use it to authorise Agent Builder.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Step 2: Connect Rube MCP inside Agent Builder
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Open Agent Builder and create a new workflow by clicking on “Create”.&lt;/li&gt;
&lt;li&gt;It will redirect you to a new page with a canvas and a toolbar on the top left, and two nodes on the canvas: "Start" and "Agent." We'll be using the same Agent node later for our multi-agent workflow. Delete the edge between both nodes by clicking on the edge and hitting the delete icon on the configuration panel.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ck1baywcjykzodcz9lz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ck1baywcjykzodcz9lz.png" alt="Agent Builder"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Click the "Agent" node, name the agent "HubSpot CRM Agent," and give the agent instructions as below:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are a CRM agent that can help me manage my contacts and deals in HubSpot CRM.
&lt;/code&gt;&lt;/pre&gt;


&lt;blockquote&gt;
&lt;p&gt;Note: We'll be editing this node later with the inputs for adding guardrails and logical nodes to the agent.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click on the "+" icon beside Tools in the agent node's configuration panel. It will open a dropdown; select "MCP Server" → click the "+ Server" button in the dialogue that appears.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Paste the MCP URL you copied from Rube, and in the Authorisation field, choose Access token / API Key and paste the token you generated.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvpafgedh989s6xy3ji32.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvpafgedh989s6xy3ji32.png" alt="Connect to MCP"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Paste the MCP URL you copied from Rube, and in the Authorisation field, choose Access token / API Key and paste the token you generated.&lt;/li&gt;
&lt;li&gt;Give the server a label like “Rube” and hit Connect&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We’ve now successfully connected HubSpot (via Rube MCP) to Agent Builder.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building a Secure Workflow Using Agent Builder
&lt;/h2&gt;

&lt;p&gt;Now that we have an Agent Node in our workflow, we need to add some guardrails and logical nodes to ensure users don't jailbreak the agent through prompt injection or other malicious attacks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Adding GuardRails
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Select the &lt;strong&gt;GuardRails&lt;/strong&gt; node from the toolbar and drag it onto the canvas. Connect it with the Start node, and in the configuration panel, give GuardRails a label of "GuardRails.” Enable "Jailbreak" to detect and prevent the agent from being used for malicious purposes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwbo63evnuo3aig22cbvg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwbo63evnuo3aig22cbvg.png" alt="Adding GuardRails"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Add Intent Classification
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Now, create a &lt;strong&gt;Classification Agent&lt;/strong&gt; for the workflow to classify the user's intent and route the request to the appropriate node. For this guide, we'll focus on creating a simple classification agent to classify the user's intent to create, update, or delete a contact and route the request to the appropriate node.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Drag and drop the Agent node from the toolbar, connect it with the "Pass" option of the Guardrails node, and in the configuration panel, give the Agent a label of "Classification Agent." Give the agent instructions as below:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Classify the user’s intent into one of the following categories: &lt;span class="s2"&gt;"create_contact"&lt;/span&gt;, &lt;span class="s2"&gt;"update_contact"&lt;/span&gt;, or &lt;span class="s2"&gt;"delete_contact"&lt;/span&gt;&lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;For the output format, select "JSON," and in the JSON schema field, add the following schema by going to Advanced Settings:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="s2"&gt;"type"&lt;/span&gt;: &lt;span class="s2"&gt;"object"&lt;/span&gt;,
  &lt;span class="s2"&gt;"properties"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;"classification"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="s2"&gt;"type"&lt;/span&gt;: &lt;span class="s2"&gt;"string"&lt;/span&gt;,
      &lt;span class="s2"&gt;"enum"&lt;/span&gt;: &lt;span class="o"&gt;[&lt;/span&gt;
        &lt;span class="s2"&gt;"create_contact"&lt;/span&gt;,
        &lt;span class="s2"&gt;"remove_contact"&lt;/span&gt;,
        &lt;span class="s2"&gt;"update_contact"&lt;/span&gt;
      &lt;span class="o"&gt;]&lt;/span&gt;,
      &lt;span class="s2"&gt;"description"&lt;/span&gt;: &lt;span class="s2"&gt;"classification of user intent"&lt;/span&gt;,
      &lt;span class="s2"&gt;"default"&lt;/span&gt;: &lt;span class="s2"&gt;""&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="s2"&gt;"additionalProperties"&lt;/span&gt;: &lt;span class="nb"&gt;false&lt;/span&gt;,
  &lt;span class="s2"&gt;"required"&lt;/span&gt;: &lt;span class="o"&gt;[&lt;/span&gt;
    &lt;span class="s2"&gt;"classification"&lt;/span&gt;
  &lt;span class="o"&gt;]&lt;/span&gt;,
  &lt;span class="s2"&gt;"title"&lt;/span&gt;: &lt;span class="s2"&gt;"response_schema"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;Update the schema by clicking the "Update" button in Advanced Settings. For the "Fail" option, we will use the "End" node to terminate the workflow, ensuring that if the GuardRails node fails, the workflow will conclude.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 3: Route Logic
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Now, we need to use the output variables from the Classification Agent in the next node, which will be a logical node to route the request to the appropriate node based on the user's intent.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Drag and drop the "&lt;strong&gt;If/else&lt;/strong&gt;" node from the toolbar, connect it with the "Classification Agent" node, and in the configuration panel, give a CaseName to the logic: "isValid." Then pass this expression in the next input field:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;input.output_parsed.classification &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"create_contact"&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; input.output_parsed.classification &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"update_contact"&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; input.output_parsed.classification &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"delete_contact"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For the "Else" option, connect it with the "End" node to end the workflow.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzmaymhccf8r5noyoxa6d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzmaymhccf8r5noyoxa6d.png" alt="Else Node"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Now, we need to use the "HubSpot CRM Agent" we created at the start of the workflow and connect it with the "isValid" option of the "If/else" node.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Click on the "HubSpot CRM Agent" node, and in the configuration panel, give the agent instructions as below:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;You are a HubSpot CRM assistant, perform user action based on user&lt;span class="s1"&gt;'s intention: {{input.output_parsed.classification}}. User input: {{workflow.input_as_text}} 
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;And that's it, you have a secured workflow that can classify the user's intent and route the request to the appropriate node based on the user's intention.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing the workflow
&lt;/h2&gt;

&lt;p&gt;Click on the "Preview" button in the top right corner of the canvas. It will open a sidebar with a chat interface. You can test the workflow by typing in the chat input and seeing the results.&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/WSCAdM67-Eg"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;h2&gt;
  
  
  Evaluating the performance and publishing the workflow
&lt;/h2&gt;

&lt;p&gt;To determine how your workflow is performing, you can use the "Evaluation" tool in the top right corner of the canvas by clicking the "Evaluate" button.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft5mhudofkklsb29l42vb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft5mhudofkklsb29l42vb.png" alt="Evaluation Result"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For deploying the agent, you can click the "Publish" button, give a name to your workflow, and click on the "Publish" button to deploy the workflow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh6donwpwby8jcuk976cf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh6donwpwby8jcuk976cf.png" alt="Deploying Agents"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Taking things further with Chatkit
&lt;/h2&gt;

&lt;p&gt;You can add more nodes to the workflow to enhance its robustness and further secure it according to your needs. You can also create ChatKit widgets to display the results and let your users interact with them.&lt;/p&gt;

&lt;p&gt;OpenAI also provides you with the feature of publishing apps to OpenAI, which other users can access to use your workflow. You can learn about it here: &lt;a href="https://platform.openai.com/docs/guides/chatkit" rel="noopener noreferrer"&gt;ChatKit Overview&lt;/a&gt;, &lt;a href="https://platform.openai.com/docs/guides/chatkit-widgets" rel="noopener noreferrer"&gt;ChatKit Widgets&lt;/a&gt;, &lt;a href="https://platform.openai.com/docs/guides/custom-chatkit" rel="noopener noreferrer"&gt;Custom ChatKit&lt;/a&gt;, &lt;a href="https://platform.openai.com/docs/guides/voice-agents?voice-agent-architecture=speech-to-speech" rel="noopener noreferrer"&gt;Voice Agents (speech-to-speech architecture)&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this guide, we built a HubSpot CRM Agent using OpenAI’s Agent Builder, integrating it with Rube MCP and adding guardrails, logical nodes, and intent classification to make the workflow secure and efficient. In practice, when I was working with the MCP node, it became quite complex to parse the desired payload requests from the agents across MCPs for their schemas. Transform nodes can help stabilize the input, but using them can also make the workflow significantly more complex.&lt;/p&gt;

&lt;p&gt;Nevertheless, Agent Builders are powerful because they let you design complex AI workflows visually, combine multiple tools and APIs, and focus on the logic of your agents rather than the underlying infrastructure. They make it easier to automate repetitive tasks, connect with various services, and quickly iterate on your agent’s behavior without writing a lot of boilerplate code.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>automation</category>
      <category>agents</category>
    </item>
    <item>
      <title>Improving your coding workflow with Claude Code Plugins</title>
      <dc:creator>Rohith Singh</dc:creator>
      <pubDate>Tue, 14 Oct 2025 14:09:16 +0000</pubDate>
      <link>https://dev.to/composiodev/improving-your-coding-workflow-with-claude-code-plugins-7i8</link>
      <guid>https://dev.to/composiodev/improving-your-coding-workflow-with-claude-code-plugins-7i8</guid>
      <description>&lt;p&gt;I've been using Claude Code for a while now, and it keeps getting better and better. Whether it's the developer experience, the models Anthropic brings in, or the continuous improvements, everything has been solid. When Anthropic dropped &lt;a href="https://www.anthropic.com/news/claude-code-plugins" rel="noopener noreferrer"&gt;plugins support on October 9, 2025&lt;/a&gt;, I was curious to see what they'd built. It's actually pretty useful for the development setup I prefer.&lt;/p&gt;

&lt;p&gt;You know how you always end up with this messy setup of slash commands, custom agents, and MCP servers scattered across different projects? And then, when your teammate asks, "How do I set up the same thing on my machine?" you realise you have no idea how to recreate your own setup. Well, plugins solve that problem. They let you bundle all your customisations into shareable packages that install with a single command. Think of it like packaging your favourite tools and features into a single file, which you can share with your team or use in your own workflow. I've been using them for a few days now, and it's been a game-changer for my workflow. In this post, I'll be going over what plugins are, how to configure them, and how I've adopted them in my workflow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl7ctw3hud39t6eghudqj.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl7ctw3hud39t6eghudqj.gif" alt="giphy"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Claude Code plugins are shareable packages that bundle slash commands, specialised agents, MCP servers, and hooks into single installable units.&lt;/li&gt;
&lt;li&gt;They solve the "how do I set up the same agentic workflow for my setup" problem by letting teams standardise their agentic development setups.&lt;/li&gt;
&lt;li&gt;The ecosystem is exploding with community marketplaces offering everything from DevOps automation to complete development stacks.&lt;/li&gt;
&lt;li&gt;Installation is pretty straightforward. You need to add a marketplace, browse for plugins in that marketplace, and install them. I've been using them for a few days now, and they've genuinely improved my workflow.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  A quick look at my plugin-powered workflow
&lt;/h3&gt;

&lt;p&gt;If you’re like me and want to see things running before the how and why, here’s a short demo of my actual setup, with plugins, Sub Agents, Slash commands, and &lt;a href="https://rube.app" rel="noopener noreferrer"&gt;Rube MCP&lt;/a&gt; working together with Linear.&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/kIKLHfLq4mI"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;h2&gt;
  
  
  What Are Claude Code Plugins?
&lt;/h2&gt;

&lt;p&gt;Alright, let's get technical for a second. According to the &lt;a href="https://docs.claude.com/en/docs/claude-code/plugins" rel="noopener noreferrer"&gt;official docs&lt;/a&gt;, plugins are basically lightweight packages that bundle together:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Slash commands&lt;/strong&gt;: Your custom shortcuts for stuff you do all the time with Claude Code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subagents&lt;/strong&gt;: Specialized AI agents that handle specific tasks (think of a database management flow, API testing, etc.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP servers&lt;/strong&gt;: The standardized way to connect Claude Code to external tools and data sources&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hooks&lt;/strong&gt;: Custom behaviors that trigger at specific points in your workflow&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How Plugins Work
&lt;/h2&gt;

&lt;p&gt;Plugins use a standardized JSON configuration format that defines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Metadata&lt;/strong&gt;: Name, version, description, author&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Components&lt;/strong&gt;: Which slash commands, agents, MCP servers, and hooks to include&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dependencies&lt;/strong&gt;: Other plugins or tools required for the setup&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configuration&lt;/strong&gt;: Default settings and environment variables you’d need&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The plugin system is built on top of Claude Code's existing extension points. So as a contributor, it's not reinventing the wheel - it's just making everything more organised and shareable.&lt;/p&gt;

&lt;p&gt;According to Anthropic, A standard plugin structure should look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;enterprise-plugin/
├── .claude-plugin/           &lt;span class="c"&gt;# Metadata directory&lt;/span&gt;
│   └── plugin.json           &lt;span class="c"&gt;# Required: plugin manifest&lt;/span&gt;
├── commands/                 &lt;span class="c"&gt;# Default command location&lt;/span&gt;
│   ├── status.md
│   └──  logs.md
├── agents/                   &lt;span class="c"&gt;# Default agent location&lt;/span&gt;
│   ├── security-reviewer.md
│   ├── performance-tester.md
│   └── compliance-checker.md
├── hooks/                    &lt;span class="c"&gt;# Hook configurations&lt;/span&gt;
│   ├── hooks.json            &lt;span class="c"&gt;# Main hook config&lt;/span&gt;
│   └── security-hooks.json   &lt;span class="c"&gt;# Additional hooks&lt;/span&gt;
├── .mcp.json                 &lt;span class="c"&gt;# MCP server definitions&lt;/span&gt;
├── scripts/                  &lt;span class="c"&gt;# Hook and utility scripts&lt;/span&gt;
│   ├── security-scan.sh
│   ├── format-code.py
│   └── deploy.js
├── LICENSE                   &lt;span class="c"&gt;# License file&lt;/span&gt;
└── CHANGELOG.md              &lt;span class="c"&gt;# Version history&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How MCPs Can Power Plugin Integrations
&lt;/h2&gt;

&lt;p&gt;The best thing I personally like about plugins is that they also let you share MCP configs.&lt;code&gt;mcp.json&lt;/code&gt;) within marketplaces. In my setup, I'm using several MCP integrations that handle different parts of my workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://rube.app" rel="noopener noreferrer"&gt;&lt;strong&gt;Rube&lt;/strong&gt;&lt;/a&gt;: This has become one of my go-to MCP server choices these days. Instead of manually configuring each MCP server, Rube provides a unified interface to discover, connect, and manage 500+ app integrations. I can browse and connect to any supported app, manage API keys securely, and orchestrate workflows across multiple services.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vercel MCP&lt;/strong&gt;: For deployment automation and project management. It connects directly to my Vercel projects, so I can deploy, manage domains, and check deployment status without leaving Claude Code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Airtable MCP&lt;/strong&gt;: For data management and project tracking. I can create records, update databases, and manage structured data directly through slash commands without switching between tools.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These MCPs work together seamlessly. I can deploy to Vercel, create Linear issues for any problems, and manage everything through Rube's interface. This is what makes my plugin setup work; it's not just about the plugins themselves, but the ecosystem of integrations they can access through MCPs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Plugin vs Marketplace vs Individual Components
&lt;/h2&gt;

&lt;p&gt;To understand how this all fits together, let’s first try to understand individual pieces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Individual Components&lt;/strong&gt;: Single slash commands, agents, or MCP servers, which you can configure manually for your setup&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plugins&lt;/strong&gt;: Bundled collections of related components that work together like a package&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Marketplaces&lt;/strong&gt; are repositories that host multiple plugins with discovery and installation tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And now let’s say you're working on a web app that needs deployment automation. A DevOps plugin might give you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/deploy&lt;/code&gt; slash command for one command secure deployments&lt;/li&gt;
&lt;li&gt;A specialized agent that knows your infrastructure inside out&lt;/li&gt;
&lt;li&gt;MCP servers that connect to your cloud providers&lt;/li&gt;
&lt;li&gt;Hooks that run security scans before every deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of this can be installed with a single command, so that you don’t need to copy-paste configs or set every little piece manually between projects.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Would You Even Want to Use Plugins?
&lt;/h2&gt;

&lt;p&gt;Based on &lt;a href="https://www.anthropic.com/news/claude-code-plugins" rel="noopener noreferrer"&gt;Anthropic's announcement&lt;/a&gt;, plugins solve some real problems that most of us face daily. Engineering leads can create standardised setups that everyone uses. Everyone gets the same tools, same configurations, same shortcuts, with no additional setup.&lt;/p&gt;

&lt;p&gt;If you maintain open source projects, you can now ship slash commands that help developers use your stuff correctly, which can help reduce the endless GitHub issues about setup problems. Just install the plugin and everything works the way it's supposed to with Claude Code. You know that debugging setup you spent weeks perfecting with the agentic workflow you’ve? Now you can package it and share it with your team (or anyone who has access to the project). The same goes for deployment pipelines, testing harnesses, whatever. Instead of everyone figuring out their own way to do things, you can share the good stuff.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6n4dt9u0vxw83r6j0qcw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6n4dt9u0vxw83r6j0qcw.png" alt="Claude plugin"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Instead of manually connecting to every service you use, plugins handle the MCP server setup with proper security and configuration—one less thing to mess up. If you're building frameworks or leading technical teams, you can package all your customisations together. Think of it like a starter template, but way more powerful.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Plugin Ecosystem and Marketplace
&lt;/h2&gt;

&lt;p&gt;The plugin ecosystem is already exploding with community-driven marketplaces. I've been going through these community marketplaces, and some of them are genuinely impressive:&lt;/p&gt;

&lt;h3&gt;
  
  
  Community Marketplaces
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;This guy, &lt;a href="https://github.com/wshobson" rel="noopener noreferrer"&gt;Seth Hobson&lt;/a&gt; has been doing something special for the past few months. &lt;a href="https://github.com/wshobson/agents" rel="noopener noreferrer"&gt;He's curated over 80 specialized sub-agents&lt;/a&gt; that you can install instantly. We're talking about sub agents for database management, API testing, code review - the whole production ready system. It's like having a team of demand specialists.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/rohittcodes/claude-plugin-suite" rel="noopener noreferrer"&gt;My Personal Plugin Marketplace&lt;/a&gt; is a comprehensive collection that includes 16 specialized agents, 10+ slash commands, and MCP integrations for 500+ app connections that I personally love to use. It's designed to give you everything you need for DevOps, testing, security, languages, and architecture in one package.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/jeremylongshore/claude-code-plugins" rel="noopener noreferrer"&gt;Jeremy Longshore's Claude Code Plugins&lt;/a&gt; is a comprehensive marketplace and educational hub that's particularly impressive for its breadth. With 20+ plugin packs covering everything from DevOps automation to AI/ML engineering, crypto trading tools, and even creator studio workflows, it's one of the most diverse collections available. What sets it apart is the educational focus - it's not just about installing plugins, but understanding how they work. The marketplace includes detailed learning paths, templates for building your own plugins, and comprehensive documentation. It's like a complete ecosystem for both plugin users and creators.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.anthropic.com/news/claude-code-plugins" rel="noopener noreferrer"&gt;Dan's marketplace&lt;/a&gt; focuses on the practical stuff - DevOps automation, documentation generation, project management. The kind of tools that actually save you time instead of just looking cool. The &lt;a href="https://www.aitmpl.com/plugins" rel="noopener noreferrer"&gt;AITMPL marketplace&lt;/a&gt; is interesting because it provides complete development stacks. Think "I want to build a React app with Stripe integration" and they've got a plugin that sets up everything you need.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How Marketplaces Work?
&lt;/h3&gt;

&lt;p&gt;The cool part about the plugin ecosystem is, creating your own marketplace is dead simple. Just need a git repo with a properly formatted &lt;code&gt;.claude-plugin/marketplace.json&lt;/code&gt; file. The &lt;a href="https://docs.claude.com/en/docs/claude-code/plugin-marketplaces" rel="noopener noreferrer"&gt;docs&lt;/a&gt; walk you through the format, and honestly, it's not that complicated.&lt;/p&gt;

&lt;p&gt;A marketplace is essentially a GitHub repository with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;code&gt;marketplace.json&lt;/code&gt; file in the root&lt;/li&gt;
&lt;li&gt;Plugin directories with their own &lt;code&gt;plugin.json&lt;/code&gt; files&lt;/li&gt;
&lt;li&gt;README files for documentation&lt;/li&gt;
&lt;li&gt;Version tags for releases&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Setting Up Plugins in Claude Code
&lt;/h2&gt;

&lt;p&gt;Alright, let's get you set up. The installation process is actually pretty straightforward:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Add a Marketplace
&lt;/h3&gt;

&lt;p&gt;First, you need to add a marketplace to your Claude Code installation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/plugin marketplace add user-or-org/repo-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example, to add Anthropic's official marketplace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/plugin marketplace add anthropics/claude-code
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhxn0lxcf16fgwvw8jfi1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhxn0lxcf16fgwvw8jfi1.png" alt="Installed plugin"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Browse and Install
&lt;/h3&gt;

&lt;p&gt;Once you've added a marketplace, just use the &lt;code&gt;/plugin&lt;/code&gt; menu to browse what's available. The interface is clean and shows you exactly what each plugin includes - no surprises.&lt;/p&gt;

&lt;p&gt;You can also install specific plugins directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/plugin &lt;span class="nb"&gt;install &lt;/span&gt;plugin-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Managing Your Plugins
&lt;/h3&gt;

&lt;p&gt;Here's the smart part - plugins are designed to be toggled on and off. Working on a database-heavy project? Enable your database plugin. Switching to frontend work? Disable it to keep your context clean. No more bloated setups.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkdfi875aittpw34l2nof.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkdfi875aittpw34l2nof.png" alt="Managing plugins"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can manage your plugins with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/plugin list          &lt;span class="c"&gt;# See all installed plugins&lt;/span&gt;
/plugin &lt;span class="nb"&gt;enable &lt;/span&gt;name   &lt;span class="c"&gt;# Enable a specific plugin&lt;/span&gt;
/plugin disable name  &lt;span class="c"&gt;# Disable a specific plugin&lt;/span&gt;
/plugin remove name   &lt;span class="c"&gt;# Remove a plugin entirely&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Current Limitations
&lt;/h3&gt;

&lt;p&gt;While plugins are genuinely useful, the ecosystem is still young and has some rough edges. I've run into plugin management issues on my Windows setup where the TUI shows inconsistent states and there's no clear way to remove failed installations. I've documented these issues in detail on &lt;a href="https://github.com/anthropics/claude-code/issues/9426" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; in case you run into similar problems. Despite these challenges, the benefits still outweigh the current limitations.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Personal Setup (and how you can use the same)
&lt;/h2&gt;

&lt;p&gt;Let me walk you through exactly what I'm running and how you can set up the same thing for your setup or workflow. I've been iterating on this for a few days after the release, and I think I've got something that actually works. The setup includes a set of:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;16 specialised sub agents (DevOps, Testing, Security, Architecture).&lt;/li&gt;
&lt;li&gt;10+ Slash commands, including &lt;code&gt;/deploy&lt;/code&gt;, &lt;code&gt;/test&lt;/code&gt;, &lt;code&gt;code-review&lt;/code&gt;, etc..&lt;/li&gt;
&lt;li&gt;8+ MCP integrations for all the tasks I need in my dev workflow.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you want to replicate &lt;a href="https://github.com/rohittcodes/claude-plugin-suite" rel="noopener noreferrer"&gt;the setup&lt;/a&gt;, you can install the marketplace and then the plugin using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/plugin marketplace add rohittcodes/claude-plugin-suite
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;claude-plugin-suite@claude-plugin-suite
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s it, your Claude Code instance now has a full CI/CD, testing and automation setup baked in. One of the most powerful aspects of the plugin system is that you can create your own. I built my Claude Plugin Suite because I couldn't find exactly what I needed with MCPs, and it's been gratifying.&lt;/p&gt;

&lt;p&gt;If you want to explore the ecosystem, check out the &lt;a href="https://docs.claude.com/en/docs/claude-code/plugins" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;, explore the &lt;a href="https://github.com/wshobson/agents" rel="noopener noreferrer"&gt;community marketplaces&lt;/a&gt;, and see what works for your setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Claude Code plugins are genuinely useful. They take Claude Code from being a powerful Agentic Coding tool to something that actually adapts to how you work. The plugin ecosystem is still young, but it's already showing promise for me. You've got community-driven marketplaces, official Anthropic support, and a growing collection of plugins that solve actual problems. Whether you're working solo and want to streamline your workflow, or you're leading a team and need to standardize practices, plugins give you a path to a more efficient, consistent development experience.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>coding</category>
      <category>tooling</category>
    </item>
    <item>
      <title>Claude Sonnet 4.5 vs. GPT-5 Codex: Best model for agentic coding</title>
      <dc:creator>Rohith Singh</dc:creator>
      <pubDate>Tue, 07 Oct 2025 13:53:58 +0000</pubDate>
      <link>https://dev.to/composiodev/claude-sonnet-45-vs-gpt-5-codex-best-model-for-agentic-coding-9o2</link>
      <guid>https://dev.to/composiodev/claude-sonnet-45-vs-gpt-5-codex-best-model-for-agentic-coding-9o2</guid>
      <description>&lt;p&gt;&lt;strong&gt;OpenAI&lt;/strong&gt;, with the release of &lt;strong&gt;GPT-5 Codex&lt;/strong&gt;, has added major upgrades to &lt;strong&gt;Codex&lt;/strong&gt;, the CLI agent, including longer context handling, better reasoning, and support for multi-hour autonomous sessions. Around the same time, &lt;strong&gt;Anthropic&lt;/strong&gt; released &lt;strong&gt;Claude Sonnet 4.5&lt;/strong&gt;, branding it as their best coding model yet. It can now run coding tasks for over 30 hours continuously, handle tools more reliably, and maintain stronger memory and context awareness throughout.&lt;/p&gt;

&lt;p&gt;Many tech influencers have shared their opinions about both releases. There’s a YT video by &lt;a href="https://www.youtube.com/watch?v=uZBjVeyiYkk" rel="noopener noreferrer"&gt;Theo&lt;/a&gt; saying that this release by Claude is the best one to date. X is buzzing with takes like &lt;a href="https://x.com/vasumanmoza/status/1972741452276072839" rel="noopener noreferrer"&gt;“Claude 4.5 Sonnet just refactored my entire codebase”&lt;/a&gt;, &lt;a href="https://x.com/VictorTaelin/status/1973014364492894223" rel="noopener noreferrer"&gt;“GPT-5 is still better at refactoring codes”&lt;/a&gt;, &lt;a href="https://x.com/JasonBotterill3/status/1974836373908783441" rel="noopener noreferrer"&gt;“Sonnet 4.5 isn’t cringe”&lt;/a&gt;, and &lt;a href="https://x.com/deedydas/status/1973574408599200146" rel="noopener noreferrer"&gt;“It’s better at designing UI”&lt;/a&gt;, and what-not.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8xsor1hqgtd9hg9byh6k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8xsor1hqgtd9hg9byh6k.png" alt="Tweet"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At the same time, there was also a &lt;strong&gt;v2 release for Claude Code&lt;/strong&gt;, and &lt;strong&gt;Codex&lt;/strong&gt; started working on &lt;strong&gt;HTTP-streamable support for MCPs&lt;/strong&gt;. So, with all this happening in the AI space, I decided to test things out by building something cool using both models, and trust me, this one is the best I’ve done to date.&lt;/p&gt;

&lt;p&gt;We’ll be comparing both side by side, and by the end, you’ll know exactly what to choose if you want to ship products that scale, stay secure, and move fast while keeping costs in mind. All the code for this comparison can be found here: &lt;a href="https://github.com/rohittcodes/fashion-hub" rel="noopener noreferrer"&gt;github.com/rohittcodes/fashion-hub&lt;/a&gt;. This is actually a fork of the Turbo repo template by create-t3-stack.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Demo of the App We Built Using Both Models&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Before diving into the comparisons, here’s a quick look at what we actually built. The UI and core features were generated collaboratively using &lt;strong&gt;Claude Sonnet 4.5&lt;/strong&gt; and &lt;strong&gt;GPT-5 Codex&lt;/strong&gt;, running through the same MCP-powered setup.&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/PVFyJtTGcKM"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;h2&gt;
  
  
  What We'll Be Comparing These Models and Agents With?
&lt;/h2&gt;

&lt;p&gt;Before proceeding, here’s how I’ll test both models using the same MCP-powered setup employed across all my builds. This will help maintain consistency across Codex and Claude when evaluating speed, reasoning, accuracy, and context retention. The whole setup that I've planned includes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Asking for ideation and Setup: The setup will include the app schema, laying out the whole project structure, and working on the repository structure.&lt;/li&gt;
&lt;li&gt;Cloning a Fashion E-commerce App UI&lt;/li&gt;
&lt;li&gt;Building the recommendation pipeline for the E-commerce App&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Before we dive in, let's take a moment to understand what these models and agents truly bring to the table.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Claude Sonnet 4.5 + Claude Code&lt;/th&gt;
&lt;th&gt;GPT-5 Codex + Codex&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Context &amp;amp; Memory&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Automatically pulls context and maintains tool state across sessions. See docs for contextual prompting.&lt;/td&gt;
&lt;td&gt;Long-context reasoning tuned for coding workflows; supports extended sessions.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tool / MCP orchestration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Sub‑agents isolate context and tools per task.&lt;/td&gt;
&lt;td&gt;Tight tool protocol integration with dynamic tool usage.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Error recovery / robustness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Good recovery from tool/state resets, especially with specialized agents.&lt;/td&gt;
&lt;td&gt;Strong at iterative correction loops and large refactors.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Coding &amp;amp; execution&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Excellent planning/architecture; can stumble on subtle async/edge cases.&lt;/td&gt;
&lt;td&gt;Very strong execution/debugging; often ships working code in complex domains.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Steerability / prompting burden&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;May need more guidance for multi‑step tool chains.&lt;/td&gt;
&lt;td&gt;More steerable out‑of‑box; less micromanagement required.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Session length / persistence&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Handles long sessions; benefits from task‑scoped agents.&lt;/td&gt;
&lt;td&gt;Built for multi‑hour autonomous execution.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost / efficiency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Costlier for very long runs with large contexts.&lt;/td&gt;
&lt;td&gt;Often more efficient on large coding tasks.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best for&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Orchestration, design oversight, documentation, multi‑tool workflows.&lt;/td&gt;
&lt;td&gt;Generation, debugging, refactoring, shipping features reliably.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Sonnet 4.5 + Claude Code&lt;/strong&gt;: Great at planning, system design, multi‑tool orchestration, and UI fidelity. Struggled more with lint fixes and schema edge cases in this project.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT‑5 Codex + Codex&lt;/strong&gt;: Strongest at iterative execution, refactoring, and debugging; reliably shipped a working recommendation pipeline with minimal lint errors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost efficiency in my runs&lt;/strong&gt;: Claude used far more tokens for UI and lint-fixing; Codex stayed leaner and fixed issues faster in the code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pick Claude&lt;/strong&gt; if you want design oversight, documentation, and multi‑tool orchestration. &lt;strong&gt;Pick Codex&lt;/strong&gt; if you're going to grind through bugs, refactors, and ship features fast.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;My choices for this repository are&lt;/strong&gt;: Codex for implementation loops, and Claude for architecture notes and UI polishing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why MCPs Even Matter Here
&lt;/h2&gt;

&lt;p&gt;MCP (Model Context Protocol) standardizes how agents call external tools and retain context across them. In this build, we relied on:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F984dl266isv6lxm3sq3s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F984dl266isv6lxm3sq3s.png" alt="Wikipedia MCP"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Credits: &lt;a href="https://en.wikipedia.org/wiki/Model_Context_Protocol" rel="noopener noreferrer"&gt;wikipedia&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rube MCP as the unified bridge to GitHub (repo init/commits), Neon (managed Postgres), Notion (planning docs), and Figma to pull frames/tokens for UI cloning reference&lt;/li&gt;
&lt;li&gt;Context7 MCP to search documentation (e.g., Gemini API notes) directly inside the session&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This made tool calls reproducible and auditable, allowing both agents to operate consistently across the same environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting up MCP servers with Claude Code and Codex
&lt;/h2&gt;

&lt;p&gt;Most MCP servers are built for single-purpose integrations, one for GitHub, another for Notion, and one more for Slack. That’s fine until your workflow spans multiple tools. Then it gets chaotic: juggling multiple MCPs, shrinking model context, and hitting client limits on the number of servers you can attach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rube MCP&lt;/strong&gt; addresses this issue with a universal connector layer, supporting over 500 apps through Composio’s integration stack. Instead of managing 10 separate MCP servers, you connect to Rube once and orchestrate multi‑tool workflows. Want to “take new GitHub issues and post them in Slack”? You can run it entirely through Rube without needing to figure out which server does what. Explore toolkits: &lt;a href="https://docs.composio.dev/toolkits/introduction" rel="noopener noreferrer"&gt;https://docs.composio.dev/toolkits/introduction&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/2HXafKhmwsQ"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;Quick Setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Visit &lt;a href="https://rube.app/" rel="noopener noreferrer"&gt;https://rube.app&lt;/a&gt; and click “Add to Claude Code”.&lt;/li&gt;
&lt;li&gt;Copy the command, run it in your terminal, then use &lt;code&gt;/mcp&lt;/code&gt; in Claude Code to authenticate Rube.&lt;/li&gt;
&lt;li&gt;For Codex, you can update the &lt;code&gt;config.toml&lt;/code&gt; file to include the Rube MCP server, by generating a new auth token from the Rube app, and then using it in the &lt;code&gt;config.toml&lt;/code&gt; file. You can follow the steps mentioned &lt;a href="https://github.com/openai/codex/blob/main/docs/config.md#connecting-to-mcp-servers" rel="noopener noreferrer"&gt;here&lt;/a&gt; to use the http streamable MCP servers with Codex.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Ideation and Setup
&lt;/h2&gt;

&lt;p&gt;The idea was to build something that actually showcases complex context handling, including how both models and agents handle it, and how the agents can autonomously handle multiple MCP tool calls through a single prompt. &lt;/p&gt;

&lt;p&gt;So, I decided to build everything inside a monorepo, letting both agents work on the same large codebase for faster feature development and debugging. Initially, I wanted to include a scalable try-on feature that could handle outfit generation for many users. But as Anthropic’s costs spiked, I had to drop that part and instead focused on building a recommendation engine that suggests outfit combinations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F27hh1tzf85mt6rozwox6.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F27hh1tzf85mt6rozwox6.gif" alt="Giphy"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The initial documentation and timeline for the project were generated using Cursor with Notion, utilising Rube MCP. Once the initial planning was complete, I began working on setting up the project using GitHub and Neon DB integrations with Rube MCP. (yeah, it can interact with all these apps) &lt;/p&gt;

&lt;p&gt;The initial setup proceeded smoothly, with no major issues. I used &lt;code&gt;create-t3-turbo&lt;/code&gt; to set up the monorepo, which includes a tRPC API, enhanced authentication, Drizzle, and Expo.&lt;/p&gt;

&lt;p&gt;These were simple prompts to run the agents for performing these actions, and the agents did a good job at it. Here's a sample of the prompt I used to run the agents for performing these actions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Using Rube MCP: create a Neon Postgres database, initialize a GitHub repo with the first commit, and generate a Notion planning page with milestones, and according to the tasks in `IDEATION.md` generate a plan for the project. Return the commands run, files changed, and any follow-ups.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  UI Cloning
&lt;/h2&gt;

&lt;p&gt;I actually had a UI planned for the application, but I’m bad at design, and it didn't look very exciting. I decided to get the initial UI idea from Figma and let the agents handle their own UI building. Surprisingly, they did a good job at it.&lt;/p&gt;

&lt;p&gt;The same prompt was used for UI cloning in both models, and they performed fairly well in this task. The task was not to capture the exact UI, but to get an initial idea of the UI, and the agents did a commendable job at it. To accurately clone the UI, it’s a simple fact that Anthropic’s LLMs perform better when replicating Figma Designs. You can read more about it here: &lt;a href="https://composio.dev/blog/cluade-code-with-mcp-is-all-you-need" rel="noopener noreferrer"&gt;Claude code with MCP is all you need&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Clone this [Link to the UI] from Figma, polished fashion e‑commerce UI for web (Next.js) and mobile (Expo): product grid/detail, cart, checkout, profile, wishlist, onboarding. Use brand style (soft pink accents, rounded‑XL, subtle gradients/shadows), add accessible focus states and good contrast, keep lint to a minimum, and build reusable primitives from packages/ui. Output how to run both apps.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Codex initially was unable to capture the design correctly, making abrupt errors with the flex properties of the design. As a result, the products weren’t visible in the catalogue for the expo app.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwcfsbm30pydox97vpn76.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwcfsbm30pydox97vpn76.png" alt="Expo Codex UI"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Our end user interface in the demo was designed by Sonnet 4.5 as you saw above in the intro, and honestly, it nailed the vibe: pixel-perfect layouts, smooth gradients, and a clean component hierarchy across both web and mobile. So I stuck with the same.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What does it cost to clone the UI:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Sonnet 4.5 + Claude Code&lt;/strong&gt;: ~5M tokens (UI cloning + iterations)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT‑5 Codex + Codex&lt;/strong&gt;: ~250k tokens (UI cloning + iterations)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Refactoring and Fixing the Lint Errors
&lt;/h2&gt;

&lt;p&gt;I then wanted to build a try-on feature for the application, and I started with the setup. I tried setting up local LLMs to help orchestrate the AR try-on (I tested a list of models and noted what went wrong), but the models were not performing well, so the other part of the plan was to use some image generation models to generate the try-on outfits, &lt;strong&gt;Gemini 2.5 Flash Image&lt;/strong&gt; (Nano Banana). For the initial setup, I used &lt;strong&gt;Cursor&lt;/strong&gt; to set up the schema for the DB and started setting up the pipeline for the try-on outfits, though I knew it’d definitely produce lint errors and a bunch of bugs, which I planned to fix with one of the models.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7lgovn9wz4paj2kx7npu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7lgovn9wz4paj2kx7npu.png" alt="Claude Setup"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once everything was set up, as I expected, I found a lot of bugs and lint errors. I prompted Claude Code to fix them, but it failed here. The lint errors were due to the schema setup, where we needed to fix relations between the tables, which &lt;strong&gt;Cursor (Auto mode)&lt;/strong&gt; missed on its side. I used the same prompt later for the Codex, and the first thing it did was grab the context from the monorepo regarding the current state of the project, and then it started fixing those relations in the schema.&lt;/p&gt;

&lt;p&gt;Although the cost up to then was relatively high, I had another feature in mind that would enable us to conduct some real coding and test how both models would perform when building a feature. This feature would involve a recommendation pipeline utilising &lt;strong&gt;content similarity&lt;/strong&gt; through specific algorithms within the API layer itself. Instead of going with a pipeline to check how they could perform under constraints of the environment and situation, I focused on shipping the recommendation work.&lt;/p&gt;

&lt;p&gt;During this detour, &lt;strong&gt;Context7 MCP&lt;/strong&gt; handled Gemini API/doc lookups directly in-session, and Rube MCP updated the Notion plan with decisions and blocked items to keep context aligned across tools.&lt;/p&gt;

&lt;p&gt;Cost of refactoring and fixing the lint errors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Sonnet 4.5 + Claude Code&lt;/strong&gt;: ~4M tokens (Lint fixing)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT‑5 Codex + Codex&lt;/strong&gt;: ~100k tokens (Lint fixing)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Building the Recommendation Pipeline
&lt;/h2&gt;

&lt;p&gt;For the recommendation pipeline, both models were given the same prompt to build the required DB schema and the UI for both web and mobile apps.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fas0u82jn187syvzbkyzr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fas0u82jn187syvzbkyzr.png" alt="Claude Recommendation pipeline"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Build an AI-powered Recommendation Engine in TypeScript that surfaces personalized product suggestions (For You, Similar Items, Trending) using a hybrid model (collaborative + content-based) with real-time learning from user interactions (views/cart/purchase/wishlist). Ship polished, accessible UI components, do not reference any design file. Match Fashion Hub’s brand style (soft pink accents, rounded-XL, subtle gradients, soft shadows, clean typography). The schema for the DB should be setup according to the current state of the project.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude Code took approximately 10 minutes to set up the schema, build the API layer, and the UI for both web and mobile apps. The total number of tokens used was around 1,189,670. However, it struggled with setting up the schema relations, which later caused issues with the API layer of the recommendation pipeline; that’s a significant oversight in terms of designing scalable and secure applications.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp4kb737lvcbsebr6kug8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp4kb737lvcbsebr6kug8.png" alt="Codex Planning"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Codex took almost 25 minutes, to set up the schema, build the API layer, and the UI for the web. I was exhausted by that time, so I had to drop the UI for the Expo app. Tokens: ~309k. But it set up the schema relations correctly and built the API layer with minimal lint errors.&lt;/p&gt;

&lt;p&gt;The commit for the Claude code feature generation can be found here: &lt;a href="https://github.com/rohittcodes/fashion-hub/commit/5b193ef7d1ee8218649d6a266b475572ac0dc262" rel="noopener noreferrer"&gt;commit/5b193ef7d1ee8218649d6a266b475572ac0dc262&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faij5zzqlivuup25hvzzq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faij5zzqlivuup25hvzzq.png" alt="Lint Errors Claude-1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5bp853m70z3y5fzx9kao.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5bp853m70z3y5fzx9kao.png" alt="Lint Errors Claude-2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Security Flaws and Issues in the Recommendation Pipeline (both models)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;From the code we generated using Sonnet 4.5, there was an issue with &lt;strong&gt;locking down the trending updates&lt;/strong&gt;, and the &lt;strong&gt;anonymous tracking was not using a server-issued&lt;/strong&gt;, signed, HttpOnly token mapped to an opaque ID with &lt;strong&gt;TTL/cleanup&lt;/strong&gt; to avoid spoofing/cross-user pollution. The &lt;strong&gt;heavy queue work&lt;/strong&gt; wasn’t being processed in a &lt;strong&gt;proper batching layer&lt;/strong&gt;, and the &lt;strong&gt;long queries&lt;/strong&gt; were not being precomputed on a schedule, and the SQL-side aggregation and indexes were not being added.&lt;/li&gt;
&lt;li&gt;According to the information we received from &lt;strong&gt;Codex&lt;/strong&gt;, there was an issue with &lt;strong&gt;long-running queries,&lt;/strong&gt; which is definitely undesirable in a &lt;strong&gt;serverless environment&lt;/strong&gt;. The &lt;strong&gt;UI&lt;/strong&gt; part was missing for the Expo app, while the web app had partial implementation of these features, but was still &lt;strong&gt;not fully functional&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Comparing the Cost
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F08hdmdtewkd15qfr7pse.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F08hdmdtewkd15qfr7pse.png" alt="Sonnet pricing"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sonnet 4.5&lt;/strong&gt; cost me around $10.26 with 18M input and 117k output tokens, with a lot of lint errors but great UI design fidelity.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;LMAO.. You def need to improve this Sam..&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4287x9cjl0tbomln9ri0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4287x9cjl0tbomln9ri0.png" alt="OpenAI Dashboard"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Till then, this is what I have:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ocdlkeeqqa2001790wy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ocdlkeeqqa2001790wy.png" alt="Codex Issues"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5 Codex&lt;/strong&gt; cost me around 600k input tokens and 103k output tokens which, approximately when valued with the pricing (i.e., $1.25 for input and $10 for output per 1 million tokens), would be around $2.50, with a clean-looking UI.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Developer Experience Feedback
&lt;/h2&gt;

&lt;p&gt;One significant issue with Codex is its developer experience (DX). There’s no clear way to track usage or cost. There’s an OAuth login, so why not generate an API key and show total usage or cost on the dashboard? Currently, it only displays the current session cost, and I couldn’t even verify if it has been deducted from my account.&lt;/p&gt;

&lt;p&gt;They do show context usage via session IDs, which is excellent, but overall visibility is poor. The &lt;code&gt;?&lt;/code&gt; The command for shortcuts no longer works.&lt;br&gt;
Even the docs feel outdated and not synced with the latest features. The DX used to be better a few weeks ago.&lt;/p&gt;

&lt;p&gt;The first thing that came to my mind was the OpenAI team vibe-coding this thing??&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Today, Codex is the &lt;strong&gt;more practical choice&lt;/strong&gt; for shipping features. It handles &lt;strong&gt;larger contexts&lt;/strong&gt; well, iterates faster on code, fixes lint issues with fewer retries, and, in my runs, costs less than Sonnet 4.5. Claude Code has felt &lt;strong&gt;inconsistent in longer sessions&lt;/strong&gt;, but Sonnet 4.5 remains strong at planning, architecture, and &lt;strong&gt;UI fidelity&lt;/strong&gt;. It’s the best Sonnet so far and cheaper than Opus, just not as reliable as Codex for heavy refactors in this repo.&lt;/p&gt;

&lt;p&gt;DX on Codex isn’t perfect, but performance and cost make up for it right now. If you depend on LLMs day to day, I’d pick &lt;strong&gt;Codex&lt;/strong&gt; for the longer run &lt;strong&gt;if the DX improves&lt;/strong&gt;. If you care about perfect UI and architecture guidance, bring in Sonnet 4.5 for design and docs, then let Codex implement and harden.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>openai</category>
      <category>ai</category>
      <category>llm</category>
    </item>
    <item>
      <title>10 best MCP servers to take your Chatgpt experience to next level</title>
      <dc:creator>Rohith Singh</dc:creator>
      <pubDate>Thu, 25 Sep 2025 15:11:10 +0000</pubDate>
      <link>https://dev.to/composiodev/10-best-mcp-servers-to-take-your-chatgpt-experience-to-next-level-19k6</link>
      <guid>https://dev.to/composiodev/10-best-mcp-servers-to-take-your-chatgpt-experience-to-next-level-19k6</guid>
      <description>&lt;p&gt;OpenAI recently added first-class support for MCP (Model Context Protocol) servers in ChatGPT’s Developer Mode, and that’s a pretty big deal for developers, even if you use ChatGPT for your day-to-day tasks. Instead of treating ChatGPT as a read-only assistant that only suggests what to click, Developer Mode lets the model interact with external tools and services through standardised MCP servers. In practice, that means ChatGPT can fetch live data, call APIs, and crucially, propose actions that you can confirm before they run.&lt;/p&gt;

&lt;p&gt;See the community launch thread for &lt;a href="https://community.openai.com/t/mcp-server-tools-now-in-chatgpt-developer-mode/1357233" rel="noopener noreferrer"&gt;MCP tools in Developer Mode&lt;/a&gt;; and official docs &lt;a href="https://platform.openai.com/docs/guides/developer-mode" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is an MCP Server btw?
&lt;/h2&gt;

&lt;p&gt;MCP servers expose discrete capabilities (like reading data, running an action, or let’s say return some structured results) in a predictable format the model can use. You can point ChatGPT at a hosted MCP or you can host a local MCP yourself, wire up OAuth or scoped API keys, and grant the assistant limited, auditable access to your apps. Developer Mode keeps the human in the loop: writes require explicit confirmation, and read calls are visible in the conversation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faq715au0vhi6ghy4a4tl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faq715au0vhi6ghy4a4tl.png" alt="MCP Architecture"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can learn more about mcp servers here: &lt;a href="https://modelcontextprotocol.io/docs/learn/server-concepts" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(btw, If you’re not familiar, Composio provides you with more than 250+ apps to connect AI Agents via Auth to external APIs, Such that you don’t need to manage the Auth layer yourself. You can learn more about the toolkits &lt;a href="https://docs.composio.dev/toolkits/introduction" rel="noopener noreferrer"&gt;here&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;For anyone building tooling, internal automations, or just trying to make their workflows less clicky, MCPs are the practical bridge between natural language and actual side-effecting operations. Below I’ve rounded up the top MCP servers that bring the most immediate, day-to-day value to developer workflows, the ones I’d try first if I were wiring ChatGPT into my stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to connect ChatGPT with MCP servers
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;In ChatGPT, open Settings → Connectors → Advanced Settings → Developer Mode&lt;/li&gt;
&lt;li&gt;Enable Developer Mode. You’ll see an option to add connectors in the chat input.&lt;/li&gt;
&lt;li&gt;Add or point to an MCP server. Many servers publish their own quick-start commands if you’re running them.&lt;/li&gt;
&lt;li&gt;For any action that writes data, ChatGPT will ask you to confirm before proceeding with that action.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You can follow along this configuration for Rube MCP below, other connections will need similar/same actions.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/zbnyiJWbS6I"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Keep secrets like API keys scoped and use least privilege keys. If you’re not much familiar with MCPs, I would suggest you to treat them as production automation tools because any connected app has the ability to run real actions against your tools.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  1. &lt;a href="https://rube.app" rel="noopener noreferrer"&gt;Rube MCP&lt;/a&gt;: A &lt;strong&gt;Universal MCP server for all your apps&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Most MCP servers connect you to a single tool - Github, Notion, Slack, etc.. That's fine if you only need one or two, but it quickly gets messy when you're working around several MCP servers. Some MCP clients even limit how many MCP servers you can add, because once you stack up too many of them, the model's context window gets smaller and even harder to work with.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://rube.app" rel="noopener noreferrer"&gt;&lt;strong&gt;Rube MCP&lt;/strong&gt;&lt;/a&gt; by Composio solves that by giving you a universal place to manage them all, so instead of switching between separate MCp servers, you simply connect to Rube once and get access to 500+ apps through Composio's integration layer. That includes Slack, Notion, Github, Linear and plenty more like them.&lt;/p&gt;

&lt;p&gt;So next time, if you wanna run a prompt like this: "Take new Github issues, and post them in Slack", you can run it entirely in Rube without any extra configurations, and thus you don't have think about which server does what.&lt;/p&gt;

&lt;p&gt;If you want to get started with Rube MCP in your ChatGPT interface, here's a short demo for the same:&lt;br&gt;
  &lt;iframe src="https://www.youtube.com/embed/Y_qEdPt8yzs"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;Resources to get started:&lt;br&gt;
&lt;a href="https://rube.app/" rel="noopener noreferrer"&gt;https://rube.app&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/ComposioHQ/Rube" rel="noopener noreferrer"&gt;https://github.com/ComposioHQ/Rube&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Stripe
&lt;/h2&gt;

&lt;p&gt;Stripe provides an official MCP server that lets you manage your payments right inside an MCP Client. If you want to check payments, send a refund, or pull some quick information, you'll have to log into the Stripe dashboard, click around and maybe even copy data somewhere else. With the official MCP support, you can do the same things directly inside ChatGPT.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmaigvcbo82l6yhdjbrbl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmaigvcbo82l6yhdjbrbl.png" alt="Stripe MCP"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can ask it for stuff like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Show me all unpaid invoices this week&lt;/li&gt;
&lt;li&gt;What's today's revenue so far?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Behind the scenes, the server just exposes a set of Stripe’s API endpoints in a format that ChatGPT can call. You connect it with an API key, and from then on you don’t need to switch between tabs whenever you want quick payment info.&lt;/p&gt;

&lt;p&gt;If you spend time in support or billing, this cuts out a lot of back-and-forth. Instead of opening Stripe for every little thing, you just stay in your chat window. You can read more about the MCP server &lt;a href="https://docs.stripe.com/mcp" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Cloudflare Observability
&lt;/h2&gt;

&lt;p&gt;Cloudflare Observability MCP server allows an &lt;strong&gt;MCP client&lt;/strong&gt; to access performance and uptime metrics for websites and applications. Instead of manually checking dashboards, developers can query latency, error rates, or traffic patterns directly through their AI agent.&lt;/p&gt;

&lt;p&gt;This is useful for monitoring system health, detecting issues early, or comparing metrics across environments without switching between multiple tools.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkmzd1wgdr3mj0irtotze.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkmzd1wgdr3mj0irtotze.png" alt="Cloudflare MCP"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Example queries you might run:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fetch uptime stats for a specific domain&lt;/li&gt;
&lt;li&gt;List recent error logs&lt;/li&gt;
&lt;li&gt;Compare traffic spikes over the past 24 hours&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Resources:&lt;br&gt;
&lt;a href="https://github.com/cloudflare/mcp-server-cloudflare/tree/main/apps/workers-observability" rel="noopener noreferrer"&gt;https://github.com/cloudflare/mcp-server-cloudflare&lt;/a&gt;&lt;br&gt;
&lt;a href="https://developers.cloudflare.com/agents/model-context-protocol/mcp-servers-for-cloudflare/" rel="noopener noreferrer"&gt;https://developers.cloudflare.com&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  4. ThoughtSpot
&lt;/h2&gt;

&lt;p&gt;ThoughtSpot’s MCP server brings &lt;strong&gt;analytics and reporting capabilities&lt;/strong&gt; into your MCP client. Rather than navigating complex BI tools, you can request data summaries or perform searches in plain language. It’s particularly handy for quickly checking business metrics, generating insights for reports, or exploring datasets without leaving your workflow.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/kiTpUPzgCbg"&gt;
  &lt;/iframe&gt;
&lt;br&gt;
Example use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retrieve sales metrics for a particular product&lt;/li&gt;
&lt;li&gt;Identify top-performing regions or categories&lt;/li&gt;
&lt;li&gt;Generate simple tables or summaries for analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Resources to get started with: &lt;br&gt;
&lt;a href="https://agent.thoughtspot.app/" rel="noopener noreferrer"&gt;https://agent.thoughtspot.app&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/thoughtspot/mcp-server" rel="noopener noreferrer"&gt;https://github.com/thoughtspot/mcp-server&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Carbon Voice
&lt;/h2&gt;

&lt;p&gt;Carbon Voice exposes productivity and communication-related functions to your AI Agent/MCP Client. You can access notes, reminders, or task-related information without switching apps. This is useful for staying organized, automating task updates, or querying ongoing action items directly in the chat.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9e5o8c475q331lvde7sz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9e5o8c475q331lvde7sz.png" alt="CarbonVoice MCP"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Possible actions you can peform:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;List upcoming tasks&lt;/li&gt;
&lt;li&gt;Summarize meeting notes&lt;/li&gt;
&lt;li&gt;Send notifications to team members&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Docs: &lt;a href="https://www.getcarbon.app/mcp/get-started-with-mcp" rel="noopener noreferrer"&gt;https://www.getcarbon.app/mcp/get-started-with-mcp&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Zine
&lt;/h2&gt;

&lt;p&gt;Zine provides memory and context management capabilities for AI agents. Your MCP Client can store, retrieve, or update contextual information across interactions from various apps/tools. This helps maintain continuity in conversations or workflows, especially for long-running projects or multi-step automations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiajzqo597bqk93q7yr8y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiajzqo597bqk93q7yr8y.png" alt="Zine MCP"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can perform actions like, storing key project decisions in Notion, Retrieving context from previous interactions on Twitter, track your progress on tasks in Linear over time, and a lot more.&lt;/p&gt;

&lt;p&gt;Resources to get started:&lt;br&gt;
&lt;a href="https://www.zine.ai/#integrations" rel="noopener noreferrer"&gt;https://www.zine.ai&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.youtube.com/watch?v=Qd7EkwzJbJg" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=Qd7EkwzJbJg&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Needle
&lt;/h2&gt;

&lt;p&gt;Needle is focused on knowledge retrieval and RAG (retrieval-augmented generation) functions. ChatGPT can pull structured information from internal or external knowledge bases efficiently. This can be valuable for research, customer support, or creating documentation without manually searching multiple sources.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frohapjlvmc6wi3obbuhz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frohapjlvmc6wi3obbuhz.png" alt="Needle MCP"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is especially valuable for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer support teams needing quick answers from internal documentation&lt;/li&gt;
&lt;li&gt;Researchers pulling references from multiple knowledge repositories&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example queries you can try:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search our internal knowledge base for all troubleshooting steps related to 502 errors&lt;/li&gt;
&lt;li&gt;Summarize the top 3 FAQs from our product docs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Resources to get started: &lt;a href="https://docs.needle.app/docs/guides/mcp/needle-mcp-server/" rel="noopener noreferrer"&gt;https://docs.needle.app/docs/guides/mcp/needle-mcp-server&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Fireflies
&lt;/h2&gt;

&lt;p&gt;Fireflies provides an MCP server that connects your AI agent to meeting intelligence. Instead of manually digging through call recordings or transcripts, your MCP client can fetch summaries, action items, or highlights directly.&lt;/p&gt;

&lt;p&gt;With this server, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retrieve transcripts of past meetings&lt;/li&gt;
&lt;li&gt;Generate summaries or follow-up notes&lt;/li&gt;
&lt;li&gt;Search across conversations for specific topics or decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcs0nm9llxr8vm02zvtiy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcs0nm9llxr8vm02zvtiy.png" alt="Fireflies MCP"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is especially useful for teams that run multiple customer or internal calls daily and want meeting data to flow into their broader workflows (like syncing with Notion, Slack, or project tools).&lt;/p&gt;

&lt;p&gt;Resources to get started:&lt;br&gt;
&lt;a href="https://guide.fireflies.ai/articles/8272956938-learn-about-the-fireflies-mcp-server-model-context-protocol" rel="noopener noreferrer"&gt;https://guide.fireflies.ai/articles/8272956938-learn-about-the-fireflies-mcp-server-model-context-protocol&lt;/a&gt;&lt;br&gt;
&lt;a href="https://fireflies.ai/blog/fireflies-mcp-server" rel="noopener noreferrer"&gt;https://fireflies.ai/blog/fireflies-mcp-server&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  9. Webflow
&lt;/h2&gt;

&lt;p&gt;The Webflow MCP server lets your MCP client or AI agent interact directly with Webflow projects. Instead of switching into the Webflow dashboard for every update, you can query collections, modify CMS items, and trigger publishes programmatically.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4lrxmf03nw3tvw5cbjm4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4lrxmf03nw3tvw5cbjm4.png" alt="Webflow MCP"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can perform typical actions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Listing CMS collections&lt;/li&gt;
&lt;li&gt;Creating or updating CMS entries (e.g., blog posts, product data, case studies)&lt;/li&gt;
&lt;li&gt;Publishing a site update&lt;/li&gt;
&lt;li&gt;Running batch updates across multiple projects&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes it easier to keep content and design workflows consistent without manually clicking through the Webflow UI.&lt;/p&gt;

&lt;p&gt;Resources:&lt;br&gt;
&lt;a href="https://developers.webflow.com/data/docs/ai-tools" rel="noopener noreferrer"&gt;https://developers.webflow.com/data/docs/ai-tools&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/webflow/mcp-server" rel="noopener noreferrer"&gt;https://github.com/webflow/mcp-server&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  10. Apify
&lt;/h2&gt;

&lt;p&gt;Apify MCP server provides web scraping and automation capabilities. Your MCP client, ChatGPT in our case can request structured data from websites, automate repetitive tasks, or extract content from multiple sources. It can be useful for market research, data collection or monitoring online content without manual effort.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc913k7hxgoic6f4k1ca5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc913k7hxgoic6f4k1ca5.png" alt="Apify MCP"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For example, you could pull product listings from an e-commerce site, collect user reviews or ratings, or monitor price changes over time. This allows you to gather insights and track online trends efficiently without manually visiting each site.&lt;/p&gt;

&lt;p&gt;Quickstart guide: &lt;a href="https://docs.apify.com/platform/integrations/mcp" rel="noopener noreferrer"&gt;https://docs.apify.com/platform/integrations/mcp&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;These were the handpicked MCP servers I’d start with today, not because they’re flashy but because they solve everyday bottlenecks I face. The case for MCPs is simple: they give LLMs a standard, auditable way to take action with least privilege access. That means you keep control while still getting real leverage. For anyone who wants to explore more MCP servers or follow up on similar tools, you can go through this page: &lt;a href="https://www.remote-mcp.com/" rel="noopener noreferrer"&gt;https://www.remote-mcp.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want one place to start and stay, use Rube MCP. One connection gives you access to hundreds of apps through a single server, unified auth, and consistent safety prompts for writes. It keeps your context tidy (no juggling multiple servers), scales as your stack grows, and lets you mix tools GitHub, Slack, Stripe, Notion, without reconfiguring every time. Start with Rube, add only what you need, and keep the human‑in‑the‑loop by design.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>chatgpt</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Claude Code vs Codex: Dev Workflow Comparison</title>
      <dc:creator>Rohith Singh</dc:creator>
      <pubDate>Mon, 15 Sep 2025 03:57:55 +0000</pubDate>
      <link>https://dev.to/composiodev/claude-code-vs-codex-dev-workflow-comparison-4jjf</link>
      <guid>https://dev.to/composiodev/claude-code-vs-codex-dev-workflow-comparison-4jjf</guid>
      <description>&lt;p&gt;For the past few days, there has been a lot of hype around OpenAI's Codex. And at the same time, Claude Code has been evolving day by day, to a perfect AI Agent with a list of features like subagents, slash commands, MCP support, and so much more. While I still prefer Claude Code, I thought it would be interesting to see how both of them perform on the same task. People say Codex + GPT-5 provides code closer to what a human would write, so let's test them out.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fna7wjvqjp6l0aorz5v0i.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fna7wjvqjp6l0aorz5v0i.gif" width="500" height="284"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Before we begin, Codex has introduced their support for stdio based MCPs. But still lacks the direct support for HTTP endpoints for MCPs. So to make sure our MCPs work, I've written a simple proxy layer over the stdio support so that Codex can use MCPs like Figma, Jira, GitHub, and more. You can find the code here: &lt;a href="https://github.com/rohittcodes/claude-vs-codex/blob/main/rube-mcp-adapter-auth.js" rel="noopener noreferrer"&gt;rube-mcp-adapter-auth.js&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So I ran a real build using Figma MCP for UI cloning and a separate coding challenge. And as always both agents got identical prompts, same setup.&lt;/p&gt;

&lt;p&gt;All the code from this comparison can be found here: &lt;a href="https://github.com/rohittcodes/claude-vs-codex" rel="noopener noreferrer"&gt;github.com/rohittcodes/claude-vs-codex&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Don't have time? Here's what happened:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Figma cloning:&lt;/strong&gt; Claude Code captured the design better but missed the yellow theme and a few details; Codex created its own version but was faster and cheaper&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Job scheduler:&lt;/strong&gt; Claude Code provided more reasoning steps and structured code; Codex was concise and faster&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Overall:&lt;/strong&gt; Claude Code is better for complex, detailed tasks with multiple steps. Codex is more efficient for straightforward code generation, with its own way of writing code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UX/DX:&lt;/strong&gt; Codex felt simpler to set up, and use (not the http based MCPs); Claude’s developer experience felt deeper once you get used to it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost:&lt;/strong&gt; Claude Code used more tokens overall (Figma: 6,232,242; Scheduler: 234,772) vs Codex (Scheduler: 72,579; Figma: 1,499,455)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Claude Code comes with native MCP support and extensive context windows. Codex recently added stdio-based MCP support (they still don't have direct support for HTTP endpoints for MCPs), while Claude Code supports MCPs out of the box. Btw, If you don't know what MCPs are, you can read about them &lt;a href="https://modelcontextprotocol.io/docs/getting-started/intro" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Instead of benchmarks, I wanted a practical comparison: build something devs can recognize. So, the tasks I picked were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Figma UI cloned into a working frontend&lt;/li&gt;
&lt;li&gt;A lightweight job scheduler with timezone handling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All within one day, with me just prompting.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I tested them
&lt;/h2&gt;

&lt;p&gt;I ran both agents through identical challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tools:&lt;/strong&gt; Rube MCP + Figma&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Languages:&lt;/strong&gt; TypeScript&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Measure:&lt;/strong&gt; Token usage, time, code quality, dev experience&lt;/li&gt;
&lt;li&gt;Both agents got the same prompts to keep it fair.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Rube MCP - Universal MCP Server
&lt;/h2&gt;

&lt;p&gt;Rube MCP (by Composio) is the universal connection layer for MCP toolkits like Figma, Jira, GitHub, and more. Explore toolkits: &lt;a href="https://docs.composio.dev/toolkits/introduction" rel="noopener noreferrer"&gt;docs.composio.dev/toolkits/introduction&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;How to connect:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://rube.composio.dev/" rel="noopener noreferrer"&gt;rube.composio.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Click "Add to Claude Code"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3vrtdzj7d4y05e120per.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3vrtdzj7d4y05e120per.gif" alt="rube" width="400" height="233"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Copy the command, run it in your terminal, then run &lt;code&gt;/mcp&lt;/code&gt; to authenticate your Rube MCP server. Once done, you can start using the tools.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For Codex, we’ll reuse the same auth token via the proxy layer, setup the &lt;code&gt;rube-mcp-adapter-auth.js&lt;/code&gt; file from the repo. See Codex config docs &lt;a href="https://github.com/openai/codex/blob/main/docs/config.md" rel="noopener noreferrer"&gt;here&lt;/a&gt; if you want more control over Codex setup. For now, your &lt;code&gt;config.toml&lt;/code&gt; should contain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[mcp_servers.rube]&lt;/span&gt;
&lt;span class="py"&gt;command&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"node"&lt;/span&gt;
&lt;span class="py"&gt;args&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"your-path-to/rube-mcp-adapter-auth.js"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Coding Comparison
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Round 1: Figma design cloning&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;I picked a complex landing page from Figma Community and asked both agents to recreate it using Next.js and TypeScript. You can find the Figma design &lt;a href="https://www.figma.com/community/file/1497324592982395916" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr6kms680if9idzrjacpc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr6kms680if9idzrjacpc.png" alt="Figma design" width="800" height="572"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Recreate the Figma landing page at [FIGMA_URL] in Next.js + TypeScript using TailwindCSS v4 only (no config file).
Follow a modular structure (components/layout/*, components/ui/*, components/sections/*), ensure pixel-accurate fidelity (typography, spacing, shadows, colors), and make it fully responsive (desktop, tablet, mobile).
No inline styles or third-party UI libraries.
Extract reusable components for repeated Figma elements, and enforce strict TypeScript types (no any).
Goal: A clean, maintainable, production-ready codebase that mirrors the Figma design as close as possible.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I wasn’t building the full developer platform here, just cloning a large landing page to see how close each agent could get.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Code results&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Claude Code (Sonnet 4) delivered a working Next.js app but missed the yellow theme entirely. It captured the design structure to some extent and even exported images from the Figma design, but the visual accuracy was disappointing. The layout was there but colors, spacing, and typography were noticeably different from the original.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbaru0usnu7h43qf8r0fv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbaru0usnu7h43qf8r0fv.png" alt="claude code clone" width="800" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tokens:&lt;/strong&gt; used a lot more than Codex.. 6,232,242 tokens to be exact.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time:&lt;/strong&gt; Longer due to more iterations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Design fidelity:&lt;/strong&gt; Partial - missed key theme elements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Codex results&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Codex (GPT-5 Medium) created its own version of the landing page. It didn't replicate the theme, layout, or components from the original design. Instead, it built a decent-looking landing page from scratch with no image exports. The result was functional but completely different from the Figma design.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fli22ga9dnuhu91sae2pt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fli22ga9dnuhu91sae2pt.png" alt="codex clone" width="800" height="504"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tokens:&lt;/strong&gt; fewer than Claude Code (i.e., 1,499,455 tokens)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time:&lt;/strong&gt; ~10 minutes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Design fidelity:&lt;/strong&gt; None - created original design&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Claude Code captured more of the original design but missed critical elements. Codex was faster and cheaper but ignored the design brief entirely.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Round 2: Job scheduler challenge&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;For the second task, It took a lot of time to decide upon this, it maybe not the best, but this is what I have for now.. PS: Suggest me some ideas for the new blogs.&lt;/p&gt;

&lt;p&gt;I threw a complex TypeScript challenge at both agents: build a timezone-aware cron scheduler with persistence and catch-up execution. This tests system design, timezone handling, and production-ready code structure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Build a lightweight job scheduler in TypeScript (Node.js) with the following requirements:
- Supports cron-style expressions (e.g., "09**1" = every Monday at 9AM).
- Must be timezone-aware: jobs scheduled in "America/New_York" vs "Asia/Kolkata" should trigger at correct local times even if the server runs in UTC.
- Implement a persistence layer (SQLite or JSON file) so scheduled jobs survive restarts.
- On startup, the scheduler must detect missed jobs (e.g., if server was down at scheduled time) and run catch-up executions.
- Provide a clean TypeScript interface with addJob, removeJob, listJobs methods.
- Include at least one example job (e.g., log "Hello World" daily at 9 AM in two different timezones).
- Must be written in modular, production-ready TypeScript (no any, no inline hacks).
- Optimize for readability + maintainability.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can run both the projects by cloning the repo &lt;a href="https://github.com/rohittcodes/claude-vs-codex" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Code results&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Claude Code delivered a comprehensive solution with extensive documentation and reasoning steps. It provided detailed explanations, great comments for typical part of the codes, and built-in test cases. The implementation was thorough with proper error handling, graceful shutdown, and production-ready structure.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tokens:&lt;/strong&gt; 234,772. Higher token usage due to detailed explanations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time:&lt;/strong&gt; Longer due to comprehensive approach&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code quality:&lt;/strong&gt; Production-ready with extensive documentation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Codex results&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Codex was more concise and direct. It built a modular, timezone-aware cron scheduler with JSON persistence and catch-up functionality. The solution was clean and functional but with less verbose explanations. It focused on getting the job done efficiently.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tokens:&lt;/strong&gt; 72,579. Lower token usage, but more concise&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time:&lt;/strong&gt; ~15 minutes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code quality:&lt;/strong&gt; Clean and functional&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both delivered working solutions. Claude Code provided more educational value and comprehensive documentation, while Codex was more efficient and direct.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it costed (tokens + time)
&lt;/h2&gt;

&lt;p&gt;Numbers vary by task complexity, but relative behavior was consistent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Figma task&lt;/strong&gt;: Claude Code used significantly more tokens due to detailed reasoning and image exports; Codex was more efficient&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scheduler task&lt;/strong&gt;: Claude Code provided comprehensive documentation but higher token usage; Codex was concise and faster&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Overall&lt;/strong&gt;: Claude Code (Sonnet 4) ~2-3× Codex (GPT-5 Medium) on token usage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6dgiuurik31wg6v52whj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6dgiuurik31wg6v52whj.png" alt="Claude pricing" width="800" height="455"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Exact usage so far, Figma: Claude Code 6,232,242; Codex 1,499,455. Scheduler: Claude Code 234,772; Codex 72,579.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Both can build apps with MCPs in a single day, but they approach tasks differently:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Code strengths&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Better design fidelity with Figma (when it follows instructions)&lt;/li&gt;
&lt;li&gt;More comprehensive documentation and reasoning&lt;/li&gt;
&lt;li&gt;Production-ready code structure&lt;/li&gt;
&lt;li&gt;Educational value with detailed explanations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Codex strengths&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Faster raw generation&lt;/li&gt;
&lt;li&gt;More cost-effective token usage&lt;/li&gt;
&lt;li&gt;Direct, concise solutions&lt;/li&gt;
&lt;li&gt;Good for "get something running" quickly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As for my take, use Codex if you want a prototype fast and cheap, or when design fidelity isn't critical, Only use Claude Code if you care about maintainability, documentation, and production readiness. And also for design-heavy tasks, Claude Code is better but can miss key elements (like the yellow theme) or maybe it was because the recent performance issues with ClaudeAI.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>mcp</category>
      <category>llm</category>
      <category>ai</category>
    </item>
    <item>
      <title>I built my complete side-project in a day using Claude Code and MCP, now you know why they don't hire jr devs anymore</title>
      <dc:creator>Rohith Singh</dc:creator>
      <pubDate>Thu, 28 Aug 2025 14:10:27 +0000</pubDate>
      <link>https://dev.to/composiodev/i-built-my-complete-side-project-in-a-day-using-claude-code-and-mcp-now-you-know-why-they-dont-22gk</link>
      <guid>https://dev.to/composiodev/i-built-my-complete-side-project-in-a-day-using-claude-code-and-mcp-now-you-know-why-they-dont-22gk</guid>
      <description>&lt;p&gt;I've been vibe coding since almost before Karpathy named it vibe coding, but yeah, I don't wish to, yet that's how things work these days. You can promptly ship a product in a single day. With the progress of MCP Servers, things have been getting better and better. I do almost all of my work with MCPs and Claude Code. Not because I'm lazy (or as some people would say, "skill issues"), but because I can do 10x more work in a single day. Just by reviewing the code and making changes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8fi4nhwc6nk66redfgpq.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8fi4nhwc6nk66redfgpq.gif" alt="Clever Code"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This post was inspired by &lt;a href="https://dwyer.co.za/static/claude-code-is-all-you-need.html" rel="noopener noreferrer"&gt;Gareth Dwyer's blog&lt;/a&gt;, where he mentioned using Claude Code for almost everything he does, such as shipping products and building websites. Let's discuss my experience with the same. I did, and it blew my mind.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;I built a complete MVP for an invoice management platform in one day using Claude Code and MCPs. What normally takes weeks of jumping between tools, Claude Code handled automatically from database setup to email testing. MCPs connected everything I needed without leaving my terminal. It was great.&lt;/p&gt;

&lt;p&gt;The entire build flow cost me around $3.65 (Sonnet 4: $3.63 + Haiku: $0.02), that’s almost around 5.8M tokens processed, and some manual configurations for less than the price of a latte.&lt;/p&gt;

&lt;p&gt;The build is open sourced and can be found here: &lt;a href="https://github.com/rohittcodes/linea" rel="noopener noreferrer"&gt;https://github.com/rohittcodes/linea&lt;/a&gt;. You’re free to contribute as well (it’s still in early development).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1p3avkfhkj70npa3vesu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1p3avkfhkj70npa3vesu.png" alt="Anthropic Pricing"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Intro
&lt;/h2&gt;

&lt;p&gt;I've been using Claude Code for months, but I was sceptical. Could it replace my usual development workflow? Meaning, I'm accustomed to switching between VS Code, GitHub, Figma, my database dashboard, Slack, and email. You know the drill.&lt;/p&gt;

&lt;p&gt;Then, MCPs (Model Context Protocols) came. Think of them as bridges that let Claude Code communicate directly with all your tools, without having to hop between apps.&lt;/p&gt;

&lt;p&gt;I happen to use &lt;a href="https://rube.app" rel="noopener noreferrer"&gt;Rube&lt;/a&gt;, a universal MCP server from Composio. It’s a bit of a shameless plug, but we kinda made it for this purpose. A single MCP server with only &lt;strong&gt;seven tools&lt;/strong&gt; that can communicate with any app on demand without the OAuth fuss.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvq1v0ex314y14ecjlxs2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvq1v0ex314y14ecjlxs2.png" alt="Rube Marketplace"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Deciding on the build
&lt;/h2&gt;

&lt;p&gt;So I decided to put this to the test. Instead of building another to-do app, I wanted to create something I'd use, like an invoice management platform for freelancers.&lt;/p&gt;

&lt;p&gt;Here's the thing: I've built similar apps before. Usually takes me 2–3 weeks of solid work. Setup, authentication, UI components, PDF generation, email integration... It’s a lot of moving parts.&lt;/p&gt;

&lt;p&gt;With Claude Code and MCPs? I decided to give myself one day and see what happens.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup (and the actual build)
&lt;/h2&gt;

&lt;p&gt;I started with a simple prompt to Claude Code. Nothing fancy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Build me an invoice management app. Next.js, Postgres(Neon), Prisma, authentication with Next-Auth/Auth.js, PDF generation, email sending. Make it look professional.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No detailed specs, no wireframes, no technical architecture documents. Just a simple request.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up Rube MCP With Claude Code
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://rube.composio.dev" rel="noopener noreferrer"&gt;Rube&lt;/a&gt; is a universal MCP server that you can use to call a list of toolkits for your AI Agents. You can have toolkits like GitHub, Figma, Linear, and many more using just Rube. However, the more MCP servers you add to your AI workflows, the smaller the context window becomes, which exacerbates the issue for complex workflows. At that point, you can use Rube. Setting up Rube is just child’s play.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Visit the Rube page: &lt;a href="https://rube.composio.dev/" rel="noopener noreferrer"&gt;https://rube.composio.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Click the installation button and select Claude Code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fge5ovxcirtpj3d1vezn3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fge5ovxcirtpj3d1vezn3.png" alt="Rube Integration"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Copy the installation command and run it in your terminal (make sure Claude Code is already installed)&lt;/li&gt;
&lt;li&gt;And done! You can just run Claude and ask the Rube MCP to do things for you. Run the &lt;code&gt;/mcp&lt;/code&gt; command to make sure you are connected to the MCP server. If not, click on the server and authenticate yourself with Rube using the generated link.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How It Went Down
&lt;/h2&gt;

&lt;p&gt;Claude Code immediately started working. The first thing I did was authorise Rube MCP and connect the GitHub toolkit. I asked it to create a new repository and open a PR, and it just worked.&lt;/p&gt;

&lt;p&gt;I connected the Figma toolkit and asked it to analyse some designs I had lying around (I just mentioned I wanted it to "look professional"), and it extracted a complete design system. Colours, fonts, spacing-everything perfectly organised into CSS variables.&lt;/p&gt;

&lt;p&gt;You can compare the Figma design for the templates here: &lt;a href="https://www.figma.com/community/file/1265787783615420446/invoice-design-kit-brix-agency" rel="noopener noreferrer"&gt;https://www.figma.com/community/file/1265787783615420446/invoice-design-kit-brix-agency&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5lkeexzbwbbl0j0efrw9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5lkeexzbwbbl0j0efrw9.png" alt="Dashboard"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The best thing about the Figma MCP was that it started with a detailed analysis and plan, then it began building the components, followed by pages... (blah, blah), and it was completed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn5oj10llvpl5tjbq8ar7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn5oj10llvpl5tjbq8ar7.png" alt="Claude Code - Figma"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Meanwhile, it spun up a Postgres database using Neon MCP. No manual configuration required, eliminating the need to copy and paste connection strings. Just done.&lt;/p&gt;

&lt;p&gt;I'm sitting here drinking my coffee, watching Claude Code work like a junior developer on steroids who works 24/7 for me and never gets tired.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Real Magic
&lt;/h3&gt;

&lt;p&gt;By lunch, I had a working authentication system. Email magic links, session management, the whole thing. I didn't write a single line of auth code myself.&lt;/p&gt;

&lt;p&gt;Then came the fun part, building the actual invoice features. I asked it to research what users want in invoice tools, and it used a web search tool to gather feedback about pain points in existing solutions.&lt;/p&gt;

&lt;p&gt;The crazy part? Most things just worked. Although some Tailwind configuration issues required manual fixes, the overall experience was significantly smoother than I expected.&lt;/p&gt;

&lt;h3&gt;
  
  
  It worked?
&lt;/h3&gt;

&lt;p&gt;I had something that looked like a good product (I won't say the perfect/real product, you still gotta make some manual changes):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clean dashboard with analytics&lt;/li&gt;
&lt;li&gt;Client management system&lt;/li&gt;
&lt;li&gt;Invoicing  with multiple templates&lt;/li&gt;
&lt;li&gt;PDF generation that looked good&lt;/li&gt;
&lt;li&gt;Email sending that worked on the first try&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwqe2k8wod3skr9yk56oi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwqe2k8wod3skr9yk56oi.png" alt="Templates grid"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxpw15lsq6p7tua9f27sy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxpw15lsq6p7tua9f27sy.png" alt="Template View"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I kept waiting for something to break—some edge cases to surface. The usual development pain points kick in. Fortunately, it was fine for the most part.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cost Factor to decide
&lt;/h2&gt;

&lt;p&gt;Alright, let’s talk money, because building fast is great, but if it burns a hole in my pocket, that’s not sustainable.&lt;/p&gt;

&lt;p&gt;Yesterday’s entire build session cost me &lt;strong&gt;$3.65&lt;/strong&gt;. That’s it.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Sonnet 4&lt;/strong&gt;: $3.63 (did all the heavy lifting)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Haiku 3.5&lt;/strong&gt;: $0.02 (handled the quick, lightweight stuff)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To put it in perspective: I pushed around &lt;strong&gt;5.8 million tokens&lt;/strong&gt; through Claude in a single day. That’s database setup, repository creation, design parsing from Figma, authentication scaffolding, PDF generation, and even wiring email, all done for under four bucks.&lt;/p&gt;

&lt;p&gt;Honestly, that’s great. If I compare it to the old-school workflow, spinning up infra, wrangling boilerplate, burning hours in dashboards, that’s not just cheaper, it’s ridiculously more efficient. For less than the price of a Starbucks latte, I shipped an MVP.&lt;/p&gt;

&lt;p&gt;You would’ve to shell out a few thousand dollars for this easily. Considering that this is nothing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F66qorxdg8acg96yip36c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F66qorxdg8acg96yip36c.png" alt="Anthropic Usage"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So yeah, the numbers make the story even better: this isn’t some expensive gimmick, it’s a genuinely cost-effective dev workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  About the product
&lt;/h2&gt;

&lt;p&gt;A pretty solid invoice management platform, with features like (MVP features):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User authentication (magic links)&lt;/li&gt;
&lt;li&gt;Client management with contact details&lt;/li&gt;
&lt;li&gt;A couple of invoice templates&lt;/li&gt;
&lt;li&gt;PDF generation&lt;/li&gt;
&lt;li&gt;Email sending&lt;/li&gt;
&lt;li&gt;Basic dashboard&lt;/li&gt;
&lt;li&gt;Revenue tracking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpmkjdgek83dsly0yukia.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpmkjdgek83dsly0yukia.png" alt="Profile Settings"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The tech stack? Next.js 14, PostgreSQL, Prisma, NextAuth.js, Tailwind CSS. Pretty standard stuff.&lt;/p&gt;

&lt;p&gt;But here's the thing that surprised me: I didn't have to think about most of the setup. The database schemas were created by Claude Code (I just provided a prompt, and it did the rest). API endpoints? Done. Email configuration? I had to intervene a bit, but it was still way faster than manual setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  But Wait, Am I Just Getting Lazy?
&lt;/h2&gt;

&lt;p&gt;Look, I get it. There's this voice in my head too saying to me, "you're not really coding anymore" or "you're losing your skills. "Here's the thing. However, I'm not outsourcing the thinking. I'm still making the architectural decisions, reviewing code, and debugging when things get tricky.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsle81pqd9tq8362oddwf.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsle81pqd9tq8362oddwf.webp" alt="Boring"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Claude Code handles the repetitive stuff—the boilerplate. The "create another CRUD endpoint" tasks that we all do, but nobody enjoys.&lt;/p&gt;

&lt;p&gt;When things get complex—such as security, data modelling, and performance optimisation—I still dive in and code it myself. But for everything else? Why waste time when I could be shipping?&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Us Developers
&lt;/h2&gt;

&lt;p&gt;Honestly? This is where development is heading. Not replacing developers-we're still needed for the thinking, the architecture, the complex problem-solving.&lt;/p&gt;

&lt;p&gt;But the grunt work? The setup, configuration, and deployment pipeline stuff? Claude Code and MCPs can handle that pretty well, and faster than manual setup. It's like having a skilled junior developer who is well-versed in the frameworks.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's Next for the product
&lt;/h3&gt;

&lt;p&gt;I will continue to build upon this workflow. The invoice platform I made is solid. I might clean it up and release it properly. I'm also going to build a few more products with this workflow.&lt;/p&gt;

&lt;p&gt;Here’s a demo of how the interactions look like in the development stage. There are few more features stacked up, will be releasing when everything fits the vibe.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/leUwb2NE3r8"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;I'm not saying Claude Code and MCPs will replace traditional development overnight. There are still plenty of cases where you need to get your hands dirty with code. But for shipping products fast? For getting ideas from concept to reality in hours instead of weeks? This workflow is pretty useful.&lt;/p&gt;

&lt;p&gt;The invoice platform I built in one day would have taken me at least 2–3 weeks the old way. And that's with me being pretty experienced with the tech stack. And don’t forget to drop a star here: &lt;a href="https://github.com/rohittcodes/linea" rel="noopener noreferrer"&gt;https://github.com/rohittcodes/linea&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>mcp</category>
      <category>llm</category>
      <category>programming</category>
    </item>
    <item>
      <title>Can You Build AI Agents in Rust? Yep, and Here’s How I Did it</title>
      <dc:creator>Rohith Singh</dc:creator>
      <pubDate>Tue, 19 Aug 2025 16:19:01 +0000</pubDate>
      <link>https://dev.to/composiodev/can-you-build-ai-agents-in-rust-yep-and-heres-how-i-did-it-2b5i</link>
      <guid>https://dev.to/composiodev/can-you-build-ai-agents-in-rust-yep-and-heres-how-i-did-it-2b5i</guid>
      <description>&lt;p&gt;Everyone's building AI agents these days, and everyone's teaching you how to do it in Python or JavaScript. Nothing wrong with &lt;strong&gt;Python&lt;/strong&gt;. It's fast to prototype with and has a mature ecosystem. But I wanted to try something different. What if we could build a &lt;strong&gt;multi-agent system&lt;/strong&gt; that orchestrates different specialised agents, each connected to real-world tools via MCP (Model Context Protocol), and what if we built it in &lt;strong&gt;Rust&lt;/strong&gt;?&lt;/p&gt;

&lt;p&gt;That’s exactly why I built &lt;strong&gt;Codepilot&lt;/strong&gt;, a multi-agent system that can handle Linear project management, GitHub repository operations, and Supabase tasks, all through a beautiful terminal UI.&lt;/p&gt;

&lt;p&gt;It’s a fun side project, and if you’re curious and want to try things with Rust, maybe you'll find this useful. The source code is available on my GitHub here: &lt;a href="https://github.com/rohittcodes/codepilot" rel="noopener noreferrer"&gt;rohittcodes/codepilot&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a Multi-Agent System and why Rust in particular?
&lt;/h2&gt;

&lt;p&gt;Traditional AI agents are great, but they often struggle when you need to handle multiple domains or complex workflows. What if you want to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a GitHub issue and link it to a Linear project.&lt;/li&gt;
&lt;li&gt;Query your Supabase database and create a summary report.&lt;/li&gt;
&lt;li&gt;Manage repositories across different services.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A &lt;strong&gt;multi-agent system&lt;/strong&gt; solves this by having specialized agents that can collaborate and orchestrate complex workflows.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4bx7o5x14a3kvs8920f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4bx7o5x14a3kvs8920f.png" alt="Multi Agent system architecture"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;credits: &lt;a href="https://langchain-ai.github.io/langgraph/concepts/agentic_concepts/" rel="noopener noreferrer"&gt;Langchain&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  And why Rust??
&lt;/h3&gt;

&lt;p&gt;Rust isn’t the usual go-to for AI, but it has some killer benefits on its side:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Performance&lt;/strong&gt;: Zero-cost abstractions and memory safety mean your agent runs fast without eating resources&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Type Safety&lt;/strong&gt;: Errors can be caught at compile time, not when your agent’s halfway through a task.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ecosystem Potential&lt;/strong&gt;: Although the AI ecosystem is more mature in Python, Rust’s async/await model and strict typing make it ideal for agents to juggle between multiple tools, APIs, or tasks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And now, if you wish to build something fast, reliable, and scalable, Rust becomes a solid choice there. So, before we dive deep into building it, let’s start with the basics.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is an AI Agent by the way?
&lt;/h2&gt;

&lt;p&gt;An &lt;strong&gt;AI agent&lt;/strong&gt; is a program that can understand your intent and take actions on your behalf. Think of it as one of your intelligent assistants that doesn't just chat, it does things. In our case, the agent understands when you're asking about Linear issues, GitHub repositories, or Supabase data, and then calls the appropriate APIs to retrieve the information, combining it with &lt;strong&gt;natural language responses&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqugcm9a42r7838cjxi4x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqugcm9a42r7838cjxi4x.png" alt="Langchain Agentic Architectures"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Agentic Architectures (credits: &lt;a href="https://langchain-ai.github.io/langgraph/concepts/agentic_concepts/" rel="noopener noreferrer"&gt;Langchain&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;One of the key insights here is that &lt;strong&gt;LLMs&lt;/strong&gt; excel at understanding intent (what you want to do), but struggle to &lt;strong&gt;access&lt;/strong&gt; &lt;strong&gt;real-time information&lt;/strong&gt;. By combining LLMs with APIs, we can create a program that automates tasks for you, eliminating the need for manual effort. Now you can get the best of both worlds: natural language understanding plus real-time information access.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting started with the Rust AI Agents
&lt;/h2&gt;

&lt;p&gt;Alright, let’s build the thing. I didn’t want to overthink about the setup, just a plain &lt;strong&gt;Rust binary project&lt;/strong&gt;, a few crates to make async work easier, and enough structure to plug the tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting up the project
&lt;/h3&gt;

&lt;p&gt;First, we create a new Rust binary project (not a lib):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo new codepilot
&lt;span class="nb"&gt;cd &lt;/span&gt;codepilot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add these &lt;strong&gt;dependencies&lt;/strong&gt; to your &lt;code&gt;Cargo.toml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[dependencies]&lt;/span&gt;
&lt;span class="py"&gt;anyhow&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"1"&lt;/span&gt;
&lt;span class="py"&gt;chrono&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;features&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"serde"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="py"&gt;dotenv&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.15"&lt;/span&gt;
&lt;span class="py"&gt;tokio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;features&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"full"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="py"&gt;tracing&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.1"&lt;/span&gt;
&lt;span class="py"&gt;tracing-subscriber&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;features&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"env-filter"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="py"&gt;swarms-rs&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.1.9"&lt;/span&gt;
&lt;span class="py"&gt;serde&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"1.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;features&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"derive"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="py"&gt;serde_json&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"1.0"&lt;/span&gt;
&lt;span class="py"&gt;reqwest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.11"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;features&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"json"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="py"&gt;crossterm&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.27"&lt;/span&gt;
&lt;span class="py"&gt;tui&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.19"&lt;/span&gt;
&lt;span class="py"&gt;ratatui&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.24"&lt;/span&gt;
&lt;span class="py"&gt;regex&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"1.10"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Building my first agent in Rust
&lt;/h3&gt;

&lt;p&gt;The core idea is simple: A multi-agent system with specialized agents, which will have tools that the LLM can call, then let the LLM decide which agent to use based on the user’s query. Here’s the basic structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/agents/linear.rs&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;LinearAgent&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Box&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;dyn&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;linear_client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;LinearMCPClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;available_tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ToolInfo&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="n"&gt;LinearAgent&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;Self&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Initialize agent with dynamic tool discovery&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;strong&gt;Integrating Composio MCP Servers&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To connect each agent with real-world APIs, we use &lt;a href="https://platform.composio.dev/" rel="noopener noreferrer"&gt;Composio MCP Integration&lt;/a&gt;. These servers expose authenticated API actions your agents can call, without you having to handwrite integrations.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;Codepilot&lt;/strong&gt;, I’ve set up MCP servers for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: To handle repos, issues, PRs, etc.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Linear&lt;/strong&gt;: For project and issue management&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Supabase&lt;/strong&gt;: For querying and updating data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can create your own MCP server configs in a few clicks.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to add your own MCP config in Composio
&lt;/h3&gt;

&lt;p&gt;If you want to integrate your tools (or replicate what I’ve done), here’s the flow using the &lt;a href="https://platform.composio.dev/" rel="noopener noreferrer"&gt;new Composio dashboard&lt;/a&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Log in&lt;/strong&gt; and go to &lt;strong&gt;MCP Configs.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;“Create MCP Config”&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Give it a name (like &lt;code&gt;linear-agent&lt;/code&gt; or &lt;code&gt;github-bot&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Choose the toolkit (e.g., &lt;code&gt;Linear&lt;/code&gt;, &lt;code&gt;GitHub&lt;/code&gt;, &lt;code&gt;Supabase&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Select how you want to handle authentication.&lt;/li&gt;
&lt;li&gt;Paste your API keys or use OAuth to connect.&lt;/li&gt;
&lt;li&gt;Pick the tools you want your agent to have access to&lt;/li&gt;
&lt;li&gt;Hit &lt;strong&gt;“Create MCP Server”&lt;/strong&gt;. This will prompt you to a dialog where you can copy the MCP server URL. Paste it in the &lt;code&gt;.env&lt;/code&gt; file with appropriate variable names.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/pis1d1Lun24"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Codepilot Architecture:
&lt;/h2&gt;

&lt;p&gt;The core architecture is built around three core principles:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Specialized Agents&lt;/strong&gt;: Each agent (Linear, Github, Supabase) is an expert in its domain.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP Integration&lt;/strong&gt;: All agents connect to the MCP tools via Composio Integrations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intelligent Orchestration&lt;/strong&gt;: A central orchestrator that routes queries to the right agent.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2m00ghws3o6xsik9r8hi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2m00ghws3o6xsik9r8hi.png" alt="Agent Architecture"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Core Components
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;MultiAgentOrchestrator&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;linear_agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;LinearAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;github_agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;GitHubAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;supabase_agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SupabaseAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;LinearAgent&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Box&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;dyn&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;linear_client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;LinearMCPClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;available_tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ToolInfo&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each agent is a &lt;strong&gt;specialist&lt;/strong&gt; who knows how to work with its specific domain tools, fetched dynamically from MCP Servers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dynamic Tool Discovery
&lt;/h3&gt;

&lt;p&gt;One of the coolest features is that each agent &lt;strong&gt;discovers its tools dynamically&lt;/strong&gt; from MCP servers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;impl&lt;/span&gt; &lt;span class="n"&gt;LinearAgent&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;Self&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;linear_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;LinearMCPClient&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="c1"&gt;// Get dynamic tools from the MCP server&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;tools_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;linear_client&lt;/span&gt;&lt;span class="nf"&gt;.get_tools&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;available_tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ToolInfo&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tools_response&lt;/span&gt;
            &lt;span class="nf"&gt;.iter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="nf"&gt;.map&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="n"&gt;ToolInfo&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="nf"&gt;.as_str&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.unwrap_or&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"unknown"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.to_string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="nf"&gt;.as_str&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.unwrap_or&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"No description"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.to_string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="n"&gt;input_schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"inputSchema"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="nf"&gt;.clone&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="nf"&gt;.collect&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="c1"&gt;// Create dynamic system prompt with actual tool descriptions&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;tools_description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;available_tools&lt;/span&gt;
            &lt;span class="nf"&gt;.iter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="nf"&gt;.map&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="nd"&gt;format!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"- {}: {}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="py"&gt;.name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="py"&gt;.description&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="py"&gt;.collect&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="nf"&gt;.join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
       &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means, &lt;strong&gt;no hardcoded operations&lt;/strong&gt; - the agents automatically adapt to whatever tools are available on their MCP Servers!&lt;/p&gt;

&lt;h2&gt;
  
  
  What's happening here?
&lt;/h2&gt;

&lt;p&gt;The system uses &lt;strong&gt;pure LLM-based tool selection&lt;/strong&gt; with intelligent fallbacks. When you ask a question, here's what happens:&lt;/p&gt;

&lt;h3&gt;
  
  
  True LLM-Based Tool Selection
&lt;/h3&gt;

&lt;p&gt;The agents use a sophisticated approach where the LLM analyses your request and mentions specific tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// 1. LLM analyzes the query and mentions specific tools&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;llm_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.agent&lt;/span&gt;&lt;span class="nf"&gt;.run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="nf"&gt;.to_string&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;// LLM says: "I would use GITHUB_LIST_REPOSITORIES to fetch your repositories"&lt;/span&gt;

&lt;span class="c1"&gt;// 2. Parse LLM guidance to extract tool selection&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="py"&gt;.available_tools&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;guidance_lower&lt;/span&gt;&lt;span class="nf"&gt;.contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="py"&gt;.name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// LLM mentioned this tool - execute it&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="nf"&gt;.execute_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// 3. If LLM doesn't mention tools, provide a clear error message&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"I don't have a tool for that request. Available tools are: [list tools]"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Constrained Agent Configuration
&lt;/h3&gt;

&lt;p&gt;To prevent the LLM from calling internal tools, we use a &lt;strong&gt;constrained configuration&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;
    &lt;span class="nf"&gt;.agent_builder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;.agent_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"GitHubAgent"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;.system_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;.user_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"User"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;.max_loops&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// Single loop to prevent tool calling&lt;/span&gt;
    &lt;span class="nf"&gt;.temperature&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// Focused responses&lt;/span&gt;
    &lt;span class="nf"&gt;.max_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2048&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// Shorter responses&lt;/span&gt;
    &lt;span class="nf"&gt;.build&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Clear Tool Constraints
&lt;/h3&gt;

&lt;p&gt;The system prompt explicitly constrains the LLM only to use available MCP tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nd"&gt;format!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"You are a GitHub agent. You can ONLY use these GitHub MCP tools:

{}

CRITICAL: You are NOT allowed to use any other tools.
You can ONLY mention and use the tools listed above.

When a user asks you something:
1. Look at the list of tools above
2. Find the most appropriate tool for their request
3. Mention the exact tool name you would use
4. Explain why you chose that tool

Example responses:
- 'I would use GITHUB_LIST_REPOSITORIES to fetch your repositories'
- 'I would use GITHUB_CREATE_ISSUE to create a new issue'

If no tool matches the request, say: 'I don't have a tool for that request. Available tools are: [list tools]'

Remember: ONLY use tools from the list above. Never use any other tools."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools_description&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you ask "List all my GitHub repositories", the system:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Orchestrator LLM&lt;/strong&gt; → "USE_GITHUB_AGENT"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Agent LLM&lt;/strong&gt; → "I would use GITHUB_LIST_REPOSITORIES to fetch your repositories."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool Execution&lt;/strong&gt; → Executes GITHUB_LIST_REPOSITORIES with proper arguments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result&lt;/strong&gt; → "LLM Analysis: [reasoning] + GitHub Operation: [tool execution result]"&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Understanding the code
&lt;/h3&gt;

&lt;p&gt;At the core of everything here is the &lt;strong&gt;&lt;code&gt;MultiAgentOrchestrator&lt;/code&gt;&lt;/strong&gt; struct, which wires everything together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;MultiAgentOrchestrator&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;linear_agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;LinearAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;github_agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;GitHubAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;supabase_agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SupabaseAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each agent here resides in its own module, making it easy to plug in or swap out components. The &lt;strong&gt;LLM&lt;/strong&gt; is guided by a &lt;strong&gt;system prompt&lt;/strong&gt; that tells it exactly what tools are available and how to use them. Something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;You are a multi-agent orchestrator. You have access to these agents:
- Linear Agent: Project management and issue tracking
- GitHub Agent: Repository and code management
- Supabase Agent: Database operations and queries

Based on the user&lt;span class="s1"&gt;'s query, determine which agent(s) to use and provide a helpful response.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pretty clean, right? You get reasoning, tool usage, and a conversational reply, all in a single setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo of what I’ve built and how things work (High Level)
&lt;/h2&gt;

&lt;p&gt;Here’s what the interaction looks like:&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/mwOKSMo1hWw"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;The multi-agent system intelligently routes each query to the appropriate agent, then combines the LLM's conversational response with real data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This was a fun little project to work on, given the usual &lt;strong&gt;Python-heavy agent world&lt;/strong&gt;. &lt;strong&gt;Rust isn't traditionally the go-to&lt;/strong&gt; for these AI workflows, but it's surprisingly too good at handling real-world agent logic once you get past the initial obstructions. The type system gives you confidence, async works well enough, and once you have your tools in place, everything seems quite simple to plug.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not production-ready yet&lt;/strong&gt;, but as a weekend project and to learn things, I'd say it's totally worth trying to build things like this. Again, the complete source code is here: &lt;a href="https://github.com/rohittcodes/codepilot" rel="noopener noreferrer"&gt;rohittcodes/codepilot&lt;/a&gt;. Try it out and let me know what you come up with.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>mcp</category>
      <category>programming</category>
      <category>rust</category>
    </item>
    <item>
      <title>How I Used Claude to Create and Assign Issues in Linear</title>
      <dc:creator>Rohith Singh</dc:creator>
      <pubDate>Tue, 12 Aug 2025 17:19:14 +0000</pubDate>
      <link>https://dev.to/composiodev/how-i-used-claude-to-create-and-assign-issues-in-linear-4lgd</link>
      <guid>https://dev.to/composiodev/how-i-used-claude-to-create-and-assign-issues-in-linear-4lgd</guid>
      <description>&lt;p&gt;In my previous posts, I showed how I used Claude with Composio's MCP layer to skip dashboards and manage tools like Neon and Supabase from a Claude session window. I also shared how I automated my day-to-day Jira tasks using the same approach. So if you're interested, check out that post too..&lt;/p&gt;

&lt;p&gt;Linear and Jira both handle project Management, but Linear's focus is on fast, modern issue tracking, perfect for developers who want a smooth experience. Still, even in Linear, opening the UI every time you want to create a bug, assign tasks, or update statuses can get old fast.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F525n1e0ckeby7shhwzcq.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F525n1e0ckeby7shhwzcq.gif" alt="nah-nope.gif"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, I used the Linear MCP server from Composio, connected it to Claude Code, and now you can manage Linear projects just from your terminal, i.e., no UI, no endless clicking.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is MCP?
&lt;/h2&gt;

&lt;p&gt;This time, let’s briefly explain MCPs with a use-case lens:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Think of MCPs as a way to turn APIs into something that Claude can “understand” and “use”, like plugging tools into some AI Agent’s brain. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxr3db07cai19fq56wy14.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxr3db07cai19fq56wy14.png" alt="modelcontextprotocol.io_docs_learn_architecture.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want more background on it, check out my &lt;a href="https://composio.dev/blog/jira-mcp-server" rel="noopener noreferrer"&gt;Jira blog&lt;/a&gt; or Anthropic’s &lt;a href="https://modelcontextprotocol.io/introduction" rel="noopener noreferrer"&gt;MCP overview&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What can a Linear MCP Server do?
&lt;/h2&gt;

&lt;p&gt;Let’s say you’re in a flow, ideating or writing code, and you suddenly think:&lt;/p&gt;

&lt;p&gt;“I should create a bug ticket and assign it to someone in the frontend team.”&lt;/p&gt;

&lt;p&gt;With Linear MCP and Claude, you just type:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Create a bug in the Payments project called “Fix refund edge case crash” and assign it to &lt;a class="mentioned-user" href="https://dev.to/alex"&gt;@alex&lt;/a&gt;.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;… and it’s done.&lt;/p&gt;

&lt;p&gt;No switching tabs. No forms, no remembering project IDs.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you can do with Linear MCP:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Create issues using &lt;code&gt;LINEAR_CREATE_LINEAR_ISSUE&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Update issue status, title, priority&lt;/strong&gt; with &lt;code&gt;LINEAR_UPDATE_ISSUE&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Delete issues&lt;/strong&gt; when no longer relevant using &lt;code&gt;LINEAR_DELETE_LINEAR_ISSUE&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fetch issue details&lt;/strong&gt; on demand with &lt;code&gt;LINEAR_GET_LINEAR_ISSUE&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And there are a lot of tools you can use: Follow &lt;a href="https://docs.composio.dev/tools/linear" rel="noopener noreferrer"&gt;this&lt;/a&gt; docs page.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why use Composio for this?
&lt;/h2&gt;

&lt;p&gt;Let’s say you’re building a productivity AI or just want to let Claude manage your Linear workspace without building everything yourself. If you connect directly to Linear’s API or its MCP, you’d still need to handle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OAuth flows or personal access tokens.&lt;/li&gt;
&lt;li&gt;Managing sessions and tokens.&lt;/li&gt;
&lt;li&gt;Keeping everything updated with Linear’s API changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Composio handles all of that for you. It acts as an integration layer that hosts MCP specs, handles auth, so all you have to do is to pick Linear from Composio’s integrations list and start prompting.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we’ll be covering
&lt;/h2&gt;

&lt;p&gt;In this post, we’ll go through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What’s a Linear MCP and how it works&lt;/li&gt;
&lt;li&gt;How to set it up using Composio&lt;/li&gt;
&lt;li&gt;Using the MCP server with Claude Code in your terminal&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to set up the Linear MCP using Claude Code
&lt;/h2&gt;

&lt;p&gt;You can easily set up a Composio MCP in 2 ways:&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 1: Quick Setup via Composio MCP page
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Head over to the &lt;a href="https://mcp.composio.dev/linear" rel="noopener noreferrer"&gt;Composio MCP Page for Linear&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Switch to the Claude tab.&lt;/li&gt;
&lt;li&gt;Click Generate, then copy the generated command.&lt;/li&gt;
&lt;li&gt;Paste and run it in your terminal.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx @composio/mcp@latest setup &lt;span class="s2"&gt;"https://mcp.composio.dev/partner/composio/linear/mcp?customerId=[your-customer-id]&amp;amp;agent=claude"&lt;/span&gt; &lt;span class="s2"&gt;"linear-vbusm8-8"&lt;/span&gt; &lt;span class="nt"&gt;--client&lt;/span&gt; claude
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Copy the config to your local project:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cp&lt;/span&gt; ~/.config/claude/claude_desktop_config.json .mcp.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Start Claude, and ask it to authenticate you with Linear MCP. It’ll generate an Auth URL to authenticate and authorize your Client. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frfngrz2sn5p0gcym96il.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frfngrz2sn5p0gcym96il.png" alt="authorizing claude"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I could have saved a few tokens if I passed a correct prompt, i.e., to initiate connection using linear mcp..&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb2ofpkqc2mppxnlmyr23.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb2ofpkqc2mppxnlmyr23.png" alt="Auth page"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fak6z5qneu027pci32xvv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fak6z5qneu027pci32xvv.png" alt="Success Message"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 2: Use the Composio Dashboard
&lt;/h3&gt;

&lt;p&gt;If you want to set up scopes, test actions, or run more advanced flows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Head over to the Composio Dashboard&lt;/li&gt;
&lt;li&gt;Navigate to MCP Configs, then hit Create MCP Config.&lt;/li&gt;
&lt;li&gt;Give a name to the MCP config, pick Linear from the list of toolkits, and select how you want to handle authentication → Select the tools you want to use from the list.&lt;/li&gt;
&lt;li&gt;In the integration step, look for Linear in the MCP Configs page, and proceed by clicking Create MCP.&lt;/li&gt;
&lt;li&gt;Once that’s done, you’ll be prompted to connect your Linear account. A new tab will open where you can log in and grant the necessary permissions.&lt;/li&gt;
&lt;li&gt;After that, a modal will appear with a ready-to-run &lt;code&gt;npx&lt;/code&gt; command, copy and run it in your terminal to use the MCP with Claude code.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/t4D8l4ZBae8"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Once everything’s connected, test it right in the Playground in Composio first.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Create a bug &lt;span class="k"&gt;in &lt;/span&gt;the Growth project titled &lt;span class="s2"&gt;"Login button unresponsive"&lt;/span&gt;, and add &lt;span class="s2"&gt;"Users can't click the button on mobile."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Using the Linear MCP Server with Claude Code
&lt;/h2&gt;

&lt;p&gt;Now that it’s all set up, try prompting Claude with things like&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Create a bug in the Billing project with priority High.”&lt;/li&gt;
&lt;li&gt;“Assign this issue to Emily and label it urgent.”&lt;/li&gt;
&lt;li&gt;“List all tasks due this week.”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can run this from Claude Code, Cursor, Windsurf, or your own CLI wrapper using HTTP.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/l5nU9YHP8s8"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The blog post isn’t about Linear’s UI being bad (it’s super cool), but because sometimes I just want to think in tasks, not a typical dashboard. If you’re building your own AI workflows or just want a more natural way to manage issues, give the Linea MCP a try.&lt;/p&gt;

&lt;p&gt;The best part? Once you’ve set up one MCP, doing the same for other tools like Github, Supabase, or Notion feels like a simple 5-minute job.&lt;/p&gt;

</description>
      <category>linear</category>
      <category>mcp</category>
      <category>llm</category>
      <category>ai</category>
    </item>
    <item>
      <title>OpenAI GPT-5 vs. Claude Opus 4.1: A coding comparison</title>
      <dc:creator>Rohith Singh</dc:creator>
      <pubDate>Tue, 12 Aug 2025 05:09:44 +0000</pubDate>
      <link>https://dev.to/composiodev/openai-gpt-5-vs-claude-opus-41-a-coding-comparison-2mll</link>
      <guid>https://dev.to/composiodev/openai-gpt-5-vs-claude-opus-41-a-coding-comparison-2mll</guid>
      <description>&lt;p&gt;OpenAI just shipped &lt;a href="https://openai.com/gpt-5/" rel="noopener noreferrer"&gt;GPT-5&lt;/a&gt;. It’s built on top of the &lt;strong&gt;GPT&lt;/strong&gt; and &lt;strong&gt;O-series&lt;/strong&gt; reasoning models, aiming to be faster, smarter, and more efficient. I put it head‑to‑head with Anthropic’s Claude Opus 4.1 to see which one actually helps more with real dev work.&lt;/p&gt;

&lt;p&gt;All the generated code from this comparison can be found here: &lt;a href="https://github.com/rohittcodes/gpt-5-vs-opus-4-1" rel="noopener noreferrer"&gt;github.com/rohittcodes/gpt-5-vs-opus-4-1&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Don't have time? Here's what happened:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Algorithms:&lt;/strong&gt; GPT‑5 wins on speed and tokens (8K vs 79K)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web dev:&lt;/strong&gt; Opus 4.1 matched the Figma design better (900K vs 1.4M+ tokens)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Overall:&lt;/strong&gt; GPT‑5 is the better everyday dev partner (fast + cheaper). If design fidelity matters and budget is flexible, Opus 4.1 shines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost:&lt;/strong&gt; GPT‑5 (Thinking) ~$3.50 vs Opus 4.1 (Thinking, Max) $7.58 (~2.3x)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Claude Opus 4.1 comes with a 200K token context window. GPT-5 bumps this up to 400K tokens with 128K max output. Despite having double the context space, GPT-5 consistently uses fewer tokens for the same work, making it cheaper to run.&lt;/p&gt;

&lt;p&gt;SWE-bench results show GPT-5 is slightly ahead of Opus 4.1 on coding benchmarks, but benchmarks don't tell the whole story. That's why I tested them on real tasks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb8ofo3mb32roxs70nbv6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb8ofo3mb32roxs70nbv6.png" alt="SWE Benchmarks"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How I tested these models&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;I ran both models through identical challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Languages:&lt;/strong&gt; Java for algorithms, TypeScript/React for building web apps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tasks:&lt;/strong&gt; Figma design cloning via Figma MCP and LeetCode problems&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Environment:&lt;/strong&gt; Cursor IDE with Rube MCP integration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Measure:&lt;/strong&gt; Token usage, time taken, code quality, actual results&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both got the exact same prompts to keep things fair.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rube MCP - Universal MCP Server
&lt;/h2&gt;

&lt;p&gt;Rube MCP (by Composio) is the universal connection layer for MCP toolkits like Figma, Jira, GitHub, Linear, and more. Explore toolkits: &lt;a href="https://docs.composio.dev/toolkits/introduction" rel="noopener noreferrer"&gt;docs.composio.dev/toolkits/introduction&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;How to connect:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://rube.composio.dev/" rel="noopener noreferrer"&gt;rube.composio.dev&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Click “Add to Cursor”&lt;/li&gt;
&lt;li&gt;Install the MCP Server when prompted and enable it&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/bZv6aX5XNTI"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Coding Comparison
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1) &lt;strong&gt;Round 1: Figma design cloning&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;I picked a complex dashboard design from Figma Community and asked both models to recreate it using Next.js and TypeScript. Figma design: &lt;a href="https://www.figma.com/community/file/1339568644170883306" rel="noopener noreferrer"&gt;link&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fos1hji8h3kq4lapqnegd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fos1hji8h3kq4lapqnegd.png" alt="Figma Design"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Create a Figma design clone using the given Figma design as a reference: [FIGMA_URL]. Use Rube MCP's Figma toolkit for this task.
Try to make it as close as possible. Use Next.js with TypeScript. Include:
- Responsive design
- Proper component structure
- Styled-components or CSS modules
- Interactive elements
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;GPT‑5 results&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;GPT-5 delivered a working Next.js app in about 10 minutes using 906,485 tokens. The app functioned well, but the visual accuracy was disappointing. It captured the basic idea but missed tons of design details, colours, spacing, typography, all noticeably different from the original.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tokens:&lt;/strong&gt; 906,485&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time:&lt;/strong&gt; ~10 minutes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost:&lt;/strong&gt; Reasonable for the output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr0s9bna9lyysuftwyw02.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr0s9bna9lyysuftwyw02.png" alt="GPT-5 Output"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Opus 4.1 results&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Opus 4.1 burned through 1.4M+ tokens (55% more than GPT-5) and initially got stuck on Tailwind configuration, despite my specific request for styled-components. After I manually fixed the config issues, the result was stunning; the UI matched the Figma design almost perfectly. Way better visual fidelity than GPT-5.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tokens:&lt;/strong&gt; 1,400,000+ (~55% more than GPT‑5)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time:&lt;/strong&gt; Longer due to more iterations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjf33n0t1tswfbnzzsuic.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjf33n0t1tswfbnzzsuic.png" alt="Opus Output"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Opus 4.1 delivered far better visual fidelity, but at a higher token cost and with some manual setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  2) Algorithm challenge
&lt;/h3&gt;

&lt;p&gt;I threw the classic "Median of Two Sorted Arrays" LeetCode hard problem at both models. This tests mathematical reasoning and optimization skills with an &lt;code&gt;O(log(m+n))&lt;/code&gt; complexity requirement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;For the below problem description and the example test cases try to solve the problem in Java. Focus on edge cases as well as time complexity:

Given two sorted arrays nums1 and nums2 of size m and n respectively, return the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).

Example 1:
Input: nums1 = [1,3], nums2 = [2]
Output: 2.00000

Example 2:
Input: nums1 = [1,2], nums2 = [3,4]
Output: 2.50000

Template Code:
class Solution {
    public double findMedianSortedArrays(int[] nums1, int[] nums2) {

    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3buaacthoopug07dhmmh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3buaacthoopug07dhmmh.png" alt="Token usage"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPT‑5 results&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Straight to business. Used 8,253 tokens in 13 seconds and delivered a clean &lt;code&gt;O(log(min(m,n)))&lt;/code&gt; binary search solution. Proper edge case handling, optimal time complexity. Just works.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tokens:&lt;/strong&gt; 8,253&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time:&lt;/strong&gt; ~13s&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Claude Opus 4.1 results&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Much more thorough. Consumed 78,920 tokens across multiple reasoning steps (almost 10x more than GPT-5). Took a methodical approach with detailed explanations, comprehensive comments, and built-in test cases. Same algorithm, way more educational value.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tokens:&lt;/strong&gt; 78,920 (~10× more, across multiple reasoning steps)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time:&lt;/strong&gt; ~34s&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ny73krwnxqrbjf0xajh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ny73krwnxqrbjf0xajh.png" alt="Leetcode"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both solved it optimally. GPT‑5 used about 90% fewer tokens.&lt;/p&gt;

&lt;h2&gt;
  
  
  ML/Reasoning task (and cost reality)
&lt;/h2&gt;

&lt;p&gt;I planned a third, bigger test around ML and reasoning: building a churn prediction pipeline end‑to‑end. After seeing Opus 4.1 use 1.4M+ tokens on the web app, I skipped running it there due to cost. I did run GPT‑5.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Build a complete ML pipeline for predicting customer churn, including:
1. Data preprocessing and cleaning
2. Feature engineering
3. Model selection and training
4. Evaluation and metrics
5. Explain the reasoning behind each step in detail
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;GPT‑5 results&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tokens:&lt;/strong&gt; ~86,850&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time:&lt;/strong&gt; ~4-5 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;GPT‑5 produced a solid, working pipeline: clean preprocessing, sensible feature engineering; multiple models (Logistic Regression, Random Forest, optional XGBoost with randomized search); SMOTE for class balance, best‑model selection via ROC‑AUC, and thorough evaluation (accuracy, precision, recall, F1). The explanations were clear without being verbose.&lt;/p&gt;

&lt;h2&gt;
  
  
  What does it cost for the test (real numbers)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;GPT‑5 (Thinking): ~$3.50 total - Web app ~$2.58, Algorithm ~$0.03, ML ~$0.88. It wasn’t as expensive as compared to what Opus-4.1 cost.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Opus 4.1 (Thinking + Max mode on cursor): $7.58 total - Web app ~$7.15, Algorithm ~$0.43.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxfcskz30062b3jvmxpf7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxfcskz30062b3jvmxpf7.png" alt="Pricing comparison"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Both models use large context windows well, but they spend tokens differently, hence the big cost gap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPT‑5 strengths&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;~90% fewer tokens on algorithm tasks&lt;/li&gt;
&lt;li&gt;Faster and more practical day‑to‑day&lt;/li&gt;
&lt;li&gt;Cost‑effective for most work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Opus 4.1 strengths&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clear, step‑by‑step explanations&lt;/li&gt;
&lt;li&gt;Great for learning as you code&lt;/li&gt;
&lt;li&gt;Excellent design fidelity (very close to Figma)&lt;/li&gt;
&lt;li&gt;Deep analysis when you can afford it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My take? Use GPT‑5 for algorithms, prototypes, and most day‑to‑day work; it’s faster and cheaper. Choose Opus 4.1 when visual accuracy really matters (client‑facing UI, marketing pages) and you can budget more tokens. Practical flow: build the core with GPT‑5, then use Opus 4.1 to polish key screens.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>vibecoding</category>
      <category>development</category>
    </item>
  </channel>
</rss>
