<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: chepy</title>
    <description>The latest articles on DEV Community by chepy (@chepy).</description>
    <link>https://dev.to/chepy</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2489440%2Fcfa8ee1d-f22e-49b1-8002-de0eb7a5a8a4.png</url>
      <title>DEV Community: chepy</title>
      <link>https://dev.to/chepy</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/chepy"/>
    <language>en</language>
    <item>
      <title>The Hidden War of "AI Artifacts" — ChatGPT vs GitHub Copilot vs Claude vs Manus</title>
      <dc:creator>chepy</dc:creator>
      <pubDate>Wed, 19 Nov 2025 15:34:50 +0000</pubDate>
      <link>https://dev.to/chepy/the-hidden-war-of-ai-artifacts-chatgpt-vs-github-copilot-vs-claude-vs-manus-45eo</link>
      <guid>https://dev.to/chepy/the-hidden-war-of-ai-artifacts-chatgpt-vs-github-copilot-vs-claude-vs-manus-45eo</guid>
      <description>&lt;h2&gt;
  
  
  Why One Word Means Four Completely Different Things in 2025 AI UX
&lt;/h2&gt;

&lt;p&gt;If you think "Artifact" means the same thing across AI platforms… &lt;strong&gt;you're already outdated&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In 2025, "Artifact" has become the most overloaded UX term in the AI ecosystem. Understanding its divergent meanings isn’t just trivia—it’s a superpower for building agentic apps, devtools, and AI workflows that actually work with (not against) each platform’s philosophy.&lt;/p&gt;

&lt;p&gt;Today, we’ll break down the four incompatible definitions of "Artifact" across leading tools (plus a nod to its historical roots) — and why this single word is quietly shaping the future of AI agent workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  🧨 First: The "Artifact" Misunderstanding Problem
&lt;/h2&gt;

&lt;p&gt;Ask 5 AI developers "What is an Artifact?" and you’ll get 8 answers. Every major AI company redefined the term to fit its product goals.&lt;/p&gt;

&lt;p&gt;Let’s start with a TL;DR cheat sheet (save this for your next integration):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Meaning of "Artifact"&lt;/th&gt;
&lt;th&gt;Layer of the Stack&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ChatGPT&lt;/td&gt;
&lt;td&gt;UI panel for complex multimodal outputs&lt;/td&gt;
&lt;td&gt;UX/UI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude&lt;/td&gt;
&lt;td&gt;Persistent creative workspace (notebook/editor)&lt;/td&gt;
&lt;td&gt;UX / Workflow Flow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub Copilot Agent&lt;/td&gt;
&lt;td&gt;Final deliverable (PR, diff, patch)&lt;/td&gt;
&lt;td&gt;Outcome&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manus&lt;/td&gt;
&lt;td&gt;Tool-execution capsule (automation step)&lt;/td&gt;
&lt;td&gt;Execution Layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI Codex (Legacy)&lt;/td&gt;
&lt;td&gt;Raw model-generated code/file&lt;/td&gt;
&lt;td&gt;Raw Output&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Now let’s dive into the nuance—because these differences make or break your AI tool design.&lt;/p&gt;

&lt;h2&gt;
  
  
  🟦 1. ChatGPT Artifact: A Mini App Inside the Chat
&lt;/h2&gt;

&lt;p&gt;ChatGPT reimagined "Artifact" as &lt;strong&gt;a persistent visual panel for complex results&lt;/strong&gt;—not just fancy code blocks, but actual interactive UI surfaces.&lt;/p&gt;

&lt;p&gt;Think:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;HTML previews&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;React component renderings&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dynamic charts&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;File previews (PDFs, CSVs)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tool output logs (e.g., "Browse the web" results)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Multi-step workflow visualizations&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;💡 Core idea: &lt;em&gt;"Not a message. Not a file. A little app window."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Why this matters: It turns ChatGPT from a "text chat" into a multimodal IDE. The platform’s philosophy here is clear: &lt;strong&gt;Prioritize clear, interactive results—even if it breaks the traditional chat paradigm&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  🟧 2. Claude Artifact: The Open-Ended Workspace
&lt;/h2&gt;

&lt;p&gt;Claude takes "Artifact" in a creative direction: &lt;strong&gt;a flexible workspace that acts as a notebook, editor, or canvas&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is the most open-ended definition of the four. A Claude Artifact can be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A long-form document (e.g., a blog post draft)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A project plan with AI co-edits&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A design system sketch&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A running code sandbox&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A shared knowledge base&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A multi-page editor for collaborative work&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;💡 Core idea: &lt;em&gt;"Hold your evolving work here. Let the AI co-edit it with you."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The contrast with ChatGPT is stark: ChatGPT leans into "structured, polished UI"; Claude leans into "freeform, iterative creation." Both work—just for different use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  🟩 3. GitHub Copilot Agent Artifact: The Final Deliverable
&lt;/h2&gt;

&lt;p&gt;This is where confusion hits hardest. For GitHub Copilot Agent, &lt;strong&gt;"Artifact" = the completed output at the end of a task&lt;/strong&gt;—nothing more, nothing less.&lt;/p&gt;

&lt;p&gt;Examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Pull Requests (PRs)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Code diffs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Patch files&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Updated project files&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Test result bundles&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Code transformations (e.g., "refactor this function")&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🚨 Critical distinction: Copilot separates "process" from "product." Tool execution details (like what the agent did step-by-step) are called &lt;em&gt;Actions&lt;/em&gt;, &lt;em&gt;Action Traces&lt;/em&gt;, or &lt;em&gt;Execution Plans&lt;/em&gt;—&lt;strong&gt;only the end result is an Artifact&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;💡 Core idea: &lt;em&gt;"If you can merge it, ship it, or download it—it’s an Artifact."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This aligns with Copilot’s identity as a "developer automation engine": It’s all about delivering tangible, deployable outcomes.&lt;/p&gt;

&lt;h2&gt;
  
  
  🟥 4. Manus Artifact: The Execution Snapshot
&lt;/h2&gt;

&lt;p&gt;Manus takes the most developer-centric approach: &lt;strong&gt;a container for tool-execution output within a workflow run&lt;/strong&gt;—think of it as atomic evidence of what the agent actually did.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Browser tool results (e.g., "scraped this webpage")&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;API call responses (JSON, XML)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;HTML screenshots from a headless browser&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Intermediate data dumps in an agent chain&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Logs from a database query&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These Artifacts become building blocks for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Automated agent workflows&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Complex agent graphs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Reproducible pipelines (critical for debugging)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;💡 Core idea: &lt;em&gt;"A snapshot of one tool step in an automation."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It’s not a final PR (Copilot), a UI window (ChatGPT), or a workspace (Claude)—it’s the raw material of agent execution.&lt;/p&gt;

&lt;h2&gt;
  
  
  🟫 5. OpenAI Codex (Legacy): The Original "Artifact"
&lt;/h2&gt;

&lt;p&gt;Before fancy UX systems, the earliest "Artifact" (from OpenAI Codex) was simple: &lt;strong&gt;whatever code the model generated&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;No UI, no workflow, no structure—just raw completions. Codex walked so the modern definitions could run.&lt;/p&gt;

&lt;h2&gt;
  
  
  🧩 Why These Differences Exist (It’s Not Accidental)
&lt;/h2&gt;

&lt;p&gt;Every platform’s "Artifact" definition maps directly to its core identity. This is why the term diverged so drastically:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Product&lt;/th&gt;
&lt;th&gt;Core Identity&lt;/th&gt;
&lt;th&gt;"Artifact" = What Serves That Identity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ChatGPT&lt;/td&gt;
&lt;td&gt;Multimodal AI UI/IDE&lt;/td&gt;
&lt;td&gt;UI panel for clear results&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude&lt;/td&gt;
&lt;td&gt;Creative thought partner&lt;/td&gt;
&lt;td&gt;Flexible workspace for iteration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub Copilot Agent&lt;/td&gt;
&lt;td&gt;Developer automation engine&lt;/td&gt;
&lt;td&gt;Final deployable deliverable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manus&lt;/td&gt;
&lt;td&gt;Agent workflow orchestrator&lt;/td&gt;
&lt;td&gt;Execution snapshot for pipelines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex&lt;/td&gt;
&lt;td&gt;Code generator model&lt;/td&gt;
&lt;td&gt;Raw code output&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;They’re solving different problems—so "Artifact" takes different shapes.&lt;/p&gt;

&lt;h2&gt;
  
  
  ⚔️ The Hidden UX War Behind "Artifact"
&lt;/h2&gt;

&lt;p&gt;The divergent "Artifact" definitions reveal a bigger battle: &lt;strong&gt;Who will own AI-native workflows?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;ChatGPT says: &lt;em&gt;"Put everything in a panel."&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Claude says: &lt;em&gt;"Put everything in a workspace."&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;GitHub says: &lt;em&gt;"Put everything in a PR."&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Manus says: &lt;em&gt;"Put everything in a tool graph."&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None are wrong—they’re just fighting for different parts of the AI stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  🔮 My 2026 Prediction: Coexistence, Not Replacement
&lt;/h2&gt;

&lt;p&gt;The industry won’t pick one "Artifact" definition. Instead, we’ll standardize around four clear mental models, each serving a distinct purpose:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;UI Artifact (ChatGPT)&lt;/strong&gt;: For presentation, visualization, and debugging.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Workspace Artifact (Claude)&lt;/strong&gt;: For creation, iteration, and co-editing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Deliverable Artifact (Copilot)&lt;/strong&gt;: For engineering outputs (PRs, code).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Execution Artifact (Manus)&lt;/strong&gt;: For agent pipelines and reproducibility.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The winning tools will be those that combine all four seamlessly—e.g., a workspace (Claude) that feeds into a deliverable (Copilot) with execution logs (Manus) visualized in a UI panel (ChatGPT).&lt;/p&gt;

&lt;h2&gt;
  
  
  ⛳ Final Thought for Developers
&lt;/h2&gt;

&lt;p&gt;The next time someone says, &lt;em&gt;"We need to support Artifacts,"&lt;/em&gt; stop and ask:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Which version?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;ChatGPT? Claude? Copilot? Manus?&lt;/p&gt;

&lt;p&gt;This one word is no longer universal—it’s a map of the AI ecosystem’s divergent philosophies. Understanding that map is how you build world-class agent UX in 2025.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>cursor</category>
    </item>
    <item>
      <title>GAIA Super Agent SDK: Build GAIA-Benchmark-Ready Super Agents in Seconds, Not Weeks</title>
      <dc:creator>chepy</dc:creator>
      <pubDate>Mon, 17 Nov 2025 17:11:26 +0000</pubDate>
      <link>https://dev.to/chepy/gaia-super-agent-sdk-build-gaia-benchmark-ready-super-agents-in-seconds-not-weeks-2h4l</link>
      <guid>https://dev.to/chepy/gaia-super-agent-sdk-build-gaia-benchmark-ready-super-agents-in-seconds-not-weeks-2h4l</guid>
      <description>&lt;p&gt;Most "AI agent frameworks" look cool in diagrams, but the moment you try to run a serious benchmark like &lt;a href="https://arxiv.org/abs/2311.12983" rel="noopener noreferrer"&gt;&lt;strong&gt;GAIA&lt;/strong&gt;&lt;/a&gt;, you realize how much glue work is still on you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;wiring 10+ external APIs&lt;/li&gt;
&lt;li&gt;writing tool wrappers by hand&lt;/li&gt;
&lt;li&gt;juggling browser automation, search, sandbox, memory…&lt;/li&gt;
&lt;li&gt;maintaining your own benchmark runner &amp;amp; result logger&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's exactly the pain point &lt;strong&gt;GAIA Super Agent SDK&lt;/strong&gt; tries to remove.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Repo: &lt;a href="https://github.com/gaia-agent/gaia-agent" rel="noopener noreferrer"&gt;https://github.com/gaia-agent/gaia-agent&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This post walks through what the SDK actually gives you, how it's structured, and how you can use it today for both &lt;strong&gt;production agents&lt;/strong&gt; and &lt;strong&gt;GAIA Benchmark&lt;/strong&gt; runs.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is GAIA Super Agent SDK?
&lt;/h2&gt;

&lt;p&gt;At its core, this repo is a &lt;strong&gt;TypeScript / Node.js SDK&lt;/strong&gt; that ships a &lt;strong&gt;pre-configured "Super Agent"&lt;/strong&gt; built on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AI SDK v6 ToolLoopAgent&lt;/strong&gt; (the new &lt;code&gt;ai&lt;/code&gt; SDK tool-based agent loop)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ToolSDK.ai&lt;/strong&gt; integration for pulling in more tools&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;ReAct-style reasoning + acting&lt;/strong&gt; pattern with planning and verification baked in&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The project description is very explicit:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"GAIA-benchmark-ready super agent built on AI SDK v6 ToolLoopAgent"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So instead of giving you a generic agent playground, this SDK is tuned for one clear mission:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Help you build agents that can seriously compete on the GAIA Benchmark, while still being usable as real production assistants.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;p&gt;The README is quite dense, so here's the feature set translated into plain English.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Zero-config, GAIA-ready agent
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;A "Super Agent" that's immediately usable for GAIA tasks&lt;/li&gt;
&lt;li&gt;You don't start from a graph editor or a prompt; you start from a &lt;strong&gt;ready-to-run agent instance&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is very close to "install → add keys → run benchmark".&lt;/p&gt;




&lt;h3&gt;
  
  
  2. ReAct + Planning + Verification
&lt;/h3&gt;

&lt;p&gt;The agent doesn't just call tools randomly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uses &lt;strong&gt;ReAct-style reasoning&lt;/strong&gt;: think → act → observe → think again&lt;/li&gt;
&lt;li&gt;Has a &lt;strong&gt;planning layer&lt;/strong&gt; for multi-step tasks&lt;/li&gt;
&lt;li&gt;Includes &lt;strong&gt;verification&lt;/strong&gt; to sanity-check final answers before returning them&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This combo is important for GAIA, where tasks often require multiple hops across search, browser, files, and code.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. 18+ Built-in Tools with Official SDKs
&lt;/h3&gt;

&lt;p&gt;Tools are grouped into categories in the README:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Core:&lt;/strong&gt; calculator, HTTP requests&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Planning:&lt;/strong&gt; planner, verifier&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Search:&lt;/strong&gt; Tavily search, Exa search &amp;amp; content fetch&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sandbox:&lt;/strong&gt; code execution via E2B or Sandock&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Browser:&lt;/strong&gt; Steel, BrowserUse, or AWS browser agent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory:&lt;/strong&gt; Mem0 or AWS AgentCore&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You get a serious "batteries-included" toolkit that covers most GAIA capabilities (search, browser, code, files, memory) without you having to wire everything manually.&lt;/p&gt;




&lt;h3&gt;
  
  
  4. Swappable Providers (One-Line Switch)
&lt;/h3&gt;

&lt;p&gt;A nice design choice: &lt;strong&gt;providers are swappable&lt;/strong&gt;, and the README gives both code and env-var ways to do it.&lt;/p&gt;

&lt;p&gt;For example (simplified):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createGaiaAgent&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@gaia-agent/sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createGaiaAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;providers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;search&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;exa&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;// instead of Tavily&lt;/span&gt;
    &lt;span class="na"&gt;sandbox&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sandock&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// instead of E2B&lt;/span&gt;
    &lt;span class="na"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;browseruse&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or via environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;GAIA_AGENT_SEARCH_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;exa
&lt;span class="nv"&gt;GAIA_AGENT_SANDBOX_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sandock
&lt;span class="nv"&gt;GAIA_AGENT_BROWSER_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;browseruse
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes it very easy to experiment: e.g. "What if I swap Tavily to Exa for search quality?" without touching agent logic.&lt;/p&gt;




&lt;h3&gt;
  
  
  5. Tight GAIA Benchmark Integration
&lt;/h3&gt;

&lt;p&gt;This is the part that most other agent repos don't have.&lt;/p&gt;

&lt;p&gt;The SDK ships with a benchmark module and a set of pnpm scripts for running GAIA tasks with good ergonomics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pnpm benchmark            &lt;span class="c"&gt;# run validation set&lt;/span&gt;
pnpm benchmark &lt;span class="nt"&gt;--limit&lt;/span&gt; 10 &lt;span class="c"&gt;# smoke test with 10 tasks&lt;/span&gt;
pnpm benchmark:files      &lt;span class="c"&gt;# only file-based tasks&lt;/span&gt;
pnpm benchmark:code       &lt;span class="c"&gt;# only code-execution tasks&lt;/span&gt;
pnpm benchmark:search     &lt;span class="c"&gt;# search-heavy tasks&lt;/span&gt;
pnpm benchmark:browser    &lt;span class="c"&gt;# browser automation tasks&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's even a &lt;code&gt;--stream&lt;/code&gt; mode to watch the agent "think" in real time while it solves GAIA tasks.&lt;/p&gt;




&lt;h3&gt;
  
  
  6. "Wrong Answers" Collection &amp;amp; Retry Loop
&lt;/h3&gt;

&lt;p&gt;One clever feature I really like: the wrong-answers pipeline.&lt;/p&gt;

&lt;p&gt;Workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run benchmarks → wrong answers are automatically logged to &lt;code&gt;benchmark-results/wrong-answers.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Inspect failures&lt;/li&gt;
&lt;li&gt;Retry only the failed tasks with:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pnpm benchmark:wrong &lt;span class="nt"&gt;--verbose&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Keep iterating until that file is empty ("No wrong answers!")&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This turns GAIA from a one-shot evaluation into an iterative training ground for your agent architecture and prompts.&lt;/p&gt;




&lt;h3&gt;
  
  
  7. Rich Benchmark Result Schema
&lt;/h3&gt;

&lt;p&gt;Benchmark results capture more than just "correct / incorrect":&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;task id, question, level&lt;/li&gt;
&lt;li&gt;which tools were used&lt;/li&gt;
&lt;li&gt;duration&lt;/li&gt;
&lt;li&gt;number of steps / tool calls&lt;/li&gt;
&lt;li&gt;per-step details&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So you can analyze, for example:&lt;/p&gt;

&lt;p&gt;"Where does my agent waste time?"&lt;br&gt;
"Which tools are over/under-used?"&lt;br&gt;
"Why does it fail certain levels or task types?"&lt;/p&gt;


&lt;h3&gt;
  
  
  8. TypeScript-First, Tree-Shaking-Friendly
&lt;/h3&gt;

&lt;p&gt;The SDK is written in TypeScript, exports ESM modules, and is designed to be tree-shakable.&lt;/p&gt;

&lt;p&gt;This matters if you want to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ship it into a Next.js / Remix / edge environment&lt;/li&gt;
&lt;li&gt;avoid bundling tools you don't use&lt;/li&gt;
&lt;li&gt;keep everything typed end-to-end&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Quick Start (From the README, With Commentary)
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Installation (npm):
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @gaia-agent/sdk ai @ai-sdk/openai zod
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Basic usage looks like this:
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createGaiaAgent&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@gaia-agent/sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createGaiaAgent&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// reads config from env&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Calculate 15 * 23 and search for the latest AI papers&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Environment variables are used to wire your providers, e.g.:
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;your-openai-api-key&amp;gt;
&lt;span class="nv"&gt;TAVILY_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;your-tavily-api-key&amp;gt;      &lt;span class="c"&gt;# search&lt;/span&gt;
&lt;span class="nv"&gt;E2B_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;your-e2b-api-key&amp;gt;            &lt;span class="c"&gt;# sandbox&lt;/span&gt;
&lt;span class="nv"&gt;STEEL_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;your-steel-api-key&amp;gt;        &lt;span class="c"&gt;# browser&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;There's also a dedicated Environment Variables Guide linked from the README if you want more combinations (Mem0, Exa, Sandock, BrowserUse, AWS AgentCore, etc.).&lt;/p&gt;


&lt;h2&gt;
  
  
  Extending the Agent (Custom Tools &amp;amp; ToolSDK)
&lt;/h2&gt;

&lt;p&gt;The SDK doesn't lock you into its default toolset.&lt;/p&gt;
&lt;h3&gt;
  
  
  Custom tools
&lt;/h3&gt;

&lt;p&gt;You can grab the default tool set and add your own:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createGaiaAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;getDefaultTools&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@gaia-agent/sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;tool&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ai&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;zod&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createGaiaAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nf"&gt;getDefaultTools&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="na"&gt;weatherTool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Get weather&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
      &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;city&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// your own logic here&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;temp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cloudy&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  ToolSDK.ai ecosystem
&lt;/h3&gt;

&lt;p&gt;The README also shows how to plug into ToolSDK.ai, so you can pull tools from its packages and expose them as AI SDK tools inside GAIA Agent.&lt;/p&gt;

&lt;p&gt;That essentially turns this SDK into a hub: GAIA agent loop on top, tools from official providers + ToolSDK ecosystem underneath.&lt;/p&gt;




&lt;h2&gt;
  
  
  Docs &amp;amp; Developer Experience
&lt;/h2&gt;

&lt;p&gt;The repo already ships with a pretty rich docs structure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Quick Start Guide&lt;/li&gt;
&lt;li&gt;ReAct + Planning Guide&lt;/li&gt;
&lt;li&gt;Reflection Guide&lt;/li&gt;
&lt;li&gt;Environment Variables&lt;/li&gt;
&lt;li&gt;GAIA Benchmark setup &amp;amp; tips&lt;/li&gt;
&lt;li&gt;Improving GAIA scores&lt;/li&gt;
&lt;li&gt;Provider comparison&lt;/li&gt;
&lt;li&gt;Testing guide (Vitest)&lt;/li&gt;
&lt;li&gt;Advanced usage and API reference&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There's also an automated NPM publish workflow: merge to main → tests → version bump → publish → changelog. So the package on npm should stay relatively aligned with main.&lt;/p&gt;

&lt;p&gt;License: Apache 2.0.&lt;/p&gt;




&lt;h2&gt;
  
  
  When Should You Use GAIA Super Agent SDK?
&lt;/h2&gt;

&lt;p&gt;This project makes the most sense if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You want a serious GAIA benchmark agent without building everything from scratch&lt;/li&gt;
&lt;li&gt;You want a production-grade multi-tool assistant with browser + search + sandbox already wired&lt;/li&gt;
&lt;li&gt;You like the AI SDK v6 ecosystem and want an opinionated "super agent" built on top of its ToolLoopAgent&lt;/li&gt;
&lt;li&gt;You want to iterate on prompts / tools / providers rather than infra&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're just experimenting with agents for the first time, this might actually simplify the path: you get a working system on day 1, then you peel layers and customize from there.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;I really like that this repo is not "yet another framework," but a concrete, GAIA-oriented super agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Batteries-included tools&lt;/li&gt;
&lt;li&gt;Strong defaults&lt;/li&gt;
&lt;li&gt;Real benchmark runner&lt;/li&gt;
&lt;li&gt;Wrong-answer analysis loop&lt;/li&gt;
&lt;li&gt;Provider swapping via one line or env vars&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're playing with the GAIA Benchmark or building your own "do-anything" assistant with serious tooling, it's absolutely worth a try.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/gaia-agent/gaia-agent" rel="noopener noreferrer"&gt;https://github.com/gaia-agent/gaia-agent&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you ship something cool with it (e.g. a leaderboard entry, a product demo, or internal tooling), definitely tag the project or share a write-up — the GAIA agent ecosystem is still very young and fast-moving.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>llm</category>
      <category>benchmarking</category>
    </item>
  </channel>
</rss>
