Gabriel Anhaia

Posted on Apr 26

Copilot Agent Mode Is Both a Threat and a Gift to AI Coding

#ai #agents #githubcopilot #programming

Book: AI Agents Pocket Guide
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

On April 1, 2026, GitHub flipped a switch most teams missed. The Copilot cloud agent (formerly the coding agent) stopped being a pull-request-only feature. It can now work on a branch without opening a PR, research before it edits, and run multi-step plans across files. Three weeks later, on April 24, inline agent mode shipped to JetBrains in public preview, putting the same loop directly inside the editor's inline-chat surface.

That second one is the move. Agent mode used to live behind a chat panel or a cloud runner. It now lives next to the cursor, in the file you have open, with permission to edit other files and run terminal commands. It is closer to where you work than Cursor's Composer, and an order of magnitude closer to where you work than Claude Code or any other CLI agent.

This is good for everyone who writes code with AI. It is also a problem.

What agent mode actually does now

Stripped to mechanics, the loop is this. You ask. The agent plans. It picks files to read. It edits. It runs your test command. If the tests fail, it iterates. If they pass, it stops. If it gets stuck on a tool boundary it cannot cross (say, an external API it does not have credentials for), it asks.

GitHub now ships three flavors of this:

Inline agent mode — runs inside the inline-chat box in JetBrains and VS Code. Smallest scope. Sees the file you have open and the files it picks. Closest to the cursor.
Agent mode in chat panel — the older one. Multi-file, multi-step, but you switch context to a side panel.
Cloud agent — runs on GitHub's infrastructure, on a branch, no IDE attached. Can be kicked off from an issue, a PR comment, or the dashboard. This is the one that competes most directly with autonomous CLI agents.

The cloud agent now has a model picker, self-review, built-in security scanning, and CLI handoff. That last one matters. CLI handoff means the cloud agent can finish a task and then drop you back to your terminal with the work-in-progress branch checked out — the boundary between "Copilot did this" and "I am doing this" is now a single command.

What it competes with

Comparisons published in early 2026 put the three roughly here:

Cursor — strongest at multi-file editing inside an IDE. Per the comparison above, Composer 2 is reported to handle cross-file refactors better than Copilot's agent mode in head-to-heads. Per-task model swaps. Standalone IDE means you give up VS Code extensions.
Claude Code — strongest at autonomous, multi-step work that crosses tool boundaries. Lives in your terminal, runs whatever your shell runs. Highest capability ceiling for "go fix the build, then update the docs, then open a PR."
GitHub Copilot — strongest at being everywhere. According to tech-insider.org's 2026 comparison (SWE-bench Verified, early 2026), Copilot reports a 56% solve rate vs Cursor's 52%. Native in VS Code, JetBrains, Neovim, Xcode. The only one with first-party access to your repo's full GitHub context: issues, PRs, Actions, Code Search.

The third bullet is the part the others cannot copy.

The IDE agent / CLI agent split

The reason agent mode in an IDE feels different from agent mode in a terminal is not UI. It is context.

An IDE-attached agent sees, for free:

The file you have open and the cursor position
Your current selection
The language server's symbols, definitions, references, diagnostics
Your unsaved buffer state
Which tab you switched from
Recent edits in the workspace
The lint warnings the language server is showing right now

A CLI agent sees, for free:

Whatever you cat or grep
The output of commands it runs
Your environment variables
The exit codes from your shell

Both lists are useful. They are not the same list.

A repo task that is natural for an in-IDE agent looks like this:

Refactor the function under my cursor into a hook,
move the side-effect to useEffect, update the three
call sites, and fix the types so the build is green.

The agent reads your selection, walks the AST via the language server, finds the call sites without grepping, and validates the fix against the type-checker output the IDE is already running. A CLI agent can do this — but it does it by re-deriving information the IDE already has, which is slower and more fragile.

A repo task that is natural for a terminal agent looks like this:

The Postgres migration in main is failing on staging.
Tail the deploy logs, find the offending migration,
roll it back via the cli, regenerate the schema diff,
and open a PR with a corrected migration plus a test
that would have caught this.

The agent runs kubectl logs, parses the output, runs flyway, runs pg_dump, opens a shell, runs the test command, opens a PR. The IDE has nothing to contribute here. The work happens in tools the IDE does not own.

The honest model for 2026 is two agents in your stack: one near the cursor, one near the shell. Keep the overlap small.

The gift half

Copilot's push is a gift to the rest of the ecosystem for one reason. It applies pressure.

Before agent mode shipped broadly, every other AI coding tool could position itself as "the smart one." Cursor's pitch was "real agents, unlike Copilot." Claude Code's pitch was "real autonomy, unlike Copilot." The pitch worked because Copilot was a fancy autocomplete with a chat sidebar.

Copilot agent mode is now a real agent: multi-step, tool-using, self-correcting, sitting inside the editor most developers already pay for. The others have to ship something Copilot cannot. Cursor doubled down on Composer 2 and per-task model swaps. Anthropic shipped Claude Code's CLI-handoff and tightened the terminal-agent loop. Reading those Q1 2026 releases (Composer 2, Claude Code CLI-handoff, JetBrains Junie's autonomous-mode push) as independent of Copilot's pressure is hard.

For a developer choosing a stack, this is straightforwardly good. You get faster iteration, more model choice, lower prices on the entry tier, and three serious tools that have to keep being serious.

The threat half

The threat is that one of those tools sits inside the platform that owns your repo, your CI, your issues, and your pull requests.

GitHub Copilot's agent has access to context the others can match only with permission grants the user has to remember to give. Your private repo. Your private issues. Your security advisories. Your Actions logs. The history of who reviewed what. Your Code Search index across hundreds of repos. None of that is automatically available to Cursor, Claude Code, or anything else, because none of those are GitHub.

The agent that is wired into the platform will always have the cheapest, deepest read-path on your work. The Copilot for Jira rollout suggests that integration is maturing fast. In my read, the gap between "the platform's agent" and "an agent the platform tolerates" is going to widen.

This is not new. Slack's bots have always had context Telegram's bots cannot reach. Google's calendar agents see things Apple's cannot. The novel part is how much of a developer's day-to-day work now flows through one platform's agent surface, and how much that platform stands to gain by privileging its own tooling on top of that surface.

A reasonable defense is mundane. Build your workflow so the agent that owns the IDE is replaceable. Keep your prompts and context-loading rules version-controlled, not buried in a vendor's setting screen. Pick tools that can run from a terminal as well as from an IDE button. Treat agent permissions the way you treat OAuth scopes: minimum-necessary, audited, revocable.

A small concrete example, both directions

Same task, twice. "Add structured logging to every public function in payments/ and update the tests."

In-IDE agent path:

Open payments/__init__.py. Run "Find all references"
to discover public exports. For each function, insert
a structured-log call at entry and exit. Run pytest -k
payments. Iterate on failing tests. Surface diff in
review panel.

The IDE-native moves: "find all references" through the language server, the diff review panel, the integrated test runner. The agent does not need to invent any of these — it calls them.

Terminal agent path:

git grep -l 'def ' payments/. Parse exports with ast.
For each, edit file via patch tool. Run pytest -k
payments in the workspace. Iterate. Open PR via gh cli
with a description summarizing the diff.

Same outcome, different toolset. The terminal agent does not need the language server because it has grep, ast, and a willingness to be slower. It also does not need an IDE to be running, which is what makes it the better fit for cron-driven, CI-driven, or remote work where no editor is open.

The mistake is picking one and forcing it into the other's job. Refactor-under-cursor is an IDE job. Production-incident triage is a terminal job. Treating Copilot agent mode as a Claude Code replacement, or vice versa, leaves both halves of your work worse than they need to be.

Where this leaves you in 2026

Pick the IDE agent that fits your editor. For most teams that is now Copilot — it is in everyone's IDE, the tier price is low, and agent mode is no longer behind. If your team needs aggressive multi-file refactors, Cursor still pulls ahead at the cost of a different IDE. If you want maximum model flexibility per-task, Cursor still wins.

Pick the CLI agent that fits your shell. For most teams in 2026 that remains Claude Code or one of its peers. The terminal agent is the one that handles the work the IDE cannot see — deploys, migrations, infra changes, multi-repo refactors. Do not expect the IDE agent to grow into that role; it is being optimized in a different direction.

The platform's agent is the platform's agent — use it for what it is good at, but keep enough of your workflow tool-agnostic that you can move when the integration tax goes up.

If this was useful

The patterns in the AI Agents Pocket Guide (tool boundaries, context scoping, hand-off between agents, when to give an agent autonomy and when to hold the leash) are what keep a multi-agent setup like this from collapsing into "whichever one I clicked first." Worth a read if your team is running more than one agent and the seams are starting to show.