DEV Community

Damien Gallagher
Damien Gallagher

Posted on • Originally published at buildrlab.com

How VS Code Copilot Chat Premium Features Leak into Subagents (and Why It Matters)

How VS Code Copilot Chat Premium Features Leak into Subagents (and Why It Matters)

A surprising Copilot Chat vulnerability (or, depending on your perspective, a billing logic bug) popped up this week: you can route “premium” model usage through subagents while being billed as if you were using a cheaper/free model.

It’s documented publicly in a VS Code issue titled “Billing can be bypassed using a combination of subagents with an agent definition, resulting in unlimited free premium requests” (microsoft/vscode#292452) and discussed on Hacker News (HN item 46936105).

This post breaks down what’s going on, why it matters beyond a single product, and what we (and anyone building agent systems) should do about it.


The core bug: metering is attached to the wrong boundary

Copilot’s “premium requests” model is supposed to make cost predictable: you pick a model tier, and requests to more expensive models consume more premium request budget.

The issue report shows a path where:

  1. A user starts a chat on a “free” model (e.g. a cheaper model included in the plan).
  2. The user (or an agent prompt) invokes a subagent via tooling.
  3. The subagent runs with an agent definition that specifies a premium model.
  4. The system executes the premium model work, but meters the interaction as if it stayed on the initial (free) model.

In other words: model execution and model billing are decoupled, and the billing logic is derived from the parent session/model selection, not from the actual model used by downstream subagents/tool paths.

The report also suggests a second class of problems: “tool calls” / agent loops where extensive work happens “inside” tool invocations and doesn’t increment request counters in the expected way.

Why subagents are the perfect hiding place

Agent systems often have three layers:

  • Orchestrator / planner: decides what to do next
  • Tools: e.g. file operations, web fetch, code execution
  • Subagents / specialists: spawn a new model instance with a different system prompt, different model, different context window, etc.

Subagents are incredibly useful. They’re also a natural boundary where product teams sometimes treat execution as “internal plumbing,” and — crucially — where cost attribution gets tricky.

If you charge for “user messages,” but subagents can do meaningful premium work without generating additional “user messages,” you’ve created a gap.


How the bypass works (conceptually, not as a step-by-step exploit)

The public issue includes reproduction instructions and sample agent files. I won’t repeat the full procedure here; instead, here’s the conceptual model you can apply to any multi-agent system:

The ingredients

  • A parent interaction that’s billed at Tier A (free/cheap)
  • A spawn mechanism (subagent / tool / function) that can run Tier B (premium)
  • A server-side enforcement gap where:
    • the execution honors Tier B
    • the metering still attributes everything to Tier A

The critical design mistake

Billing is derived from the initiating model (or the UI-selected model), not from the executed model(s).

Once you allow “agent-defined paths” (agent files / prompt frontmatter / tool schemas) to select or override models, you have to treat those paths as first-class billable actions.

If you don’t, users can deliberately shape a conversation so that the “billable” surface looks cheap while the “execution” surface does expensive work.

The subtle point: this isn’t only about money

It’s tempting to call this “just a billing bypass.” But it’s also an example of a broader class of failures:

  • Policy bypass: safety controls applied at one layer don’t hold at another
  • Quota bypass: rate limits apply to the wrong primitive
  • Audit gaps: the system log shows Tier A, but Tier B was actually used

Whenever control and enforcement happen on different sides of a boundary (client vs server, orchestrator vs tool runner, parent vs child), you get these kinds of problems.


Why this matters for LLM-based tooling: metering is a security boundary now

In 2024, metering was mostly finance.

In 2026, metering has become part of the security model.

Why? Because agentic systems convert “usage” into real-world side effects:

  • expensive inference cost
  • code changes
  • API calls to third-party services
  • data access
  • background tasks that keep running

If a system can be tricked into doing premium work “for free,” it can also be tricked into doing more work than intended, which becomes a reliability and abuse problem.

1) Predictable cost is necessary for trust

Teams adopt Copilot-style products because they’re predictable compared to raw APIs. The “premium request” idea is basically a fixed-price abstraction.

If that abstraction breaks, the product becomes:

  • harder to budget for providers
  • easier to abuse by users
  • harder to reason about operationally

2) Model tier selection is often tied to safety posture

Different models and modes often have different:

  • tool access
  • context limits
  • content policy tuning
  • “refusal” behavior

If a cheaper model can spawn a premium model without being billed or properly attributed, the same path can be used to bypass other policies that were assumed to be attached to the top-level choice.

3) “Tool calls are free” is a footgun

The issue report highlights a pattern many platforms share: “tool calls” are treated as internal actions rather than billable compute.

But tool calls are often the mechanism by which the model:

  • does more thinking (via nested calls)
  • does more reading/writing (files, network)
  • does more execution (code runners)

If you do not meter tool usage in some way — or at least cap it by policy — you’ve created an incentive-compatible way to do a lot of work behind a single billable interaction.


What this means for BuildrLab’s multi-agent architecture

We build multi-agent workflows all the time: planners, code-gen agents, QA agents, doc agents, research agents.

So the first question we should ask is blunt:

Are we vulnerable to the same class of bug?

The risk pattern (applies to us)

We’re vulnerable if all of the following are true:

  1. The user (or upstream agent) initiates a request in a “cheap” mode.
  2. That request can spawn subagents with a different model tier.
  3. Our billing / metering / quota enforcement is anchored to the parent request.
  4. The server does not validate “who is allowed to run which model” at execution time.

Even if we don’t sell “premium requests,” we still have analogous constraints:

  • internal cost budgets per workspace / org / user
  • rate limits
  • safety policies (what tools can do, what data can be accessed)

How to prevent this class of failure

The fix is less about patching one path and more about adopting a principle:

Enforcement must be attached to the execution primitive — not the UI primitive.

Concretely, for a multi-agent backend:

  1. Server-side model authorization

    • The server should decide which models are allowed for a given principal (user/org/project).
    • Agent files can request a model, but the server must gate it.
  2. Child calls inherit billing context

    • Every subagent run must carry a cryptographically verifiable “billing context” (tenant, plan, budget, limits).
    • If a child uses a premium model, it must debit premium budget — regardless of how it was invoked.
  3. Meter on actual model execution

    • Charge based on the model actually used (and ideally on tokens/compute), not on the parent’s label.
  4. Tool and subagent caps as first-class policy

    • Limit number of subagent spawns per parent.
    • Limit tool-call chains (depth) and total tool-call count.
    • Apply timeouts and “work budgets” (max wall time, max tokens, max actions) to long-running agents.
  5. Audit logs that match reality

    • Log every model execution with: model name, tokens, tool usage, parent chain.
    • Make it queryable so you can detect anomalies (e.g. “free-tier sessions spawning premium-tier work”).
  6. Treat agent definitions as untrusted input

    • Agent frontmatter, prompt files, and tool schemas are effectively a programming surface.
    • If they’re user-controlled, they must be sandboxed and validated.

A simple litmus test

If you can answer “yes” to this question, you likely have the same bug class:

Can a request that is accounted as Tier A cause the backend to execute Tier B compute without debiting Tier B?


Practical takeaways for anyone building agent systems

Whether you’re building an IDE agent, a “chat with your repo” assistant, or an internal automation bot, here’s a checklist you can apply immediately.

1) Define your billable primitives

Don’t bill “messages.” Bill the things that cost money:

  • model inference calls
  • tool executions that invoke third-party services
  • long-running background jobs

Then connect your UI abstraction (“turns”, “premium requests”) to those primitives.

2) Make the server the source of truth

Anything enforced only on the client will eventually be bypassed.

  • model selection
  • quota checks
  • tool permissions
  • max request depth

All of it must be validated on the server.

3) Treat subagents as first-class actions

Subagents shouldn’t be a loophole. They should be a clearly defined part of your execution graph.

  • subagent calls must be authenticated
  • they must be authorized
  • they must be metered
  • they must be traceable

4) Build “budget-aware” orchestration

Instead of “keep going until done,” give agents explicit budgets:

  • max tokens
  • max tool calls
  • max wall-clock time
  • max subagent spawns

and make the orchestrator plan within those constraints.

5) Assume prompt files will be weaponized

Agent definitions and prompt files are not documentation — they’re executable configuration.

If users can edit them, they can:

  • coerce different model selection
  • force loops
  • attempt to disable safety controls

Treat them like code: lint, validate, sandbox, and review.


Where Copilot goes from here

I don’t have inside information on how Microsoft/GitHub will resolve this, but the fix likely needs to land in two places:

  • Accounting/metering: tie billing to actual execution (including subagents)
  • Policy enforcement: ensure agent definitions can’t silently escalate compute tier without debit and authorization

The bigger lesson is that agentic UX is racing ahead of agentic governance. The more “magical” these systems become, the more important it is that their internal graphs — models, tools, subagents — are treated as real execution surfaces.

If you’re building agent products, this is your warning shot.


References


BuildrLab builds agent-native developer tooling and multi-agent workflows. If you’re building an agent system and want a second set of eyes on metering, policy boundaries, or architecture, get in touch.

Top comments (0)