AdmilsonCossa

Posted on May 9 • Edited on May 14

Your AI Is Still Billing After the User Closed the Tab

#typescript #ai #javascript #programming

That's not a bug. It's a missing owner.

The user closed the browser 30 seconds ago.

Your logs show the response was never delivered. But the LLM stream is still running. The vector search is still scanning. The reranker is still scoring. The tool calls are still executing.

The invoice will arrive tomorrow.

This is not hypothetical. It is the default behaviour of many AI backends today.

app.post("/chat", async (req, res) => {
  const stream = await openai.chat.completions.create({
    model: "gpt-4.1",
    stream: true,
    messages: req.body.messages,
  });

  for await (const chunk of stream) {
    res.write(chunk.choices[0]?.delta?.content ?? "");
  }

  res.end();
});

That code looks reasonable.

It even works — until the user refreshes the page, closes the tab, loses signal, or navigates away.

At that moment:

the HTTP response is dead
the client no longer exists
the user no longer cares

But the async work continues anyway.

The LLM keeps generating tokens. The vector search keeps scanning. Background tasks run with no remaining consumer.

The work outlived the reason it existed.

That sentence is the real problem.

The real root cause: ownership is missing

Most async systems treat cancellation as an optional convention rather than a runtime guarantee.

You can pass an AbortController if you remember.

You can manually wire cleanup if every developer remembers.

You can hope every dependency correctly propagates cancellation.

But structurally, there is no single owner for the tree of async work created by a request.

A single AI request may spawn:

LLM streaming
vector search
reranking
tool execution
background audit writes
metrics
cleanup handlers
retries
observability traces

Every one of these should stop when the user disconnects.

But native Promise does not provide ownership semantics.

So the work survives.

The hidden cost of abandoned AI work

At small scale this is invisible.

At production scale it is expensive.

100,000 abandoned requests/day
× 3–5 seconds of unnecessary downstream execution
= millions of wasted tokens
= unnecessary GPU time
= avoidable API spend
= infrastructure pressure

This is not a code-quality issue.

This is infrastructure waste.

The fix: one scope owns the work

@workit/core introduces one ownership boundary: the scope. Every child task belongs to it. When the scope cancels, everything below it stops with a typed reason and runs registered cleanup before the scope resolves.

import { run } from "@workit/core";

app.post("/chat", async (req, res) => {
  await run.scope(async (scope) => {
    // Client disconnects → cancel everything underneath
    req.signal.addEventListener("abort", () =>
      scope.cancel({ kind: "manual", tag: "client_disconnected" })
    );

    const llm = scope.spawn(async (ctx) =>
      streamLLM(req.body.messages, { signal: ctx.signal }),
      { name: "llm-stream", kind: "llm" });

    const tools = scope.spawn(async (ctx) =>
      runTools(req.body.input, { signal: ctx.signal }),
      { name: "tools", kind: "tool" });

    const vector = scope.spawn(async (ctx) =>
      searchVectorDB(req.body.query, { signal: ctx.signal }),
      { name: "vector-search", kind: "io" });

    const [text, toolResult, sources] = await Promise.all([llm, tools, vector]);
    res.json({ text, toolResult, sources });
  });
});

Now the request owns the work.

When the client disconnects:

scope.cancel({ kind: "manual", tag: "client_disconnected" }) fires.
Every child task receives ctx.signal.aborted with the typed reason.
The OpenAI stream call sees its signal abort and stops at the TCP layer.
Tool execution stops.
Vector search stops.
Registered ctx.defer(...) cleanup runs LIFO.
The scope resolves deterministically.

No orphaned work. No zombie tasks. No invisible token burn.

streamLLM, runTools, and searchVectorDB above are your application's wrappers — anything that accepts an AbortSignal (the official OpenAI SDK, the Vercel AI SDK, pg, mongodb, fetch) plugs straight in.

This is not theory

The repository ships a runnable sample for exactly this scenario:

// samples/stt-disconnect.sample.js — same shape, applied to live audio
const disconnect = new AbortController();
const iterator = transcribeStream(microphone, {
  async transcribe(chunk, ctx) {
    return provider.transcribe(chunk, { signal: ctx.signal });
  },
}, { signal: disconnect.signal })[Symbol.asyncIterator]();

await iterator.next();                // first chunk: "FIRST"
const pending = iterator.next();      // second chunk starts
disconnect.abort(new CancellationError({ kind: "manual", tag: "client_disconnect" }));

When the sample runs, it asserts and prints:

{
  "sample": "stt-disconnect",
  "first": "FIRST",
  "providerCancelled": true,
  "sourceClosed": true,
  "reasonKind": "manual"
}

The receipt is what did not continue running:

providerCancelled: true — the provider's HTTP request observed the abort.
sourceClosed: true — the async generator's finally ran and the microphone source closed.
reasonKind: "manual" — the cancel reason was typed end to end. Your dashboard can pivot a metric on it.

Reproduce: npm run sample:stt-disconnect. The sample is at samples/stt-disconnect.sample.js.

Why this matters

The AI ecosystem is rapidly moving toward:

streaming
agents
tool execution
multi-provider inference
background workflows
long-lived realtime sessions

All of these create trees of async work. Most of them still lack clear ownership semantics.

The result is:

abandoned compute
leaking streams
runaway retries
zombie tool execution
incomplete shutdowns
hidden infrastructure cost

WorkIt treats async work as something that must have an owner.

When the owner disappears, the work disappears with it.

The deeper idea

This is not really about cancellation.

It is about lifecycle ownership.

The request should own the work it creates.

The WebSocket should own its subscriptions.

The agent should own its tools.

The stream should own its producers.

Without ownership, async systems slowly leak compute and complexity.

Try it

npm install @workit/core

import { run } from "@workit/core";

await run.scope(async (scope) => {
  req.signal.addEventListener("abort", () =>
    scope.cancel({ kind: "manual", tag: "client_disconnected" })
  );

  const result = await scope.spawn(async (ctx) =>
    openai.chat.completions.create({
      model: "gpt-4.1",
      stream: true,
      messages,
      signal: ctx.signal,            // pass ctx.signal — the chain handles the rest
    }),
    { name: "ai-call", kind: "llm" }
  );
});

That's the contract. One scope owns the work. The OpenAI client receives the signal. The tab closes; the bill stops.

The larger problem

Every senior engineer has seen some version of this question in production:

"Why is this still running?"

That question appears in:

AI streaming
WebSockets
Kafka consumers
background workers
multiplayer game loops
Discord bots
server-side rendering
agent runtimes

The problem is the same every time: