DEV Community

AdmilsonCossa
AdmilsonCossa

Posted on

Your AI Is Still Billing After the User Closed the Tab

That’s not a bug. It’s a missing owner.

The user closed the browser 30 seconds ago.

Your logs show the response was never delivered. But the LLM stream is still running. The vector search is still scanning. The rerankers are still scoring. The tool calls are still executing.

The invoice will arrive tomorrow.

This is not hypothetical. It is the default behavior of many AI backends today.

app.post("/chat", async (req, res) => {
  const stream = await openai.chat.completions.create({
    model: "gpt-4.1",
    stream: true,
    messages: req.body.messages,
  });

  for await (const chunk of stream) {
    res.write(chunk.choices[0]?.delta?.content ?? "");
  }

  res.end();
});
Enter fullscreen mode Exit fullscreen mode

This code looks completely reasonable.

It even works — until the user refreshes the page, closes the tab, loses signal, or navigates away.

At that moment:

  • the HTTP response is dead
  • the client no longer exists
  • the user no longer cares

But the async work often continues anyway.

The LLM may keep generating tokens. The vector search may keep scanning. Background tasks may continue running with no remaining consumer.

The work outlived the reason it existed.

That sentence is the real problem.


The real root cause: ownership is missing

Most async systems treat cancellation as an optional convention rather than a runtime guarantee.

You can pass an AbortController if you remember.

You can manually wire cleanup if every developer remembers.

You can hope every dependency correctly propagates cancellation.

But structurally, there is usually no single owner for the tree of async work created by a request.

A single AI request may spawn:

  • LLM streaming
  • vector search
  • rerankers
  • tool execution
  • background audit writes
  • metrics
  • cleanup handlers
  • retries
  • observability traces

Every one of these should stop when the user disconnects.

But native Promises do not provide ownership semantics.

So the work survives.


The hidden cost of abandoned AI work

At small scale this is invisible.

At production scale this becomes expensive.

Imagine:

100,000 abandoned requests/day
× 3–5 seconds of unnecessary downstream execution
= millions of wasted tokens
= unnecessary GPU time
= avoidable API spend
= infrastructure pressure
Enter fullscreen mode Exit fullscreen mode

This is not just a code-quality issue.

This is infrastructure waste.


The fix: one scope owns the work

Instead of manually wiring cancellation across unrelated async operations, WorkIt introduces one ownership boundary: the scope.

import { WorkIt } from "@workit/core";

await WorkIt.run(async (scope) => {
  // User disconnects → cancel everything
  req.on("close", () => {
    scope.cancel("client_disconnected");
  });

  // Each operation belongs to the scope
  const llm = scope.spawn("llm-stream", async (signal) =>
    streamLLM({
      messages,
      signal,
    })
  );

  const tools = scope.spawn("tools", async (signal) =>
    runTools({
      input,
      signal,
    })
  );

  const vector = scope.spawn("vector-search", async (signal) =>
    searchVectorDB({
      query,
      signal,
    })
  );

  // Wait for all child work
  return await scope.all([llm, tools, vector]);
});
Enter fullscreen mode Exit fullscreen mode

Now the request owns the work.

When the client disconnects:

  1. scope.cancel("client_disconnected") fires
  2. every child operation receives cancellation
  3. streaming stops
  4. tool execution stops
  5. vector search stops
  6. cleanup handlers run
  7. the scope settles deterministically

No orphaned work.

No zombie tasks.

No invisible token burn.


This is not theory

We validated this behavior in the WorkIt test suite.

Scenario:

{
  "case": "AI streaming disconnect",
  "result": "LLM/tool/vector work stopped after cancellation",
  "post_cancel_work_ms": 0,
  "late_events": 0
}
Enter fullscreen mode Exit fullscreen mode

The important part:

"late_events": 0
Enter fullscreen mode Exit fullscreen mode

After cancellation, no additional downstream work completed.

The evidence is what did not continue running.


Why this matters

The AI ecosystem is rapidly moving toward:

  • streaming
  • agents
  • tool execution
  • multi-provider inference
  • background workflows
  • long-lived realtime sessions

All of these create trees of async work.

And most of them still lack clear ownership semantics.

The result is:

  • abandoned compute
  • leaking streams
  • runaway retries
  • zombie tool execution
  • incomplete shutdowns
  • hidden infrastructure cost

WorkIt treats async work as something that must have an owner.

When the owner disappears, the work disappears with it.


The deeper idea

This is not really about cancellation.

It is about lifecycle ownership.

The request should own the work it creates.

The WebSocket should own its subscriptions.

The agent should own its tools.

The stream should own its producers.

Without ownership, async systems slowly leak compute and complexity.


Try it

npm install @workit/core
Enter fullscreen mode Exit fullscreen mode
import { WorkIt } from "@workit/core";

await WorkIt.run(async (scope) => {
  req.on("close", () => {
    scope.cancel("user_gone");
  });

  const result = await scope.spawn(
    "my-ai-call",
    async (signal) => {
      return openai.chat.completions.create({
        model: "gpt-4.1",
        stream: true,
        messages,
        signal,
      });
    }
  );
});
Enter fullscreen mode Exit fullscreen mode

The larger problem

Every senior engineer has seen some version of this question in production:

“Why is this still running?”

That question appears in:

  • AI streaming
  • WebSockets
  • Kafka consumers
  • background workers
  • multiplayer game loops
  • Discord bots
  • server-side rendering
  • agent runtimes

The problem is the same every time:

The work outlived the reason it existed.

WorkIt is an attempt to fix that at the runtime level.


GitHub:
https://github.com/WorkRuntime/workit

Article series:
https://dev.to/admilsoncossa/owned-async-work-in-typescript-ogp

Top comments (0)