Avi Fenesh

Posted on Mar 28

How AI workloads changed the queue I was already building

#ai #queue #valkey #orchestration

I did not start glide-mq because of AI.

I started it because I needed a queue, wanted mechanics I liked better than what was out there, and wanted it built on top of Valkey Glide.

Up through v0.13, that is basically what it was: a feature-rich queue.

Then I kept building AI systems on top of it.

That is where the shape started changing.

The queue itself was usually fine. The pain was everything around it. Long-running jobs that were not actually stuck. Streaming that wanted to be part of the job instead of a side channel. Budget checks that needed to happen before the spend, not after. Token-aware rate limits. Pause/resume because real flows sometimes need a human or have to wait for CI.

Different project. Same pile of glue.

After enough rounds, the pattern stops looking normal. The queue is doing the easy part. Everything AI-specific is leaking out around it.

That is what pushed glide-mq in a different direction.

AI workloads expose the wrong assumptions in normal queue design

Most queues were built for ordinary background work.

AI workloads are not ordinary background work.

They stream. They run long without being stuck. They spend money while they execute. They wait on humans. They hit limits based on tokens, not job count. They can fail semantically while every operational metric stays green.

If the queue does not understand that shape, the missing behavior does not disappear. It ends up patched on from the outside.

That is the part that started feeling wrong to me.

Not because queues are bad. Because a lot of the assumptions underneath them were built for a different class of workload.

1. Locks are the wrong model for long AI jobs

Most queues assume the worker proves it is alive by renewing a lock. Miss the renewal, retry the job.

That is reasonable for short jobs.

It is a bad model for LLM workloads.

A long generation is not dead. A reasoning-heavy call sitting on a hard prompt for 90 seconds is not dead. But a normal queue cannot tell the difference, so it retries, and now you are paying twice for the same work.

Nothing crashed. Nothing exploded. You just had the wrong execution model.

The usual workaround is increasing a global lock timeout.

That is also wrong.

A tiny classifier and a two-minute generation should not share the same timeout assumptions just because they live in the same queue.

So in glide-mq, lock duration is per job:

await queue.add("classify", { text: "short" }, { lockDuration: 10_000 });
await queue.add("research", { topic: "complex" }, { lockDuration: 180_000 });

Same queue. Different expectations.

That should be normal.

2. Budget control belongs in the execution path

The ugly failures in agent systems are often not technical failures.

The requests succeed.
The responses are valid.
The logs look clean.
The dashboards stay green.

And meanwhile the system is stuck in some useless loop spending money.

That is why budget control does not belong only in dashboards, analytics, or some side service that tells you later what happened. By then the spend is already gone.

The queue is the choke point. Every step passes through it. That is where the budget check belongs.

So I put budgets on flows:

await flowProducer.add(
  {
    name: "pipeline",
    queueName: "ai",
    data: { topic: "research" },
    children: [
      { name: "search", queueName: "ai", data: {} },
      { name: "analyze", queueName: "ai", data: {} },
      { name: "draft", queueName: "ai", data: {} },
    ],
  },
  {
    budget: {
      maxTotalTokens: 50_000,
      maxTotalCost: 2.0,
      tokenWeights: { reasoning: 2.0 },
      onExceeded: "fail",
    },
  },
);

When a job reports usage, the budget check happens atomically in Valkey. If the flow is out of budget, the next step stops there.

Not later. Not after the invoice. At the point where it still matters.

3. Streaming should not be a second system

If the model is producing the result incrementally, that stream is part of the job.

Treating it as a separate system is already the smell.

But most queues only understand one shape: job starts, job finishes, here is the result.

LLMs do not behave like that.

So people bolt on pub/sub. Or WebSockets. Or SSE through some different route. Then reconnect logic. Then ordering. Then another pile of glue to keep the stream state and the job state from drifting apart.

Now the job lives in one place and the live output lives somewhere else.

That split is artificial.

So in glide-mq, the stream stays on the job:

const worker = new Worker(
  "ai",
  async (job) => {
    for await (const token of generateTokens(job.data.prompt)) {
      await job.streamChunk("token", token);
    }
    await job.streamChunk("done");
    return { completed: true };
  },
  { connection },
);

const chunks = await queue.readStream(jobId, { block: 5000 });

No extra pub/sub layer. No second system just to watch a job do its work. The stream is attached to the job, stored in Valkey, and resumable after disconnect.

That is the model that actually matches the workload.

The rest is the same mismatch in different clothes

Once you accept that AI workloads are a different class of work, the rest stops looking like extra features and starts looking like missing queue behavior.

Pause/resume for human approval.
Wake the flow back up when CI finishes.
Fallbacks across models.
Rate limiting based on tokens instead of job count.
Usage tracking that does not break when the next model adds a new token category.

These are not edge cases. They are part of the shape of the work.

That is what v0.14 is really about

Not AI branding. Not pretending glide-mq started as an AI queue from day one.

It did not.

It started as the queue I wanted to have.

Then building real AI systems on top of it kept exposing the same gaps, and eventually patching around them started feeling like the wrong move.

So v0.14 moves those behaviors into the queue

That is what changed.

glide-mq is still a queue. But v0.14 is where it started absorbing the behaviors that AI systems kept forcing into side systems.

Per-job lockDuration so long jobs stop fighting short ones.
job.reportUsage() so budgets and accounting live in the execution path.
job.streamChunk() so streaming stays attached to the job.
job.suspend() and queue.signal() for human-in-the-loop flows.
Ordered fallbacks.
Token-aware throttling.
Flow budgets that fail before the spend gets worse.

That is the direction now.

Not queue plus five things you will bolt on later anyway.

The queue should understand more of the workload it is running.

Final thought

I do not think this is only a glide-mq story.

I think AI workloads are exposing the wrong assumptions in a lot of older tooling.

The problem is not just that queues need a few more integrations. The problem is that many of the abstractions we still lean on were designed for ordinary jobs, requests, and background work. AI systems have a different shape, and when the abstraction does not match, the missing behavior leaks out into glue.

That is the part I stopped wanting to patch from the outside.

npm install glide-mq

GitHub | Examples | Docs

Top comments (6)

klement Gunndu • Mar 29

Per-job lock duration is the right call — we hit the exact same wall where a 3-second classifier and a 2-minute generation kept fighting over the same global timeout. Putting budget checks in the execution path instead of dashboards is underrated advice.

Avi Fenesh • Apr 5

I'm sure it happens to many: we use tools for X and layering above for Y. But we can build tools that are meant for X from the beginning, and hopefully we will not need to check the dashboard often.

Mads Hansen • Mar 29

The shift from 'queue as reliability mechanism' to 'queue as orchestration layer' is exactly what we see in AI-heavy workloads. Tasks that used to be fire-and-forget (send email, resize image) now need to be retryable, observable, and sometimes cancellable.

We built something similar when adding MCP to Conexor — the tool calls from the LLM need proper queuing so a slow database query doesn't block the whole agent loop. What queue implementation did you end up with?

Avi Fenesh • Apr 5

If I understand your question correctly, the queue is what I posted about. github.com/avifenesh/glide-mq
But maybe I don't understand your q?

Botánica Andina • Mar 28

Good read. The agent reliability problem is real — I've been building autonomous SEO tools and the failure modes you describe are spot-on. State management is the hardest part.

Avi Fenesh • Apr 5

If you have a specific example that the queue could add as another feature, hit me with it. Would love to evolve to real needs of others.