I shipped an AI feature last fall that took an input document, called a large language model to extract structured data, called a second model to validate it, posted the results to a webhook, and then emailed the user. The whole thing took between 40 seconds and 3 minutes depending on the document size.
It worked perfectly in testing. It worked for the first hundred users in production. Then a network hiccup took out the LLM provider for 90 seconds during a busy afternoon, and I discovered the hard way that I had built a very expensive way to lose data.
My serverless function timed out. The retry was another full run from scratch, which hit the LLM a second time for tokens I had already paid for. Users saw errors. Some of them got two emails. A few of them got neither because the second run failed at a different step and the retry count hit zero.
I spent the next weekend rewriting the whole thing on top of a durable workflow engine. The problem was not that I had bad code. The problem was that I was using request-response infrastructure to run a multi-step, long-running, stateful process. That is not what serverless functions are for, and pretending it is leads to exactly the kind of failure I walked into.
This post is the guide I wish I had before I shipped that feature. It covers what durable workflows are, why AI features need them more than almost any other category of work, and how to choose between Inngest, Trigger.dev, and Vercel Workflow in 2026.
What Breaks When AI Meets Serverless
The default pattern for shipping a feature in 2026 looks something like: a Next.js or similar framework, an API route that handles a request, some business logic, maybe a database call, and a response. This pattern is fast, cheap, and covers 90 percent of what most web apps do.
It also breaks in predictable ways when AI gets involved.
Timeouts. LLM calls are slow. A single Claude or GPT call is typically a few seconds. A chain of them can take minutes. Vercel raised the default function timeout to 300 seconds in 2025, which helps, but a multi-step agent can easily exceed that. If your function times out mid-run, you lose the work in progress and any external side effects you already triggered.
Retries. When an LLM provider has an outage or rate limits you, you need to retry. Naive retries cause duplicate emails, duplicate database writes, and duplicate bills. Smart retries require keeping track of which steps have already succeeded so you can resume from where you left off instead of starting over.
Cost. Every retry on an LLM call costs real money. A workflow that reruns from scratch on every failure can 2x or 3x your AI costs during a bad day with a provider. For features where each run is cheap this is tolerable. For agentic workflows that use 50,000 tokens per run, it is a budget problem.
Observability. When a multi-step AI workflow fails, you need to know which step failed, with what input, and with what output from the previous steps. Tracing this in a standard logging setup is painful. You end up grepping logs across multiple function invocations, trying to correlate request IDs that may not even exist on retries.
Concurrency. If a user kicks off ten AI workflows at once, you want to throttle them so you do not blow up your rate limits with your LLM provider. Standard serverless functions have no built-in way to do this without building your own queue, and LLM cost optimization in production depends on getting this right.
These are not edge cases. They are the default failure modes for any AI feature that does more than a single one-shot completion. The moment you chain two LLM calls together, or mix an LLM call with an external API, or run something that takes longer than a normal HTTP request, you are in workflow territory whether you planned for it or not.
What Durable Workflows Actually Are
The term "durable workflow" sounds like enterprise jargon, but the idea is simple.
A durable workflow is a function where each step is checkpointed. When a step succeeds, the result is persisted. If the workflow fails partway through, the engine resumes from the last successful step instead of starting over. The function can take minutes, hours, or days to complete. It can pause to wait for external events. It can sleep for a week and then resume. All of this is handled by the engine, not by you.
The programming model looks almost identical to normal async code. You write a function with steps. Each step is a regular async operation. The engine wraps each step to persist its result and provide the persisted result on replay if the step has already run.
The magic is that failures become survivable. A network blip in step 3 of a 5 step workflow does not lose the work from steps 1 and 2. A provider outage does not double bill you. A deploy in the middle of a running workflow does not drop it on the floor. These are not optimizations. They are the baseline behavior.
This is the model Temporal popularized in the enterprise. What changed in 2026 is that the pattern finally got accessible to indie developers and small teams, with tools that work natively with Next.js, serverless functions, and modern TypeScript stacks. You no longer need a dedicated worker infrastructure to run durable workflows. You can run them on the same platform as the rest of your app.
Inngest: The Mature Choice
Inngest has been in the durable workflow space longer than most of the current competitors. It is a hosted service with a TypeScript SDK that defines workflows as functions with steps, using a familiar async pattern.
What it does well
The developer experience is polished. Defining a workflow looks like writing a regular async function with a few wrapper calls. You call step.run for operations that should be checkpointed, step.sleep for delays, and step.waitForEvent for waiting on external triggers. There is no special syntax to learn and the types are strong.
Event-driven triggers are a first class concept. Instead of calling a workflow directly, you emit an event, and Inngest decides which workflows should run based on event matching rules. This is the right pattern for anything that involves user actions triggering background work, and it composes cleanly as your app grows.
The local development story is good. Inngest has a local dev server that mirrors production behavior, so you can iterate on workflows without deploying. The dashboard shows you every run, every step, every input, every output. When something goes wrong, you can see exactly what happened and often just click to replay from a failed step.
Concurrency and rate limiting are built in. You can limit a workflow to process at most 5 runs concurrently per user, or throttle invocations to 10 per second per integration, or back off exponentially on retry. For AI features that need to stay under LLM rate limits, this is the feature you did not know you needed until you shipped without it.
Where it falls short
The hosted pricing can get expensive for high-volume workflows. Inngest charges based on step executions and concurrency, and both scale with how chatty your workflows are. For a workflow that checkpoints a lot of small steps, the bill adds up.
Self-hosting is possible but more involved than the managed service suggests. If you want to run Inngest on your own infrastructure to control costs or compliance, expect to spend time on the deployment.
The abstraction is opinionated about event-driven triggers. If your mental model is "call this workflow now and wait for the result," Inngest supports it but the ergonomics lean toward async event-driven patterns. This is usually the right pattern, but it can feel foreign if you are coming from a simpler background job queue.
When to pick it
Inngest is the right choice if you are building an event-driven system, care about first class concurrency controls, and want a polished managed service. It is also the choice with the longest track record, so if you are risk averse, it is the safe pick.
Trigger.dev: The Open Source Friendly Pick
Trigger.dev took a different path. It is open source, self hostable from day one, and focuses on making background jobs and workflows accessible with a minimum of ceremony. Version 3, which is the version you should be using in 2026, is a full rewrite that added durable execution and significantly improved the developer experience.
What it does well
The setup is the fastest of the three tools I tested. You install the SDK, define a task with a simple decorator pattern, and it is ready to run. For quick prototyping or for developers who want to minimize the conceptual overhead of adopting a new tool, Trigger.dev is the lightest lift.
The self-hosting story is first class. The open source version of Trigger.dev runs as a Docker container and has feature parity with the managed cloud product. For teams that need to own their infrastructure for compliance or cost reasons, this is a significant advantage over the more managed-first alternatives.
The dashboard is genuinely nice. You get a live view of running tasks, a history of past runs, the ability to replay from any step, and the tooling for debugging failed runs is polished. For AI workflows specifically, being able to see exactly what each LLM call received and returned is invaluable when you are tracking down a bad completion.
The SDK handles common AI patterns well. There is built in support for streaming responses, long running inference calls, and checkpointing expensive LLM outputs so you do not rerun them on retry. This is the kind of domain-specific polish that separates a tool that works for AI from a tool that was designed for AI.
Where it falls short
The platform is younger than Inngest. Some advanced features like sophisticated event matching, complex concurrency policies, and multi-tenant controls are either newer or still in development. For a simple AI workflow this does not matter. For a complex multi-tenant SaaS with intricate routing needs, it might.
The managed cloud pricing is competitive but the tool is still finding its positioning. I have seen pricing adjustments several times in the last year, which is normal for a product at this stage but worth being aware of if you are trying to budget.
The ecosystem around triggers and integrations is smaller than Inngest's. Inngest has invested heavily in pre-built integrations with common services. Trigger.dev leans on you to wire up the integrations yourself, which is fine but slightly more work.
When to pick it
Trigger.dev is the right choice if you value open source, want the fastest possible setup, need to self host, or want a tool that was designed with AI workloads in mind from the start. It is especially strong for indie developers building one person startups who want to control their infrastructure without managing it full time.
Vercel Workflow: The Native Vercel Pick
Vercel Workflow, sometimes called Vercel Workflow DevKit or WDK, is Vercel's answer to the durable workflow problem. It launched in 2025 and matured throughout 2026 as part of Vercel's broader push to own more of the backend runtime. It runs on Fluid Compute, integrates with the rest of the Vercel platform, and requires no separate infrastructure if you are already deploying on Vercel.
What it does well
The integration with the Vercel platform is seamless. If your app is already on Vercel, adding a workflow is a matter of creating a new function file with the workflow pattern. No separate service, no additional dashboard, no new billing relationship. Everything shows up in your existing Vercel project.
The programming model is clean. You write a workflow as a regular async function, mark steps that should be checkpointed, and the runtime handles persistence. The API feels like a natural extension of Next.js rather than an external tool bolted on.
Cost efficiency is genuinely different. Because Vercel Workflow runs on Fluid Compute, you get the benefits of function instance reuse and active CPU pricing. For AI workflows that spend most of their time waiting on LLM responses, you are not paying for idle time the way you would with traditional serverless invocation counts.
The observability tie-in is strong. Workflow runs show up in the Vercel dashboard alongside your deployments, logs, and other platform metrics. When a workflow fails, you can trace it back to the specific deployment, look at the runtime logs, and see the preview environment context all in one place.
Where it falls short
It only works on Vercel. This is the obvious limitation and it is not going to change. If you are on AWS, Render, Fly, Cloudflare, or self hosted, Vercel Workflow is not available.
It is newer than the alternatives. Inngest and Trigger.dev have years of production usage across thousands of applications. Vercel Workflow is production-ready but has less battle-tested coverage of edge cases. For straightforward AI workflows this is fine. For complex orchestration with unusual patterns, you may run into rough edges.
The ecosystem of patterns, examples, and integrations is smaller. Inngest and Trigger.dev both have mature libraries of patterns for common use cases. Vercel Workflow is catching up but you will sometimes end up implementing things from first principles.
When to pick it
Vercel Workflow is the right choice if you are already on Vercel and want the tightest possible integration with your existing stack. For AI features that are part of a larger Next.js app, the zero-configuration setup and platform-native observability are hard to beat.
The Decision Framework
After running all three on real projects for the last few months, here is the framework I use to decide which one to reach for.
Are you on Vercel and shipping Next.js? Start with Vercel Workflow. The integration is seamless and the setup cost is effectively zero. If you hit a limitation, switching to one of the others is always an option, but most AI features do not hit those limits.
Do you need to self host? Trigger.dev is the pick. Inngest can be self hosted but the experience is more involved. Vercel Workflow is not an option off platform.
Is your workflow fundamentally event-driven? Inngest is the pick. The event routing and matching features are first class in a way the others are not. For systems where many different triggers can kick off related workflows, Inngest's model is the cleanest.
Are you optimizing for the fastest possible setup? Trigger.dev is the pick. The cognitive overhead is the lowest of the three, and for a solo developer trying to ship an AI feature quickly, this matters.
Do you care about long term track record and maturity? Inngest is the pick. It has been at this the longest and has the largest set of real-world production deployments to learn from.
For most of my current projects, I end up running Vercel Workflow for the AI features that live inside a Vercel-hosted app, and Trigger.dev for anything that needs to run off platform or where I want to control my own infrastructure. I have stopped reaching for Inngest on new projects mostly because the pricing for the kind of chatty workflows I write adds up faster than the alternatives.
Practical Patterns for AI Workflows
A few patterns I have learned the hard way that apply regardless of which tool you pick.
Checkpoint LLM calls aggressively. Every LLM call should be its own checkpointed step. If the call succeeds, you never want to run it again, because it costs money and the output is not deterministic anyway. Every durable workflow engine handles this well if you mark the step correctly.
Store the raw LLM output, not just the parsed version. When an LLM call succeeds but the parsing fails, you want to be able to fix the parser and replay without rerunning the LLM. This requires persisting the raw completion, not just the structured result you extracted from it.
Use the workflow engine's native rate limiting. Do not build your own throttling layer on top of a workflow engine. Every tool I have covered has built in primitives for this. Use them.
Design steps for idempotency. Even with durable workflows, steps can retry. If a step sends an email, sends a webhook, or charges a card, make sure running it twice has the same effect as running it once. Idempotency keys, deduplication tokens, and "has this been done already" checks all matter.
Keep step inputs small. Every step's inputs get persisted. If you pass a large payload to a step, you are paying to serialize, store, and deserialize that payload on every retry. Pass references to stored data rather than the data itself when possible.
Log the prompts and the responses. For debugging AI workflows, the prompt-response pair is the source of truth. Log both, correlate them to the workflow run, and make sure you can replay any failed step with the exact same prompt that caused the failure. AI agent observability is the companion discipline that makes durable workflows debuggable in production.
The Honest Bottom Line
If you are shipping an AI feature that does more than a single one-shot completion, you need a durable workflow engine. The alternative is not "simpler code." The alternative is a production incident that you will write a blog post about, and the blog post will be shaped a lot like this one.
Inngest is mature and event-driven. Trigger.dev is open source and fast to adopt. Vercel Workflow is native to Vercel and uses Fluid Compute to keep costs down on long running AI workloads. All three are production ready and all three solve the core problem of multi-step, long-running, stateful AI work.
The wrong answer is to keep running AI workflows on plain serverless functions and hope that your users never hit a provider outage. The provider outage is coming. The only question is whether your code is ready for it.
I ended up migrating the feature that ate my weekend to a durable workflow in a single afternoon. The rewrite was smaller than the original implementation because most of the retry logic and state tracking I had built by hand got replaced by the engine. Six months later the feature has weathered three LLM provider incidents without dropping a single run. That is the whole pitch.
Pick a tool. Migrate your AI workflows. Get your weekends back.
Top comments (0)