I’ve been building .NET apps for about seven years, and I keep running into the same problem.
For background jobs, my default answer has usually been Hangfire.
And to be clear: I like Hangfire.
No shade at all. It is simple, reliable, and boring in the best way.
BackgroundJob.Enqueue(() => SendWelcomeEmail(userId));
Point it at SQL Server or Postgres, enqueue the job, ship it, go home.
But every time I need something more than fire-and-forget, I end up doing the same thing: slowly building a workflow engine on top of it.
Badly.
The flow that made me notice the pattern
A pretty common example: user signup.
A user signs up and the system needs to:
- create the user
- charge the card
- provision a tenant
- send a welcome email
- add them to the CRM
- wait 24 hours
- check whether they activated
- send a nudge if they did not
The first version is usually just one big handler.
await users.Create(...);
await stripe.Charge(...);
await tenants.Provision(...);
await email.SendWelcome(...);
await crm.Add(...);
It works in dev. It demos well. Everyone is happy.
Then reality shows up.
Stripe is slow. The CRM returns 500s. SendGrid rate-limits you. A deploy happens halfway through. Now you have users in every possible half-finished state:
- paid, but no tenant
- tenant created, but no email
- email sent, but no CRM record
- CRM record created, but activation nudge never scheduled
So you do the reasonable .NET thing: split it into jobs.
BackgroundJob.Enqueue(() => ProvisionTenant(userId));
BackgroundJob.Enqueue(() => SendWelcomeEmail(userId));
BackgroundJob.Enqueue(() => AddToCrm(userId));
That is better.
But then the next set of problems starts.
Jobs are not the same thing as workflows
Once the flow is split into jobs, the business process becomes harder to see.
A job can fail. A continuation can run. A delayed job can be scheduled. But the system does not really understand the whole flow.
What I usually want is something like:
- run step A
- if step A succeeds, run B and C in parallel
- wait for both
- pause for 24 hours
- resume later
- wait for an external event, but only up to 7 days
- retry one step with a custom policy
- show the whole thing as one workflow run
You can build all of this with Hangfire, a few tables, some state columns, job IDs, recurring polling jobs, and enough discipline.
I have done that.
It works.
It also feels like I am re-implementing the same missing abstraction every time.
The parts that always get awkward
These are the specific things I keep running into.
Delays and cancellations
“Wait 24 hours, then nudge the user if they have not activated” sounds simple.
With a delayed job, you either need to cancel it when the user activates, which means storing the job ID somewhere, or you let it fire and check the current state again.
Both approaches work, but neither feels like the primitive I actually wanted.
What I wanted was:
await ctx.Sleep(TimeSpan.FromHours(24));
var activated = await ctx.Step("check-activation", () =>
users.IsActivated(userId));
if (!activated)
{
await ctx.Step("send-nudge", () => email.SendActivationNudge(userId));
}
External events
Sometimes the next step should not run after a timer.
It should run when something happens.
For example:
- user activated
- payment succeeded
- document was signed
- webhook arrived
- admin approved the request
You can model this manually with database state, polling, and background jobs.
But again, that is the point. You are building the workflow runtime yourself.
Resuming after failure
This is the big one.
If the process dies after charging the card but before provisioning the tenant, I do not want to charge the card again.
I want the system to resume from the last durable checkpoint.
Not “restart the method and hope everything is idempotent.”
Not “read logs and reconstruct what happened.”
Not “add another status column and retry from there.”
I want the runtime to know:
-
create-usercompleted -
charge-cardcompleted -
provision-tenantdid not complete - resume from
provision-tenant
That is the missing piece.
Observability
Hangfire’s dashboard is good for jobs.
But when the business process spans multiple jobs, I usually want to see the flow, not just the individual jobs.
Something like:
SignupWorkflow / user_123
✓ create-user 120ms
✓ charge-card 2.4s retried once
✓ provision-tenant 8.7s
✓ send-welcome-email 300ms
✓ add-to-crm 1.1s
⏱ waiting 24h
That is the kind of view I always end up wanting later.
What I keep wishing existed
What I want is not really another job runner.
Hangfire already exists. Quartz exists. Coravel exists. Hosted services exist.
What I want is durable workflows-as-code for normal ASP.NET Core apps.
Something that feels closer to Temporal, Azure Durable Functions, Inngest, or Trigger.dev, but fits the boring .NET deployment model a lot of teams already have:
- ASP.NET Core app
- SQL Server or Postgres
- containers, App Service, or Kubernetes
- no extra cluster to operate
- decent local development story
In my head, the API looks something like this:
[Workflow]
public sealed class UserSignupWorkflow
{
public async Task<SignupResult> Run(WorkflowContext ctx, SignupRequest request)
{
var user = await ctx.Step("create-user", () => users.Create(request.Email, request.Password));
var charge = await ctx.Step(
"charge-card",
new StepOptions
{
Retries = 3,
Backoff = TimeSpan.FromSeconds(5)
},
() => stripe.Charge(user.Id, request.PaymentToken));
var tenant = await ctx.Step("provision-tenant", () => provisioner.CreateTenant(user.Id));
await ctx.Parallel(
ctx.Step("send-welcome-email", () => email.SendWelcome(user.Id)),
ctx.Step("add-to-crm", () => crm.Add(user.Id))
);
await ctx.Sleep(TimeSpan.FromHours(24));
var activated = await ctx.Step("check-activation", () =>
users.IsActivated(user.Id));
if (!activated)
{
await ctx.Step("send-activation-nudge", () =>
email.SendActivationNudge(user.Id));
}
// Or instead, wait for a singal
// await ctx.WaitForSignal<UserActivated>();
return new SignupResult(user.Id, tenant.Id);
}
}
And starting it would be something boring:
await workflows.Start<UserSignupWorkflow>(new SignupRequest(...));
The runtime would persist each completed step.
If the app crashes after charge-card, the workflow resumes from the last recorded checkpoint. It should not charge the card again.
If the workflow sleeps for 24 hours, that sleep survives restarts and deploys.
If a step fails, retries are configured on that step.
If I need to wait for an external event, that should be a first-class thing too.
var activated = await ctx.WaitForSignal<UserActivated>(
timeout: TimeSpan.FromDays(7));
Yes, I know this is hard
The moment you say “resume where it left off,” the sharp edges appear.
You have to think about:
- deterministic replay
- idempotency
- duplicate side effects
- workflow versioning
- serialization compatibility
- locking
- timers
- external events
- cancellation
- observability
- safe deployments
This is not magic.
The honest version is probably closer to:
Resume from the last durable checkpoint.
That wording matters.
If a card charge succeeds but the result is never recorded, you still have a hard distributed systems problem. No library can make side effects magically exactly-once.
But a good workflow runtime can make the common path much safer and much less hand-rolled.
“Why not just use Temporal?”
This is the obvious answer.
And honestly, Temporal is probably the gold standard here.
If you are solving this problem at serious scale, or workflows are central to your business, Temporal is probably the thing to look at first.
But for many .NET teams, especially smaller teams, Temporal feels like a big jump.
You need to learn the programming model. You need to run or pay for the platform. You need to understand task queues, activities, workflows, signals, determinism, versioning, and the operational model.
That may be completely worth it.
But it is not the same level of adoption as:
dotnet add package Something.Workflows
and pointing it at the database you already use.
“Why not Azure Durable Functions?”
Azure Durable Functions is probably the closest thing in the .NET ecosystem.
Orchestrators, activities, durable timers, external events — the model is very close to what I want.
The issue is the runtime.
If your app already lives in Azure Functions, great.
But if your app is a normal ASP.NET Core service running in containers, App Service, or Kubernetes, adopting Durable Functions means changing how that part of your system is built and deployed.
What I keep wanting is basically:
The Durable Functions programming model without having to live inside the Functions runtime.
“What about Wolverine, MassTransit, NServiceBus?”
These are all valid answers depending on the problem.
Wolverine in particular is really interesting. Durable inbox/outbox, messaging, handlers, sagas, Postgres, SQL Server, Marten — it solves a lot of real problems.
MassTransit and NServiceBus also have mature saga patterns.
But sagas usually mean writing explicit state machines. That is powerful, but it is not the same thing as writing the business flow as a straight-line async method and letting the runtime persist progress between steps.
That is the category difference I keep coming back to.
Messaging frameworks solve messaging very well.
I am talking about durable execution.
The gap I think exists
Maybe I am wrong, but the gap I keep seeing is this:
Durable workflows for .NET apps that run inside a normal ASP.NET Core application, use boring storage like SQL Server or Postgres, and do not require adopting a separate workflow platform.
Not a Hangfire replacement.
Not a Temporal killer.
Not a whole new cloud platform.
Just a practical workflow layer for the kind of business processes a lot of SaaS apps eventually grow into.
Where I am with this
I have been thinking about building it.
Very early.
No big announcement. No funding. No dramatic launch.
Just trying to figure out whether this is a real pain or whether I am overfitting to my own projects.
The rough idea is called RelayFx for now, although I am not married to the name.
The thing I would want from an MVP:
- workflows written in C#
- step-level persistence
- durable timers
- external signals/events
- SQL Server and Postgres support
- local dashboard
- basic retries and backoff
- visibility into each workflow run
- safe-ish workflow versioning story
Hosted version later, maybe.
But first I want to know if the pain is real.
Questions for other .NET devs
If you have dealt with long-running workflows in .NET, I would love to hear what you used.
A few specific questions:
- Am I wrong that this is a gap?
- What do you use today when Hangfire is not enough?
- Have you used Temporal’s .NET SDK in production? How was it?
- Have you built this manually with jobs, tables, polling, and state columns?
- What hurt the most?
- Is there an obvious library I am missing?
I put up a small waitlist at relayfx.dev because I am seriously considering building this.
But mostly, I want the criticism.
If this already exists, I want to know.
If the idea is bad, I would rather find out now.
And if you have built this the hard way before, I really want to hear where it got painful.
Roast me.
Top comments (0)