Saqueib Ansari

Posted on Apr 28 • Originally published at qcode.in

Laravel tenant onboarding works better as a workflow than a controller action

#laravel #multitenancy #queues #architecture

Creating a tenant in Laravel looks simple when the demo path is just Tenant::create() followed by a redirect. That illusion lasts right up until onboarding starts touching billing, custom domains, role assignment, workspace defaults, seed data, email, and audit logs that all succeed or fail on different timelines.

That is the moment when “create tenant” stops being a CRUD action and becomes a workflow.

I think teams get this wrong because the first version often works fine inside one controller action. You validate the request, create a tenant row, maybe create an owner user, maybe dispatch a couple of jobs, and call it done. Then the product grows. Provisioning gets slower. External systems get involved. One step succeeds, another times out, a third retries twice, and suddenly you have half-created accounts sitting in production with no trustworthy story for recovery.

The practical fix is to stop treating tenant onboarding like a single request-response event. Model it as a tracked workflow with explicit steps, state transitions, retries, failure handling, and operator visibility.

That is the real lesson behind a strong Laravel tenant onboarding workflow: partial success is not an edge case. It is the default shape of real provisioning. If you do not design for that, operational debt starts on day one.

The controller-action version works until provisioning becomes distributed

A lot of Laravel SaaS apps start here, because it is the most obvious implementation.

public function store(CreateTenantRequest $request)
{
    $tenant = Tenant::create([
        'name' => $request->string('name'),
        'slug' => $request->string('slug'),
    ]);

    $owner = User::create([
        'tenant_id' => $tenant->id,
        'name' => $request->string('owner_name'),
        'email' => $request->string('owner_email'),
    ]);

    $owner->assignRole('owner');

    SeedTenantDefaults::dispatch($tenant->id);
    SendWelcomeEmail::dispatch($owner->id);

    return response()->json([
        'tenant_id' => $tenant->id,
        'status' => 'created',
    ], 201);
}

There is nothing inherently wrong with this when onboarding is tiny, synchronous, and fully local.

The problem is that onboarding almost never stays that small.

Very quickly, tenant creation starts involving things like:

provisioning a billing customer
creating a subscription or trial
reserving or validating a domain
attaching feature flags or plans
generating default roles and permissions
seeding templates, settings, and starter content
sending invitation or verification email
writing audit events
notifying internal systems or analytics pipelines

At that point, your controller is no longer “creating a tenant.” It is kicking off a distributed set of operations with different latency, failure, and retry characteristics.

What breaks first

The first failure is usually not catastrophic. It is annoying.

The tenant row exists, but billing setup failed.

Or the billing customer exists, but the domain record did not get created.

Or the seed job partly ran, then the welcome email retried three times, then the admin UI says the workspace exists even though the owner never received access.

None of those failures are rare. They are exactly what real systems do.

Why this becomes operational debt fast

If onboarding is modeled as one controller action plus a few detached jobs, you usually lose three important things:

a reliable source of truth for current onboarding state
a clean way to retry only the failed step
operator visibility into what already happened and what should happen next

That is how half-created tenants turn into support tickets, manual scripts, and “just run this SQL plus artisan command” cleanup rituals.

A workflow model gives you a place to store reality

The first real improvement is conceptual, not technical: treat onboarding as an entity with state, not as a side effect of tenant creation.

Instead of “we created a tenant,” think in terms of:

an onboarding attempt started
specific provisioning steps were scheduled
some steps completed
some are waiting
some failed
the workflow is either completed, retryable, blocked, or canceled

That means you usually want a persistent onboarding record.

Schema::create('tenant_onboardings', function (Blueprint $table) {
    $table->id();
    $table->foreignId('tenant_id')->nullable()->constrained();
    $table->string('status');
    $table->string('requested_by_email');
    $table->json('input');
    $table->timestamp('started_at')->nullable();
    $table->timestamp('completed_at')->nullable();
    $table->timestamp('failed_at')->nullable();
    $table->text('failure_reason')->nullable();
    $table->timestamps();
});

This record is not busywork. It gives your system a place to store the actual story of provisioning.

What that record should answer

At minimum, your onboarding model should let you answer:

who requested the tenant
which tenant, if any, has already been created
what status the onboarding is in right now
which step failed last
whether the workflow is safe to retry
when onboarding completed or failed

Without that, every downstream job is making local decisions without a shared control plane.

Status should be explicit, not inferred from side effects

A common mistake is to infer onboarding status from the presence of rows elsewhere:

if tenant exists, onboarding succeeded
if subscription exists, billing step succeeded
if domain exists, DNS step succeeded

That looks clever and quickly becomes messy.

You want explicit workflow state instead:

pending
running
awaiting_external_confirmation
failed_retryable
failed_manual_review
completed

Those statuses communicate intent much better than scattered inference from ten other tables.

Break onboarding into tracked steps with different failure semantics

This is where the design gets real. Not every onboarding step behaves the same way, so do not model them as if they do.

Some steps are transactional and local. Some are asynchronous and remote. Some can be retried safely. Some should never be repeated blindly.

A strong Laravel tenant onboarding workflow splits steps according to those realities.

A useful step breakdown

For a typical SaaS app, onboarding may look something like this:

create tenant record
create owner account
attach plan or trial
provision billing customer
seed default workspace data
assign default roles and permissions
configure domain or subdomain
send onboarding email
emit audit and analytics events
mark onboarding complete

That does not mean everything must run serially. It means every step should be named, tracked, and reasoned about explicitly.

Not all failures deserve the same status

This is where teams often stay too naive.

If sending a welcome email fails, should onboarding be marked failed? Maybe not.

If billing customer creation fails, should the tenant still be considered active? Often no.

If domain verification is pending on user DNS changes, is that a failure? Definitely not.

That means each step should carry its own completion and blocking semantics.

A practical step model

Schema::create('tenant_onboarding_steps', function (Blueprint $table) {
    $table->id();
    $table->foreignId('tenant_onboarding_id')->constrained();
    $table->string('step');
    $table->string('status');
    $table->unsignedInteger('attempts')->default(0);
    $table->timestamp('started_at')->nullable();
    $table->timestamp('completed_at')->nullable();
    $table->timestamp('failed_at')->nullable();
    $table->text('last_error')->nullable();
    $table->json('meta')->nullable();
    $table->timestamps();
});

Now you can track step-level state without pretending the whole workflow is one binary success/failure event.

The right execution model is orchestration, not controller glue

Once onboarding becomes a workflow, you need something to orchestrate it.

That does not require a huge workflow engine on day one, but it does require more than a controller dispatching unrelated jobs and hoping for the best.

The orchestration layer should decide:

which step runs next
which steps can run in parallel
what counts as blocking
when to retry
when to stop and escalate
when the workflow is complete

A simple application service is a good start

You can start with a focused coordinator class.

final class StartTenantOnboarding
{
    public function handle(array $input): TenantOnboarding
    {
        $onboarding = TenantOnboarding::create([
            'status' => 'pending',
            'requested_by_email' => $input['owner_email'],
            'input' => $input,
            'started_at' => now(),
        ]);

        RunTenantOnboardingWorkflow::dispatch($onboarding->id);

        return $onboarding;
    }
}

Then let the workflow runner manage step progression.

final class RunTenantOnboardingWorkflow implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public function __construct(public int $onboardingId) {}

    public function handle(TenantOnboardingCoordinator $coordinator): void
    {
        $coordinator->advance($this->onboardingId);
    }
}

This is already better than stuffing everything into a controller, because orchestration now has a home.

The coordinator should be idempotent

This matters a lot.

Queue retries, duplicate dispatches, and partial step completion will happen. Your coordinator should be safe to re-enter.

That usually means:

checking current workflow state before acting
skipping already completed steps
using unique constraints or step markers to prevent duplicate side effects
making external provisioning calls idempotent where possible

If the workflow runner is not idempotent, retries become dangerous instead of helpful.

Treat external systems as eventually successful, eventually failed, or eventually manual

This is where onboarding designs often become unrealistic. Teams assume external steps behave like local method calls.

They do not.

Billing, domains, email, and third-party provisioning each have different kinds of uncertainty. A clean workflow acknowledges that.

Three external outcomes you should model

For most external onboarding steps, the result is not just success or failure. It is usually one of these:

completed: the external system confirmed the action
retryable failure: the step failed in a way that may succeed later
waiting/manual: the step cannot proceed automatically yet

Domain onboarding is a perfect example.

You may create a domain record successfully, but actual verification depends on DNS changes the customer has not made yet. That is not a failed workflow. It is a workflow waiting on external action.

Example: billing plus domain steps

final class ProvisionBillingCustomerStep
{
    public function handle(TenantOnboarding $onboarding): StepResult
    {
        try {
            $customerId = $this->billing->createCustomer([
                'email' => $onboarding->input['owner_email'],
                'tenant_name' => $onboarding->input['tenant_name'],
            ]);

            $onboarding->tenant->update(['billing_customer_id' => $customerId]);

            return StepResult::completed();
        } catch (TemporaryProviderException $e) {
            return StepResult::retryable($e->getMessage());
        } catch (PermanentProviderException $e) {
            return StepResult::manualReview($e->getMessage());
        }
    }
}

That is a much more useful contract than just throwing exceptions and letting queue retries guess what to do.

Manual review is not architectural failure

Teams sometimes resist explicit manual-review states because they want the workflow to feel “fully automated.” That is fantasy for many real onboarding systems.

If a tax configuration mismatch, billing fraud check, or domain verification issue requires human intervention, model that honestly.

A system that says “manual review needed” is much healthier than one that keeps retrying a hopeless step until the logs become noise.

The case-study lesson: partial success needs recovery paths, not blame

This is the part most teams only learn after they get burned.

Imagine this realistic onboarding path:

tenant row created
owner account created
seed data succeeded
billing customer creation timed out after provider-side success
retry is unsafe because a second customer may be created
domain step never started because billing is considered blocking
support sees a tenant that “exists” but cannot tell whether onboarding is safe to resume

That is not a weird edge case. It is exactly the kind of case that happens once onboarding touches remote systems.

What a good workflow lets you do here

A good workflow model lets you:

inspect exact completed and incomplete steps
confirm whether billing customer creation is idempotent
rerun only the blocked step
avoid reseeding or recreating the tenant
leave an audit trail of who resumed what and why

That is the difference between workflow-based onboarding and controller-based onboarding.

Recovery should be designed before production pain forces it

Every onboarding step should have one of these answers:

safe to retry automatically
safe to retry manually
must not retry; requires operator decision
compensatable by rollback

If your system cannot answer that for each step, it is not really production-ready onboarding.

Operator visibility is part of the product, not an afterthought

If onboarding can fail partially, someone needs to see where and why.

This is why I strongly recommend building at least a minimal internal onboarding status view early.

What operators should be able to see

A useful admin screen for onboarding should show:

tenant name and requested owner
current workflow status
each step with status and last attempt
last error message per failed step
whether automatic retry is pending
whether manual action is required
audit notes or resume history

That screen is often more valuable than clever internal abstractions, because it reduces panic when onboarding fails in production.

A small response shape for internal status APIs

{
  "onboarding_id": 481,
  "tenant_id": 102,
  "status": "failed_retryable",
  "steps": [
    {"step": "create_tenant", "status": "completed"},
    {"step": "create_owner", "status": "completed"},
    {"step": "provision_billing_customer", "status": "failed_retryable", "last_error": "timeout from provider"},
    {"step": "seed_defaults", "status": "completed"},
    {"step": "configure_domain", "status": "pending"}
  ]
}

That tells the truth in seconds. Logs alone do not.

Keep the workflow strict about what “complete” means

This is an easy place to get sloppy.

Teams sometimes mark onboarding complete as soon as the tenant can technically log in. That may be fine for some products. For others, it creates long-lived half-configured accounts that look active but are missing critical setup.

Completion should match product reality.

Define blocking vs non-blocking steps clearly

For example, you might decide:

Blocking before complete:

tenant record created
owner account created
billing customer provisioned
required roles created
minimum seed data installed

Non-blocking after complete:

welcome email sent
analytics event delivered
optional templates imported
custom domain verified

That is a product decision as much as a technical one.

If you do not define it clearly, engineers will each make their own assumption and the workflow will become inconsistent over time.

Completion should be auditable

When onboarding changes a customer’s ability to access paid product features, completion should leave an audit trail.

You want to know:

when the workflow completed
which version of the workflow logic ran
whether completion was automatic or operator-assisted
what non-blocking steps were still pending

This becomes especially important in B2B SaaS products where support, billing, and success teams all care about the same tenant lifecycle.

A practical Laravel implementation path that is strong without being overbuilt

You do not need a heavyweight orchestration platform immediately. You do need more structure than controller glue and background hope.

A practical setup looks like this:

Start with these building blocks

tenant_onboardings table for workflow-level state
tenant_onboarding_steps table for step-level tracking
a coordinator class to advance the workflow
one job that re-enters the coordinator safely
step classes with explicit result types
internal admin visibility for inspection and retry

That gives you most of the value early.

Add these next if complexity grows

As onboarding expands, add:

step dependency rules
retry backoff policies per step type
workflow versioning when steps change over time
webhook or polling completion hooks for external systems
operator controls for resume, skip, or cancel
alerting when workflows remain stuck too long

This is a better growth path than jumping straight from a controller action to a giant workflow engine nobody understands.

Do not over-serialize domain logic into the controller layer

Keep the controller tiny.

public function store(CreateTenantRequest $request, StartTenantOnboarding $start)
{
    $onboarding = $start->handle($request->validated());

    return response()->json([
        'onboarding_id' => $onboarding->id,
        'status' => $onboarding->status,
    ], 202);
}

That 202 Accepted is meaningful. It tells the truth: onboarding has started, not finished.

That is already a healthier contract than returning 201 Created and pretending the whole system is done.

The rule of thumb that saves pain later

Tenant onboarding in Laravel should feel less like “create a record” and more like “run a tracked provisioning process.”

That shift sounds heavier, but it is actually what keeps the system simpler once the product becomes real.

If you want one practical rule, use this:

The moment tenant creation touches more than one asynchronous or externally dependent step, stop modeling it as a controller action.

Model it as a workflow with explicit state, tracked steps, retries, and operator visibility.

Because provisioning rarely fails all at once. It fails halfway. And if your system has no durable story for halfway, onboarding debt starts accumulating immediately.

Read the full post on QCode: https://qcode.in/7-laravel-tenant-onboarding-should-be-a-workflow-not-a-controller-action/

DEV Community