tarik haddadi

Posted on Jun 22

The Hidden Architecture Behind AI SaaS: Lessons From Building an Enterprise Automation Platform

#ai #architecture #saas #softwareengineering

Building an AI-powered SaaS platform taught me something I underestimated at the beginning:

The hard part is not calling an LLM.

The hard part is making AI work inside a real business environment.

At first, everything looks manageable.

You think:

API keys are just generated secrets.
SSO is just connecting an identity provider.
Billing is just plugging Stripe.
Monitoring is just adding dashboards.
Deployment is just Docker and Kubernetes.
AI is just calling OpenAI, Mistral, Anthropic or another provider.

Then the platform starts becoming real.

And every “simple” topic turns into an architectural system.

API Keys Are Not Just API Keys

At the beginning, an API key looks like this:

generate key
store hash
return secret once

But in a real SaaS environment, API keys quickly become:

scopes
expiration
revocation
tenant boundaries
audit logs
rate limits
privileged access
plan-based access
surface-level permissions

A key should not only answer:

Is this key valid?

It should answer:

Who owns this key?
Which tenant does it belong to?
Which resources can it access?
Which actions can it perform?
Which plan allows this action?
When does it expire?
Can it be revoked?
Can it be audited?

That is the moment you realize that API keys are part of your authorization model, not just your authentication model.

SSO Is Not Just Login

SSO looks simple until you deal with tenants.

Connecting Keycloak, Google, Microsoft, or any OIDC provider is not the hardest part.

The hard part is deciding what you trust.

Do you trust the email domain?

Do you trust the subject claim?

Do you trust groups?

Do you trust roles coming from the external IdP?

Can a tenant admin map roles?

Can a tenant admin accidentally create a platform admin?

What happens when a user belongs to multiple tenants?

What happens when the token is valid, but issued for another audience?

Real SSO architecture becomes:

issuer validation
audience validation
nonce validation
PKCE
role mapping
tenant membership
identity authority
fallback prevention
session isolation
external IdP configuration

In enterprise SaaS, login is only the entry point.

The real question is:

Can identity be trusted across users, tenants, projects and execution contexts?

AI Usage Is Not Just Calling a Model

Calling a model is easy.

Operating AI is not.

Once AI becomes part of a product, you need to think about:

token consumption
cost visibility
provider usage
model usage
rate limits
latency
retries
timeouts
fallbacks
tool calls
traceability
prompt governance
data boundaries

For a demo, a model response is enough.

For a business platform, you need to answer:

Which tenant used the model?
Which workflow triggered it?
Which user started the execution?
Which provider was used?
How many tokens were consumed?
How much did it cost?
Was the output reviewed?
Can the result be traced?
Can the process be repeated?

That is where AI stops being a feature and becomes an operational system.

Billing Is Not Just Stripe

Stripe can process payments.

But Stripe does not define your product model for you.

A serious SaaS needs to connect billing to:

plans
quotas
capabilities
feature gates
tenant limits
token limits
storage limits
execution limits
subscription status
license keys
deployment mode

If your product can be deployed as:

managed SaaS
customer cloud
on-prem
BYOC

then billing becomes more than payment.

It becomes commercial governance.

The system needs to understand:

What is the customer allowed to use?
Where is the product deployed?
Is the subscription active?
Is this an enterprise contract?
Is Stripe even involved?
Is there a license key?
What happens when quotas are exceeded?

This is where pricing, architecture and runtime enforcement meet.

Kubernetes Does Not Automatically Mean Scalable

Using Kubernetes does not automatically make a platform scalable.

A real execution platform needs to think about workloads.

Some jobs are lightweight.

Some jobs run AI calls.

Some jobs process files.

Some jobs generate documents.

Some jobs ingest knowledge.

Some jobs run long workflows.

That means you start separating:

queues
workers
lanes
timeouts
resource limits
probes
autoscaling
storage
network policies
observability

At some point, “deployment” becomes an execution architecture.

You need to know:

Which queue is saturated?
Which worker is failing?
Which jobs are delayed?
Which execution lane is overloaded?
Which process consumes memory?
Which tenant creates most load?

Without that visibility, scaling is mostly guessing.

Observability Is Not Optional

When automation becomes part of business operations, monitoring is not a technical bonus.

It is part of the product.

You need metrics for:

queue depth
execution success rate
execution failures
average duration
P95 / P99 latency
AI token usage
provider usage
storage usage
auth failures
webhook failures
backup status
SLA / SLO

For engineers, observability answers:

What is broken?

For leadership, observability answers:

Where is value created?
Where is time saved?
Where is cost increasing?
Which process is failing?
Which team is adopting the platform?

That is a different level of visibility.

Configuration Eventually Becomes a Product Surface

At first, environment variables are enough.

Then customers ask for different settings.

Different providers.

Different limits.

Different identity configurations.

Different storage.

Different security policies.

Different integrations.

Different deployment models.

And suddenly, redeploying for every change becomes unacceptable.

That is when configuration needs to move into an admin surface.

Not everything should be editable from the UI, of course.

But a serious platform needs to distinguish:

bootstrap configuration
runtime configuration
tenant configuration
secret-backed configuration
platform-managed configuration
customer-managed configuration

The more enterprise your product becomes, the more your back office becomes part of the product itself.

The Real SaaS + AI Correlation

The biggest lesson is that these systems cannot be designed in isolation.

They are connected.

Business model ↔ Plans
Plans ↔ Capabilities
Capabilities ↔ Roles
Roles ↔ Access control
Access control ↔ Security
Security ↔ Trust
AI usage ↔ Cost visibility
Workflows ↔ Measurable outcomes
Infrastructure ↔ Reliability
Observability ↔ Better decisions

If one part is weak, the entire platform becomes harder to operate.

A workflow engine without governance becomes risky.

AI without metering becomes expensive.

SSO without tenant isolation becomes dangerous.

Kubernetes without observability becomes blind.

Billing without runtime enforcement becomes cosmetic.

Admin features without backend enforcement become security theater.

The Real Questions

For CEOs, the question is not only:

Can we use AI?

It is:

Can we recover operational capacity, measure the impact and scale it safely?

For CTOs, the question is not only:

Can we build this?

It is:

Can we govern it, secure it, deploy it, monitor it and maintain it across real environments?

For Heads of AI, the question is not only:

Which model should we use?

It is:

How do we turn AI from isolated experiments into controlled business execution?

Final Thought

The hardest part of building AI SaaS is not the prompt.

It is not the first demo.

It is not the first integration.

The hard part is making identity, data, permissions, costs, infrastructure, workflows, observability and user experience move together.

That is where AI becomes enterprise-ready.

And that is where SaaS architecture becomes a serious discipline.

Top comments (1)

Lolo • Jun 30

This matches exactly what pushed me to build a gateway instead of wiring providers directly. The token consumption / cost visibility / provider usage section especially, once you have multiple providers, per-token billing becomes unmanageable to reason about. Went with credit-based pricing instead so cost visibility is just 'credits remaining,' no need to track per-provider token math. Solves a fraction of what you're describing, but that piece alone justified building it