DEV Community

Cover image for The Hidden Architecture Behind AI SaaS: Lessons From Building an Enterprise Automation Platform
tarik haddadi
tarik haddadi

Posted on

The Hidden Architecture Behind AI SaaS: Lessons From Building an Enterprise Automation Platform

Building an AI-powered SaaS platform taught me something I underestimated at the beginning:

The hard part is not calling an LLM.

The hard part is making AI work inside a real business environment.

At first, everything looks manageable.

You think:

  • API keys are just generated secrets.
  • SSO is just connecting an identity provider.
  • Billing is just plugging Stripe.
  • Monitoring is just adding dashboards.
  • Deployment is just Docker and Kubernetes.
  • AI is just calling OpenAI, Mistral, Anthropic or another provider.

Then the platform starts becoming real.

And every “simple” topic turns into an architectural system.

API Keys Are Not Just API Keys

At the beginning, an API key looks like this:

generate key
store hash
return secret once
Enter fullscreen mode Exit fullscreen mode

But in a real SaaS environment, API keys quickly become:

scopes
expiration
revocation
tenant boundaries
audit logs
rate limits
privileged access
plan-based access
surface-level permissions
Enter fullscreen mode Exit fullscreen mode

A key should not only answer:

Is this key valid?
Enter fullscreen mode Exit fullscreen mode

It should answer:

Who owns this key?
Which tenant does it belong to?
Which resources can it access?
Which actions can it perform?
Which plan allows this action?
When does it expire?
Can it be revoked?
Can it be audited?
Enter fullscreen mode Exit fullscreen mode

That is the moment you realize that API keys are part of your authorization model, not just your authentication model.

SSO Is Not Just Login

SSO looks simple until you deal with tenants.

Connecting Keycloak, Google, Microsoft, or any OIDC provider is not the hardest part.

The hard part is deciding what you trust.

Do you trust the email domain?

Do you trust the subject claim?

Do you trust groups?

Do you trust roles coming from the external IdP?

Can a tenant admin map roles?

Can a tenant admin accidentally create a platform admin?

What happens when a user belongs to multiple tenants?

What happens when the token is valid, but issued for another audience?

Real SSO architecture becomes:

issuer validation
audience validation
nonce validation
PKCE
role mapping
tenant membership
identity authority
fallback prevention
session isolation
external IdP configuration
Enter fullscreen mode Exit fullscreen mode

In enterprise SaaS, login is only the entry point.

The real question is:

Can identity be trusted across users, tenants, projects and execution contexts?
Enter fullscreen mode Exit fullscreen mode

AI Usage Is Not Just Calling a Model

Calling a model is easy.

Operating AI is not.

Once AI becomes part of a product, you need to think about:

token consumption
cost visibility
provider usage
model usage
rate limits
latency
retries
timeouts
fallbacks
tool calls
traceability
prompt governance
data boundaries
Enter fullscreen mode Exit fullscreen mode

For a demo, a model response is enough.

For a business platform, you need to answer:

Which tenant used the model?
Which workflow triggered it?
Which user started the execution?
Which provider was used?
How many tokens were consumed?
How much did it cost?
Was the output reviewed?
Can the result be traced?
Can the process be repeated?
Enter fullscreen mode Exit fullscreen mode

That is where AI stops being a feature and becomes an operational system.

Billing Is Not Just Stripe

Stripe can process payments.

But Stripe does not define your product model for you.

A serious SaaS needs to connect billing to:

plans
quotas
capabilities
feature gates
tenant limits
token limits
storage limits
execution limits
subscription status
license keys
deployment mode
Enter fullscreen mode Exit fullscreen mode

If your product can be deployed as:

managed SaaS
customer cloud
on-prem
BYOC
Enter fullscreen mode Exit fullscreen mode

then billing becomes more than payment.

It becomes commercial governance.

The system needs to understand:

What is the customer allowed to use?
Where is the product deployed?
Is the subscription active?
Is this an enterprise contract?
Is Stripe even involved?
Is there a license key?
What happens when quotas are exceeded?
Enter fullscreen mode Exit fullscreen mode

This is where pricing, architecture and runtime enforcement meet.

Kubernetes Does Not Automatically Mean Scalable

Using Kubernetes does not automatically make a platform scalable.

A real execution platform needs to think about workloads.

Some jobs are lightweight.

Some jobs run AI calls.

Some jobs process files.

Some jobs generate documents.

Some jobs ingest knowledge.

Some jobs run long workflows.

That means you start separating:

queues
workers
lanes
timeouts
resource limits
probes
autoscaling
storage
network policies
observability
Enter fullscreen mode Exit fullscreen mode

At some point, “deployment” becomes an execution architecture.

You need to know:

Which queue is saturated?
Which worker is failing?
Which jobs are delayed?
Which execution lane is overloaded?
Which process consumes memory?
Which tenant creates most load?
Enter fullscreen mode Exit fullscreen mode

Without that visibility, scaling is mostly guessing.

Observability Is Not Optional

When automation becomes part of business operations, monitoring is not a technical bonus.

It is part of the product.

You need metrics for:

queue depth
execution success rate
execution failures
average duration
P95 / P99 latency
AI token usage
provider usage
storage usage
auth failures
webhook failures
backup status
SLA / SLO
Enter fullscreen mode Exit fullscreen mode

For engineers, observability answers:

What is broken?
Enter fullscreen mode Exit fullscreen mode

For leadership, observability answers:

Where is value created?
Where is time saved?
Where is cost increasing?
Which process is failing?
Which team is adopting the platform?
Enter fullscreen mode Exit fullscreen mode

That is a different level of visibility.

Configuration Eventually Becomes a Product Surface

At first, environment variables are enough.

Then customers ask for different settings.

Different providers.

Different limits.

Different identity configurations.

Different storage.

Different security policies.

Different integrations.

Different deployment models.

And suddenly, redeploying for every change becomes unacceptable.

That is when configuration needs to move into an admin surface.

Not everything should be editable from the UI, of course.

But a serious platform needs to distinguish:

bootstrap configuration
runtime configuration
tenant configuration
secret-backed configuration
platform-managed configuration
customer-managed configuration
Enter fullscreen mode Exit fullscreen mode

The more enterprise your product becomes, the more your back office becomes part of the product itself.

The Real SaaS + AI Correlation

The biggest lesson is that these systems cannot be designed in isolation.

They are connected.

Business model ↔ Plans
Plans ↔ Capabilities
Capabilities ↔ Roles
Roles ↔ Access control
Access control ↔ Security
Security ↔ Trust
AI usage ↔ Cost visibility
Workflows ↔ Measurable outcomes
Infrastructure ↔ Reliability
Observability ↔ Better decisions
Enter fullscreen mode Exit fullscreen mode

If one part is weak, the entire platform becomes harder to operate.

A workflow engine without governance becomes risky.

AI without metering becomes expensive.

SSO without tenant isolation becomes dangerous.

Kubernetes without observability becomes blind.

Billing without runtime enforcement becomes cosmetic.

Admin features without backend enforcement become security theater.

The Real Questions

For CEOs, the question is not only:

Can we use AI?
Enter fullscreen mode Exit fullscreen mode

It is:

Can we recover operational capacity, measure the impact and scale it safely?
Enter fullscreen mode Exit fullscreen mode

For CTOs, the question is not only:

Can we build this?
Enter fullscreen mode Exit fullscreen mode

It is:

Can we govern it, secure it, deploy it, monitor it and maintain it across real environments?
Enter fullscreen mode Exit fullscreen mode

For Heads of AI, the question is not only:

Which model should we use?
Enter fullscreen mode Exit fullscreen mode

It is:

How do we turn AI from isolated experiments into controlled business execution?
Enter fullscreen mode Exit fullscreen mode

Final Thought

The hardest part of building AI SaaS is not the prompt.

It is not the first demo.

It is not the first integration.

The hard part is making identity, data, permissions, costs, infrastructure, workflows, observability and user experience move together.

That is where AI becomes enterprise-ready.

And that is where SaaS architecture becomes a serious discipline.

Top comments (0)