DEV Community

Cover image for The Full Architecture Behind Autowired.ai. Multi-Tenant AI SaaS on AWS Serverless
Yoganand Govind
Yoganand Govind

Posted on

The Full Architecture Behind Autowired.ai. Multi-Tenant AI SaaS on AWS Serverless

Over the last four weeks, I've written about individual pieces of Autowired.ai's architecture: the event-driven document pipeline, Bedrock cost optimisation, and the DynamoDB single-table design. This post is the whole picture — how everything fits together, why the system is organised the way it is, and the specific tradeoffs behind each major decision.

Autowired.ai is a document extraction SaaS. Users define extraction schemas, submit document batches, and receive structured data back. Multiple tenants share the same infrastructure. AI inference and OCR are the dominant cost drivers. Batches can contain hundreds of documents and take minutes to process. Every design decision flows from those constraints.

How the Codebase Is Organised

The repo is a Turborepo monorepo — two Next.js apps (marketing site, product) and a shared packages layer for API handlers, database access, and tests. Infrastructure is defined entirely in AWS CDK TypeScript, split into six purpose-separated stacks:

repo structure

Six stacks instead of one is an operational choice, not an aesthetic one. When I need to update the processing pipeline, CloudFormation shouldn't be touching the database or API Gateway. Stack separation gives independent deployment units with isolated change sets. The cost is cross-stack dependency management — more on that shortly.

The Request Flow

Two flows make up the system. The submission path and the processing path are deliberately decoupled.

Submission — synchronous, returns immediately:

flow

The API handler never touches the processing pipeline. Its job is to accept the request, write the initial records, and return. Done.

Processing — fully async, triggered by S3:

flow

The S3 trigger instead of a direct Step Functions invocation from the API is intentional. If Step Functions has a transient issue at upload time, the S3 event queues and retries automatically. The API already returned 202 — the client doesn't know or care.

The Architecture Diagram

Architecture

Six Stacks — The Decision That Paid Off Most

The CDK stack separation looks like over-engineering until you first need to deploy a pipeline change without touching the API layer, or roll back a processing change without affecting the database.

The stacks and what they own:

  • DatabaseStack: DynamoDB single-table, GSI definitions, PITR, TTL
  • StorageStack: S3 buckets, lifecycle rules, S3 event notifications
  • ProcessingStack: Step Functions state machine, DocumentProcessorLambda, SQS queues + DLQs, ScheduledBatchLambda
  • BedrockStack: Bedrock Guardrails (topic policies, content filters)
  • APIStack: API Gateway, all API Lambda handlers, Clerk JWT auth
  • MonitoringStack: CloudWatch dashboards, alarms on DLQ depth, Lambda errors, state machine failures

The tricky part: StorageStack needs the Step Functions ARN from ProcessingStack to wire S3 event notifications, but ProcessingStack needs the S3 bucket from StorageStack — circular dependency.

The solution: compute the state machine ARN deterministically from the naming convention rather than importing it as a CDK cross-stack export:

const stateMachineArn = arn:aws:states:${region}:${account}:stateMachine:autowire-batch-processing-${stage};

It's a deliberate tradeoff — the naming convention becomes load-bearing infrastructure. Any rename of the state machine requires updating both stacks. I documented this explicitly in the CDK code. It's the right tradeoff for avoiding circular dependency, but it has to be owned, not left implicit.

Tenant Isolation: Structural, Not Advisory

Everything lives in a single DynamoDB table. Tenant isolation is enforced at the key structure level — not the application layer.

keys

tenantId is in every partition key. A DynamoDB query physically cannot return another tenant's data without the correct tenantId. There's no filter to forget, no middleware layer to misconfigure. I covered this in detail in Week 4.

Three GSIs handle the access patterns that the primary key can't:

  • GSI1: User lookup by email + workflow listing sorted by date
  • GSI2: Status-based filtering for the processing pipeline (sparse index — only active statuses write GSI attributes)
  • GSI3: Direct batch lookup by batchId alone — decouples Step Functions from the main table key structure

The Document Processor: Three Stages

The DocumentProcessorLambda inside the Step Functions Map state does three things in sequence:

  1. Textract extracts structured fields from the document. For standard invoices and forms, it reliably gets 70–80% of target fields with high confidence.
  2. Bedrock gap-fill extracts the fields that Textract couldn't get. Only the OCR sections for missing fields are sent to Bedrock — not the full document.
  3. Bedrock verify validates the combined output (Textract results + gap-fill), assigns final confidence scores, and flags fields for review.

This architecture is why Bedrock costs came in significantly lower than a naive "send everything to the LLM" approach. I broke down how I got to ~40% cost reduction in Week 3.

The Lambda configuration matters here:

  • ARM64 (Graviton2): ~20% cheaper than x86 per GB-second. I/O-bound workloads like this see comparable or better latency.
  • 1GB RAM: Lambda CPU allocation scales with memory. More memory means faster JSON parsing of large Textract responses — the bottleneck isn't memory, it's CPU.
  • 5-minute timeout: Textract on multi-page PDFs + two Bedrock calls can approach 30 seconds per document on a cold path. 5 minutes gives real headroom.
  • X-Ray tracing: Non-negotiable. When a batch runs 3x slower than expected, distributed traces show exactly where the time went.

Concurrency: maxConcurrency: 10 Is a Contract, Not a Guess

The Map state processes up to 10 documents simultaneously. This number isn't a default I left in place — it's a deliberate decision based on AWS service quota limits for Textract and Bedrock.

Running 50 concurrent document processors would exhaust per-account Bedrock concurrency limits, trigger 429 throttling, and ironically make the batch slower through retry backoff compounding. 10 concurrent workers stays below quota limits while providing meaningful throughput on large batches.

At 10 concurrent workers and ~15 seconds per document (Textract + two Bedrock calls), a 100-document batch takes ~150 seconds. A 500-document batch takes ~750 seconds — just over 12 minutes. That's completely acceptable for a background batch processing job.

When I raise Textract and Bedrock quotas, I raise this number. It's a named constant in the CDK definition, not buried in application code.

Failure Handling First

Every failure mode in the pipeline was designed before the happy path:

Per-document failures: addCatch routes to MarkDocumentFailed. One corrupted PDF writes an error to DynamoDB, and the Map state moves to the next document. The batch completes with mixed SUCCEEDED/FAILED document statuses.

Webhook failures: Isolated in a separate SQS queue. maxReceiveCount: 5 (more than the document queue's 3 — external endpoints are flakier than internal Lambdas). batchSize: 1 on the consumer, so each webhook delivery retries independently.

S3 at-least-once delivery: S3IngestionLambda uses attribute_not_exists(PK) on every DynamoDB write. Second delivery of the same S3 event fails silently on the condition check.

Scheduled batch triggers: EventBridge retryAttempts: 2. Transient Lambda cold start won't silently skip a scheduled run.

State machine timeout: 24 hours. A stalled execution terminates rather than running indefinitely.

Both DLQs retain messages for 14 days. A CloudWatch alarm on DLQ depth > 0 fires immediately when a message dead-letters — that's the signal to investigate.

Cost Decisions Baked Into the Architecture

  • PAY_PER_REQUEST DynamoDB: Batch submissions are bursty. Provisioned capacity would mean paying for idle throughput continuously.
  • ARM64 Lambdas everywhere: No reason to default to x86 for Node.js Lambda workloads. The cost difference is ~20% per GB-second across every invocation.
  • S3 lifecycle rules: Batch documents transition to Glacier after 3 days, expire after 6 months. Temp processing artifacts delete after 24 hours. Without this, S3 Standard storage accumulates indefinitely.
  • Sparse GSI2: Only documents in active processing statuses write GSI attributes. Completed documents (the majority at any given time) don't appear in the index. Keeps GSI storage and write amplification costs low.
  • Textract-first extraction: Bedrock is only invoked for what Textract can't do — gap-fill and verification. Dramatically lower token consumption compared to sending full OCR to a foundation model for every field.

The Things I'd Do Differently

RETAIN on DynamoDB everywhere from day one. The table uses the RETAIN removal policy, so test environment teardowns can't accidentally delete data. I added this after a near-miss during early development. Should have been the default.

Define the state machine execution naming convention explicitly in documentation. The ARN determinism trick that avoids the circular dependency is non-obvious. Anyone reading the CDK code later will wonder why the ARN is hardcoded. Leave a comment that explains the tradeoff, not just the implementation.

Instrument DynamoDB access patterns from the first week. I added CloudWatch metrics on the query patterns partway through development. Earlier would have caught a GSI design issue sooner.

Wrapping Up

The architecture isn't novel — it uses Step Functions, DynamoDB, SQS, S3, and Bedrock in fairly standard ways. What makes it work in production is the intentionality: why 10 concurrent workers, why 3 retries for documents but 5 for webhooks, why ARM64 at 1GB RAM, why the state machine ARN is computed rather than exported, why GSI3 exists.

If you've been following this series, you've seen each piece in isolation. This is how they fit:

Week 1: Why Step Functions, EventBridge, and SQS — all three in the same system
Week 2: The S3-triggered async pipeline and failure isolation
Week 3: How I reduced Bedrock costs by 40%
Week 4: DynamoDB single-table design and the three GSIs

Next week: why I chose AWS CDK over Terraform for this stack

Top comments (0)