DEV Community: Yoganand Govind

The Full Architecture Behind Autowired.ai. Multi-Tenant AI SaaS on AWS Serverless

Yoganand Govind — Sun, 05 Jul 2026 11:52:48 +0000

Over the last four weeks, I've written about individual pieces of Autowired.ai's architecture: the event-driven document pipeline, Bedrock cost optimisation, and the DynamoDB single-table design. This post is the whole picture — how everything fits together, why the system is organised the way it is, and the specific tradeoffs behind each major decision.

Autowired.ai is a document extraction SaaS. Users define extraction schemas, submit document batches, and receive structured data back. Multiple tenants share the same infrastructure. AI inference and OCR are the dominant cost drivers. Batches can contain hundreds of documents and take minutes to process. Every design decision flows from those constraints.

How the Codebase Is Organised

The repo is a Turborepo monorepo — two Next.js apps (marketing site, product) and a shared packages layer for API handlers, database access, and tests. Infrastructure is defined entirely in AWS CDK TypeScript, split into six purpose-separated stacks:

Six stacks instead of one is an operational choice, not an aesthetic one. When I need to update the processing pipeline, CloudFormation shouldn't be touching the database or API Gateway. Stack separation gives independent deployment units with isolated change sets. The cost is cross-stack dependency management — more on that shortly.

The Request Flow

Two flows make up the system. The submission path and the processing path are deliberately decoupled.

Submission — synchronous, returns immediately:

The API handler never touches the processing pipeline. Its job is to accept the request, write the initial records, and return. Done.

Processing — fully async, triggered by S3:

The S3 trigger instead of a direct Step Functions invocation from the API is intentional. If Step Functions has a transient issue at upload time, the S3 event queues and retries automatically. The API already returned 202 — the client doesn't know or care.

The Architecture Diagram

Six Stacks — The Decision That Paid Off Most

The CDK stack separation looks like over-engineering until you first need to deploy a pipeline change without touching the API layer, or roll back a processing change without affecting the database.

The stacks and what they own:

DatabaseStack: DynamoDB single-table, GSI definitions, PITR, TTL
StorageStack: S3 buckets, lifecycle rules, S3 event notifications
ProcessingStack: Step Functions state machine, DocumentProcessorLambda, SQS queues + DLQs, ScheduledBatchLambda
BedrockStack: Bedrock Guardrails (topic policies, content filters)
APIStack: API Gateway, all API Lambda handlers, Clerk JWT auth
MonitoringStack: CloudWatch dashboards, alarms on DLQ depth, Lambda errors, state machine failures

The tricky part: StorageStack needs the Step Functions ARN from ProcessingStack to wire S3 event notifications, but ProcessingStack needs the S3 bucket from StorageStack — circular dependency.

The solution: compute the state machine ARN deterministically from the naming convention rather than importing it as a CDK cross-stack export:

const stateMachineArn = arn:aws:states:${region}:${account}:stateMachine:autowire-batch-processing-${stage};

It's a deliberate tradeoff — the naming convention becomes load-bearing infrastructure. Any rename of the state machine requires updating both stacks. I documented this explicitly in the CDK code. It's the right tradeoff for avoiding circular dependency, but it has to be owned, not left implicit.

Tenant Isolation: Structural, Not Advisory

Everything lives in a single DynamoDB table. Tenant isolation is enforced at the key structure level — not the application layer.

tenantId is in every partition key. A DynamoDB query physically cannot return another tenant's data without the correct tenantId. There's no filter to forget, no middleware layer to misconfigure. I covered this in detail in Week 4.

Three GSIs handle the access patterns that the primary key can't:

GSI1: User lookup by email + workflow listing sorted by date
GSI2: Status-based filtering for the processing pipeline (sparse index — only active statuses write GSI attributes)
GSI3: Direct batch lookup by batchId alone — decouples Step Functions from the main table key structure

The Document Processor: Three Stages

The DocumentProcessorLambda inside the Step Functions Map state does three things in sequence:

Textract extracts structured fields from the document. For standard invoices and forms, it reliably gets 70–80% of target fields with high confidence.
Bedrock gap-fill extracts the fields that Textract couldn't get. Only the OCR sections for missing fields are sent to Bedrock — not the full document.
Bedrock verify validates the combined output (Textract results + gap-fill), assigns final confidence scores, and flags fields for review.

This architecture is why Bedrock costs came in significantly lower than a naive "send everything to the LLM" approach. I broke down how I got to ~40% cost reduction in Week 3.

The Lambda configuration matters here:

ARM64 (Graviton2): ~20% cheaper than x86 per GB-second. I/O-bound workloads like this see comparable or better latency.
1GB RAM: Lambda CPU allocation scales with memory. More memory means faster JSON parsing of large Textract responses — the bottleneck isn't memory, it's CPU.
5-minute timeout: Textract on multi-page PDFs + two Bedrock calls can approach 30 seconds per document on a cold path. 5 minutes gives real headroom.
X-Ray tracing: Non-negotiable. When a batch runs 3x slower than expected, distributed traces show exactly where the time went.

Concurrency: maxConcurrency: 10 Is a Contract, Not a Guess

The Map state processes up to 10 documents simultaneously. This number isn't a default I left in place — it's a deliberate decision based on AWS service quota limits for Textract and Bedrock.

Running 50 concurrent document processors would exhaust per-account Bedrock concurrency limits, trigger 429 throttling, and ironically make the batch slower through retry backoff compounding. 10 concurrent workers stays below quota limits while providing meaningful throughput on large batches.

At 10 concurrent workers and ~15 seconds per document (Textract + two Bedrock calls), a 100-document batch takes ~150 seconds. A 500-document batch takes ~750 seconds — just over 12 minutes. That's completely acceptable for a background batch processing job.

When I raise Textract and Bedrock quotas, I raise this number. It's a named constant in the CDK definition, not buried in application code.

Failure Handling First

Every failure mode in the pipeline was designed before the happy path:

Per-document failures: addCatch routes to MarkDocumentFailed. One corrupted PDF writes an error to DynamoDB, and the Map state moves to the next document. The batch completes with mixed SUCCEEDED/FAILED document statuses.

Webhook failures: Isolated in a separate SQS queue. maxReceiveCount: 5 (more than the document queue's 3 — external endpoints are flakier than internal Lambdas). batchSize: 1 on the consumer, so each webhook delivery retries independently.

S3 at-least-once delivery: S3IngestionLambda uses attribute_not_exists(PK) on every DynamoDB write. Second delivery of the same S3 event fails silently on the condition check.

Scheduled batch triggers: EventBridge retryAttempts: 2. Transient Lambda cold start won't silently skip a scheduled run.

State machine timeout: 24 hours. A stalled execution terminates rather than running indefinitely.

Both DLQs retain messages for 14 days. A CloudWatch alarm on DLQ depth > 0 fires immediately when a message dead-letters — that's the signal to investigate.

Cost Decisions Baked Into the Architecture

PAY_PER_REQUEST DynamoDB: Batch submissions are bursty. Provisioned capacity would mean paying for idle throughput continuously.
ARM64 Lambdas everywhere: No reason to default to x86 for Node.js Lambda workloads. The cost difference is ~20% per GB-second across every invocation.
S3 lifecycle rules: Batch documents transition to Glacier after 3 days, expire after 6 months. Temp processing artifacts delete after 24 hours. Without this, S3 Standard storage accumulates indefinitely.
Sparse GSI2: Only documents in active processing statuses write GSI attributes. Completed documents (the majority at any given time) don't appear in the index. Keeps GSI storage and write amplification costs low.
Textract-first extraction: Bedrock is only invoked for what Textract can't do — gap-fill and verification. Dramatically lower token consumption compared to sending full OCR to a foundation model for every field.

The Things I'd Do Differently

RETAIN on DynamoDB everywhere from day one. The table uses the RETAIN removal policy, so test environment teardowns can't accidentally delete data. I added this after a near-miss during early development. Should have been the default.

Define the state machine execution naming convention explicitly in documentation. The ARN determinism trick that avoids the circular dependency is non-obvious. Anyone reading the CDK code later will wonder why the ARN is hardcoded. Leave a comment that explains the tradeoff, not just the implementation.

Instrument DynamoDB access patterns from the first week. I added CloudWatch metrics on the query patterns partway through development. Earlier would have caught a GSI design issue sooner.

Wrapping Up

The architecture isn't novel — it uses Step Functions, DynamoDB, SQS, S3, and Bedrock in fairly standard ways. What makes it work in production is the intentionality: why 10 concurrent workers, why 3 retries for documents but 5 for webhooks, why ARM64 at 1GB RAM, why the state machine ARN is computed rather than exported, why GSI3 exists.

If you've been following this series, you've seen each piece in isolation. This is how they fit:

Week 1: Why Step Functions, EventBridge, and SQS — all three in the same system
Week 2: The S3-triggered async pipeline and failure isolation
Week 3: How I reduced Bedrock costs by 40%
Week 4: DynamoDB single-table design and the three GSIs

Next week: why I chose AWS CDK over Terraform for this stack

DynamoDB Single-Table Design for Multi-Tenant SaaS

Yoganand Govind — Sun, 21 Jun 2026 21:38:04 +0000

The DynamoDB Table Behind Autowired.ai — Single-Table Design for Multi-Tenant SaaS

DynamoDB single-table design has a reputation for being hard to learn and easy to get wrong. That reputation is deserved.

The AWS documentation makes it sound mechanical: define your access patterns, map them to partition keys and sort keys, add GSIs where needed, and done. What it doesn't tell you is that the decisions you make before your table has a single item are effectively permanent. Changing a partition key structure in a table with production data is a migration with backfill, dual-write phases, and cutover risk. You don't iterate on the DynamoDB schema the way you iterate on application code.

This post is the data model behind Autowired.ai – the specific key patterns, the three GSIs and why each one exists, and the decisions I'd make differently if I started over.

The Domain and Why Single-Table Made Sense

The entity hierarchy for Autowired.ai:

Tenant
  └── Project
       └── Workflow (extraction schema + confidence thresholds)
            └── Batch (submitted document set)
                 └── Document (file + extraction result)

User (linked to Tenant)

Most operations are tenant-scoped. The access patterns are predictable and unlikely to change fundamentally — list projects for a tenant, list workflows for a project sorted by date, get a batch by ID, list documents for a batch, and filter by processing status.

A single-table design made sense here for the same reason it makes sense for most serverless SaaS at this scale: one table, one set of CloudWatch metrics, one backup configuration, one IAM policy. PAY_PER_REQUEST billing means you're not pre-provisioning capacity across multiple tables and paying for idle throughput on the ones that don't get hit often.

The tradeoff is real, though: no ad hoc queries, no schema evolution without migration, and a data model that only makes sense if you document it.

Primary Key Design: Tenant Isolation by Structure

The PK/SK pattern follows the entity hierarchy directly:

Entity       PK                                    SK
──────────────────────────────────────────────────────────────────────
Tenant       TENANT#<tenantId>                     METADATA
User         USER#<userId>                         PROFILE
User↔Tenant  TENANT#<tenantId>                     USER#<userId>
Project      TENANT#<tenantId>                     PROJECT#<projectId>
Workflow     TENANT#<tenantId>#PROJECT#<projectId> WORKFLOW#<workflowId>
Batch        TENANT#<tenantId>#PROJECT#<projectId> BATCH#<batchId>
Document     TENANT#<tenantId>#BATCH#<batchId>     DOCUMENT#<documentId>

The compound PK (TENANT##PROJECT#) is the decision that matters most here.

Tenant isolation is structural, not advisory. Because tenantId is embedded in every PK, a DynamoDB query physically cannot return results across tenant boundaries. There's no middleware to configure, no application-layer filter to apply consistently, and no trust that every code path remembered to add the right FilterExpression. The key structure enforces it.

An API handler listing workflows must supply both tenantId (from the authenticated session context via Clerk) and projectId (from the request path). The query:

const result = await dynamodb.query({
  TableName: TABLE_NAME,
  KeyConditionExpression: "PK = :pk AND begins_with(SK, :prefix)",
  ExpressionAttributeValues: {
    ":pk": `TENANT#${tenantId}#PROJECT#${projectId}`,
    ":prefix": "WORKFLOW#",
  },
});

Returns only that tenant-project's workflows. There's no way to accidentally return another tenant's data without constructing the wrong PK — which requires the wrong tenantId to be in the session, not a missing filter clause.

Each workflow also stores confidenceThreshold and reviewThreshold — the two values that control document status transitions in the processing pipeline. A document whose extraction confidence falls below confidenceThreshold is FAILED. One that falls between reviewThreshold and confidenceThreshold is REVIEW_REQUIRED. These thresholds are per workflow, stored in the workflow item, and passed through to Step Functions at batch execution time.

Three GSIs, Three Distinct Problems

The main table's PK structure handles hierarchical queries well. For everything else, there are GSIs. Each one exists for a specific reason.

GSI1 — Email Lookup + Date-Sorted Workflow Listing

GSI1 solves two access patterns that the primary key can't:

User lookup by email. Clerk manages auth, but there are operations — invitations, notifications, and admin lookups — where you need to find a user record by email, not by Clerk's userId. The PK has users keyed by userId. GSI1 maps email to GSI1PK and USER# _to _GSI1SK, giving O(1) lookup by email.

Workflows sorted by last updated date. The frontend shows workflows in reverse chronological order. The PK query can return all workflows for a project but can't sort them. GSI1 maps TENANT##PROJECT# to GSI1PK and an ISO 8601 timestamp to GSI1SK, enabling:

const result = await dynamodb.query({
  TableName: TABLE_NAME,
  IndexName: "GSI1",
  KeyConditionExpression: "GSI1PK = :pk",
  ExpressionAttributeValues: {
    ":pk": `TENANT#${tenantId}#PROJECT#${projectId}`,
  },
  ScanIndexForward: false, // most recently updated first
});

One GSI serving two patterns works here because the two item types use different GSI1PK formats — there's no collision. The downside: it's not obvious why GSI1 exists when you read the CDK definition without comments. Document the patterns each GSI serves, not just the attribute names.

GSI2 — Status-Based Filtering (Processing Pipeline)

GSI2 exists entirely for the processing pipeline — the Step Functions state machine and Lambdas that update document status.

The pipeline needs to ask, "What documents in this batch are in PROCESSING status?" and "What batches for this tenant are FAILED?" Neither query is possible from the main table's PK structure.

GSI2 composite keys:

For documents:  GSI2PK = TENANT#<tenantId>#BATCH#<batchId>#STATUS#<status>
For batches:    GSI2PK = TENANT#<tenantId>#STATUS#<status>
                GSI2SK = <ISO 8601 timestamp>

REVIEW_REQUIRED is a first-class status here — not a variant of FAILED. It represents documents where the Bedrock extraction completed successfully but the combined confidence score fell below the reviewThreshold. These need human review, not reprocessing. Conflating them with FAILED would make it impossible to route them correctly.

The sparse index pattern: Only documents in interesting statuses (PROCESSING, FAILED, REVIEW_REQUIRED) write GSI2 attributes. Completed items have their GSI2 attributes removed after a retention window. Items that don't write GSI2 attributes simply don't appear in the index.

This keeps GSI2 lean. In a system processing thousands of documents per day, the vast majority will be in terminal SUCCEEDED status. If every document item included GSI2 attributes, the index would be orders of magnitude larger than necessary. Sparse index design means the GSI contains only the items you'd actually query.

GSI3 — Direct Batch Lookup by batchId

GSI3 exists to solve a coupling problem.

The Step Functions state machine receives a batchId in its execution input. That's it. It doesn't receive tenantId or projectId — which means it can't construct the primary key TENANT##PROJECT# needed to query the main table.

Without GSI3, the state machine would need to carry the full PK context in every execution payload, coupling the Step Functions definition to the DynamoDB key structure. Any change to the key structure would require updating the state machine input format.

GSI3 maps batchId to GSI3PK with a fixed GSI3SK = "BATCH" for point lookups:

const result = await dynamodb.query({
  TableName: TABLE_NAME,
  IndexName: "GSI3",
  KeyConditionExpression: "GSI3PK = :batchId AND GSI3SK = :sk",
  ExpressionAttributeValues: {
    ":batchId": batchId,
    ":sk": "BATCH",
  },
});

The state machine calls this lookup once at initialization, gets the full batch record (including tenantId, projectId, workflowId, thresholds), and then uses the main table PK for all subsequent operations.

This pattern — a GSI that gives a downstream consumer its native lookup identifier rather than forcing it to understand the data model — comes up repeatedly in event-driven architectures. The processing pipeline is a consumer of DynamoDB, not an owner of the key structure.

The Diagram

Operational Patterns Worth Mentioning

Extraction results in DynamoDB. Each Document item holds the full extraction output — field values, per-field confidence scores, and the Bedrock verification result. DynamoDB's 400KB item limit covers most extraction results. For documents with many fields or long extracted values that approach the limit, the payload goes to S3 and the Document item holds the reference key. DynamoDB remains the authoritative record for status and metadata; S3 handles the overflow.

Conditional writes for idempotency. S3 event notifications are at-least-once delivery. The S3IngestionLambda uses attribute_not_exists(PK) on every PutItem — if the document record already exists, the write fails silently and the Lambda exits cleanly. No duplicate records, no reprocessing.

await dynamodb.putItem({
  TableName: TABLE_NAME,
  Item: documentRecord,
  ConditionExpression: "attribute_not_exists(PK)",
});

TransactWriteItems for batch initialisation. Creating a batch involves writing the batch record, writing all document records, and updating the project's batch count. TransactWriteItems ensures all writes succeed or none do. A partial write – a batch record with no documents or documents with no parent batch – would leave the pipeline in an inconsistent state that's hard to recover from.

PITR is non-optional. Point-in-time recovery is enabled. DynamoDB holds the authoritative record for extraction results, batch status, workflow configuration, and tenant data. A bug in the processing pipeline corrupting document records is recoverable within 35 days with PITR. Without it, it's not recoverable at all.

TTL for ephemeral data. Temporary processing state and short-lived tokens use the ttl attribute. DynamoDB deletes expired items within ~48 hours — eventually consistent, not guaranteed exact. For compliance use cases requiring exact deletion on a schedule, a Lambda on a cron or explicit deletes are more appropriate than TTL alone.

Lessons Learned

Define all access patterns before touching a key. Not 80% — all of them. Access patterns you discover after the fact require GSIs, which have write amplification costs. Access patterns you can't fit into any GSI require scans.

Encode tenant ownership in the PK. Tenant isolation via key structure is more reliable than any application-layer enforcement. The query physically cannot cross tenant boundaries if tenantId is in every partition key.

Sparse indexes are a feature. Only write GSI attributes on items that need to appear in that index. For a processing pipeline where most items are in terminal SUCCEEDED status, keeping active-status items in a sparse index is a significant operational win.

The DynamoDB schema is a public API contract. Treat the first design with the same care you'd give a public API. You won't get to iterate on it cheaply.

RAG Architecture for SaaS Applications Using Amazon Bedrock

Yoganand Govind — Sun, 14 Jun 2026 17:53:28 +0000

My first real production batch on Autowired.ai cost 3x what I'd budgeted.

200 documents. One real customer test. And an AWS bill that showed me exactly how wrong my mental model was.

The initial architecture felt reasonable: Textract handles OCR, Bedrock extracts all the fields from the OCR output. Simple. Straightforward. And expensive at scale because I was sending every document's full OCR text to a frontier model for every field on every page.

This post is the story of how I got to ~40% cost reduction. The biggest win wasn't a configuration change. It was rethinking what Bedrock should actually be doing.

What I Was Actually Paying For

Before touching anything, I instrumented every Bedrock call to log input tokens, output tokens, model ID, cache hit/miss, and latency. One week of real data:

System prompt tokens — ~35% of every input. The verification schema, field definitions, and output format instructions are completely static and present in full on every single call.
Full OCR context — I was sending the complete Textract response to Bedrock. For an invoice targeting 10 specific fields, maybe 30% of that OCR content was actually relevant. I was paying for the other 70%.
Model tier mismatch — Claude Sonnet for everything, including structured form extraction where fixed-field invoices have consistent, predictable layouts. Sonnet is ~5x Haiku pricing.
No result caching — Documents from the same vendor, same template, same layout — fresh Bedrock call every time.

Instrument first. Always. You can't optimise what you haven't measured.

The Change That Moved the Needle Most

The original flow was: Textract OCR → send full OCR to Bedrock → Bedrock extracts all fields.

The problem with this: Textract is actually very good at extracting structured fields from well-formatted documents—dates, totals, invoice numbers, and line items from tables. It's purpose-built for this. I was using a foundation model to re-derive values that a specialised OCR service had already extracted correctly.

The new architecture has three stages:

Instead of sending a full 3-page invoice OCR to Bedrock for all 10 fields, you're sending the following:

Targeted OCR sections for the 2–3 fields Textract couldn't get (gap-fill call)
The combined extraction result for validation (verification call)

For structured documents like standard invoices, Textract reliably gets 70–80% of target fields. Bedrock only handles the hard cases and confirms the final output. Token count drops significantly.

For complex unstructured documents — contracts, freeform text — Textract's confidence is lower across more fields, so Bedrock handles more. The architecture adapts naturally based on Textract's per-field confidence scores.

This single shift was the largest cost driver reduction of everything I did.

Prompt Caching: The Fastest Configuration Win

Both the gap-fill and verification system prompts are static per workflow type. The field schema, output format, confidence rules — none of it changes between documents in a batch.

Amazon Bedrock supports prompt caching on certain models. Marking the system prompt block as cacheable means the first call in a parallel batch wave pays the cache write cost (~25% premium on that portion), and every subsequent call in the same 5-minute window hits the cache at ~10% of the normal input token price.

const response = await bedrockClient.send(
  new InvokeModelCommand({
    modelId: "anthropic.claude-3-5-sonnet-20241022-v2:0",
    body: JSON.stringify({
      system: [
        {
          type: "text",
          text: systemPrompt,
          cache_control: { type: "ephemeral" },
        },
      ],
      messages: [
        {
          role: "user",
          content: [{ type: "text", text: documentContext }],
        },
      ],
      max_tokens: 1024,
    }),
  })
);

For a 10-document parallel batch wave, the system prompt is computed once. Nine of the ten calls were read from cache. At 1,100-token system prompts across hundreds of documents per batch, this adds up.

Measured impact: ~20% reduction in input token costs for standard batch workloads.

Sending the Right Context, Not All the Context

For the gap-fill call, you don't need the full Textract output — just the OCR blocks covering the fields Textract couldn't extract. Textract returns confidence scores per field and per block, so you can filter:

function filterOcrForMissingFields(
  textractResult: TextractOutput,
  missingFields: string[]
): string {
  const relevantBlocks = textractResult.blocks
    .filter(block => isBlockRelevantToFields(block, missingFields))
    .map(block => block.text)
    .join("\n");

  return relevantBlocks;
}

For the verification call, you don't need the raw OCR at all — you need the combined extraction result (Textract values + gap-fill values) structured as a validation input. Sending the full OCR here is wasted tokens.

I also audited both system prompts. After removing redundant formatting instructions already encoded in the output schema, verbose field descriptions that could be tightened, and defensive edge-case handling that never triggered, each went from ~2,400 tokens to ~1,100 tokens. Zero accuracy impact.

Measured impact: ~15% reduction in per-invocation input token count.

Testing the Smaller Model — Per Task Type

Not per system. Per task type.

The gap-fill task and the verification task have different complexity profiles. Gap-fill on structured forms (filling in 2–3 missing fields from a standard invoice) is a simpler task than gap-fill on an unstructured contract. Verification — validating already-extracted values is generally easier than deriving them from raw text.

I tested Haiku vs. Sonnet on 50 representative documents per workflow type:

Structured form gap-fill + verification: Haiku within 2% accuracy of Sonnet. Switched to Haiku. ~5x cost reduction on these workflows.
Unstructured document gap-fill: Haiku 8–12% less accurate than Sonnet. Kept Sonnet. The quality gap mattered.
Verification on structured forms: Haiku performed well — validating extracted values is easier than extracting them. Switched.

Model selection is now a per-workflow config, not a system-wide setting:

function selectModelForTask(workflow: Workflow, task: "gap-fill" | "verify"): string {
  if (workflow.complexityTier === "structured") {
    return "anthropic.claude-3-haiku-20240307-v1:0";
  }
  return "anthropic.claude-3-5-sonnet-20241022-v2:0";
}

Measured impact: ~15% overall cost reduction, concentrated in high-volume structured extraction workflows.

Application-Layer Result Caching

Beyond Bedrock's 5-minute prompt cache, I added document-level result caching at the application layer.

Cache key:

Extraction schema ID + version hash
Hash of normalised Textract output (whitespace-normalised)
Confidence threshold settings

Cache hit → return the stored extraction result. No gap-fill call, no verification call.

In production, cache hit rates for first-run batches are low. But during schema development — when you're running the same 20-document test set 5–10 times as you tune field definitions — the cache eliminates 80–90% of Bedrock calls.

Cache storage: DynamoDB with a 24-hour TTL. A configuration hash in the key means any schema change automatically invalidates affected documents.

Measured impact: 5–30%, highly workload-dependent. Highest during schema development.

The Architecture Diagram

Four layers: application cache (skip Bedrock entirely on repeat docs) → Bedrock prompt cache (cheap cache reads on static system prompts) → model selection (Haiku vs Sonnet per workflow type and task) → context filtering (gap-fill only for missing fields, structured input for verification).

What Didn't Work

Aggressive prompt compression. I stripped whitespace and punctuation from system prompts to reduce token count. Accuracy degraded measurably. Foundation models are trained on well-formatted text; stripping formatting works against the training distribution.

Shared universal prompt across workflow types. I tried building one system prompt with conditional sections, cacheable as a single prefix. Engineering complexity was high, the cache hit rate was lower than expected (different workflow types rarely executed close enough in time to share a cache window), and accuracy dropped on edge cases. Reverted.

Output post-processing to compress tokens. Asking the model to output abbreviated values and expanding them in Lambda saved some output tokens, but it added execution time and increased application complexity—net cost difference: negligible.

Wrapping Up

The 40% isn't one clever trick. It's an architectural shift plus four boring, measurable optimisations applied to what was left:

Architecture first: Textract for what it handles well, Bedrock only for gap-fill and verification
Prompt caching: One field change, immediate impact on batch workloads
Context filtering: Send the right OCR to Bedrock, not all of it
Model tiering: Test per task type, not per system
Result caching: Highest impact during schema development

If you're running production Bedrock workloads and haven't measured where your tokens are going — start there. The data will tell you which of these applies.

How I Built the Document Processing Pipeline Behind Autowired.ai: S3, Lambda, Step Functions, and SQS

Yoganand Govind — Sun, 07 Jun 2026 10:54:13 +0000

Early in building Autowired.ai, I wired up a simple flow: user submits a batch → API calls Textract → API calls Bedrock → API returns results.

It worked perfectly. For one document.

The moment I tested with 20 documents, the Lambda timed out. The moment I tested with a corrupted PDF, the entire batch failed. The moment I imagined what happens when Bedrock throttles mid-batch, I realised I had built a pipeline that would fall apart in production in at least five different ways.

So I scrapped it and rebuilt the whole thing async. This post is what I landed on — and more importantly, why.

The Core Insight: The API Should Not Touch the Processing
The first architectural shift was the most important one: the API has no role in document processing. Its only job is to accept the batch submission, write the initial records to DynamoDB, return a presigned S3 URL, and send a 202 Accepted back to the client.

That's it. The API is done.

Everything that follows – OCR, AI extraction, status tracking, and webhook delivery – happens completely independently, triggered by the document landing in S3.

Why S3 as the trigger instead of having the API start the Step Functions execution directly?

Because S3 event notifications are durable. If Step Functions has a transient service hiccup when the document uploads, the S3 event queues and retries. The API already returned 202 — the client doesn't know or care. The processing will start when the event delivers.

If the API triggered Step Functions directly, a transient Step Functions error at submission time would require the client to retry the entire batch upload. That's a much worse failure mode.

The Gotcha I Hit With S3 Event Filters
S3 event notification suffix filters are case-sensitive.

I set up filters for .pdf, .png, .jpg, .jpeg, .tiff and wondered why some customer uploads weren't triggering processing. Turns out files uploaded from Windows frequently come in as .PDF, .JPG, .TIFF.

The fix is to register both variants for every extension:

const supportedExtensions = [
  ".pdf", ".PDF",
  ".png", ".PNG",
  ".jpg", ".JPG",
  ".jpeg", ".JPEG",
  ".tiff", ".TIFF",
  ".tif", ".TIF",
];

for (const ext of supportedExtensions) {
  this.documentsBucket.addEventNotification(
    s3.EventType.OBJECT_CREATED,
    new s3n.LambdaDestination(this.s3IngestionLambda),
    { prefix: "s3-ingestion/", suffix: ext }
  );
}

Not glamorous. But the kind of thing that burns you in production if you don't know it.

Also, S3 event notifications are at-least-once, not exactly-once. S3IngestionLambda checks DynamoDB before creating a record — if the document already exists, it's a no-op. Idempotency here is not optional.

The State Machine: Where the Real Work Happens
Once the ingestion Lambda fires StartExecution, control passes to the Step Functions state machine. This is the heart of the pipeline.

The full flow:

Let me walk through the decisions that aren't obvious from the diagram.

Why maxConcurrency: 10 — Not 50, Not 100
The Map state fans out to one execution per document. maxConcurrency: 10 means at most 10 documents are processed simultaneously.

The first time I saw this, I thought it was a conservative default I should raise. I didn't raise it — and here's why that was the right call.

Textract's AnalyzeDocument API and Bedrock both have per-account concurrency limits. If you submit 50 documents simultaneously, you'll hit those limits, get 429 throttling responses, and your retry logic will pile on. You end up processing 50 documents slower than if you'd used 10, because the retry backoff adds latency on top of the throttled calls.

10 concurrent document processors is a deliberate contract with AWS service quotas. At ~15 seconds per document (Textract + Bedrock combined), a 100-document batch takes around 150 seconds — 10 waves of 10. That's completely acceptable for a background processing job.

When I request higher Textract and Bedrock quotas, I'll raise this number. The point is it lives in the CDK definition as a named constant — not buried in application code.

Per-Document Error Handling: The Most Important Decision
This was the thing I got wrong first.

In my initial version, if one document's processor threw an exception, the Map state failed, and the entire batch execution failed. 49 successfully processed documents, one corrupted PDF, entire batch in FAILED state.

The fix is addCatch at the task level:

processDocument.addCatch(markDocumentFailed, {
  errors: ["States.ALL"],
  resultPath: "$.error",
});

When DocumentProcessorLambda throws — for any reason — execution routes to MarkDocumentFailed instead of propagating up. MarkDocumentFailed writes the error, timestamp, and document ID to DynamoDB and returns cleanly. The Map state moves to the next document.

The batch ends with a mix of SUCCEEDED and FAILED documents. UpdateBatchStatus calculates the final batch state from the individual document outcomes. Users can see which documents succeeded, which failed, and why — rather than getting a single opaque batch failure.

This is the difference between a pipeline that's usable in production and one that isn't.

SQS Queues: Two of Them, Different Purposes
The pipeline has two SQS queues, and they're not interchangeable.

DocumentProcessingQueue — for bulk S3 ingestion paths where documents need to buffer before hitting the state machine. The critical configuration:

visibilityTimeout: cdk.Duration.minutes(6), // Lambda timeout is 5min — always add 1
deadLetterQueue: {
  queue: documentDlq,
  maxReceiveCount: 3,
},

The visibility timeout rule I follow everywhere: always set it to Lambda execution timeout + at least 60 seconds. If the Lambda is mid-execution and the visibility timeout expires, SQS thinks the message was abandoned and makes it visible again. Now two Lambdas are processing the same message simultaneously. That's bad.

WebhookDeliveryQueue — for notifying customer endpoints after batch completion. Five retry attempts instead of three, because external customer endpoints are less reliable than internal Lambda functions:

deadLetterQueue: {
  queue: webhookDlq,
  maxReceiveCount: 5, // external endpoints get more retries
},

And batchSize: 1 on the consumer Lambda — if you process 10 webhook deliveries in one invocation and 3 fail, SQS retries all 10. With batch size 1, each delivery fails independently.

Both DLQs retain messages for 14 days. That's the window to investigate failures. A CloudWatch alarm on DLQ depth > 0 means something broke and gave up — always worth looking at.

The Four Failure Modes and How Each Is Handled
I designed the failure handling before I wrote the happy path. Here's the failure map:

A single document fails → addCatch routes to MarkDocumentFailed. The batch continues. The document is marked FAILED in DynamoDB with the error reason.

The entire batch initialisation fails → State machine transitions to FAILED. This is the right outcome — if the batch can't be initialised, there's nothing to recover.

Webhook delivery fails → SQS retries up to 5 times. After 5 failures, message dead letters. CloudWatch alarm fires. The customer gets no webhook — but the batch itself already completed successfully.

Scheduled batch trigger fails → EventBridge retries 2 times. ScheduledBatchLambda is idempotent — it checks DynamoDB before starting any execution, so even a double-trigger from EventBridge won't create duplicate processing.

Each failure mode is isolated. A failed webhook doesn't affect a completed batch. A failed scheduled trigger doesn't affect manually submitted batches. The pipeline degrades gracefully rather than having a single failure surface.

One Operational Lesson That Surprised Me
Step Functions execution history is more useful than I expected.

When something goes wrong at 2am — and it will — the first place I look is the Step Functions console. The execution shows every state transition, every input payload, and every error message exactly when it occurred. No digging through CloudWatch logs across multiple Lambda invocations. No reconstructing request IDs.

I added X-ray tracing to every Lambda in the pipeline before the first batch ever ran. The combination of Step Functions execution history and X-Ray distributed traces means I've never had a failure I couldn't diagnose within a few minutes.

Add these from day one. They cost almost nothing, and they're invaluable when you need them.

Wrapping Up
The pipeline I built is not clever. It's S3, Lambda, Step Functions, and SQS doing exactly what they're designed to do:

S3 as a durable ingestion trigger, not a storage layer
Step Functions for orchestration with state, parallelism, and explicit failure handling
SQS for buffering and decoupled delivery with the right retry budget per queue
DLQs and CloudWatch alarms as the operational safety net

The 202 async pattern means the API is fast and the processing is resilient. The per-document error handling means one bad file can't kill a batch. The visibility timeout discipline means no message is processed twice.

Next I'm writing about the AI cost optimization layer — specifically how I reduced Bedrock costs by ~40% without touching extraction accuracy.

Step Functions vs EventBridge vs SQS — I Use All Three in the Same System. Here's why.

Yoganand Govind — Sun, 31 May 2026 12:03:18 +0000

When I started building Autowired.ai — an AI document extraction SaaS, one of the earliest decisions I had to make was which AWS messaging and orchestration services to use for the processing pipeline.

My first instinct was to reach for SQS everywhere. I knew it well. It's simple, it's cheap; it's reliable. But as the pipeline grew more complex — document uploads triggering extraction workflows, batches processing dozens of files in parallel, and webhook notifications delivered to customer endpoints, I kept running into the edges of what a queue alone can do.

I ended up using all three services: SQS, EventBridge, and Step Functions. Not because I wanted complexity, but because each one fits a specific job that the others don't.

This post discusses the decision-making process for each service, detailing what each one does, where it may fail, and the specific reasons why I selected each service for different components of the Autowired pipeline.

The Problem With Reaching for One Tool
Here's the antipattern I see a lot: an engineer learns SQS, it works for the first async use case, and then SQS becomes the default for everything async. Queue → Lambda → done. Repeat.

This works fine until your workflow has multiple steps. Then you start chaining queues: Queue A → Lambda B writes to Queue C → Lambda D writes to Queue E. You've now built a state machine out of SQS queues — without any of the state, visibility, error handling, or branching logic that a state machine provides.

When something breaks in that chain, you're reconstructing what happened from CloudWatch logs across five different Lambda invocations with different request IDs, trying to figure out which step failed and what the data looked like when it did.

I've been there. It's not fun.

The right answer isn't "use Step Functions for everything" either. Step Functions has overhead — cost per state transition, latency per step, and operational complexity that's overkill for simple async tasks.

The answer is using each service for what it's actually designed for.

The Three Services, Simply Put
SQS is a queue. It buffers work between a producer and a consumer, handles retries, and dead-letters failed messages. It knows nothing about workflow state — only whether a message was processed or needs to be retried.

EventBridge is an event router. It receives events and routes them to one or more consumers based on rules. Its superpower is loose coupling — the producer doesn't know who's listening, and you can add new consumers without touching the producer.

Step Functions is a workflow orchestrator. It manages multi-step processes with persistent state, branching, parallelism, and error handling. Every step's input, output, and failure is recorded in the execution history.

None of these is a substitute for the others. They solve different problems.

How Autowired Uses All Three
Here's the actual service topology in the Autowired processing pipeline:

Let me explain why each service is where it is.

EventBridge: For the Scheduled Trigger
Every 5 minutes, Autowired checks whether any batches have a scheduled execution time that's due. This is handled by a Lambda (ScheduledBatchLambda) triggered by an EventBridge rule:

const scheduledBatchRule = new events.Rule(this, "ScheduledBatchRule", {
  ruleName: `autowire-scheduled-batch-${stage}`,
  schedule: events.Schedule.rate(cdk.Duration.minutes(5)),
  description: "Triggers scheduled batch processing check every 5 minutes",
});

scheduledBatchRule.addTarget(
  new targets.LambdaFunction(scheduledBatchLambda, {
    retryAttempts: 2,
  })
);

Why EventBridge here and not a CloudWatch Events cron or a polling Lambda?

Because EventBridge is the native AWS scheduling primitive. The rule is declarative, versioned in CDK, has built-in retry logic (retryAttempts: 2), and integrates cleanly with Lambda. There's no infrastructure to manage, no polling loop to maintain.

The retryAttempts: 2 matters: if the Lambda has a cold start failure or a transient error, EventBridge retries twice before giving up. Without this, a Lambda cold start would silently skip a scheduled batch check.

Step Functions: For the Document Processing Workflow
When a document batch is submitted — either via S3 upload or scheduled trigger — it starts a Step Functions execution. This is the core of the pipeline, and it's where the complexity lives.

The state machine looks like this:

I want to highlight a few specific decisions here:

The Map state with maxConcurrency: 10. Each batch can have dozens or hundreds of documents. The Map state fans out to one execution per document, running up to 10 in parallel. Why 10 and not 50? Because Textract and Bedrock both have per-account concurrency limits. Unconstrained parallelism would exhaust those limits, trigger throttling, and ironically make the batch slower. 10 is a deliberate contract with AWS service quotas, not a performance guess.

Per-document error handling with addCatch. This was one of the most important design decisions. If one corrupted PDF crashes the document processor, it should not abort the other 49 documents in the batch. The addCatch on the processDocument step routes failures to MarkDocumentFailed, which writes the error to DynamoDB and lets the Map state continue:

processDocument.addCatch(markDocumentFailed, {
  errors: ["States.ALL"],
  resultPath: "$.error",
});

Without this, a single bad document would fail the entire batch execution. That's the wrong failure mode.

Why not Lambda chains? I tried a simpler version of this early on — chaining Lambda invocations directly. The moment I needed to fan out across multiple documents and then aggregate results (to determine overall batch status), Lambda chains couldn't express it. There's no fan-in primitive. Step Functions' Map state handles this natively.

The other thing Lambda chains can't give you: execution history. When a batch fails, the first thing I do is open the Step Functions console and look at the execution. Every state, every input, every error is right there. That debugging visibility has saved me hours.

SQS: For Webhook Delivery
After a batch completes, if the customer has webhook delivery configured, Autowired sends a notification to their endpoint. This is handled via SQS, not via a direct Lambda call from inside the state machine, and not via EventBridge.

Here's why SQS specifically:

this.webhookDeliveryQueue = new sqs.Queue(this, "WebhookDeliveryQueue", {
  queueName: `autowire-webhook-queue-${stage}`,
  visibilityTimeout: cdk.Duration.seconds(60),
  deadLetterQueue: {
    queue: webhookDlq,
    maxReceiveCount: 5,
  },
});

The decoupling is the point. Customer webhook endpoints are unreliable — they go down, they time out, they return 500s for unrelated reasons. If webhook delivery is inside the Step Functions execution, a failed webhook delivery blocks or fails the batch. Those are completely unrelated concerns.

By routing to SQS, the Step Functions execution completes cleanly. Webhook delivery has its own retry budget (*maxReceiveCount: 5 *— more than the document processor queue's 3, because external endpoints are flakier than internal Lambdas). Failures dead-letter after 5 attempts and trigger a separate alarm.

batchSize: 1 on the consumer. The WebhookDeliveryLambda processes one message at a time:

webhookDeliveryLambda.addEventSource(
  new lambdaEventSources.SqsEventSource(this.webhookDeliveryQueue, {
    batchSize: 1,
  })
);

If you process 10 webhook deliveries in one Lambda invocation and 3 fail, SQS can't partially acknowledge all 10 retry. With batchSize: 1, each webhook delivery retries independently. The blast radius of a failed delivery is exactly one customer, not ten.

The Decision Framework I Actually Use
After building this, here's how I'd frame the decision for any new async requirement:

Reach for SQS when:

You need to buffer work between a producer and consumer running at different rates
The work is a single step — queue in, Lambda processes, done
You need per-message acknowledgment and dead-lettering
The work items are independent of each other

Reach for EventBridge when:

You need a scheduled trigger (rate or cron)
One event needs to fan out to multiple independent consumers
You want the producer to be decoupled from who consumes its events
You're routing events across services or accounts

Reach for Step Functions when:

Your workflow has multiple steps where each depends on the previous step's output
You need to fan out over a collection and wait for all items to complete (Map state)
You need conditional branching based on data in the execution context
You need execution history for debugging and operations
The workflow can run for minutes to hours

And the pattern to avoid: using SQS to chain multi-step workflows. If you find yourself writing a Lambda that reads from Queue A and writes to Queue B to trigger the next step, stop and reach for Step Functions instead.

Wrapping Up
The combination of EventBridge for scheduling, Step Functions for orchestration, and SQS for decoupled delivery isn't overengineering. It's three services doing the jobs they were designed for.

The Autowired pipeline is more debuggable, more resilient, and operationally cleaner because each service is in its right place — not because I used the most services, but because I used the right ones.

Next I'm writing about the full event-driven document processing pipeline — how S3 uploads trigger the Step Functions execution, how the DLQs are wired, and the failure handling that makes it production grade.

Follow along if that's useful.

This is part of a 10-post series on the architecture behind Autowired.ai — an AI document extraction SaaS I built solo on AWS serverless.

← Intro post: What I've been building

What I've Been Building for the Last Several Months and Why I'm Finally Writing About It

Yoganand Govind — Thu, 28 May 2026 17:45:41 +0000

I've been quietly heads-down building something outside of work for the past several months. No posts, no updates, no "excited to share" announcements. Just building.
Today I'm breaking that silence, and this is the first post in a series where I'll share everything I've learned.

The Thing I Built
Autowired.ai — an AI-powered document extraction SaaS.
The idea is straightforward: businesses deal with mountains of documents — invoices, purchase orders, contracts, insurance forms — and extracting structured data from them is still mostly manual or brittle rule-based OCR that breaks the moment the template changes.

Autowired lets you define a visual extraction template on a canvas (you draw fields over a sample document), submit a batch of documents, and get back structured JSON with the extracted values. No code, no regex, no fragile parsers.

Sounds simple. The engineering is not.

Why I Built It, Solo
I have 11 years of software engineering experience. Enterprise Java, government systems, insurance platforms, cloud architecture. I've worked on large teams, gone through full SDLC processes, dealt with change advisory boards and SLA contracts.

What I hadn't done was build something entirely from scratch, make every architectural decision myself, and take it all the way to production — solo.

So that's what I did. This project is my proving ground for everything I know about cloud-native architecture and AI systems applied without the safety net of a team.

It's ~90% complete, which is in beta phase. And the lessons have been hard-earned.

The Stack (and Why)
Before I dive into any specific post, here's the full picture of what's running under the hood:

Infrastructure: AWS CDK (TypeScript) — everything is code, nothing is clicked into existence in the console. Six separate CDK stacks: database, storage, processing, Bedrock, API, and monitoring.
Database: DynamoDB single-table design with three GSIs. Multi-tenant data isolation baked into the partition key structure — not enforced by application code.

Document processing pipeline: S3 event notifications trigger a Lambda, which starts a Step Functions state machine. The state machine runs up to 10 documents in parallel, handles per-document failures independently, updates batch status, and optionally delivers webhook notifications via a separate SQS queue.

AI extraction: Amazon Bedrock Data Automation (BDA) for intelligent field extraction. Amazon Textract for OCR preprocessing. Bedrock Guardrails for safety filtering.

Auth: Clerk.

Frontend: Next.js product app and marketing site.

Runtime: ARM64 Lambdas on Node.js 20, X-Ray tracing across the pipeline, DLQs on every queue.

Every piece of that list came with decisions, tradeoffs, and at least one thing I got wrong the first time.

What's Coming in This Series
Over the next 10 weeks, I'm writing about each layer of this system in depth — not tutorials, not beginner walkthroughs, but the actual engineering reasoning behind the decisions:

Step Functions vs EventBridge vs SQS — when to use each, and how I use all three in the same system for different jobs
Building the event-driven document processing pipeline — S3, SQS, Lambda, Step Functions, and the failure handling that makes it production-grade
How I reduced Bedrock AI costs by ~40% — prompt caching, model tiering, token optimisation, result caching
DynamoDB single-table design for multi-tenant SaaS — real partition key patterns, GSI design decisions, and the tradeoffs nobody mentions
The full multi-tenant SaaS architecture — tenant isolation, async processing, API design, and how all the stacks fit together
Terraform vs AWS CDK — a practical comparison from someone who's used both in production
*RAG architecture on Amazon Bedrock *— embeddings, chunking strategy, tenant-aware retrieval, hallucination reduction
Designing high-availability systems at enterprise scale - what I carried over from government and insurance engineering
AI architecture patterns from a real product — observability, confidence thresholds, prompt versioning, output validation
From Enterprise Java engineer to AI Platform engineer — what the transition actually looks like.

If you work in cloud, AI infrastructure, or platform engineering or you're just curious how a solo engineer structures a production-grade SaaS — this series is for you.

A Note on Why I'm Sharing This
I'm not building in public for the sake of building in public. I'm sharing this because the content I wish had existed when I was making these decisions about DynamoDB key design, about when Step Functions is overkill, about how to actually reduce Bedrock costs in a real workload mostly doesn't exist at the depth it should.

Most AWS content is either too beginner or too abstract. There's not enough "here's a real system, here's why it's designed this way, here's what broke."

That's what I'm trying to write.

— Yoganand (Yogi)

Follow me here on Dev.to or connect on LinkedIn to get each post as it drops.