Md. Mostafa Al Mahmud for AWS Community Builders

Posted on Jul 3 • Originally published at builder.aws.com

From Manual OAuth Onboarding to Event-Driven Sync: A Privacy-Safe Serverless Case Study

#serverless #security #oauth #eventdriven

TL;DR

This write-up is intentionally anonymized and educational. Names, identifiers, and implementation details are generalized. Code snippets are partial by design.

Why this story matters

Many data integrations start with a "quick workaround":

Share an account
Send credentials in a ticket
Manually export account IDs
Hardcode resource mappings

It works for one client, then breaks at ten.

This case study walks through how we moved from manual onboarding to a secure, low-cost serverless OAuth platform that automatically discovers analytics resources and keeps downstream relational data synchronized through event-driven streaming.

The point is not to copy a vendor-specific integration. The point is to learn the architecture and decision process so you can apply it to your own system.

The initial problem statement

We had a repeatable operational pain:

External users needed to authorize read-only access to analytics and advertising data.
Onboarding required human steps and back-and-forth communication.
Metadata lived in one place, but reporting and internal workflows relied on a relational database.
Security and privacy constraints required strict token handling.

Non-negotiable constraints

No raw OAuth tokens in app logs or primary metadata tables.
CSRF-safe OAuth flow.
Low operational overhead and low monthly cost.
Near real-time sync into Postgres for downstream consumers.
Publicly shareable architecture knowledge without exposing business logic.

Solution v1: serverless OAuth discovery pipeline

We started with a simple but strong baseline:

API Gateway for HTTPS endpoints
Lambda for orchestration and read APIs
DynamoDB for metadata and nonce state
SSM Parameter Store (SecureString) for token material
CloudWatch for observability

High-level architecture (v1)

OAuth flow with single-use state nonce

The first major security decision: treat state as one-time and time-bound.

# pseudo-code
state = uuid4()
put_item("OAuthStates", {
  "state": state,
  "created_at": now_epoch(),
  "ttl": now_epoch() + 600
})
redirect_to_provider(state=state)

On callback:

# pseudo-code
record = get_item("OAuthStates", key={"state": state})
if not record:
    return unauthorized("invalid or reused state")

delete_item("OAuthStates", key={"state": state})  # consume once
if record.ttl < now_epoch():
    return unauthorized("state expired")

Why both TTL and explicit delete?

TTL cleanup is eventually consistent.
Explicit delete enforces one-time use immediately.

Token boundary: secrets vs metadata

A second core decision: split storage responsibilities.

Sensitive tokens -> encrypted secret store (SSM SecureString)
Non-sensitive metadata -> DynamoDB

Security impact:

Querying metadata tables never reveals token material.
Access control can differ per store.
Incident blast radius is reduced.

Why we moved away from Cognito for this specific flow

An earlier design considered using Cognito as the center of authentication and token handling.

That approach is strong when your main goal is application sign-in and session management (JWT-based identity, federation, and authorization boundaries). But our integration path needed something slightly different: reliable custody of the external provider's actual access token and refresh token so backend jobs could call provider APIs over time.

In practice, Cognito-issued tokens represent Cognito sessions, not a full replacement for long-lived third-party API credential lifecycle management in this architecture. So we changed direction:

Use direct OAuth code exchange with the provider.
Persist provider access/refresh tokens in encrypted parameter storage.
Keep DynamoDB focused on non-sensitive token metadata and discovery records.

This decision reduced ambiguity in token ownership, made refresh behavior explicit, and aligned better with least-privilege backend API access patterns.

Resource discovery pipeline

After the callback exchange, the auth Lambda performs discovery against two provider APIs and stores normalized records.

# pseudo-code
tokens = exchange_code_for_tokens(code, redirect_uri)
email = fetch_user_identity(tokens.access_token)
store_tokens(email, tokens.access_token, tokens.refresh_token)

resources_a = list_resource_type_a(tokens.access_token)
resources_b, skipped = list_resource_type_b(tokens.refresh_token)

upsert_resources_a(email, resources_a)
upsert_resources_b(email, resources_b, skipped)

Notice the mixed token strategy:

API A accepts an access token directly.
API B may be better served from refresh-token-derived sessions.

This small detail matters in real-world provider ecosystems.

New problem: operational analytics lived in Postgres, not DynamoDB

V1 solved onboarding and discovery. Then a second problem appeared.

Downstream consumers (dashboards, joins, historical analysis, role-based reports) relied on relational querying in Postgres. But fresh data now landed in DynamoDB first.

We had to answer:

How do we keep relational tables synced with minimal lag?
How do we remain idempotent under retries and duplicate events?
How do we avoid expensive full-table scans every minute?

Solution v2: event-driven sync with DynamoDB Streams

The best fit was an event-driven projection layer:

DynamoDB table updates emit stream records.
Stream processor Lambda transforms records.
Lambda upserts rows into Postgres.

Updated architecture with streaming sync

Why event-driven first, batch second

Primary path (event-driven):

Low latency (seconds)
No frequent scans
Natural fit for change-data-capture style projection

Safety net (nightly reconciliation):

Catches rare drift (missed events, temporary DB outage, mapping regressions)
Supports audit checks and backfills This is a practical engineering pattern: fast path + correctness path.

Stream processor design details

1. Idempotency via SQL upsert

DynamoDB Streams are at-least-once delivery. Duplicate records can happen. Upsert semantics make retries safe.

-- pseudo-SQL
INSERT INTO ext_resource_a (
  user_email,
  resource_id,
  resource_name,
  status,
  updated_at
)
VALUES ($1, $2, $3, $4, $5)
ON CONFLICT (user_email, resource_id)
DO UPDATE SET
  resource_name = EXCLUDED.resource_name,
  status = EXCLUDED.status,
  updated_at = EXCLUDED.updated_at;

2. Record-level routing

# pseudo-code
for rec in event.records:
    table = detect_source_table(rec)
    if rec.event_name in ["INSERT", "MODIFY"]:
        row = map_new_image_to_row(table, rec.new_image)
        upsert_postgres(table, row)
    elif rec.event_name == "REMOVE":
        soft_delete_or_mark_inactive(table, rec.keys)

3. Preserve source truth semantics
Not every delete should be a physical delete in Postgres. Often better:

Keep row
Mark status = inactive
Track synced_at and source_updated_at

This improves auditability and historical reporting.

4. Backpressure and failure handling

For production, configure:

Batch size tuned for row payload
Retries + DLQ (or failure destination)
Per-table metrics for lag and failure counts

# pseudo-SAM fragment
EventSourceMapping:
  Type: DynamoDB
  Properties:
    StartingPosition: LATEST
    BatchSize: 100
    MaximumRetryAttempts: 3
    BisectBatchOnFunctionError: true

Security-by-design decisions (and why)

CSRF-safe OAuth state

Single-use, TTL-bound nonce in DynamoDB reduced callback forgery risk.

Token isolation
Only the secret store contains token values. The metadata table stores "token exists" and consent timestamps.

Least privilege IAM

Each Lambda role should have only:

read/write specific DynamoDB tables it uses
limited SSM parameter path access
CloudWatch log permissions
network access only when needed (sync Lambda inside VPC for RDS)

Logging hygiene

Never log:

authorization code
access token
refresh token
raw provider error objects that may contain sensitive context

Log instead:

operation outcome
provider endpoint class
masked subject identifiers
correlation ID

Cost-aware architecture choices

The design intentionally kept fixed costs low:

Lambda for bursty orchestration
API Gateway for managed ingress
DynamoDB on-demand for uncertain traffic
SSM Parameter Store SecureString instead of a heavier secret system for this phase
30-day log retention to control CloudWatch growth

Rough POC economics can stay small (single-digit USD/month) when traffic is modest and retention is disciplined.

Sequence walkthrough (problem to resolution)

Small implementation snippets you can adapt

Sanitize identity for secret path keys

# pseudo-code
import re

def to_secret_path_segment(identity: str) -> str:
    return re.sub(r"[^A-Za-z0-9._-]", "_", identity)

Build the callback URI dynamically to avoid template coupling

# pseudo-code
def callback_uri_from_event(event):
    domain = event.requestContext.domainName
    stage = event.requestContext.stage
    return f"https://{domain}/{stage}/auth/callback"

Separate "active" and "skipped" discovered resources

# pseudo-code
active, skipped = discover_resources()
upsert_active(active)
upsert_skipped(skipped, reason_field="skip_reason")

Keep a reconciliation watermark

-- pseudo-SQL
CREATE TABLE sync_checkpoint (
  pipeline_name text primary key,
  last_reconciled_at timestamptz not null
);

Design trade-offs and what changed in architecture

What improved from v1 to v2

Onboarding became self-service instead of support-driven.
Token handling became boundary-safe and auditable.
Metadata became immediately queryable in DynamoDB.
Relational consumers received near real-time updates via streams.
Operational resilience improved with nightly reconciliation.

New complexity introduced (and accepted)

Stream processor deployment and monitoring.
VPC networking for Lambda-to-RDS connectivity.
Schema mapping ownership between NoSQL and SQL models. These are acceptable because they buy reliability, lower manual effort, and a better consumer experience.

What this case study intentionally does not reveal

To protect privacy and commercial implementation details, this post excludes:

Real account names, tenants, domains, and identifiers
Production table names and environment values
End-to-end source code and full function implementations
Internal business workflows, SLAs, and organization-specific goals

That is not a weakness. It is a publishing discipline.

Final architecture summary

Final design in one sentence:

A serverless OAuth ingestion service writes secure secrets and normalized metadata, then projects metadata changes into Postgres through an idempotent event-driven stream processor, with scheduled reconciliation for correctness.

If you are designing a similar platform, the key pattern to remember is:

Keep secrets and metadata in separate trust boundaries.
Use event streams for freshness.
Add periodic reconciliation for confidence.
Design cost and security as first-class constraints, not post-launch patches.

Practical rollout checklist

Ship OAuth nonce validation and one-time consumption first.
Enforce token/metadata split before production traffic.
Add provider discovery with partial-failure handling (active vs skipped).
Enable streams on metadata tables.
Implement the idempotent Postgres upsert projection.
Add nightly reconciliation and drift metrics.
Lock down IAM and log redaction rules.
Track cost and lag dashboards from day one.

Closing note

Architecture maturity usually arrives in stages, not all at once. First, you remove manual pain. Then you harden security boundaries. Then you solve data movement with event-driven design. If you do those steps intentionally, you can stay both secure and cost-effective while your system grows.

Resources
OAuth 2.0 Authorization Framework (RFC 6749): datatracker
OAuth 2.0 Threat Model and Security Considerations (RFC 6819): datatracker
AWS Lambda Developer Guide: docs.aws.amazon.com/lambda
Amazon API Gateway Developer Guide: docs.aws.amazon.com/apigateway
Amazon DynamoDB Developer Guide: docs.aws.amazon.com/amazondynamodb
DynamoDB Streams and Lambda event source mappings: docs.aws.amazon.com/lambda
AWS Systems Manager Parameter Store (SecureString): docs.aws.amazon.com/systems-manager
AWS Well-Architected Framework (Security and Cost pillars): docs.aws.amazon.com/wellarchitected
Amazon RDS for PostgreSQL User Guide: docs.aws.amazon.com/AmazonRDS
PostgreSQL INSERT ... ON CONFLICT (UPSERT): postgresql/sql-insert
Python requests documentation: requests.readthedocs

DEV Community