DEV Community: Heartlin Machado

How I Built a RAG System Over more than 100 USCIS Administrative Appeals Office Decisions with Gemini

Heartlin Machado — Sun, 28 Jun 2026 17:28:11 +0000

USCIS denial rates for EB-1A petitions nearly doubled in one year - from 25.6% to 46.6%. NIW denial rates hit 64.3%. Immigration attorneys charge $5,000 to $15,000 for case preparation that most applicants can't afford.

I'm building PetitionIQ, an immigration case preparation platform that analyzes visa petitions the way USCIS actually reviews them. The core of the platform is a RAG pipeline over 107 real USCIS Administrative Appeals Office (AAO) non-precedent decisions - not generic legal knowledge, not LLM training data, but actual adjudication outcomes with full provenance.

This post walks through every design decision in the RAG system: why the corpus is biased and how I handle it, why category isolation matters more than you'd think, and how hybrid retrieval with hard filters prevents the kind of cross-contamination that makes legal AI dangerous.

Why AAO decisions?

The Administrative Appeals Office publishes non-precedent decisions on uscis.gov. These are real adjudication outcomes - cases where someone filed an I-140 petition, got denied, and appealed. The AAO either sustained the appeal (overturned the denial), dismissed it (upheld the denial), or remanded it (sent it back for further review).

This corpus is valuable because it shows exactly how USCIS evaluates evidence for each criterion. Not what the law says in the abstract, but how officers actually apply it to real cases. When the AAO writes "the petitioner's three publications in field-specific journals, while commendable, do not establish that the beneficiary's work constitutes original contributions of major significance," that's a data point no amount of LLM training captures.

But the corpus has a fundamental problem.

The corpus bias problem

AAO decisions are appeals of denials. Clean approvals never appear in this dataset. If someone filed an EB-1A petition and got approved, there's no AAO record of it.

This means the corpus is selection-biased toward rejection. If I built a system that naively learned from this data, it would conclude that almost nothing gets approved - because it only sees the cases that didn't.

Design decision: PetitionIQ never outputs approval probabilities.

No "you have a 73% chance of approval." No "based on similar cases, your likelihood is high." The system uses strength indicators (strong, moderate, weak) and cites specific AAO decisions to explain why evidence does or doesn't meet a particular criterion. Every response includes a corpus bias disclosure explaining that the AAO corpus only contains appeals of denials.

This is not a limitation I'm hiding. It's a design constraint I'm highlighting. The honest thing to do with biased data is to be transparent about the bias, not to paper over it with false confidence.

The crawl pipeline

The AAO publishes decisions as PDFs on uscis.gov, organized by category and year. The crawler is a polite, rate-limited scraper that:

Discovers PDFs via directory listings on the USCIS website
Falls back to candidate URL probing when directory listings aren't available (AAO filenames follow predictable patterns like JAN162026_01B2203.pdf)
Downloads each PDF with a 2-second rate limit between requests
Extracts text using pdfplumber
Maintains an idempotent manifest so re-runs don't re-download

The current corpus: 107 decisions across 4 visa categories (EB-1A: 44, EB-2 NIW: 54, EB-1B: 4, O-1A: 5), totaling 262,778 words.

# Polite rate limiting
class RateLimiter:
    def __init__(self, min_interval=2.0):
        self.min_interval = min_interval
        self._last_request = 0.0

    def wait(self):
        elapsed = time.time() - self._last_request
        if elapsed < self.min_interval:
            time.sleep(self.min_interval - elapsed)
        self._last_request = time.time()

Gemini structured extraction

Raw AAO decision text is messy. Different officers write differently, formatting varies, and the same criterion can be discussed across multiple sections of a decision. I use Gemini 2.5 Flash to extract structured data from each decision:

Category (EB-1A, EB-1B, EB-2 NIW, O-1A)
Outcome (sustained, dismissed, remanded)
Criteria findings - which criteria were claimed, which were met, what the AAO's reasoning was for each
Field of endeavor - what field the petitioner worked in
Confidence score - how confident the extraction is

The extraction uses JSON response mode with a strict Pydantic schema. Decisions that fail validation (usually because Gemini returned null for a required boolean field) get quarantined rather than included with bad data.

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=prompt,
    config=types.GenerateContentConfig(
        temperature=0.1,  # Low temperature for factual extraction
        response_mime_type="application/json",
    ),
)

Out of 107 decisions, 93 extracted successfully and 14 were quarantined. The quarantined decisions were predominantly NIW cases where the Dhanasar prong analysis didn't map cleanly to the schema. I'd rather lose 13% of the corpus than include bad extractions.

Why category isolation matters

This is the design decision that most legal AI tools get wrong.

O-1A (extraordinary ability in the arts/sciences/business) and EB-1A (extraordinary ability for a green card) share almost identical criteria text. Both reference "awards," "published material," "original contributions," etc. But they apply different legal standards. O-1A uses a "distinction" standard. EB-1A uses a higher "sustained national or international acclaim" standard. The same evidence that satisfies O-1A may not satisfy EB-1A.

If your retrieval system returns EB-1A reasoning when a user asks about O-1A, the analysis is wrong even though the text looks relevant. The criteria names match, the evidence types match, but the legal standard is different.

Design decision: category is a hard filter, not a soft signal.

When a user requests analysis for O-1A, the retrieval system only returns chunks tagged as O-1A. Zero EB-1A chunks leak through, regardless of semantic similarity.

def retrieve(query, category, top_k=10, store=None):
    # Hard filter: only chunks matching the requested category
    category_chunks = [
        c for c in store.chunks
        if c.category == category
    ]
    # Semantic search only within filtered set
    results = semantic_search(query, category_chunks, top_k)
    return results

I wrote an eval test that specifically checks for cross-category leakage. It queries for O-1A criteria and verifies that zero EB-1A chunks appear in the results. This test runs on every build.

[PASS] category_leakage - Zero cross-category contamination

Chunking by criterion, not by token window

Most RAG tutorials chunk by fixed token windows: 500 tokens with 100 token overlap. This makes no sense for legal documents.

AAO decisions are structured around criteria. An officer evaluates the "Awards" criterion in one section, the "Original Contributions" criterion in another. Cutting a chunk in the middle of a criterion analysis breaks the reasoning unit.

PetitionIQ chunks by criterion section. Each chunk represents one complete piece of legal reasoning about one criterion from one decision. The chunks carry full metadata:

@dataclass
class Chunk:
    id: str              # unique chunk ID
    text: str            # the reasoning text
    category: str        # EB1A, EB1B, EB2_NIW, O1A
    corpus: str          # "case" or "authority"
    decision_id: str     # source AAO decision
    criterion_id: str    # regulatory citation
    criterion_name: str  # controlled vocabulary name
    outcome: str         # sustained/dismissed/remanded
    field_of_endeavor: str
    source_ref: str      # citation reference

The current index has 361 chunks (327 case chunks + 34 authority chunks from regulatory text).

Hybrid retrieval: cosine + TF-IDF + RRF

Pure semantic search misses important legal terminology. When a user asks about "Kazarian two-step analysis," semantic similarity might rank a chunk about "evaluation framework" higher than one that literally mentions Kazarian. Pure keyword search misses semantic meaning. A question about "impact of research on the field" should match chunks about "original contributions of major significance" even though the exact words don't overlap.

PetitionIQ uses hybrid retrieval:

Cosine similarity over gemini-embedding-001 embeddings (3072 dimensions) for semantic matching
TF-IDF for keyword matching with term weighting
Reciprocal Rank Fusion (RRF) to combine the two ranked lists into a single result

def reciprocal_rank_fusion(ranked_lists, k=60):
    scores = {}
    for ranked_list in ranked_lists:
        for rank, (chunk_id, _) in enumerate(ranked_list):
            if chunk_id not in scores:
                scores[chunk_id] = 0.0
            scores[chunk_id] += 1.0 / (k + rank + 1)
    return sorted(scores.items(), key=lambda x: x[1], reverse=True)

RRF is simple and it works. It doesn't require tuning weights between semantic and keyword scores, and it's robust to score distribution differences between the two methods.

The authority corpus

In addition to case chunks, the retrieval system includes an authority corpus: 34 chunks of regulatory text, USCIS Policy Manual excerpts, and key precedent decision summaries (Kazarian v. USCIS, Dhanasar, Chawathe). These provide the legal framework that case chunks are interpreted against.

Authority chunks are always included in retrieval results alongside case chunks. The generator uses both to produce grounded analysis: "Under the Kazarian two-step framework [authority], the AAO in [decision_id] found that..."

Generation with citations

Every claim in the generated analysis cites a specific source. Not "based on AAO precedent" but "[AAO-JAN162026_01B2203]" with a clickable link to the original PDF on uscis.gov.

The generation prompt is strict about this:

Every factual claim must reference a retrieved chunk
No approval probabilities
Corpus bias disclosure on every response
If the evidence is insufficient for a conclusion, say so

The eval suite

Four tests run on every build:

category_leakage - Query O-1A, verify zero EB-1A chunks in results
probability_leak - Generate a response and verify no approval probability language appears
probability_pattern_validation - Test that the pattern detector catches probability language when it exists
retrieval_recall - Verify that relevant chunks are actually retrieved for known queries

[PASS] category_leakage     - Zero cross-category contamination
[PASS] probability_leak     - No approval odds language detected
[PASS] probability_patterns - Banned patterns correctly caught
[PASS] retrieval_recall     - Relevant chunks retrieved for all queries

All four passing on the current 361-chunk index.

What this enables

The RAG system powers PetitionIQ's deep analysis feature. When a user runs a deep analysis for their visa category, the system:

Embeds the query with gemini-embedding-001
Retrieves the top chunks via hybrid search (hard-filtered to the user's category)
Passes retrieved chunks + authority corpus to Gemini 2.5 Flash
Generates per-criterion analysis with AAO decision citations
Includes corpus bias disclosure

The entire pipeline runs on Google Cloud: Vertex AI for Gemini calls and embeddings, Cloud Run for the FastAPI backend, Firestore for persistence.

What I learned

Bias transparency beats bias mitigation. I spent time trying to "correct" for the selection bias in the AAO corpus before realizing the honest approach is to just tell the user about it. Every response says "this analysis is based on AAO appeal decisions, which only include cases that were denied and appealed. Approval patterns are not represented."

Hard filters beat soft signals for safety-critical retrieval. In legal analysis, returning the wrong category's reasoning isn't a "less relevant" result - it's an actively misleading one. Hard category filtering with eval tests is the only approach I trust.

Chunk by reasoning unit, not by token count. Legal reasoning has natural boundaries. Respect them.

Start with the eval suite. I wrote the four eval tests before building the retrieval system. They defined the contract the system had to satisfy. Every design decision was tested against them.

PetitionIQ is live at petitioniq.io. Free multi-visa analysis across 5 categories. The RAG-powered deep analysis, pre-submit consistency audit, document generation, and RFE response module are available with paid plans.

Built entirely on Gemini 2.5 Flash + gemini-embedding-001 + Google Cloud Run + Firestore for the Build with Gemini XPRIZE.

The full codebase is at github.com/4KInc/petitioniq.

How to Generate Cryptographic Proof of AI Agent Authorization (EU AI Act Article 14)

Heartlin Machado — Sat, 27 Jun 2026 04:16:58 +0000

How to Generate Cryptographic Proof of AI Agent Authorization for EU AI Act Article 14 Compliance

EU AI Act Article 14 enforcement starts August 2, 2026. If you're building AI agents that access sensitive data, process customer information, or make autonomous decisions -you need to demonstrate human oversight with verifiable artifacts.

Not logs. Not observability traces. Cryptographic proof.

In this post, I'll show you how we built Verigate -a cryptographic trust infrastructure for AI agents -and how you can use it to generate tamper-evident authorization receipts that any auditor can verify offline.

This content was created for the Build with Gemini XPRIZE.

The Problem

Every AI agent platform today -LangChain, CrewAI, Google ADK, Zapier AI -lets agents take actions. But none of them produce independently verifiable proof that the action was authorized according to policy.

When your agent:

Reads customer PII from a database
Sends an email on behalf of a user
Processes a refund
Modifies a CRM record

...what evidence exists that this action was authorized? A database log? That can be modified. An observability trace? That's vendor-dependent. A timestamp? That proves when, not whether.

Article 14 of the EU AI Act requires deployers to demonstrate five capabilities:

Understand the AI system's capabilities and limitations
Monitor the system's operation
Interpret outputs correctly
Override or stop the system
Record what happened and prove it

That fifth requirement is where most teams fail. You need artifacts that are:

Independently verifiable -without trusting the system that produced them
Tamper-evident -modification is detectable
Immutable -can't be retroactively altered

The Architecture: Ed25519 + SHA-256 + Merkle + Base L2

Here's how Verigate solves this:

Step 1: Every Agent Action Gets a Signed Receipt

When an agent requests authorization, the gateway evaluates policy rules (allowlist, resource scope, rate limit) and produces an Ed25519-signed receipt:

{
  "body": {
    "v": "1",
    "seq": "42",
    "ts": "2026-06-26T10:30:00Z",
    "request_digest": "sha256:0e6d5b86f01f...",
    "policy_version": "sha256:d59a1e4171e6...",
    "decision": "approve",
    "reasons": [],
    "prev_receipt": "sha256:b3f51c8824bc..."
  },
  "sig": {
    "alg": "EdDSA",
    "kid": "gateway-prod-a1b2c3d4",
    "value": "7WiFneT3tLRtE2Iztm..."
  },
  "receipt_hash": "sha256:2a3e65a3ade468..."
}

Key properties:

Ed25519 (EdDSA) signature -no HS256, no symmetric keys, no shared secrets
SHA-256 action digest -the intent (agent_id + action + resource) is canonicalized via RFC 8785 JCS before hashing
Policy version hash -proves which policy was active when the decision was made

Step 2: Receipts Are Hash-Chained

Each receipt's prev_receipt field contains the SHA-256 hash of the previous receipt. This creates a tamper-evident chain:

Receipt #1 (genesis) → prev: sha256:0000...0000
Receipt #2 → prev: sha256(Receipt #1)
Receipt #3 → prev: sha256(Receipt #2)
...

Modify any receipt in the chain, and every subsequent prev_receipt hash becomes invalid. Insert or delete a receipt, and the sequence numbers break.

Step 3: Merkle Tree with Inclusion Proofs

Receipt hashes are organized into a Merkle tree using domain-separated hashing:

Leaf:  SHA256("BI_RECEIPT_LEAF_V1" || 0x00 || receipt_hash)
Node:  SHA256("BI_RECEIPT_NODE_V1" || 0x00 || left || right)

This lets you prove a specific receipt is included in a batch without downloading all receipts. The /v1/engine/merkle/proof endpoint returns the sibling hashes and directions.

Step 4: On-Chain Anchoring (Base L2)

For regulated industries, the Merkle root can be anchored on Base mainnet (chain ID 8453) as transaction calldata:

Anchor TX → burn address (0x000...000)
Value: 0
Calldata: 32-byte Merkle root

This creates an immutable timestamp proving the receipt chain existed at a specific block height. Verifiable on BaseScan by anyone, forever.

Zero LLM in the Authorization Path

Here's what makes this architecture unique: the authorization decision is fully deterministic. No AI model can influence whether an action is allowed or denied. The policy engine evaluates three rule types:

Allowlist -is this action in the permitted set?
Resource Scope -is this resource in the permitted scope?
Rate Limit -has this agent exceeded its quota?

All three must pass. Any failure → deny.

Gemini (via Vertex AI) powers six AI agents that sit outside the authorization path:

Auditor -analyzes receipt chains against OWASP and NIST frameworks
Recommender -proposes policy changes from CONFLICT patterns
Investigator -synthesizes incident reports
Coordinator -manages A2A agent discovery
Isolator -quarantines agents on HIGH/CRITICAL incidents
Onboarding/Support/Marketing -business operations

The security boundary is explicit: AI advises, the gateway decides.

Try It: 5-Minute Integration

Install and authorize your first action:

from sdk import Verigate

# Provision a tenant (or use an existing API key)
vg = Verigate(api_key="as_...")

# Register your agent
vg.register_agent("my-bot", name="My Bot", capabilities=["read", "query"])

# Authorize an action
result = vg.authorize("my-bot", action="read", resource="/data/users")
print(f"Decision: {result.decision}")
print(f"Receipt: {result.receipt_hash}")

# Verify the chain
chain = vg.verify_chain()
print(f"Chain valid: {chain['valid']}")

Generate a compliance report:

report = vg.generate_compliance_report(
    agent_name="my-bot",
    agent_description="Reads customer profiles from staging database",
    capabilities=["read", "query"],
    data_types=["PII", "customer_records"],
    frameworks=["EU AI Act", "HIPAA", "SOC 2"],
)
print(f"Findings: {len(report.findings)}")
# Download PDF: GET /v1/compliance/report/{report.report_id}/pdf

Or use the MCP server with Claude Desktop:

{
  "mcpServers": {
    "verigate": {
      "command": "python",
      "args": ["/path/to/mcp_server.py"],
      "env": { "VERIGATE_API_KEY": "as_..." }
    }
  }
}

56 tools available -authorize, verify, register agents/resources/actions, generate compliance reports, chat with the multi-agent system.

Free Quick-Scan

Not ready to commit? Try the free compliance quick-scan -describe your agent and get 3 EU AI Act findings in 30 seconds. No signup required.

Full report with all 6 frameworks (EU AI Act, HIPAA, SOC 2, DORA, NIST AI RMF, OWASP LLM Top 10): $299 one-time.

BuildWithGemini #GeminiXPRIZE

Sharding Hot Partitions in DynamoDB: Why Your Single-Partition Log Table Will Break at Scale

Heartlin Machado — Thu, 25 Jun 2026 04:53:45 +0000

This post was created for the H0: Hack the Zero Stack hackathon. #H0Hackathon

I shipped a DynamoDB table with a hot partition and didn't notice for three weeks. At demo scale (700 items, a few writes per minute) everything worked. It would have been fine right up until it wasn't.

The anti-pattern was obvious in hindsight: every AI operation log entry was written to PK: "OPS_LOG". A single partition key for an append-only, high-throughput write stream. This is the exact workload that hits DynamoDB's per-partition throughput ceiling.

Here's what I found, why it matters, and the three patterns I used to fix it, all without a table migration.

The problem: per-partition throughput limits

DynamoDB scales horizontally by splitting data across partitions. Each partition handles:

3,000 RCU (read capacity units) for eventually consistent reads
1,000 WCU (write capacity units)
10 GB of data

When you use PAY_PER_REQUEST (on-demand) billing mode, DynamoDB auto-scales table-level capacity. But it doesn't auto-scale within a partition. If all your writes hit the same partition key, you're bottlenecked at 1,000 WCU on that one partition regardless of your table-level throughput.

A note on adaptive capacity: DynamoDB does have an adaptive capacity feature that can temporarily boost a hot partition's throughput by borrowing from underutilized partitions. But adaptive capacity is a safety net, not a design strategy. It activates reactively, has limits, and doesn't eliminate the per-partition ceiling. Designing around the constraint is always better than relying on the database to compensate for a bad access pattern.

For PK: "OPS_LOG", with every single AI operation landing on one partition key, this means:

At 1 write/second: no problem
At 100 writes/second: no problem
At 1,001 writes/second: throttled. ProvisionedThroughputExceededException.

A real anti-counterfeiting platform processing scans across thousands of brands could easily hit this. And the failure mode is silent at first: DynamoDB retries internally with exponential backoff. You only see it as increased latency, then as dropped writes.

Where I found hot partitions

I audited every PK pattern in my single-table design and found three hot spots. The simplest detection method: count the cardinality of each PK pattern. If a PK has cardinality of 1 (every write goes to the same key), it's a hot partition by definition.

1. OPS_LOG (AI operations telemetry)

// BEFORE: Every AI call writes to the same PK
{
  PK: "OPS_LOG",
  SK: "2026-06-22T01:00:00Z#threat_detector",
  agent: "threat_detector",
  latencyMs: 340,
  aiSeverity: "HIGH",
  ...
}

Problem: Unbounded write concentration. Every AI classification, regardless of brand, product, or time, lands on one partition key. PK cardinality: 1.

2. THREAT#brandId (threat alerts)

// BEFORE: All threats for one brand in one partition
{
  PK: "THREAT#brand-abc-123",
  SK: "ALERT#2026-06-22T01:00:00Z#geographic_anomaly",
  severity: "HIGH",
  ...
}

Problem: A brand under active counterfeiting attack generates hundreds of alerts per day. All writes concentrate on THREAT#brand-abc-123. The brand being attacked the hardest gets the worst write performance. Exactly backwards from what you want.

3. BRAND_INDEX / PRODUCT_INDEX (collection keys)

// Collection key for "list all brands" without Scan
{
  PK: "BRAND_INDEX",
  SK: "BRAND#2026-06-22T01:00:00Z#abc",
  name: "Luxe Watches",
  ...
}

Problem: If brand registrations spike (product launch, marketing campaign), all writes hit BRAND_INDEX. Same issue as OPS_LOG. PK cardinality: 1.

The fix: three sharding patterns

Pattern 1: Time-bucketed sharding (for OPS_LOG)

Instead of a single OPS_LOG key, bucket writes by date:

// AFTER: Daily-bucketed partition keys
const dateBucket = timestamp.slice(0, 10); // "2026-06-22"
{
  PK: `OPS_LOG#${dateBucket}`,       // OPS_LOG#2026-06-22
  SK: `${timestamp}#${agent}`,
  GSI1PK: "OPS_LOG",                  // For cross-day queries
  GSI1SK: timestamp,
  ...
}

Write path: Each day's ops entries go to a different partition. Today's 1,000 writes go to OPS_LOG#2026-06-22. Tomorrow's go to OPS_LOG#2026-06-23. The per-partition WCU limit applies per day, not per all-time. PK cardinality goes from 1 to 365/year.

Read path: The dashboard needs recent ops entries across days. Two options:

Option A: Scatter-gather

// Query each daily partition in parallel
const days = 7;
const buckets = [];
for (let i = 0; i < days; i++) {
  const d = new Date(Date.now() - i * 86400000);
  buckets.push(d.toISOString().slice(0, 10));
}

const results = await Promise.all(
  buckets.map(date =>
    queryItems(`OPS_LOG#${date}`, undefined, { limit: 50, scanForward: false })
  )
);

// Merge and sort
const logs = results.flat()
  .sort((a, b) => b.timestamp.localeCompare(a.timestamp))
  .slice(0, limit);

7 parallel queries, each hitting a different partition. DynamoDB handles them concurrently. Total latency is the slowest single query, typically under 20ms.

Option B: GSI1 query

// Single query across all days via GSI
const logs = await queryGSI1("OPS_LOG", undefined, { limit: 50, scanForward: false });

The GSI1 projection has GSI1PK: "OPS_LOG" across all daily partitions. This re-concentrates reads on one GSI partition key, but reads are less critical than writes (3,000 RCU vs 1,000 WCU limit), and the dashboard is low-frequency.

I use scatter-gather as the primary path and GSI1 as a fallback.

Pattern 2: Month-bucketed sharding (for THREAT)

Threats are read by brand, so the bucket needs to include the brand ID:

// AFTER: Monthly-bucketed by brand
const monthBucket = timestamp.slice(0, 7); // "2026-06"
{
  PK: `THREAT#${brandId}#${monthBucket}`,   // THREAT#abc#2026-06
  SK: `ALERT#${timestamp}#${type}`,
  GSI1PK: `BRAND#${brandId}`,               // Cross-month queries
  GSI1SK: `THREAT#${timestamp}`,
  ...
}

Why monthly, not daily? Threats are lower volume than ops logs. A busy brand might get 10-50 threats per day. Monthly bucketing is sufficient to prevent hot-spotting while keeping the scatter-gather read path manageable (query last 3 months = 3 parallel queries vs 90 for daily).

Read path: GSI1 query on BRAND#brandId with SK prefix THREAT# returns threats across all monthly buckets, sorted by timestamp, no scatter-gather needed:

const threats = await queryGSI1(`BRAND#${brandId}`, "THREAT#", {
  limit: 50,
  scanForward: false,
});

This is the ideal pattern: shard writes on the base table, unify reads on a GSI.

Pattern 3: Accept the trade-off (for collection keys)

BRAND_INDEX and PRODUCT_INDEX are also single-partition keys. But brand and product registration is low-throughput: maybe 50 per day during a hackathon, maybe 500 per day in production. The 1,000 WCU per-partition limit won't be hit.

The decision: Don't shard collection keys. The engineering cost of scatter-gather reads on "list all brands" isn't justified when registration throughput will never approach the partition limit.

If it did (say, an enterprise customer bulk-importing 10,000 products via the batch endpoint), I'd switch to PRODUCT_INDEX#<shard> with N-way random sharding:

const shard = Math.floor(Math.random() * 10);
{
  PK: `PRODUCT_INDEX#${shard}`,  // Random distribution across 10 partitions
  SK: `PRODUCT#${timestamp}#${id}`,
  ...
}

Read path: scatter-gather across shards 0-9, merge, sort. But I don't need this today. YAGNI applies to partition sharding too.

How to detect hot partitions

1. Count your PK cardinality

The simplest check, no monitoring required. Read every PutItem and UpdateCommand in your codebase. For each one, ask:

Is this PK cardinality bounded or unbounded? PRODUCT#uuid = unbounded (good). OPS_LOG = bounded to 1 (bad).
Does this PK receive burst traffic? A PK that gets 1 write/hour is fine even if it's singleton. A PK that gets 1,000 writes/second needs sharding.
What's the growth rate? A PK with 100 items forever is different from a PK that grows by 1,000 items/day.

2. CloudWatch Contributor Insights

Enable Contributor Insights on the table. It shows the top-N partition keys by consumed capacity. If one PK is 80% of your write traffic, you have a hot partition even if you're not throttled yet. ThrottledRequests per table only fires after you're already impacted. Contributor Insights catches the problem before it hurts.

3. Write a risk matrix

Document every write operation with its PK. If you see the same PK in multiple write paths, that's a convergence signal:

Write Operation	PK	Risk
Register brand	`BRAND#uuid`	Low: unique per brand
Register product	`PRODUCT#uuid`	Low: unique per product
Record scan	`PRODUCT#uuid`	Medium: popular products get many scans
Write threat	`THREAT#brand#month`	Low: monthly bucketing distributes
Write ops log	`OPS_LOG#date`	Low: daily bucketing distributes
Brand index	`BRAND_INDEX`	Low: registration is low-throughput

If the Risk column says "High" for anything, shard it.

Summary: the three decisions

Hot Partition	Pattern	Bucket Size	Read Strategy
OPS_LOG	Time-bucketed	Daily	Scatter-gather (7 parallel queries)
THREAT#brand	Time-bucketed	Monthly	GSI1 query (single partition)
BRAND_INDEX	Accepted	N/A	Single partition query

The fix for OPS_LOG was 8 lines in the Lambda writer and 15 lines in the API reader. No table migration. No GSI rebuild. No downtime. The monthly THREAT bucketing was similarly surgical: change the PK format in the Lambda and switch the reader to GSI1.

That's the beauty of DynamoDB's schemaless design: you can change your partition key format mid-stream without touching existing data. New writes go to the new pattern; old data stays readable through legacy fallback queries. You don't need a migration. You need a new PutItem and a Query that checks both patterns.

The code

All changes are in a single commit: refactor: shard hot partitions, eliminate Scans, document access patterns.

Key files:

lambda/threat-detector.mjs: OPS_LOG daily bucketing and THREAT monthly bucketing
src/app/api/ops-log/route.ts: scatter-gather read across daily buckets
src/app/api/threats/route.ts: GSI1 read across monthly buckets
src/lib/dynamodb.ts: updated schema documentation

Built for the H0: Hack the Zero Stack hackathon using DynamoDB and Vercel. #H0Hackathon

DynamoDB Streams and Lambda for Real-Time Threat Detection: The Event Pipeline DynamoDB Was Built For

Heartlin Machado — Thu, 25 Jun 2026 04:47:09 +0000

This post was created for the H0: Hack the Zero Stack hackathon. #H0Hackathon

A consumer scans a product's QR code. Five seconds later, a threat alert appears on the brand's dashboard, no page refresh, no polling. The entire pipeline is DynamoDB Streams firing a Lambda, writing a threat alert back to DynamoDB, and pushing it to the browser via Server-Sent Events.

This post walks through the complete pipeline I built for GenuProof, an anti-counterfeiting platform running on DynamoDB and Vercel. Every component is serverless. Cost at zero traffic: $0.

The architecture

Consumer scans QR > Vercel API Route > DynamoDB PutItem (SCAN# record)
                                              |
                                              v
                                    DynamoDB Stream (NEW_IMAGE)
                                              |
                                              v
                                    Lambda: authentik-threat-detector
                                      |-- Geographic anomaly check
                                      |-- Burst scan detection
                                      |-- Claim violation check
                                      |-- Hash tampering check
                                              |
                                         (if anomaly)
                                              |
                                              v
                                    DynamoDB PutItem (THREAT# alert)
                                              |
                                              v
                                    SSE endpoint polls THREAT#
                                              |
                                              v
                                    Brand dashboard updates (no refresh)

Step 1: The scan write triggers the Stream

When a consumer verifies a product, the Vercel API route writes a scan record:

const scanRecord = {
  PK: `PRODUCT#${productId}`,
  SK: `SCAN#${new Date().toISOString()}`,
  productId,
  timestamp: now,
  ip,
  country: geo.country,
  city: geo.city,
  userAgent: req.headers.get("user-agent"),
  result: hashMatch && signatureValid ? "authentic" : "suspicious",
};
await putItem(scanRecord);

This write hits DynamoDB. Because Streams is enabled with NEW_IMAGE view type, DynamoDB emits a stream record containing the complete new item. The stream record goes to a shard, and our Lambda function is subscribed to that shard.

Step 2: Lambda receives the stream event

The Lambda function is configured as a DynamoDB Stream trigger:

Batch size: 10 (process up to 10 records per invocation)
Batching window: 5 seconds (wait up to 5s to fill the batch)
Starting position: LATEST

export async function handler(event) {
  for (const record of event.Records) {
    if (record.eventName !== "INSERT") continue;

    const newImage = record.dynamodb.NewImage;
    const sk = newImage.SK.S;

    // Only process scan records and provenance events
    if (sk.startsWith("SCAN#")) {
      await processScan(newImage);
    } else if (sk.startsWith("EVENT#")) {
      await processEvent(newImage);
    }
  }
}

Key detail: the Lambda filters by SK prefix. Because this is a single-table design, the Stream contains writes for all entity types: brand profiles, product registrations, webhook configs. The Lambda ignores everything except SCAN# and EVENT# records. This filtering happens in application code, not at the Stream level, which means we pay for Lambda invocations on non-scan writes. At our scale, this is negligible. At very high write volume, you'd use DynamoDB Stream event filtering to filter at the infrastructure level.

Step 3: Anomaly detection (four checks)

For each scan record, the Lambda runs four sequential anomaly checks:

Geographic anomaly

// Query recent scans for this product (last 24 hours)
const recentScans = await ddb.send(new QueryCommand({
  TableName: TABLE,
  KeyConditionExpression: "PK = :pk AND SK BETWEEN :start AND :end",
  ExpressionAttributeValues: {
    ":pk": `PRODUCT#${productId}`,
    ":start": `SCAN#${twentyFourHoursAgo}`,
    ":end": `SCAN#${now}`,
  },
}));

const countries = new Set(recentScans.Items.map(s => s.country.S));
if (countries.size >= 3) {
  // Same product scanned from 3 or more countries in 24h
  anomalyType = "geographic_anomaly";
}

This is the pattern DynamoDB was built for: range query within a partition, sorted by timestamp. The SK SCAN#2026-06-22T01:00:00Z sorts lexicographically as a timestamp. The BETWEEN query returns only scans in the 24-hour window, no filter expression needed, no wasted read capacity.

An important subtlety: DynamoDB Streams delivers records in order within a shard. This ordering guarantee is what makes burst detection correct. If scans arrived out of order, we couldn't reliably count "10 scans in the last hour" because the window would be inconsistent. Streams' per-shard ordering means the Lambda always sees scans in the sequence they were written.

Burst scan detection

const oneHourAgo = new Date(Date.now() - 3600000).toISOString();
const recentHourScans = recentScans.Items.filter(
  s => s.timestamp.S > oneHourAgo
);
if (recentHourScans.length >= 10) {
  anomalyType = "burst_scan";
}

Ten or more scans of the same product in one hour suggests someone is testing a cloned QR code, trying different devices and locations to see if the system catches them.

Claim violation

const claim = await ddb.send(new GetCommand({
  TableName: TABLE,
  Key: { PK: `PRODUCT#${productId}`, SK: "CLAIM" },
}));
if (claim.Item) {
  // Product already claimed by a consumer, new scan from different device
  anomalyType = "claimed_product_scan";
}

This is a GetItem: one partition read, one RCU. The CLAIM record was written when the original consumer claimed the product. Any subsequent scan from a different device fingerprint is suspicious.

Hash tampering

if (verificationResult !== "authentic") {
  anomalyType = "hash_tampering";
}

If the scan record itself shows the product failed hash verification, something is fundamentally wrong: either the database was tampered with or the product record was modified. CRITICAL severity.

Step 4: Write the threat alert

If any check fires, the Lambda writes a threat alert back to DynamoDB:

const monthBucket = timestamp.slice(0, 7); // "2026-06"
const alert = {
  PK: `THREAT#${brandId}#${monthBucket}`,
  SK: `ALERT#${timestamp}#${type}`,
  GSI1PK: `BRAND#${brandId}`,
  GSI1SK: `THREAT#${timestamp}`,
  brandId, type, severity, productId, details, timestamp,
  resolved: false,
  source: "lambda-stream",
};
await ddb.send(new PutCommand({ TableName: TABLE, Item: alert }));

Note the monthly-bucketed PK: THREAT#brandId#2026-06. If we used THREAT#brandId as a flat PK, a brand that generates thousands of threat alerts would create a write-hot partition, all writes concentrating on one partition key. Monthly bucketing distributes writes across time-based partitions. (I wrote a separate post on sharding hot partitions if you want the full breakdown.)

The GSI1PK: "BRAND#brandId" projection means we can query all threats for a brand across monthly buckets with a single GSI1 query, no scatter-gather needed on the read path.

Step 5: SSE pushes to the dashboard

The final piece: getting the alert to the brand's browser without polling from the client side.

The Vercel API has an SSE (Server-Sent Events) endpoint that the dashboard connects to:

// Client (React component)
const source = new EventSource(`/api/stream?brandId=${brandId}`);
source.addEventListener("threat", (e) => {
  const threat = JSON.parse(e.data);
  setThreats(prev => [threat, ...prev]);
});

The server-side SSE endpoint queries DynamoDB every 3 seconds for new threats newer than the last timestamp:

const threats = await queryGSI1(`BRAND#${brandId}`, `THREAT#${lastTimestamp}`, {
  limit: 10,
  scanForward: true,
});
for (const threat of threats) {
  controller.enqueue(encoder.encode(`event: threat\ndata: ${JSON.stringify(threat)}\n\n`));
  lastTimestamp = threat.timestamp;
}

The total latency from scan-to-dashboard:

PutItem scan record: ~5ms
Stream delivery to Lambda: ~100-500ms
Lambda anomaly checks: ~50-200ms (includes DynamoDB queries)
PutItem threat alert: ~5ms
SSE poll interval: up to 3s

Total: under 5 seconds from QR scan to dashboard alert.

Error handling and retry guarantees

What happens when the Lambda fails mid-execution? DynamoDB Streams has built-in retry:

At-least-once delivery: if the Lambda throws, DynamoDB retries the same batch. The function must be idempotent (writing a threat alert with the same PK/SK is a no-op PutItem, naturally idempotent).
Ordering preserved on retry: retries deliver the same records in the same order within the shard. Your anomaly detection logic sees a consistent sequence regardless of how many retries occurred.
Bisect on error: if a batch consistently fails, DynamoDB splits it in half and retries each half separately, isolating the poisoned record.

The Lambda doesn't need a dead-letter queue at our scale. If a record genuinely can't be processed after retries, it ages out of the 24-hour Stream retention window. No scan goes unprocessed silently: the scan record itself is already in DynamoDB, and the anomaly detection runs again on the next scan for the same product.

Why DynamoDB Streams (not SQS, not EventBridge)

The alternative architecture would be: write to DynamoDB, then separately publish to SQS or EventBridge, then subscribe a Lambda, then write the alert back. That's three services instead of one.

DynamoDB Streams collapses the first two into a built-in feature. The advantages over a separate message bus:

Zero infrastructure: no queue to create, no dead-letter queue to configure, no IAM policies for cross-service access
Guaranteed delivery: every successful DynamoDB write generates a stream record. No "forgot to publish" bugs.
Ordered processing: records arrive in write order within a shard. SQS standard queues don't guarantee ordering. SQS FIFO queues do, but require explicit deduplication IDs.
Same-table writes: the Lambda reads from DynamoDB and writes back to the same table. One set of credentials, one IAM policy, one table.
Cost: $0 at rest. No base cost when nobody is scanning. Lambda charges only for invocations.

The Lambda function

The full Lambda is 412 lines. Here's what each section does:

Lines	Function	Purpose
1-20	Setup	DynamoDB client, env vars, table name
21-40	handler()	Stream record iteration, SK filtering
41-106	anomalyChecks()	Four detection checks
108-250	processScan()	Orchestrates checks and writes
252-355	AI integration	Classification (downstream consumer of the pipeline)
356-397	writeAlert()	Threat alert with monthly bucketing
400-412	writeOpsLog()	Telemetry with daily bucketing

The complete source is in lambda/threat-detector.mjs at github.com/4KInc/genuproof.

Stream configuration

Setting	Value	Why
View type	NEW_IMAGE	Need the full item to run anomaly checks
Batch size	10	Process multiple scans per invocation to reduce Lambda cold starts
Batching window	5 seconds	Allows batching during burst periods
Starting position	LATEST	Only process new writes, not historical data
Retry	2	Retry failed batches (Streams guarantees ordering within retry)

What I learned

Single-table design with Streams is the canonical DynamoDB architecture. One table gives you one Stream. One Stream gives you one event pipeline. The simplicity is the point.
Filter in application code, not infrastructure (at small scale). At large scale, use Lambda event source filtering to avoid paying for irrelevant invocations.
Monthly-bucketed threat partitions were a late addition after I realized the flat THREAT#brandId PK would hot-spot. The fix took 30 minutes and required zero table migration: change the PK format in the Lambda writer and switch the reader to GSI1.
SSE is underrated for real-time features. It's simpler than WebSockets, works through CDNs, and the 3-second poll against DynamoDB costs essentially nothing at demo scale.
Streams ordering enables correctness, not just convenience. Burst detection, geographic anomaly windows, and claim violation checks all depend on seeing scans in the order they were written. Without Streams' per-shard ordering guarantee, you'd need application-level sequencing.

Built for the H0: Hack the Zero Stack hackathon using DynamoDB and Vercel. #H0Hackathon

17 Access Patterns, Zero Scans, One DynamoDB Table: Single-Table Design for a 37-Endpoint SaaS

Heartlin Machado — Thu, 25 Jun 2026 04:44:57 +0000

This post was created for the H0: Hack the Zero Stack hackathon. #H0Hackathon

Single-table DynamoDB design sounds great until you have five entity types that all need to be listed, queried by different owners, and processed by a single event stream. That's where the tutorials stop and the real design work starts.

I'm building GenuProof, a B2B anti-counterfeiting platform on DynamoDB and Vercel. One table, 13 PK/SK patterns, 17 access patterns serving 37 API endpoints. Zero joins, zero full-table Scans on data paths, predictable cost at any scale. This post walks through every design decision.

Why single-table?

The alternative is one table per entity: brands, products, events, scans, threats, webhooks. In DynamoDB, that means six tables, six sets of capacity settings, six sets of alarms, and no way to fetch related data in a single query without application-level joins.

Single-table design puts everything in one table with composite primary keys. You get:

One capacity config to manage (PAY_PER_REQUEST in my case)
One DynamoDB Stream that captures every write across all entity types
Transactional writes across entities (same table = same TransactWriteItems call)
Simpler operations: one table to back up, monitor, and alarm on

The cost is upfront design work. You must know your access patterns before you write a line of code.

The access patterns

I started by listing every operation my 37 API endpoints need. Multiple endpoints share the same underlying access pattern (e.g., three different product-listing endpoints all use the same GSI1 query), which is why 37 endpoints collapse to 17 distinct patterns:

Access Pattern	Operation
Register a brand	PutItem
Get brand profile	GetItem
List all brands	Query
Get brand stats	GetItem
Register a product (with hash and signature)	PutItem (x5 items)
Verify a product by code	GetItem, GetItem, Query
List products by brand	Query (GSI1)
List all products for public gallery	Query
Add provenance event	PutItem
Get provenance chain	Query
Record verification scan	PutItem
Get scan history	Query
Write threat alert	PutItem
Get threats by brand	Query (GSI1)
Write AI operations log	PutItem
Read AI ops log (last 7 days)	Query (scatter-gather)
Consumer claim product	PutItem
Health check	Scan (Limit: 1)

That last one is the only Scan in the entire application, and it reads exactly one item to test DynamoDB connectivity.

The schema

Here are the 13 PK/SK patterns that serve those 37 endpoints:

PK                               SK                          Entity
────────────────────────────────────────────────────────────────────
BRAND#<id>                       PROFILE                     Brand profile
BRAND#<id>                       STATS                       Counters (atomic)
BRAND#<id>                       WEBHOOK#<id>                Webhook config
PRODUCT#<id>                     META                        Product record
PRODUCT#<id>                     EVENT#<ts>#<type>           Provenance event
PRODUCT#<id>                     SCAN#<ts>                   Scan log
PRODUCT#<id>                     CLAIM                       Consumer lock (TTL)
VERIFY#<code>                    META                        Code to product
HASH#<sha256>                    META                        Hash to product
THREAT#<brand>#<YYYY-MM>         ALERT#<ts>#<type>           Threat alert
OPS_LOG#<YYYY-MM-DD>             <ts>#<agent>                AI ops log
BRAND_INDEX                      BRAND#<ts>#<id>             Brand listing
PRODUCT_INDEX                    PRODUCT#<ts>#<id>           Product listing

And one GSI (GSI1):

GSI1PK                           GSI1SK                      Access Pattern
────────────────────────────────────────────────────────────────────
BRAND#<id>                       PRODUCT#<ts>                Products by brand
BRAND#<id>                       THREAT#<ts>                 Threats by brand
VERIFY#<code>                    META                        Code lookup
OPS_LOG                          <ts>                        Ops across days

CLAIM records carry a TTL attribute so expired consumer locks are automatically cleaned up by DynamoDB, keeping the per-product item collection lean and avoiding stale claim checks on products that were never disputed.

Key design decisions

1. Collection keys replace Scans

The most common DynamoDB anti-pattern in tutorials: "just Scan the table and filter." At 1,000 items, nobody notices. At 1,000,000, your Lambda times out and your bill spikes.

I needed "list all brands" and "list all products" without Scan. The solution: collection keys. When I register a brand, I write two items:

// The brand itself
{ PK: "BRAND#abc", SK: "PROFILE", name: "Luxe Watches", ... }

// The collection entry
{ PK: "BRAND_INDEX", SK: "BRAND#2026-06-22T01:00:00Z#abc", name: "Luxe Watches", ... }

Now "list all brands" is Query(PK = "BRAND_INDEX", ScanIndexForward = false), returning brands sorted by registration date, no Scan, O(n) on the result set.

Same pattern for products with PRODUCT_INDEX.

Trade-off: every registration writes one extra item. At DynamoDB's $1.25/million writes, this costs $0.00000125 per registration. Acceptable.

2. Verification in three hops (no joins)

The critical hot path: a consumer scans a QR code. The server must verify the product in under 100ms.

Step 1: GetItem(PK="VERIFY#wfPHybaFV3_a", SK="META")
        returns { productId: "e084..." }

Step 2: GetItem(PK="PRODUCT#e084...", SK="META")
        returns { hash, signature, name, brandId, ... }

Step 3: Query(PK="PRODUCT#e084...", SK begins_with "EVENT#")
        returns provenance chain, sorted by timestamp

Three DynamoDB operations, all on the same partition for steps 2-3. No joins, no Scans. DynamoDB returns each in single-digit milliseconds.

3. Atomic counters avoid read-modify-write

Brand statistics (product count, scan count, threat count) use DynamoDB's UpdateExpression with ADD:

await ddb.send(new UpdateCommand({
  TableName: TABLE,
  Key: { PK: `BRAND#${brandId}`, SK: "STATS" },
  UpdateExpression: "SET scanCount = if_not_exists(scanCount, :zero) + :one",
  ExpressionAttributeValues: { ":zero": 0, ":one": 1 },
}));

No read-before-write. No race condition. Works correctly under concurrent Lambda invocations.

4. GSI1 for cross-partition queries

Within a single partition, DynamoDB sorts by SK automatically. But "all products for brand X" and "all threats for brand X" live in different partitions (PRODUCT#id and THREAT#brand#month).

GSI1 solves this. Every product and threat writes GSI1PK: "BRAND#brandId" with a typed sort key. One GSI, two access patterns:

// Products by brand
Query(IndexName="GSI1", GSI1PK="BRAND#abc", GSI1SK begins_with "PRODUCT#")

// Threats by brand (across all monthly buckets)
Query(IndexName="GSI1", GSI1PK="BRAND#abc", GSI1SK begins_with "THREAT#")

5. One Stream feeds the entire event pipeline

Because everything is in one table, one DynamoDB Stream captures every write. The Lambda function filters by SK prefix:

for (const record of event.Records) {
  const sk = record.dynamodb.NewImage.SK.S;
  if (sk.startsWith("SCAN#")) { /* anomaly detection */ }
  if (sk.startsWith("EVENT#")) { /* chain gap analysis */ }
}

With multiple tables, you'd need multiple Streams and multiple Lambda functions. Single table means single stream means single pipeline. This is what makes the AI threat detection layer possible: the Lambda receives every scan, event, and product registration through a single stream, with no polling and no external message queue.

The numbers

1 table, PAY_PER_REQUEST
13 PK/SK patterns serving 37 API endpoints
1 GSI (GSI1) serving 4 cross-partition query patterns
0 Scans on data paths (only health probe: Limit 1)
696 items, 259 KB at demo scale
Sub-10ms single-item reads, sub-50ms queries

What I'd do differently

If I were starting over, I'd add a GSI2 for entity-type queries (GSI2PK = "PRODUCT", GSI2SK = createdAt) instead of collection keys. GSI2 would be automatically maintained by DynamoDB, no extra writes at registration time. I chose collection keys because they work without a table migration, and I was mid-hackathon.

Full access pattern matrix

Here's the complete matrix. 17 access patterns, zero Scans:

Access Pattern	PK	SK	Index	Scan?
Register brand	`BRAND#id` / `BRAND_INDEX`	`PROFILE` / `BRAND#ts`	Table	No
Get brand	`BRAND#id`	`PROFILE`	Table	No
List brands	`BRAND_INDEX`	`begins_with(BRAND#)`	Table	No
Brand stats	`BRAND#id`	`STATS`	Table	No
Register product	`PRODUCT#id` / `VERIFY#code` / `HASH#` / `PRODUCT_INDEX`	Multiple	Table	No
Verify product	`VERIFY#code` then `PRODUCT#id`	`META` then `EVENT#`	Table	No
Products by brand	`BRAND#id`	`begins_with(PRODUCT#)`	GSI1	No
Explore products	`PRODUCT_INDEX`	`begins_with(PRODUCT#)`	Table	No
Add event	`PRODUCT#id`	`EVENT#ts#type`	Table	No
Get chain	`PRODUCT#id`	`begins_with(EVENT#)`	Table	No
Record scan	`PRODUCT#id`	`SCAN#ts`	Table	No
Scan history	`PRODUCT#id`	`begins_with(SCAN#)`	Table	No
Write threat	`THREAT#brand#month`	`ALERT#ts`	Table	No
Get threats	`BRAND#id`	`begins_with(THREAT#)`	GSI1	No
Write ops log	`OPS_LOG#date`	`ts#agent`	Table	No
Read ops log	`OPS_LOG#date` x N	scatter-gather	Table	No
Health check	n/a	n/a	Table	Limit:1

The complete source is at github.com/4KInc/genuproof. The schema lives in src/lib/dynamodb.ts and the Lambda in lambda/threat-detector.mjs.

Built for the H0: Hack the Zero Stack hackathon using DynamoDB and Vercel. #H0Hackathon

DEV Community: Heartlin Machado

How I Built a RAG System Over more than 100 USCIS Administrative Appeals Office Decisions with Gemini

Why AAO decisions?

The corpus bias problem

The crawl pipeline

Gemini structured extraction

Why category isolation matters

Chunking by criterion, not by token window

Hybrid retrieval: cosine + TF-IDF + RRF

The authority corpus

Generation with citations

The eval suite

What this enables

What I learned

How to Generate Cryptographic Proof of AI Agent Authorization (EU AI Act Article 14)

How to Generate Cryptographic Proof of AI Agent Authorization for EU AI Act Article 14 Compliance

The Problem

The Architecture: Ed25519 + SHA-256 + Merkle + Base L2

Step 1: Every Agent Action Gets a Signed Receipt

Step 2: Receipts Are Hash-Chained

Step 3: Merkle Tree with Inclusion Proofs

Step 4: On-Chain Anchoring (Base L2)

Zero LLM in the Authorization Path

Try It: 5-Minute Integration

Install and authorize your first action:

Generate a compliance report:

Or use the MCP server with Claude Desktop:

Free Quick-Scan

Links

BuildWithGemini #GeminiXPRIZE

Sharding Hot Partitions in DynamoDB: Why Your Single-Partition Log Table Will Break at Scale

The problem: per-partition throughput limits

Where I found hot partitions

1. OPS_LOG (AI operations telemetry)

2. THREAT#brandId (threat alerts)

3. BRAND_INDEX / PRODUCT_INDEX (collection keys)

The fix: three sharding patterns

Pattern 1: Time-bucketed sharding (for OPS_LOG)

Pattern 2: Month-bucketed sharding (for THREAT)

Pattern 3: Accept the trade-off (for collection keys)

How to detect hot partitions

1. Count your PK cardinality

2. CloudWatch Contributor Insights

3. Write a risk matrix

Summary: the three decisions

The code

DynamoDB Streams and Lambda for Real-Time Threat Detection: The Event Pipeline DynamoDB Was Built For

The architecture

Step 1: The scan write triggers the Stream

Step 2: Lambda receives the stream event

Step 3: Anomaly detection (four checks)

Geographic anomaly

Burst scan detection

Claim violation

Hash tampering

Step 4: Write the threat alert

Step 5: SSE pushes to the dashboard

Error handling and retry guarantees

Why DynamoDB Streams (not SQS, not EventBridge)

The Lambda function

Stream configuration

What I learned

17 Access Patterns, Zero Scans, One DynamoDB Table: Single-Table Design for a 37-Endpoint SaaS

Why single-table?

The access patterns

The schema

Key design decisions

1. Collection keys replace Scans

2. Verification in three hops (no joins)

3. Atomic counters avoid read-modify-write

4. GSI1 for cross-partition queries

5. One Stream feeds the entire event pipeline

The numbers

What I'd do differently

Full access pattern matrix