Fix N+1 Trigger Patterns Where Lambda Functions Hammer the Same DynamoDB Partition Key

#opensource #typescript #aws #dynamodb

You add a sixth Lambda trigger to your OrderEvents table, deploy it, and within 20 minutes your SLA dashboard goes red. Latency on order writes jumps from 4ms to 40ms. The function itself is fine. The table is fine. The problem is that five other Lambdas are already hitting the same partition key on every write, and you just made it six. DynamoDB's internal partition throttling doesn't care that each function looks clean in isolation.

This is an N+1 trigger problem, and your AI coding assistant cannot catch it. Not because it lacks intelligence, but because the fact that five Lambdas already target that table lives in your AWS account and your full codebase — not in the file your assistant has open.

Infrawise · npm

Why the LLM Can't See the Pattern

When you ask Claude to write a new order processing Lambda, it reads the file you have open and generates code that looks correct — because in the context of that one file, it is correct. It doesn't know about ProcessRefundsLambda, NotifyFulfillmentLambda, SyncInventoryLambda, UpdateAnalyticsLambda, and AuditTrailLambda, all of which you wrote in previous sprints and which all write to the Orders table.

This is a category of failure that model quality doesn't fix. A better model produces a more fluent explanation for why your latency spiked. The fact that five functions converge on the same table is a lookup, not a prediction. The source of truth is a combination of your code (which functions exist) and your infrastructure (what they access).

Infrawise draws that boundary explicitly. It extracts the answer from your code using AST parsing and from your infrastructure using API calls, then hands that graph to the model as structured context — it never generates the answer.

How Infrawise Traces Trigger Chains to the Same Table

When Infrawise scans your repository, it uses ts-morph to walk every CallExpression in every source file. It's not searching for the string "DynamoDB" — it matches call structure against a known set of SDK patterns in a DYNAMO_OPERATIONS set: both v2 method names (getItem, query, putItem, updateItem, deleteItem, batchWriteItem) and v3 command classes (QueryCommand, PutItemCommand, UpdateItemCommand, DeleteItemCommand). Each matched call becomes an extracted operation: this function performs this operation against this table.

That list feeds into a SystemGraph. Nodes represent tables, functions, indexes, queues, and topics. Edges represent query, scan, and write relationships. The graph is what makes the N+1 pattern visible: not just "six functions exist" and "a table exists," but "six functions all write to Orders with no distribution across paths."

The HotPartitionAnalyzer walks the graph and fires when a table receives five or more distinct access edges from separate code paths. The threshold is configurable per-table via hotPartitionThresholds in infrawise.yaml — Issue #57 resolved false positives on high fan-in systems by making this a per-table setting rather than a single global value. A finding looks like:

Medium severity
Potential hot partition detected on DynamoDB table "Orders"
  Table "Orders" is accessed by 6 distinct code paths, which may create
  hot partition issues at scale. High access concentration on the same
  partition key can throttle requests.
  Recommendation: Consider adding a random suffix or timestamp to partition
  keys (write sharding). Use DynamoDB DAX for read-heavy workloads.

This runs deterministically. Feed it the same graph, get the same findings. There's no sampling temperature involved.

The infrawise check --fail-on medium command gates CI on this finding. Since HotPartitionAnalyzer emits medium severity, you need --fail-on medium (the default --fail-on high won't catch it). When violations are found, infrawise check exits with code 1 — your build fails before the sixth Lambda merges, and the engineer who wrote it sees the finding in the PR, not on a latency dashboard at 11pm.

Fixing It — Restructuring the Key or Sharding the Access Pattern

Once Infrawise surfaces the pattern, you have two practical options.

Write sharding adds a random suffix to the partition key — distributing writes across logical partitions. Reads require scatter-gather or a deterministic suffix derived from the order ID. This is the right choice when all six functions are pure writers and reads are handled by a separate query path.

Access pattern separation restructures which functions need direct table access at all. If SyncInventoryLambda and UpdateAnalyticsLambda are consuming state that flows through the Orders table, they shouldn't write to it directly — they should react to a DynamoDB stream and write to their own tables. The fan-in often exists because multiple services treat the same source-of-truth table as a synchronization point when they should be downstream consumers.

The analyze_function tool helps here. Point it at any function and it traces the full access path: which tables the function reads and writes, which indexes it uses, what event shapes trigger it, and what queues or topics it publishes to. That trace makes it clear which functions can be moved to stream consumption and which genuinely need direct write access.

Conclusion

The N+1 trigger problem is invisible to any tool that works only from your open files. It's not a reasoning failure — no amount of context about a single Lambda reveals that five others already saturate the same table. That fact lives in the intersection of your code and your infrastructure.

Infrawise puts that intersection in a graph, runs deterministic analyzers over it, and surfaces the finding before it becomes a production incident. The model's job is to decide what to do — restructure the key, introduce a stream, separate the access pattern. The detection is never generated; it's extracted.

If your AI assistant is writing Lambda functions against DynamoDB, give it the access graph first: GitHub · npm.

Key Takeaways

A hot partition problem requires knowing how many code paths hit the same table — that fact lives in your AWS account and your full codebase, not in the file your AI assistant has open.
Infrawise's HotPartitionAnalyzer counts distinct code paths hitting each DynamoDB table and fires at a configurable threshold, with per-table overrides via hotPartitionThresholds in infrawise.yaml.
Hot partition findings emit medium severity; use infrawise check --fail-on medium to gate CI builds on them (the default --fail-on high won't catch them).
analyze_function traces the full access path for any function — tables, indexes, event shapes, queues — making it easy to separate writers from downstream consumers.
Write sharding and event-stream separation are the two practical fixes; which one to pick depends on whether converging functions genuinely need to write or are just consuming state.