DEV Community

Cover image for 10 Claude Prompts for Better Architecture Decisions (With Examples)
Dev Prompts
Dev Prompts

Posted on

10 Claude Prompts for Better Architecture Decisions (With Examples)

Architecture decisions are the most expensive code you write — because you never write them.

They live in Slack threads, whiteboard photos, and the head of whoever was in the room that Thursday. When they go wrong, you don't get a stack trace. You get six months of "why is everything so hard to change?" — a codebase shaped by a conversation no one can fully reconstruct.

Most developers don't skip system design because they don't care. They skip it because the feedback loop is too slow. You design something, build it, then discover it was wrong three sprints later. AI doesn't fix bad judgment, but it can compress that feedback loop from weeks to minutes: surface trade-offs before you commit, stress-test assumptions before they calcify, document decisions before everyone forgets why.

These 10 Claude prompts for system design cover the architecture tasks that actually recur: decomposing vague requirements, analyzing trade-offs, reviewing systems for risk, writing ADRs, designing APIs, finding failure modes. If you've read the code review prompts or the debugging prompts, you know the format: copy-paste ready, paired with real example output, explained so you know when to reach for each one. Those prompts catch bugs in code and bugs in production. These catch bugs in decisions — before you write any code at all.


Before You Start: How to Feed Claude an Architecture Problem

Architecture is the hardest domain to AI-prompt because the context is massive. A code review prompt needs a diff. A debugging prompt needs a stack trace. An architecture prompt needs your entire system's shape — its constraints, scale, team, history, and the specific thing that's forcing a decision right now. Dumping all of that into every prompt is impractical. Trimming too much makes the answer useless.

The solution is to treat Claude's context window as a budget and spend it deliberately. Three tiers:

Always include (non-negotiable): A 5–10 sentence plain-text overview of what your system does and how data flows through it. Your tech stack and versions. The specific constraint or question driving the decision.

Include when relevant: Scale numbers — requests per second, data volume, growth rate. Team size and skill profile (three senior Go engineers is a different design space than twenty mixed-skill engineers). Existing pain points and tech debt. Compliance or regulatory constraints that eliminate certain options.

Trim aggressively: Implementation details of components not involved in the decision. Historical context older than six months unless it's directly relevant. Source code — architecture prompts rarely need code, they need descriptions of code.

The format that works: paste your system context as a labeled block at the top of every prompt, before the question. Here is a template you can copy once and reuse:

SYSTEM CONTEXT:
- What it does: [1-2 sentences]
- Architecture: [monolith | microservices | serverless | hybrid]
- Key components: [list services, stores, queues]
- Scale: [requests/day, data size, growth trajectory]
- Tech stack: [languages, frameworks, databases, infra]
- Team: [size, seniority, relevant expertise]
- Constraint: [the specific thing driving this decision]
Enter fullscreen mode Exit fullscreen mode

Every prompt below assumes you paste this block first. The prompts are the question. The context block is what makes the answer worth reading.


The 10 Prompts

1. Break Down a Vague Requirement Into System Components

The Prompt

Given this product requirement, decompose it into concrete system components.
For each component: name it, define its single responsibility in one sentence,
identify its inputs and outputs, and flag any component where the requirement
is ambiguous. Do not recommend specific technologies yet — just map the
components and their boundaries.

Requirement: [paste requirement here]

SYSTEM CONTEXT:
[paste context block]
Enter fullscreen mode Exit fullscreen mode

What it does
Takes a fuzzy product requirement — "we need real-time notifications," "we need an activity feed," "users should be able to collaborate on documents" — and breaks it into concrete system components with defined boundaries. Turns a whiteboard idea into something you can actually design.

Example output

Requirement: "Users need a real-time activity feed showing actions by people they follow."

Components:

1. Event Producer
   Responsibility: Emit a structured ActivityEvent whenever a tracked user action occurs
   Inputs: User action (create, like, follow, comment) from app services
   Outputs: ActivityEvent { actor_id, action_type, object_id, object_type, timestamp }
   Ambiguity: Which actions qualify? "Follow" and "post" seem clear; "view" is not mentioned
   but implies stalking risk — clarify before including.

2. Event Ingestion Pipeline
   Responsibility: Accept ActivityEvents and route them to the appropriate stores
   Inputs: ActivityEvents from producers
   Outputs: Written records to feed store + push notifications to delivery layer
   Ambiguity: None — standard fan-out problem.

3. Feed Storage
   Responsibility: Store pre-computed or on-demand feed entries per user
   Inputs: Routed ActivityEvents
   Outputs: Paginated feed entries on read
   Ambiguity: Fan-out-on-write vs. fan-out-on-read depends on follower counts not specified.
   High follower counts (>10k) break fan-out-on-write — need that number.

4. Real-Time Push Layer
   Responsibility: Deliver new feed events to connected clients without polling
   Inputs: ActivityEvents routed from ingestion
   Outputs: WebSocket/SSE push to active client sessions
   Ambiguity: "Real-time" — sub-second or eventual (a few seconds)? Different architectures.

5. Feed Read API
   Responsibility: Serve paginated feed to clients on request
   Inputs: authenticated user_id, pagination cursor
   Outputs: ordered list of ActivityEvents with actor metadata

Clarifying questions to take back to product:
- Which actions trigger an event?
- Max expected followers per user?
- Definition of "real-time" — latency SLA?
Enter fullscreen mode Exit fullscreen mode

Why it works
Forcing component decomposition before technology selection prevents the most common architecture failure: choosing a tool before understanding the problem. The ambiguity-flagging instruction turns the prompt into a requirements clarification tool, not just a design tool.

Pro tip — After decomposition, run each component through Prompt #5 (API contract design) to define how they talk to each other before you start building any of them.


2. Analyze Trade-offs Between Two Architectural Approaches

The Prompt

Compare these two architectural approaches for my use case. For each approach,
evaluate it across these dimensions: implementation complexity, operational
complexity, scalability ceiling, time-to-ship, team skill fit, data consistency
guarantees, and reversibility (how hard is it to migrate away from this later).

Produce a comparison table. Then give me a direct recommendation — which approach
for my current context, and at what trigger point does the other approach become
the better choice?

Approach A: [describe first approach]
Approach B: [describe second approach]

SYSTEM CONTEXT:
[paste context block]
Enter fullscreen mode Exit fullscreen mode

What it does
Takes two candidate approaches and produces a structured comparison across dimensions that matter — not just theoretical benefits, but fit against your actual constraints. Prevents "gut feel" decisions by making the trade-off surface explicit.

Example output

Context: Order management system, 3-person team, 8k orders/day, considering
event sourcing vs. CRUD + soft deletes.

| Dimension              | CRUD + soft deletes     | Event sourcing              |
|------------------------|-------------------------|-----------------------------|
| Implementation         | Low — patterns are      | High — CQRS, projections,   |
| complexity             | well-understood         | replay logic, event schema  |
| Operational complexity | Low — standard DB ops   | High — event store ops,     |
|                        |                         | projection rebuilds          |
| Scalability ceiling    | ~50k orders/day before  | Effectively unbounded for   |
|                        | write contention matters| write throughput             |
| Time-to-ship           | 2-3 weeks               | 6-10 weeks                  |
| Team skill fit         | Strong — standard ORM   | Low — team has no event     |
|                        | patterns                | sourcing experience          |
| Data consistency       | Strong — ACID           | Eventually consistent for   |
|                        |                         | read projections             |
| Reversibility          | High — standard schema  | Low — migrating away from   |
|                        | migration               | an event store is painful   |

Recommendation: CRUD + soft deletes for current context.

At 8k orders/day with a 3-person team and no event sourcing experience, the
operational overhead of event sourcing exceeds its benefits. You don't need the
audit trail, temporal queries, or write throughput that justify it.

Trigger point for reconsidering: if you need to replay business logic over
historical orders (e.g., recalculating loyalty points retroactively), or if
write throughput exceeds 30k orders/day and you've already optimized the CRUD
path. Either trigger means a migration — build the CRUD path with clean domain
events as a stepping stone.
Enter fullscreen mode Exit fullscreen mode

Why it works
Enumerated comparison dimensions prevent Claude from writing a generic "it depends" essay. The "reversibility" dimension is the one most architects forget — it forces consideration of how painful the wrong choice is to undo, not just how good the right choice is.

Pro tip — Add your team's actual constraints: "We have 3 months, 2 backend engineers, and zero Kafka experience." Constraints collapse the theoretical trade-off space down to what actually matters for you.


3. Audit an Existing System for Scalability Risks

The Prompt

Review this system description and identify the components most likely to
become bottlenecks under 10x current load. For each risk: name the component,
describe the failure mode, estimate how close we are to hitting it (distance-
to-failure: near / medium / far based on current scale), and suggest one
mitigation. Rank by severity × proximity. Stop at the top 5.

SYSTEM CONTEXT:
[paste context block — include current traffic numbers]
Enter fullscreen mode Exit fullscreen mode

What it does
Takes a description of your current system and identifies what breaks first under growth. Ranking by severity × proximity makes the output actionable: fix the near-term risks, plan for the medium-term ones, note the far-term ones and move on.

Example output

Current load: 800 req/s, 2M users, PostgreSQL + Redis, Node monolith, single region.

Risk 1 — PostgreSQL single writer (NEAR)
Failure mode: All writes (user events, orders, session updates) funnel through one
primary. At ~3,000-5,000 writes/sec the write-ahead log becomes a bottleneck.
At 800 req/s now with ~40% write ratio, you're at ~320 writes/sec. 10x puts you
at 3,200 — directly in the failure zone.
Mitigation: Audit write patterns now. Move high-frequency, low-value writes
(event logs, analytics) to a separate store. Read replicas for the read path.

Risk 2 — Redis with no eviction policy (NEAR)
Failure mode: Cache grows unbounded. When Redis runs out of memory it either
crashes (noeviction) or starts evicting arbitrarily (allkeys-lru, if set).
Either causes a cache stampede against Postgres.
Mitigation: Set maxmemory + allkeys-lru today. This is a one-line config change.

Risk 3 — Synchronous image processing on request threads (MEDIUM)
Failure mode: Image uploads block a Node worker thread for 200-800ms. At current
scale this is masked by low concurrency. At 10x, thread exhaustion causes
request queue backup across all endpoints.
Mitigation: Offload to a background job queue (BullMQ, SQS). Return a job ID,
poll or push for completion.

Risk 4 — Single-region deployment (MEDIUM)
Failure mode: Regional outage takes down the entire service. No failover path.
At 2M users, this is a business risk more than a scalability risk.
Mitigation: Multi-region read replicas first (lower blast radius, minimal cost),
active-active later if SLA demands it.

Risk 5 — Monolithic deploy unit (FAR)
Failure mode: As team grows, deployment coupling slows release velocity and a
bad deploy in one area takes down everything. Not a load problem — a team
scaling problem. Expect this to matter at 15+ engineers.
Mitigation: Not urgent. Extract high-churn modules into separate deploy units
when team size crosses the threshold, not before.
Enter fullscreen mode Exit fullscreen mode

Why it works
"What breaks first under 10x" is a sharper question than "is this scalable." Forcing a ranked list with distance-to-failure estimates prevents Claude from producing an exhaustive academic analysis and keeps the output actionable.

Pro tip — Include your current traffic numbers in the context block. Without them, Claude can only say a risk exists. With them, it can estimate when the risk becomes a real problem.


4. Write an Architecture Decision Record (ADR)

The Prompt

Write an Architecture Decision Record (ADR) for this decision. Structure it as:

Title: [short imperative phrase]
Status: [Proposed | Accepted | Deprecated | Superseded]
Context: What is the situation that forced a decision?
Decision: What did we decide?
Alternatives considered: What else was on the table and why was it rejected?
Consequences: What are the trade-offs accepted? What becomes easier? What becomes harder?
Revisit trigger: What condition would cause us to revisit this decision?

Decision to document: [describe the decision and the reasoning behind it]

SYSTEM CONTEXT:
[paste context block]
Enter fullscreen mode Exit fullscreen mode

What it does
Takes a decision you've already made (or are about to finalize) and produces a structured ADR that future engineers can actually use. The "revisit trigger" section is the most underrated part — it makes the decision reversible by design instead of permanent by default.

Example output

Title: Use DynamoDB for session storage

Status: Accepted

Context:
Our PostgreSQL primary currently stores all user sessions. Session reads account
for ~60% of total DB queries. Sessions have a simple key-value access pattern
(read by session_id, write on auth, delete on logout/expiry), no relational
requirements, and a 24-hour TTL. As we approach 5M users, session read volume
is becoming the largest contributor to Postgres connection pool exhaustion.

Decision:
Migrate session storage to DynamoDB. Use session_id as partition key, store
session payload as a JSON attribute, configure DynamoDB TTL on the expiry field
to handle automatic cleanup.

Alternatives considered:
- Redis: Evaluated first. Strong latency characteristics, but operational overhead
  of managing Redis cluster at scale (failover, eviction policy, memory sizing)
  exceeds DynamoDB's managed cost at our volume. Redis remains for application
  caching where sub-millisecond reads matter.
- Keep in Postgres with read replicas: Deferred the problem without solving it.
  Read replicas add ops cost; session data has zero relational value in Postgres.
- ElastiCache (Memcached): No persistence guarantees — session loss on node failure
  is unacceptable without application-level handling we don't want to own.

Consequences:
+ Removes ~60% of Postgres read load
+ Managed TTL eliminates session cleanup job
+ Scales independently of application DB
- Adds a new dependency (DynamoDB) to the data layer
- Session data is no longer queryable via SQL — admin tooling needs updating
- Local development now requires DynamoDB Local or a mock

Revisit trigger:
If session access patterns become relational (e.g., "list all active sessions for
a user" becomes a product requirement), or if DynamoDB costs exceed $500/month
before we reach 10M users, reconsider Redis with managed ElastiCache.
Enter fullscreen mode Exit fullscreen mode

Why it works
ADRs are universally recognized as high-value and universally skipped because writing them is tedious. Claude handles the structure and prose; you supply the decision and enough context to make it accurate. The output is paste-ready into any ADR file or wiki.

Pro tip — If your team has an existing ADR template, paste it into the prompt. Claude will match your format exactly. If you don't have a template, this output is your template.


5. Design an API Contract Between Two Services

The Prompt

Design the API contract between these two services for the use cases described.
Produce: endpoint or method signatures, request and response schemas, error codes
and their meanings, idempotency strategy, versioning approach, and authentication
method. Design the failure case first — what does the caller do when this service
is unavailable or returns an error?

Service A (caller): [describe the calling service]
Service B (provider): [describe the service being called]
Use cases to support: [list the operations needed]

SYSTEM CONTEXT:
[paste context block]
Enter fullscreen mode Exit fullscreen mode

What it does
Takes two services and the operations they need to exchange, and produces a concrete API contract — not just the happy path. The "design the failure case first" instruction is the key constraint. Most API design produces a sunny-day spec; this prompt produces a contract you can actually build against.

Example output

Services: Order Service (caller) → Inventory Service (provider)
Use case: Reserve stock for a new order before payment is captured.

--- CONTRACT ---

POST /v1/reservations
Authorization: Bearer {service-token}
Idempotency-Key: {order_id}  // Required — enables safe retries

Request:
{
  "order_id": "ord_abc123",       // string, required — used as idempotency key
  "items": [
    {
      "sku": "WIDGET-RED-M",
      "quantity": 2
    }
  ],
  "expires_at": "2026-04-13T12:30:00Z"  // when to auto-release if not confirmed
}

Response 201 — Full reservation successful:
{
  "reservation_id": "res_xyz789",
  "status": "reserved",
  "items": [{ "sku": "WIDGET-RED-M", "quantity": 2, "reserved": 2 }],
  "expires_at": "2026-04-13T12:30:00Z"
}

Response 207 — Partial reservation (some items unavailable):
{
  "reservation_id": "res_xyz789",
  "status": "partial",
  "items": [{ "sku": "WIDGET-RED-M", "quantity": 2, "reserved": 1 }]
}

Response 409 — Duplicate reservation (idempotency key already used):
{
  "error": "DUPLICATE_RESERVATION",
  "reservation_id": "res_xyz789"  // return existing reservation
}

Response 422 — Validation error:
{ "error": "INVALID_REQUEST", "details": "expires_at must be in the future" }

Response 503 — Inventory service unavailable

DELETE /v1/reservations/{reservation_id}
// Called if payment capture fails — releases reserved stock

Failure handling (caller's responsibility):
- 503 / network timeout: Retry with exponential backoff (100ms, 200ms, 400ms).
  Use Idempotency-Key — safe to retry.
- 207 partial: Order Service decides: proceed with partial items, cancel order,
  or return partial-availability response to the customer. Do not assume.
- If no response within 3s: assume failure, retry. Reservation auto-expires
  via expires_at if never confirmed — no manual cleanup needed.

Versioning: /v1/ prefix. Breaking changes increment to /v2/. v1 supported for
minimum 6 months after v2 ships.
Enter fullscreen mode Exit fullscreen mode

Why it works
Listing "error codes," "idempotency strategy," and "failure handling" as explicit output requirements prevents the most common API design failure: specifying the happy path and leaving the rest to "we'll figure it out." The caller's failure handling section is the part that gets skipped most often and causes the most incidents.

Pro tip — After generating the contract, run it through Prompt #10 (failure modes) with "what are the most likely ways this API contract causes a production incident?"


6. Map the Single Points of Failure in a System

The Prompt

Given this system description, identify every single point of failure (SPOF) —
any component whose failure takes down the entire system or a critical user path.
For each SPOF: identify the component, describe the failure mode, estimate the
blast radius (what percentage of users/functionality is affected), and suggest
the minimal mitigation. Include human SPOFs — people, credentials, or knowledge
held by only one person.

SYSTEM CONTEXT:
[paste context block]
Enter fullscreen mode Exit fullscreen mode

What it does
Maps every component in your system that is "one failure away from an outage." The blast-radius framing makes output actionable rather than a vague concern list. The human SPOF instruction catches the fragility that doesn't show up in architecture diagrams.

Example output

System: E-commerce checkout flow. PostgreSQL primary, Redis, Stripe, single
DNS provider (Cloudflare), 3 backend engineers.

SPOF 1 — PostgreSQL primary
Failure mode: Primary DB failure takes down all write operations — no orders can
be placed, cart cannot be saved, inventory cannot be updated.
Blast radius: 100% of transactional functionality. Read-only browse still works
if reads are on replicas.
Mitigation: Promote a read replica to primary (RDS Multi-AZ does this automatically
in ~60s). Verify your app reconnects on failover — not all ORMs handle this cleanly.

SPOF 2 — Stripe API
Failure mode: Payment gateway unavailability blocks all order completion.
Blast radius: 100% of checkout — users can browse and add to cart but cannot pay.
Mitigation: Circuit breaker in the payment service to fail fast and return a
clear error instead of hanging. Optionally: secondary processor (Braintree) on
the circuit-open path. Minimum: test your graceful-degradation path exists.

SPOF 3 — Single DNS provider (Cloudflare)
Failure mode: Cloudflare outage makes your domain unresolvable.
Blast radius: 100% of traffic — no users can reach the site at all.
Mitigation: Multi-provider DNS (Cloudflare primary + Route53 secondary) via
NS record delegation. Low cost, high payoff.

SPOF 4 — Synchronous email send blocking order confirmation
Failure mode: Email provider timeout (SendGrid, SES) causes order confirmation
HTTP request to hang or fail. If the transaction rolls back on email failure,
orders are lost silently.
Blast radius: Potentially 100% of orders if the code path is synchronous and
non-resilient. Risk is highest during email provider degradation events.
Mitigation: Decouple email send from order confirmation transaction. Place email
job in a queue; confirm the order regardless of email delivery.

SPOF 5 — Human: One engineer holds all production credentials
Failure mode: That engineer is unavailable during an incident. No one else can
access the production database, deploy a hotfix, or rotate a compromised key.
Blast radius: Operational paralysis during the engineer's absence.
Mitigation: Rotate all credentials into a shared secrets manager (1Password
Teams, AWS Secrets Manager). Document runbooks. At least two people should be
able to independently perform every production operation.
Enter fullscreen mode Exit fullscreen mode

Why it works
"Single point of failure" is a precise enough concept that Claude can apply it systematically — tracing each path through the system and asking what happens if this node is gone. Requesting blast-radius estimates forces prioritization rather than an equal-weight concern list.

Pro tip — Run this before every major launch, not just during architecture reviews. Systems accumulate new SPOFs as features are added.


7. Evaluate Whether to Split a Monolith Into Services

The Prompt

Evaluate whether extracting [component/module] from our monolith into a separate
service is justified right now. Analyze: deployment coupling (how often does this
component cause deploys it doesn't own?), team ownership (does a different team
need to own this independently?), data coupling (how many DB tables does it share
with the rest of the monolith?), independent scalability (does this component
have meaningfully different load characteristics?), and operational cost (what
new concerns does a service split introduce?).

Give a go / no-go recommendation with the specific condition that would flip it.

SYSTEM CONTEXT:
[paste context block]
Enter fullscreen mode Exit fullscreen mode

What it does
Applies the actual preconditions for microservices — not the theoretical benefits — to your specific situation. Most premature splits fail on operational cost and data coupling, not on architecture principles.

Example output

Question: Should we extract the notification system into its own service?

Current state: Notification logic lives in the monolith. Shares 3 DB tables
(users, preferences, notification_log) with core app. Called synchronously from
2 endpoints (post creation, follow event). Same team owns everything. 200k
notifications/day, growing at 15%/month.

Analysis:

Deployment coupling: LOW — notification code changes ~2x/month. Core app deploys
8-10x/month. When core deploys, notification code comes along for the ride, but
there's no evidence this is causing incidents or blocking releases.

Team ownership: LOW JUSTIFICATION — same team owns both. No organizational
forcing function for a service boundary.

Data coupling: HIGH — notifications directly read from users and preferences
tables. A service split requires either duplicating that data (sync problem) or
making cross-service DB calls (coupling traded for network latency).

Independent scalability: MODERATE — notification volume will diverge from app
volume as the product grows. At current scale (200k/day), this is not yet a
problem. At ~5M/day it likely is.

Operational cost introduced: Service registry, inter-service authentication,
network failure handling, separate deployment pipeline, distributed tracing,
separate on-call runbook. That's real overhead for a 3-person team.

Recommendation: NO — not yet.

Extract the notification system as a well-bounded *module* first: clear interface,
no direct DB access from outside the module, event-based integration internally.
That gives you the architectural discipline without the operational overhead.

Flip condition: Go when (a) a second team needs to own notifications independently,
or (b) notification throughput needs to scale independently and you've exhausted
in-process optimization, or (c) the notification module's data model has fully
decoupled from the core schema.
Enter fullscreen mode Exit fullscreen mode

Why it works
The five-dimension analysis forces evaluation of preconditions, not benefits. Every "go microservices" decision should pass the ownership, data coupling, and operational cost tests first. The specific flip condition makes the recommendation time-bounded rather than permanent.

Pro tip — Add "assume we split — list every new operational concern we take on" to make the hidden cost visible before the decision is made, not after.


8. Design for a Specific Non-Functional Requirement

The Prompt

Design the architecture required to meet this specific non-functional requirement.
Work backward from the number: what components, patterns, and configuration changes
are required to hit it? Where are the trade-offs? What is the cheapest architecture
that meets this requirement — do not over-engineer.

NFR: [latency target | availability SLA | throughput target | consistency guarantee]
Current architecture: [brief description]

SYSTEM CONTEXT:
[paste context block]
Enter fullscreen mode Exit fullscreen mode

What it does
Anchors the design to a concrete, measurable target and works backward from it. Prevents the most common NFR failure: designing for a vague goal ("it needs to be fast") and discovering the real requirement six months later.

Example output

NFR: API response time < 200ms at P99 under 10,000 req/s
Current: Node monolith, PostgreSQL, no cache, single region, average latency ~180ms

Working backward from 200ms P99 at 10k req/s:

At 10k req/s, your current ~180ms average will have a P99 well above 200ms
(typically 3-5x average under load). You're not currently meeting this at your
current traffic levels, let alone 10k req/s.

Required changes, cheapest to most expensive:

1. Read-through Redis cache for hot data ($50-100/month)
   Target: cut DB reads for top 80% of traffic by caching frequently accessed
   records with 30s TTL. Expected P99 impact: -60 to -80ms on cached paths.
   Trade-off: stale reads for up to 30s. Acceptable if data is not user-specific
   or safety-critical.

2. Database connection pooling via PgBouncer (free, self-hosted)
   Target: eliminate connection establishment overhead (~20-40ms per cold query
   under load). At 10k req/s, Postgres connection limits become a bottleneck
   before cache hits save you. Set pool size = (2 × CPU cores) + effective_spindle_count.

3. Move synchronous DB writes off the hot path (engineering cost)
   Audit writes on the P99 path. Audit log writes, analytics events, non-critical
   updates — queue them. Only block the response on writes the user actually needs
   to see.

4. CDN for static responses (negligible cost)
   Any response that is the same for all users in a 30s window can be cached at
   the CDN edge. Check if your public API endpoints qualify.

What you don't need yet: multi-region, read replicas, sharding, or event streaming.
Those are right answers for the wrong problem. The 200ms P99 target is achievable
with cache + connection pooling + async writes before you need any of that.

Estimated outcome: 200ms P99 achievable at 10k req/s with items 1-3. Validate
with load testing against production data shapes before shipping.
Enter fullscreen mode Exit fullscreen mode

Why it works
A concrete number ("200ms P99") eliminates vague performance advice. Every recommendation must justify itself against that target. The "cheapest architecture" constraint prevents the response from escalating to Kubernetes and distributed caching for a problem that a Redis instance solves.

Pro tip — Pair with Prompt #3 (scalability audit) to make sure the NFR solution doesn't introduce new bottlenecks at 10x.


9. Translate a Technical Decision for a Non-Technical Stakeholder

The Prompt

Translate this technical architecture decision into language for a non-technical
stakeholder. Cover: what we're doing (no jargon), why in business terms (not
technical terms), what it costs (time, engineering effort, opportunity cost),
what happens if we don't do it, and what risks remain after we do.

Audience: [CFO | CEO | PM | Board member — choose one]
Decision: [describe the technical decision]

SYSTEM CONTEXT:
[paste context block]
Enter fullscreen mode Exit fullscreen mode

What it does
Reframes a technical decision around what stakeholders actually care about: cost, risk, timeline, and consequences of inaction. The audience specification matters — a CFO cares about cost and downtime risk; a PM cares about feature timeline impact; a CEO cares about competitive and reputational risk.

Example output

Audience: CFO
Decision: Migrate from a single-server database to a distributed database cluster

What we're doing:
Right now, all of our customer data lives on a single database server — one
machine. We're moving it to three servers that work together, so no single
machine failure can take down the business.

Why in business terms:
Last quarter we had two database outages totaling 4 hours of downtime. Each hour
of outage costs approximately $12,000 in lost transactions plus support volume.
We also failed our enterprise SLA with two customers who are contractually owed
uptime credits. A three-node cluster eliminates the single point of failure that
caused both incidents.

What it costs:
6 weeks of engineering time (two engineers, full focus). $2,200/month increase
in infrastructure cost. Two weeks of feature development deprioritized while
migration is in progress.

What happens if we don't:
Based on current growth, we will exceed the capacity of the single server within
5-6 months. When that happens, we face either a planned migration under pressure
or an unplanned outage during a traffic spike. The enterprise contract renewals
in Q3 include uptime requirements we cannot currently guarantee.

Risks that remain after we do this:
The migration itself carries a risk of a 2-4 hour maintenance window, which we
will schedule off-peak. After migration, a regional cloud outage could still
take down the cluster — full geographic redundancy is a separate project.
Enter fullscreen mode Exit fullscreen mode

Why it works
"No jargon" alone is not enough — Claude will just simplify the technical explanation. Explicitly requiring business-framing dimensions (cost, consequence of inaction, remaining risk) forces Claude to reframe the decision around what the stakeholder needs to make a decision, not what the engineer wants to communicate.

Pro tip — Specify the audience precisely. A PM who needs to adjust a roadmap cares about different dimensions than a CFO who needs to approve a budget line.


10. Run a Pre-Mortem on a Proposed Design

The Prompt

Run a pre-mortem on this proposed architecture. Assume it is 12 months from now
and this system has failed in production in a significant way. What are the most
likely causes? Cover: data loss scenarios, cascading failure paths, performance
cliffs that only appear at scale, operational complexity that overwhelms the team,
and failure modes that only surface under specific conditions (high concurrency,
partial network failures, specific data shapes).

Rank by likelihood × impact. Give me the top 5 only, with a one-sentence
mitigation for each.

Proposed design: [describe the architecture]

SYSTEM CONTEXT:
[paste context block]
Enter fullscreen mode Exit fullscreen mode

What it does
Activates pessimistic reasoning against a design before any code is written. The "assume it has already failed" framing bypasses optimism bias. Ranking by likelihood × impact ensures you focus on the failures that are both plausible and consequential — not just the most dramatic theoretical scenarios.

Example output

Proposed design: Async job queue (Redis-backed BullMQ) with retry logic for
order fulfillment processing. Workers consume from the queue, call third-party
fulfillment API, update order status in Postgres.

Pre-mortem — top 5 failure modes:

1. Poison pill messages cause worker death spiral (HIGH likelihood × HIGH impact)
   A malformed fulfillment response causes a worker to throw on every retry.
   BullMQ retries it indefinitely (or to max attempts), keeping the worker busy
   on an unprocessable job. Other jobs queue up behind it.
   Mitigation: Dead-letter queue with alerting on DLQ depth > 0. Any job that
   fails N times moves to DLQ and pages on-call.

2. Queue depth grows unbounded when fulfillment API is slow (HIGH × HIGH)
   Fulfillment API degrades to 10s response time. Workers are all occupied on
   in-flight requests. New orders queue up faster than they're processed. Redis
   memory grows until OOM or jobs are dropped silently.
   Mitigation: Set max queue length. Alert on queue depth exceeding 15-minute
   processing capacity. Circuit-break when fulfillment API P99 > 5s.

3. Duplicate order fulfillment on worker crash mid-processing (MEDIUM × HIGH)
   Worker calls fulfillment API, job succeeds externally, worker crashes before
   marking job complete. BullMQ retries — fulfillment API is called again with
   same order. Duplicate shipment or charge.
   Mitigation: Idempotency key on all fulfillment API calls using order_id.
   Verify the fulfillment API supports idempotent retries before relying on this.

4. Silent data inconsistency when Postgres write fails after API success (MEDIUM × HIGH)
   Fulfillment API confirms the order; the subsequent Postgres status update fails
   (connection timeout, constraint violation). Order is fulfilled externally but
   shows "pending" internally. Customer support has no visibility.
   Mitigation: Wrap the status update in a retry with idempotency check. Add a
   reconciliation job that cross-checks fulfillment API status against DB state
   every 15 minutes.

5. No alerting on queue depth or worker count (HIGH likelihood × MEDIUM impact)
   This is the meta-failure: you won't know any of the above is happening until
   a customer complains. Queue depth, DLQ count, worker error rate, and
   fulfillment API error rate need to be in your monitoring before go-live.
   Mitigation: Add these four metrics to your dashboard and set alerts before
   shipping. Non-negotiable.
Enter fullscreen mode Exit fullscreen mode

Why it works
"Pre-mortem" framing explicitly activates pessimistic reasoning — Claude looks for what goes wrong, not what could theoretically be improved. The specific failure categories (data loss, cascading failures, performance cliffs, operational complexity) prevent a generic "consider error handling" response. The top-5 constraint prevents an exhaustive catalog of everything that could theoretically fail.

Pro tip — The failures you anticipate here become the tests you write before launch. Take the top 3 failure modes and turn them into load tests or chaos engineering scenarios before the system ships.


The Bigger Principle

Architecture prompts are different from code-level prompts in one fundamental way: the quality of the output is bounded by the quality of the context you provide, not the quality of the question. A mediocre debugging prompt with a full stack trace still produces useful output. A brilliant architecture prompt with vague context produces confident-sounding nonsense.

The pattern across all 10 prompts is consistent: context block + specific question + output structure. The context block does the heavy lifting. The question narrows the focus. The output structure prevents the rambling "it depends" answer that architecture questions tend to attract.

These prompts don't replace an experienced architect. They replace the part of architecture work that is mechanical: enumerating trade-offs, checking for SPOFs, formatting ADRs, translating decisions into stakeholder language. The judgment is still yours. The prompts just make sure you're exercising that judgment with complete information — rather than whatever you can hold in your head during a 30-minute design meeting.

Start with the one that covers the decision you're staring at right now. The context block takes five minutes to fill out once. After that, each prompt takes 30 seconds to run.


Want the full pack?
This article covers 10 prompts. The Claude Prompts for Developers pack includes 55 prompts across 6 categories — code review, debugging, architecture, docs, productivity, and 5 multi-step power combos. One-time download, copy-paste ready.

Top comments (2)

Collapse
 
devprompts profile image
Dev Prompts

TL;DR for the prompts:

  1. Break down a vague requirement into concrete system components
  2. Compare two architectural approaches across 7 dimensions (with a direct recommendation)
  3. Audit existing system for scalability risks — ranked by severity × proximity
  4. Write an Architecture Decision Record (ADR) — including a revisit trigger
  5. Design an API contract — failure cases first, not just the happy path
  6. Map every single point of failure — including human SPOFs
  7. Evaluate whether a microservices split is justified right now (go / no-go)
  8. Design for a specific NFR — work backward from the number
  9. Translate a technical decision for a non-technical stakeholder
  10. Pre-mortem — assume the design already failed, find the top 5 causes

The pattern: paste your SYSTEM CONTEXT block once, then drop any prompt.
Context does the heavy lifting. The prompt does the narrowing.

Collapse
 
devprompts profile image
Dev Prompts

The hardest part of architecture prompting isn't the question — it's the context.

Most AI-assisted architecture advice fails because the context is too vague. You get
confident-sounding output that doesn't account for your scale, your team, or your
actual constraints.

The fix I've found: a reusable SYSTEM CONTEXT template at the top of every prompt.
Paste it once, answer 6 fields, and every prompt in this list produces output that's
actually specific to your situation. The template is in the "Before You Start" section.

What's the most painful architecture decision you're dealing with right now?