When I started preparing for the Claude Certified Architect Foundations (CCA-F) exam, the first thing that surprised me was how many people were studying for it the way you'd study for a generic LLM cert: heavy on prompt engineering tricks, light on the actual architecture material.
The published blueprint tells a different story. If you let the weights drive your hours instead of your comfort zone, you end up with a very different study plan than the one most candidates are running.
This is the plan I'd give myself if I were starting over. It's six weeks, roughly 6–8 hours per week, and it's organized around the actual blueprint, not vibes.
The blueprint, before anything else
The CCA-F is a 60-question, 120-minute exam. Scoring is scaled 100–1000, with 720 to pass. The blueprint is split across five domains:
- Agentic Architecture — 27%
- Claude Code — 20%
- Prompt Engineering — 20%
- Tool Design & MCP — 18%
- Context Management — 15%
The single most useful thing I can tell you is this: Agentic Architecture is the largest domain, and it's the one most engineers under-prepare on. Prompt Engineering feels familiar, so it absorbs disproportionate study time. Agentic Architecture feels abstract, so it gets skipped until the last week. Reverse that instinct.
A quick sanity check on hour allocation across six weeks at ~7 hours/week (42 hours total):
- Agentic Architecture: ~11 hours
- Claude Code: ~8 hours
- Prompt Engineering: ~8 hours
- Tool Design & MCP: ~8 hours
- Context Management: ~6 hours
- Mock exams and review: built into weeks 5–6
Now the week-by-week plan.
Week 1 — Agentic Architecture, part 1
Goal: stop thinking about Claude as a chat completion endpoint.
The core mental model you need is the agent loop: the model emits a response, the runtime inspects stop_reason, dispatches tool calls if needed, appends results back into the conversation, and re-invokes the model. That loop is the unit of work the exam tests, not the single API call.
Things to be solid on by end of week 1:
- The four common values of
stop_reason(end_turn,tool_use,max_tokens,stop_sequence) and what the orchestrator should do in response to each one. - The difference between a single-turn tool-use request and a multi-turn agentic loop.
- Why you append
tool_resultblocks to the conversation history rather than re-prompting from scratch. - When to terminate the loop (budget, max iterations,
end_turn, explicit user interrupt).
A useful exercise: sketch the pseudocode for an agent loop on paper, without looking. If you can't write the while loop and the stop_reason branches from memory, you're not ready to move on.
while True:
response = client.messages.create(
model=model,
messages=history,
tools=tools,
max_tokens=4096,
)
history.append({"role": "assistant", "content": response.content})
if response.stop_reason == "end_turn":
break
if response.stop_reason == "tool_use":
tool_results = run_tools(response.content)
history.append({"role": "user", "content": tool_results})
continue
if response.stop_reason == "max_tokens":
# decide: continue, summarize, or fail
break
That snippet is the spine of half the Agentic Architecture domain.
Week 2 — Agentic Architecture, part 2 + Context Management
Now layer on the failure modes the exam loves to ask about.
- Runaway loops. What stops a model from calling the same tool 40 times? You need a story for max iterations, budget guards, and idempotency.
- Subagents and orchestrators. When does it make sense to spawn a subagent vs. continue in the same context? What does the orchestrator own that the subagent doesn't?
-
Parallel vs. sequential tool calls. When the model emits multiple
tool_useblocks in one turn, the runtime executes them in parallel and returns all results before the next model call.
Then pivot into Context Management, which is the smallest domain (15%) but heavily intertwined with Agentic Architecture. Focus on:
- The structure of the context window (system, conversation history, tool definitions, tool results).
- Strategies for long-running agents: summarization checkpoints, scratchpads, external memory, retrieval-augmented context.
- The cost of context bloat — both literal dollars and quality degradation.
- Caching: what's eligible, what isn't, and the prefix-stability rule that makes it actually work.
Week 3 — Tool Design & MCP
This is the domain where engineers with strong backend backgrounds tend to overestimate themselves.
The trap: you already know how to write a good function signature for a human caller. Writing a good function signature for an LLM caller is a different discipline. The model is reading your description as part of its decision about whether and how to invoke the tool.
Key things to internalize:
- Tool descriptions are prompts. They should describe when to use the tool, not just what it does.
- Parameter schemas should be tight. Enums beat free-text strings. Required fields should actually be required. Defaults should be documented.
-
Error returns are part of the interface. A tool that returns
"error: bad input"as a string teaches the model nothing. A tool that returns a structured error with acodeand ahintlets the model self-correct. -
One tool, one job. A
do_everythingtool with amodeparameter is almost always worse than three narrow tools. - MCP is the protocol layer: how tools, resources, and prompts are exposed to a host. Know the difference between a tool, a resource, and a prompt in MCP terms — that distinction shows up in scenario questions.
If you only do one exercise this week: take a tool you'd normally write for a REST client, and rewrite the schema and description as if the caller were an LLM that has never seen your codebase.
Week 4 — Claude Code + Prompt Engineering
Claude Code is 20% of the exam and it's the most concrete domain. Things to be comfortable with:
- The agent loop as it appears inside Claude Code specifically (tool use, file edits, bash, subagents).
-
CLAUDE.mdfiles and how project-level instructions compose with user-level instructions. - Skills, plans, and how Claude Code structures multi-step work.
- Hooks and permissions — what the runtime executes vs. what the model proposes.
- Headless mode and the difference between interactive and scripted use.
Prompt Engineering, also 20%, is the domain where most candidates over-prepare. You don't need clever jailbreaks or 50 prompt patterns. You need:
- System prompts vs. user prompts and what belongs in each.
- Structured output: when to ask for JSON, when to use tool use as a forcing function instead.
- Few-shot examples and when they hurt more than they help.
- Chain-of-thought and extended thinking — what the difference actually is and when each is appropriate.
- XML-style tag conventions Anthropic recommends for delimiting input sections.
The exam is not asking you to write clever prompts. It's asking you to choose the right prompting strategy for a given scenario.
Week 5 — First full mock + targeted remediation
Take a full 60-question, 120-minute timed simulation under exam conditions. No notes, no Claude open in another tab. Then score it by domain.
The pattern you're looking for is not your overall score. It's your lowest-scoring domain weighted by blueprint percentage. A 60% in Agentic Architecture costs you more points than a 60% in Context Management.
Spend the rest of the week drilling whichever domain has the worst weighted gap. Don't review what you already know.
Week 6 — Second mock + light review
Take a second full mock early in the week. Compare to week 5. If your weighted-worst domain hasn't moved, that's where the remaining hours go.
The last two days should be light. Re-read your own notes, not new material. Sleep more than you study.
A few things I wish I'd known earlier
- The exam rewards orchestration thinking, not API recall. You will rarely be asked to recite a parameter name. You will frequently be asked to choose between two valid-looking architectures.
- Read the scenario stem twice before reading the options. The options are designed to be plausible. The stem usually contains the constraint that makes only one of them correct.
- "All of the above" and "none of the above" rarely appear. When two options seem equally right, you've missed a constraint in the stem.
- Time pressure is real. 60 questions in 120 minutes is two minutes per question, including re-reads. Practice with a clock.
If you want to drill against blueprint-weighted scenarios while you work through this plan, the practice platform I maintain at claudecertifiedarchitect.dev has a free 15-question set across the five domains (it emails you a diagnostic at the end). It's independent and not affiliated with Anthropic, but the question distribution mirrors the official blueprint, which is the thing that matters most for calibration. The full bank is 1,000+ questions, $24.99 one-time.
Good luck. Study the weights, not your comfort zone.
Top comments (0)