naoki_JPN

Posted on May 1

Anthropic's Prompting 101 — A Practical Guide to Building Production-Quality Claude Prompts

#claude #promptengineering #anthropic #ai

Note: This article is a Japanese summary of a ~25-minute video posted by @jota_snchez on X. This is the English translation. Original video: https://x.com/jota_snchez/status/2049898145346105395

Introduction

Hannah Moran and Christian Ryan from Anthropic's Applied AI Team walk through prompt engineering best practices with live console demos.

Using a real customer case — having Claude analyze Swedish car accident insurance forms — they show how to evolve a prompt from step one through five versions, going from "Claude thinks it's a ski accident" to production-quality structured output. Every iteration is highly instructive.

The Basic Prompt Structure

Anthropic recommends organizing prompts around 5 core elements:

#	Element	Description
1	Task description	1–2 sentences defining Claude's role and the task
2	Dynamic content	Data, images, or retrieved information to process
3	Detailed instructions	Step-by-step guidance on how to approach the task
4	Examples (optional)	Few-shot samples
5	Reminder of critical points	Restate the most important rules at the end

Note: For long prompts, repeating critical instructions at the end is especially effective.

Organizing Information: XML Tags

Claude excels with structured information. Anthropic's top recommendation is using XML tags as delimiters:

<user_preferences>
  {{USER_PREFERENCES}}
</user_preferences>

Explicitly declares what's inside the tags
Makes it easier for Claude to reference that information later in the prompt
Clearer boundaries than Markdown, and more token-efficient

Disorganized prompts are hard for Claude to parse and degrade output quality. XML tags alone can make a significant difference.

Live Demo: Building a Prompt Step by Step

The demo task: "Determine which vehicle is at fault from a Swedish car accident report form." They built the prompt by adding elements one at a time in the console.

V1 → V2: Adding Task Context and Tone

V1's problem: Claude output "a ski accident occurred on Chapman Gotham Street" — a wild miss because there was zero background context.

What V2 added:

This is an auto insurance claims processing system
Inputs are a Swedish accident report form and a hand-drawn sketch
Do not make a determination if not confident (hallucination prevention)

→ Claude now correctly identifies it as a car accident, but the verdict is still vague due to missing information.

V3: Adding Background Information to the System Prompt

Added the form's structure (17 checkboxes, two columns for Vehicle A and B) to the system prompt.

Note: Static information belongs in the system prompt.
The form structure never changes. This type of static background is ideal for the system prompt — and maximizes prompt caching effectiveness.

→ Form reading accuracy improved. Claude issued its first clear verdict: "Vehicle B is at fault."

V4: Detailed Step-by-Step Instructions (Order Matters)

1. First, carefully examine the form and list every checked box
2. Then analyze the sketch (informed by what you learned from the form)
3. Deliver your final verdict

"Read the form before the sketch" is the critical ordering. A hand-drawn sketch alone is meaningless — but once you've read the form and know you're dealing with a car accident, the sketch makes sense. Mirror the order a human would naturally work through this.

V5: Specifying Output Format

Wrap your final verdict in <final_verdict> XML tags.

→ The application can now extract just the information it needs (the verdict) from the XML tag. Ski accident misread → ambiguous → confident structured output — the evolution is complete.

Additional Techniques

Few-Shot Examples

Label difficult edge cases with human annotations and add them as examples. Images can be Base64-encoded and included in the samples. Production systems often carry dozens to hundreds of examples.

Conversation History

For user-facing applications, passing prior conversation history as context improves accuracy.

Pre-fill (Specifying the Start of Output)

Set a starting string in the Assistant role to force Claude's output format:

messages = [
    {"role": "user", "content": "..."},
    {"role": "assistant", "content": "<final_verdict>"}  # ← pre-fill
]

Claude will continue from <final_verdict>. The same works for forcing JSON output.

Extended Thinking

Available in Claude 3.7+. Claude's reasoning process appears in <thinking> tags.

⚠️ Warning: Treat Extended Thinking as a diagnostic tool, not a permanent crutch. Use it to identify where Claude struggles, then encode those reasoning steps as explicit instructions in the system prompt. That approach achieves the same quality without Extended Thinking — and uses fewer tokens.

Summary

Technique	Effect
Explicit task context	Prevents off-base interpretations
Static info in system prompt	Maximizes prompt caching
XML tag structure	Improves information retrieval accuracy
Specify processing order	Mirrors human reasoning order
Specify output format	Simplifies app integration
Few-shot examples	Improves accuracy on hard cases
Pre-fill	Forces output format
Extended Thinking	Visualizes reasoning for debugging

Prompt engineering is an iterative empirical science. Build test cases, find failure patterns, encode fixes into the system prompt — keep running this loop to reach production quality.

Full Video Transcript

Opening (0:00)

Hey everyone, thank you for joining us today for Prompting 101. My name is Hannah, I'm part of the Applied AI Team at Anthropic. With me is Christian, also from the Applied AI Team. Today we're going to take you through some prompting best practices using a real-world scenario and build up a prompt together.

Prompt Engineering is the practice of writing clear instructions for the model, giving the model the context it needs to complete a task, and thinking through how to arrange that information for the best result. The best way to learn this is just to practice doing it.

We're using an example inspired by a real customer we worked with — analyzing images and having Claude make a judgment about what it finds there. I don't speak the language this content is in, but luckily Christian and Claude both do.

Scenario Introduction (1:00)

Christian: Imagine you're working for an auto insurance company, dealing with car insurance claims daily. You have two pieces of information: a car accident report form in Swedish (17 checkboxes detailing what happened) and a hand-drawn sketch of how the accident occurred. We want to pass these to Claude and determine who is at fault.

Let's start by just throwing them into the console and seeing what happens.

Console settings: claude-sonnet (latest model), temperature 0, large max token budget.

First prompt: "This is an accident report form. Determine what happened and who is at fault."

Result: Claude thinks it's a ski accident on "Chapman Gotham Street" — a very common street in Sweden. You can understand this: in the prompt we haven't done anything to set the stage about what's actually taking place. Claude's first guess isn't terrible, but we have a lot of intuition we can bake in.

Best Practices: Prompt Structure (4:00)

Prompt engineering is iterative empirical science. We could have a test case where Claude needs to understand it's in a vehicular environment, not a skiing one, and iteratively build the prompt from there.

Anthropic's recommended structure:

Task description — tell Claude what it's here to do, its role, what task it's trying to accomplish
Dynamic content — in this case, the images; may also be information retrieved from another system
Detailed instructions — almost like a step-by-step list of how we want Claude to tackle the reasoning
Examples — here's an example piece of content; here's how you should respond
Repeat critical instructions — review the information with Claude, emphasize things that are extra critical, then tell Claude to go ahead

Building V2 (6:00)

Christian: Starting with task context. We want to give clearer instructions and make sure Claude understands what we're doing. We also add tone: Claude should be factual and confident. If Claude can understand what it's looking at, we want that assessment to be as clear and confident as possible.

Back in the console, V2 explicitly labels the data — this is a car accident report form with Vehicle A and Vehicle B in left and right columns. The system prompt specifies that this AI system assists a human claims adjuster reviewing Swedish car accident report forms. It should not make an assessment if it's not fully confident.

Running it: Claude now correctly identifies it as car accidents — not skiing. It can pick up that Vehicle A checked box 1 and Vehicle B checked box 12. Scrolling down, Claude still says there's information missing to make a fully confident determination. This is great — it's behaving as instructed. But there's a lot of information still missing regarding what the form actually entails.

V3: Background Information and Structure (9:00)

Hannah: Next we add background data, documents, and images. We actually know a lot about this form — it will be the same every single time. This is a great type of information to put into the system prompt, and a great candidate for prompt caching since it will always be the same. This helps Claude spend less time figuring out what the form is each time.

Claude loves structure and organization. XML tags let you specify what's inside those tags — <user_preferences> tells Claude everything wrapped in those tags is related to user preferences. Claude understands all types of delimiters; we prefer XML because its boundaries are clear and it's token-efficient.

In V3, we tell Claude everything about the form: it's a Swedish car accident form, it'll have this title, two columns representing different vehicles, and what each of the 17 rows means. We also tell it that humans fill this out — so it won't be perfect, people might put a circle, might scribble, might not put an X in the box.

Running it: Claude spends less time narrating the form to us because it already knows what it is. It gives us a list of what's checked and — Claude now confidently says Vehicle B is at fault based on the drawing and the sketch.

V4: Detailed Instructions (14:00)

Hannah: One thing we really highlight: examples. Few-shot is a mechanism that's really powerful for steering Claude. You can bake in concrete accidents that were tricky for Claude to get right — with human-labeled correct conclusions. You can include visual examples using Base64-encoded images. This is how you push the limits of your LLM application. If you're building this for an insurance company, you might have 10, maybe hundreds of examples of difficult edge cases.

Conversation history: not used here, but for user-facing apps with long history, this is the right place to bring that in.

Next step: a reminder of the immediate task and important guidelines. Preventing hallucinations — we don't want Claude to invent details it's not finding in the data. If the sketch is unintelligible and even a human couldn't figure it out, we want Claude to be able to say that.

In V4, we keep the system prompt the same and add a detailed task list. The order in which Claude analyzes this information is very important. You'd probably not look at the drawing first — it's just boxes and lines without context. But if you read the form first, understand we're talking about a car accident and see checkboxes indicating what vehicles were doing, then you know how to interpret the drawing.

So: first, carefully examine the form, make sure you can tell what boxes are checked, make a list. Then move to the sketch, informed by what you learned.

Running it: Claude now very carefully examines each and every box. It gives structured XML output: form analysis, accident summary, sketch analysis. It continues to say Vehicle B appears to be clearly at fault. With more complicated drawings and less clarity in forms, this step-by-step thinking is really impactful.

V5: Output Format and Pre-fill (19:00)

Christian: Final step: we keep the system prompt the same and add important guidelines. Summary should be clear, concise, and accurate. Nothing should impede Claude's assessment. Then output formatting: wrap the final verdict in <final_verdict> XML tags so the application can extract just the verdict.

Running it: much more succinct. At the end, the output is wrapped in <final_verdict> tags. We've gone from skiing accident to uncertain to unconfident secure output to now a much more strictly formatted, confident output we can build a real application around.

Christian: Another key way to shape output is pre-filled responses. If you want structured JSON output, you just add that Claude needs to begin its output with a certain format. In the Assistant field, write <final_verdict> or { — Claude will continue from where you left off. This gives you greater control over output formatting without the preamble.

Finally: Extended Thinking in Claude 3.7+. You can use this as a crutch for prompt engineering — enable it to make sure Claude has time to think. The beauty is you can analyze that thinking transcript to understand how Claude goes about the data. Try to help Claude by building this into your system prompt itself. It's more token-efficient, and it's a good way of understanding how these models actually go about the data. That's key to making your system prompt a lot better.

Thank you everyone for coming. We'll be around all day for questions. Don't miss "Prompting for Agents" and the Claude plays Pokémon demo!

DEV Community