Vaulter Prompt

Posted on Feb 19 • Originally published at prompt-engineering-handbook.com

The easy way to stop screaming at AI with CAPS

#programming #ai #productivity #promptengineering

Some foundational prompt engineering techniques and patterns to know about.

I think everybody at some point caught themselves typing in all caps, to ChatGPT or Claude, "I LITERALLY JUST TOLD YOU TO DO THIS." Rephrasing the same request just to get the same useless output. Getting progressively angrier at a language model that is specifically designed to drive humanity crazy.

Very often it's not a broken tool, but vague instructions to a system that does exactly what you ask - just not what you mean. And there's a gap between those two things that costs most people hours every single day.

I hope this article can help you to get very quickly to the point, where this is no longer a case for you!

The invisible tax you're already paying

The original promise of AI was that it would do the work for you. And in a way, it does. But it also quietly changes what the work actually is.

Before AI, you spent time writing. Now you spend time reviewing. Checking if the AI got it right. Rephrasing when it didn't. Cleaning up hallucinated requirements. Making sure the output actually says what you meant and not what the model decided you probably meant.

With good prompts, that review step is quick - a sanity check, maybe a small adjustment. With bad prompts, the review becomes the work. You're not using AI anymore. You're babysitting it.

A January 2026 Zapier survey of 1,100 AI users puts a number on this: workers spend an average of 4.5 hours per week revising, correcting, and redoing AI outputs. That's more than half a workday - not writing, not thinking, just cleaning up after a tool that was supposed to save time.

And untrained people are more likely to say AI makes them less productive. Not because the tool is worse for them - because they never learned how to direct it. Meanwhile people with access to prompt training and libraries report productivity gains.

It's like buying a professional DSLR camera and shooting everything in auto mode, then complaining the photos look the same as your phone. The capability is there. You just haven't learned to access it.

The mental model that changes everything

Here's what nobody told us upfront: as a consumer you can think of an LLM as a very sophisticated autocomplete. Of course I'm seriously oversimplifying, but hear me out. It really helps to get things right. The point is: it doesn't "understand" your request. It predicts the most probable next words (in reality tokens) based on everything that it was trained on.

That's it. Not intelligence. Pattern prediction (sophisticated, complicated, groundbreaking, but anyway).

This line of thinking explains almost every frustration you've ever had:

Vague prompts get vague answers - many probable continuations, the model picks one at random
Examples work better than instructions - you're showing it the pattern to continue, not hoping it interprets your intent
Long conversations go off the rails - the model has a finite "context window" (its working memory). Everything in your conversation takes up space, and when it fills up, older content gets dropped or compressed. The AI isn't being thick after 20 messages - it literally cannot see what you said earlier
It "hallucinates" - it predicts plausible text, not true text (hallucinations is a feature, not a bug)

So when you type "make this better" and get back something useless, the AI isn't being stupid. It's doing exactly what autocomplete does with ambiguous input: guessing.

Three rules follow from this.

Be explicit - ambiguity is the enemy.
Show, don't tell - examples constrain the solution space better than descriptions.
Start fresh conversations for fresh tasks - don't let context rot.

Everything below traces back to these three principles.

Four core techniques that actually work

There are four basic prompting techniques that, together, cover pretty much every type of task you'd throw at an AI. They form a ladder - start with #1, escalate when needed:

Step	Technique	One-liner	Use when...
1	Zero-shot	Just ask	Task has one obvious interpretation
2	Few-shot	Show examples	Format or style matters
3	Chain-of-thought	Make it reason	Task needs logic, not pattern matching
4	Prompt chaining	Break it apart	Too complex for a single prompt

The mistake most people make is either staying on step 1 forever (most of the time, really), or jumping straight to step 4 when they didn't need to. I'll walk through each one below - when it works, when it doesn't, and the signal that tells you it's time to move up.

To show how these build on each other, I'll use one example throughout: "Analyse 50 customer feedback entries from last quarter and write a summary for the product team." Same task, four different approaches, very different results.

Start simple: just ask (zero-shot)

"Zero-shot" just means: give a direct instruction, no examples.

For tasks with one obvious interpretation, this is all you need:

"Translate this email to Spanish"
"Extract all deadlines from this contract"
"What are the three biggest risks in this plan?"

The AI already "knows" what "translate" means and what "deadlines" look like. A clear instruction, a clear input, done.

But watch what happens with our feedback analysis:

Prompt: "Summarise the key themes from this customer feedback for the product team."

What you get: A different structure every time. First try: a wall of text with no categories. Second try: bullet points, but random grouping and no prioritisation. Third try: nice categories, but completely different ones from last time. The AI extracts themes fine - but the format and depth change with every run.

"Customer feedback summary" has dozens of valid interpretations. The model picks one at random each time.

So what can we say about this pattern?

Use when:

Task has one clear interpretation
The AI already "knows" the task type (translation, extraction, summarisation)
You don't care about exact format

Signal to escalate:

You're rephrasing the same request 3 times and getting different structures
Content is right but format/style is inconsistent
You need a specific output shape every time

Show, don't tell (few-shot)

So the feedback summary keeps coming back in a random format. You could try describing what you want: "Use a table, group by theme, include frequency count, add a severity column, include one example quote per theme..." But by the time you've written all that, you could have just made the table yourself. Here's few-shot prompting comes to help.

"Few-shot" means: instead of describing what you want, show 3 to 5 examples of it.

Same task - with few-shot:

Analyse the customer feedback below and summarise it
for the product team. Follow this format:

Example:
| Theme | Mentions | Severity | Example quote |
|-------|----------|----------|---------------|
| Slow page loads | 12 | High | "Dashboard takes 8s to load" |
| Missing export | 5 | Medium | "I need CSV export for reports" |

Top priority: Slow page loads - affects daily usage,
12 mentions in 30 days, multiple churn-risk accounts.

Now analyse this feedback:
[50 input entries here]

From the examples AI knows you want a table with those exact columns, followed by a top-priority callout with reasoning. Format, length, structure - communicated in seconds, more precisely than any paragraph of instructions could. This is called "in-context learning" - the model's ability to pick up a pattern from just a few demonstrations and apply it to new input.

Good examples are:

Diverse - different scenarios, not three variations of the same thing
Representative - typical cases, not edge cases
Consistent - same format across all of them
Minimal - 3-5 is usually plenty

Now the format is perfect every time. But there's a problem: the AI lists "slow checkout" (30 mentions) and "button colour" (2 mentions) at the same severity level. It's mimicking the table structure beautifully, but it's not actually thinking about what matters.

Few-shot fails when the task needs:

Actual calculation - not pattern completion
Reasoning - not correlation
Domain knowledge that isn't present in the examples
Multi-factor trade-off judgements - weighing competing priorities
Handling novel constraints the examples don't cover

In short: if the answer requires thinking through the problem and not just matching a format, examples alone won't get you there.

Use when:

Format, tone, or style matters
The task has many valid outputs but you need a specific one
You want consistent results across multiple runs

Signal to escalate:

Format is perfect but the reasoning is wrong
The AI mimics your examples but makes logical errors or misses nuance
The task needs analysis, not pattern matching

Make it think (chain-of-thought)

Examples fix formatting and content depth expectaions, sure. They don't fix thinking. When the task needs actual reasoning, the AI mimics the pattern and jumps to a plausible-looking answer without working through the problem.

That feedback summary has the right columns now, but the priorities are shallow. The AI saw "High/Medium" in your example and just distributed those labels without weighing anything.

Five words fix this: "Let's think step by step."

Same task - with chain-of-thought:
Analyse this customer feedback for the product team.
Use the table format from the examples above.

Before filling in the severity column, think step by
step: consider how many users mentioned it, whether it
causes churn or just annoyance, and how it compares to
other themes.
What you get now:

Slow checkout (30 mentions) → directly causes cart abandonment, mentioned by 3 enterprise accounts → High

Confusing pricing page (8 mentions) → causes support tickets but users still convert → Medium

Button colour (2 mentions) → cosmetic, no impact on conversion → Low

Top priority: Slow checkout - 30 mentions, directly tied to revenue loss, affects highest-value accounts.

That's not a gimmick. Research shows this single phrase improves accuracy on reasoning tasks from 17.7% to 78.7%. By asking the model to show its reasoning, you force it to actually work through the problem instead of guessing.

Same principle as showing your work at school - you catch errors you'd miss if you just wrote the final answer.

Pro tip: self-consistency. For high-stakes decisions, run the same CoT prompt 3 times and compare the answers. If all three agree, you're probably right. If they disagree wildly, the problem needs more breakdown. Costs 3x the tokens but catches blind spots a single run misses.

So for this technique:

Use when:

Debugging, analysis, decisions, math
Anything where "showing work" would help a human
The task has a right answer that requires reasoning to reach

Signal to escalate:

Reasoning per step is fine, but the task has too many moving parts
Output is solid for the first half and falls apart after that
The prompt is getting so long the AI starts ignoring parts of it

Break it down (prompt chaining)

If you're asking for more than one distinct deliverable in a single prompt, you're probably going to get disappointed.

With 50 feedback entries, trying to categorise, assess severity, AND write recommendations in one prompt usually means the categorisation is decent, the severity assessment is rushed, and the recommendations are generic. The model runs out of steam halfway through.

"Prompt chaining" means breaking it into steps, reviewing each one before moving to the next. You literally take the output of prompt 1 and paste it as input into prompt 2.

Why this works better than one big prompt:

Catch errors early - spot problems before 5 steps of compounding
Smaller context - model focuses on one task, not juggling 10 instructions
Easier to debug - you know exactly which step failed
Reusable pieces - swap out step 2 without rewriting 1,200 lines
Human in the loop - review and adjust between steps

Same task - as a chain:

Prompt 1: "Categorise all 50 feedback entries into themes with counts."
→ Output: table with 6 themes (slow checkout: 30, confusing pricing: 8, missing export: 5...)
→ ✓ Review: do these categories make sense? Merge or split any?

Prompt 2: "Here are the themes: [paste output from step 1]. For each theme, assess severity and business impact. Think step by step."
→ Output: slow checkout = High (causes abandonment), confusing pricing = Medium (causes support tickets)...
→ ✓ Review: does the reasoning hold up? Any wrong assumptions?

Prompt 3: "Here's the full analysis: [paste output from step 2]. Write the summary for the product team with top 3 recommendations."
→ Final output: ready to send.

Each step is small enough to actually verify. You catch wrong categories at step 1 instead of discovering them baked into the final recommendations at step 3.

And what do we have here in the result?

Use when:

Task has multiple distinct deliverables
Your prompt is getting so long the AI ignores parts of it
You want to review intermediate results before continuing

Signal that something's off:

Individual steps produce bad reasoning → add chain-of-thought within each step
Chain works but results feel generic → better context or examples needed in step 1
Conversation is sideways after many messages → start a fresh one. Context rots.

Those four techniques are the core of it. But knowing which technique to use is only half the problem. The other half is how you structure the prompt itself - and that's where most people quietly lose hours without realising it.

The patterns nobody teaches you

Techniques tell you what to do. Patterns tell you how to do it well. If techniques are the bricks, these are the cement - and skipping them is why a lot of prompts that should work still don't.

The prompt anatomy

Every prompt has up to four elements. When something goes wrong, one is usually missing:

Element	What it is	If missing...
Instruction	The task to perform	AI guesses what you want
Context	Background, constraints, role	AI makes wrong assumptions
Input data	Content to process	Nothing to work with
Output indicator	Expected format	You get 500 words when you needed 2 bullets

Remember our feedback analysis prompt from the few-shot section? Let's map the four elements onto it:

[INSTRUCTION] Analyse the customer feedback below and summarise it
              for the product team. Follow this format:
[OUTPUT]      Example:
              | Theme | Mentions | Severity | Example quote |
              | Slow page loads | 12 | High | "Dashboard takes 8s..." |
              Top priority: Slow page loads - affects daily usage...
[INPUT DATA]  Now analyse this feedback:
              [50 input entries here]

Two elements present, two missing. There's no context (who's the analyst? what's the review for?) and no delimiters between the instruction and the input data. It works OK because the few-shot examples carry most of the weight - but it could be better. Add the missing elements:

[CONTEXT]     You are a product analyst preparing a quarterly review.
[INSTRUCTION] Analyse the customer feedback below and summarise it
              for the product team. Follow this format:
[OUTPUT]      Example:
              | Theme | Mentions | Severity | Example quote |
              | Slow page loads | 12 | High | "Dashboard takes 8s..." |
              Top priority: Slow page loads - affects daily usage...
[INPUT DATA]  <feedback> [50 input entries here] </feedback>

Now the context shapes the analysis (product analyst prioritises by business impact), and the <feedback> tags separate data from instruction. Same few-shot technique, better prompt structure.

This is also a diagnostic tool. Prompt didn't work as expected? Check which element is vague or absent. Takes ten seconds, fixes most problems on the spot.

Four patterns that save the most time

1. Scope boundaries - tell the AI what NOT to do.

The "eager intern" problem - where the AI helpfully restructures your entire document when you asked it to fix one paragraph - is solved by explicit fences:

ONLY modify: [specific section]
Do NOT: [things you don't want changed]
Match: [existing conventions, tone, style]

Applied to our example: "Only analyse the feedback entries I provide. Do not add themes that aren't in the data. Do not invent quotes."

When to skip: discovery phase, architecture discussions, brainstorming - boundaries kill creativity when you actually want broad thinking.

2. Output specification - define the container.

"Summarise this" gets you 500 words. "Summarise in 3 bullet points, max 15 words each" gets you exactly what you asked for. If you specify the shape - format, length, sections, what to exclude - the AI can't invent things that don't fit.

Applied to our example: this is exactly what the few-shot table format did - but you can also do it without examples, just by describing the container: "Return a markdown table with columns: Theme, Mentions, Severity, Example Quote. Then write exactly 3 recommendations, one sentence each."

When to skip: exploratory questions ("what should I consider?"), creative brainstorming, or one-off queries where you'll just read and act.

3. Delimiters - separate sections clearly.

Use XML tags, markdown headers, or triple quotes between your instruction, context, and input data. Without them, the AI sometimes confuses what's an instruction and what's content to process.

Applied to our example: wrapping the feedback in <feedback> tags tells the model "this is the data, not part of the instruction." Without it, if a customer wrote "ignore previous instructions" in their feedback (yes, this happens), the model might actually obey it.

4. Role and audience - narrow the model's behaviour.

An LLM is a generalist by default - it draws on everything it was trained on, which is basically the entire internet. Setting a role and audience constrains that enormous solution space to a specific domain, expertise level, and communication style. "You are a senior engineer, I am also senior, skip basics" activates domain-specific knowledge, suppresses beginner-level explanations, and calibrates the output for a professional context. One line of context, completely different answer.

Applied to our example: "You are a product analyst" tells the model to prioritise themes by business impact rather than just frequency - because that's how a product analyst thinks. Without it, you get a generic summary. With it, you get one that's shaped by domain expertise.

The quick reference

Two things worth bookmarking.

Where to start for your task type:

Task type	Start with	If that fails
Text classification	Zero-shot	Few-shot with examples
Summarisation	Zero-shot + output spec	Few-shot with example summaries
Information extraction	Zero-shot + output format	Few-shot with examples
Code generation	Zero-shot + scope boundaries	Few-shot + chaining
Code review	Role + scope	Few-shot with example reviews
Reasoning / math	Zero-shot CoT	Few-shot CoT
Complex multi-step	Prompt chaining	Add CoT within each step
Docs / reports	Output specification	Few-shot with example docs

When your prompt doesn't work:

Problem	Fix
Wrong format	Specify output shape
Inconsistent results	Add 2-5 examples (few-shot)
Wrong reasoning	"Let's think step by step" (CoT)
AI invents things you didn't ask for	Add scope boundaries
Task too complex	Break into smaller prompts (chaining)
Conversation went off track	Start fresh
Response too basic or too advanced	Set role and audience
Not reading output before sharing	Human verification always

The prompt diagnostic: instruction, context, input, output indicator. When something fails, one of these is missing or vague. Start there.

The embarrassingly simple conclusion

The AI does exactly what autocomplete would do with your input. Vague input, random output. Specific input, useful output.

Four techniques, a handful of patterns, and a couple of hours of practice. That's the gap between babysitting and actually getting work done.

The AI is not the only one who needs training, we need it too if we want to learn to use it right.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.