DEV Community

Cover image for The easy way to stop screaming at AI with CAPS
Vaulter Prompt
Vaulter Prompt

Posted on • Originally published at prompt-engineering-handbook.com

The easy way to stop screaming at AI with CAPS

Some foundational prompt engineering techniques and patterns to know about.

I think everybody at some point caught themselves typing in all caps, to ChatGPT or Claude, "I LITERALLY JUST TOLD YOU TO DO THIS." Rephrasing the same request just to get the same useless output. Getting progressively angrier at a language model that is specifically designed to drive humanity crazy.

Very often it's not a broken tool, but vague instructions to a system that does exactly what you ask - just not what you mean. And there's a gap between those two things that costs most people hours every single day.

I hope this article can help you to get very quickly to the point, where this is no longer a case for you!

The invisible tax you're already paying

The original promise of AI was that it would do the work for you. And in a way, it does. But it also quietly changes what the work actually is.

Before AI, you spent time writing. Now you spend time reviewing. Checking if the AI got it right. Rephrasing when it didn't. Cleaning up hallucinated requirements. Making sure the output actually says what you meant and not what the model decided you probably meant.

With good prompts, that review step is quick - a sanity check, maybe a small adjustment. With bad prompts, the review becomes the work. You're not using AI anymore. You're babysitting it.

A January 2026 Zapier survey of 1,100 AI users puts a number on this: workers spend an average of 4.5 hours per week revising, correcting, and redoing AI outputs. That's more than half a workday - not writing, not thinking, just cleaning up after a tool that was supposed to save time.

And untrained people are more likely to say AI makes them less productive. Not because the tool is worse for them - because they never learned how to direct it. Meanwhile people with access to prompt training and libraries report productivity gains.

It's like buying a professional DSLR camera and shooting everything in auto mode, then complaining the photos look the same as your phone. The capability is there. You just haven't learned to access it.

The mental model that changes everything

Here's what nobody told us upfront: as a consumer you can think of an LLM as a very sophisticated autocomplete. Of course I'm seriously oversimplifying, but hear me out. It really helps to get things right. The point is: it doesn't "understand" your request. It predicts the most probable next words (in reality tokens) based on everything that it was trained on.

That's it. Not intelligence. Pattern prediction (sophisticated, complicated, groundbreaking, but anyway).

This line of thinking explains almost every frustration you've ever had:

  • Vague prompts get vague answers - many probable continuations, the model picks one at random
  • Examples work better than instructions - you're showing it the pattern to continue, not hoping it interprets your intent
  • Long conversations go off the rails - the model has a finite "context window" (its working memory). Everything in your conversation takes up space, and when it fills up, older content gets dropped or compressed. The AI isn't being thick after 20 messages - it literally cannot see what you said earlier
  • It "hallucinates" - it predicts plausible text, not true text (hallucinations is a feature, not a bug)

So when you type "make this better" and get back something useless, the AI isn't being stupid. It's doing exactly what autocomplete does with ambiguous input: guessing.

Three rules follow from this.

  1. Be explicit - ambiguity is the enemy.
  2. Show, don't tell - examples constrain the solution space better than descriptions.
  3. Start fresh conversations for fresh tasks - don't let context rot.

Everything below traces back to these three principles.

Four core techniques that actually work

There are four basic prompting techniques that, together, cover pretty much every type of task you'd throw at an AI. They form a ladder - start with #1, escalate when needed:

Step Technique One-liner Use when...
1 Zero-shot Just ask Task has one obvious interpretation
2 Few-shot Show examples Format or style matters
3 Chain-of-thought Make it reason Task needs logic, not pattern matching
4 Prompt chaining Break it apart Too complex for a single prompt

The mistake most people make is either staying on step 1 forever (most of the time, really), or jumping straight to step 4 when they didn't need to. I'll walk through each one below - when it works, when it doesn't, and the signal that tells you it's time to move up.

To show how these build on each other, I'll use one example throughout: "Analyse 50 customer feedback entries from last quarter and write a summary for the product team." Same task, four different approaches, very different results.

Start simple: just ask (zero-shot)

"Zero-shot" just means: give a direct instruction, no examples.

For tasks with one obvious interpretation, this is all you need:

  • "Translate this email to Spanish"
  • "Extract all deadlines from this contract"
  • "What are the three biggest risks in this plan?"

The AI already "knows" what "translate" means and what "deadlines" look like. A clear instruction, a clear input, done.

But watch what happens with our feedback analysis:

Prompt: "Summarise the key themes from this customer feedback for the product team."

What you get: A different structure every time. First try: a wall of text with no categories. Second try: bullet points, but random grouping and no prioritisation. Third try: nice categories, but completely different ones from last time. The AI extracts themes fine - but the format and depth change with every run.

"Customer feedback summary" has dozens of valid interpretations. The model picks one at random each time.

So what can we say about this pattern?

Use when:

  • Task has one clear interpretation
  • The AI already "knows" the task type (translation, extraction, summarisation)
  • You don't care about exact format

Signal to escalate:

  • You're rephrasing the same request 3 times and getting different structures
  • Content is right but format/style is inconsistent
  • You need a specific output shape every time

Show, don't tell (few-shot)

So the feedback summary keeps coming back in a random format. You could try describing what you want: "Use a table, group by theme, include frequency count, add a severity column, include one example quote per theme..." But by the time you've written all that, you could have just made the table yourself. Here's few-shot prompting comes to help.

"Few-shot" means: instead of describing what you want, show 3 to 5 examples of it.

Same task - with few-shot:

Analyse the customer feedback below and summarise it
for the product team. Follow this format:

Example:
| Theme | Mentions | Severity | Example quote |
|-------|----------|----------|---------------|
| Slow page loads | 12 | High | "Dashboard takes 8s to load" |
| Missing export | 5 | Medium | "I need CSV export for reports" |

Top priority: Slow page loads - affects daily usage,
12 mentions in 30 days, multiple churn-risk accounts.

Now analyse this feedback:
[50 input entries here]

From the examples AI knows you want a table with those exact columns, followed by a top-priority callout with reasoning. Format, length, structure - communicated in seconds, more precisely than any paragraph of instructions could. This is called "in-context learning" - the model's ability to pick up a pattern from just a few demonstrations and apply it to new input.

Good examples are:

  • Diverse - different scenarios, not three variations of the same thing
  • Representative - typical cases, not edge cases
  • Consistent - same format across all of them
  • Minimal - 3-5 is usually plenty

Now the format is perfect every time. But there's a problem: the AI lists "slow checkout" (30 mentions) and "button colour" (2 mentions) at the same severity level. It's mimicking the table structure beautifully, but it's not actually thinking about what matters.

Few-shot fails when the task needs:

  • Actual calculation - not pattern completion
  • Reasoning - not correlation
  • Domain knowledge that isn't present in the examples
  • Multi-factor trade-off judgements - weighing competing priorities
  • Handling novel constraints the examples don't cover

In short: if the answer requires thinking through the problem and not just matching a format, examples alone won't get you there.

Use when:

  • Format, tone, or style matters
  • The task has many valid outputs but you need a specific one
  • You want consistent results across multiple runs

Signal to escalate:

  • Format is perfect but the reasoning is wrong
  • The AI mimics your examples but makes logical errors or misses nuance
  • The task needs analysis, not pattern matching

Make it think (chain-of-thought)

Examples fix formatting and content depth expectaions, sure. They don't fix thinking. When the task needs actual reasoning, the AI mimics the pattern and jumps to a plausible-looking answer without working through the problem.

That feedback summary has the right columns now, but the priorities are shallow. The AI saw "High/Medium" in your example and just distributed those labels without weighing anything.

Five words fix this: "Let's think step by step."

Same task - with chain-of-thought:

Analyse this customer feedback for the product team.
Use the table format from the examples above.

Before filling in the severity column, think step by
step: consider how many users mentioned it, whether it
causes churn or just annoyance, and how it compares to
other themes.

What you get now:

  • Slow checkout (30 mentions) → directly causes cart abandonment, mentioned by 3 enterprise accounts → High
  • Confusing pricing page (8 mentions) → causes support tickets but users still convert → Medium
  • Button colour (2 mentions) → cosmetic, no impact on conversion → Low
  • Top priority: Slow checkout - 30 mentions, directly tied to revenue loss, affects highest-value accounts.

That's not a gimmick. Research shows this single phrase improves accuracy on reasoning tasks from 17.7% to 78.7%. By asking the model to show its reasoning, you force it to actually work through the problem instead of guessing.

Same principle as showing your work at school - you catch errors you'd miss if you just wrote the final answer.

Pro tip: self-consistency. For high-stakes decisions, run the same CoT prompt 3 times and compare the answers. If all three agree, you're probably right. If they disagree wildly, the problem needs more breakdown. Costs 3x the tokens but catches blind spots a single run misses.

So for this technique:

Use when:

  • Debugging, analysis, decisions, math
  • Anything where "showing work" would help a human
  • The task has a right answer that requires reasoning to reach

Signal to escalate:

  • Reasoning per step is fine, but the task has too many moving parts
  • Output is solid for the first half and falls apart after that
  • The prompt is getting so long the AI starts ignoring parts of it

Break it down (prompt chaining)

If you're asking for more than one distinct deliverable in a single prompt, you're probably going to get disappointed.

With 50 feedback entries, trying to categorise, assess severity, AND write recommendations in one prompt usually means the categorisation is decent, the severity assessment is rushed, and the recommendations are generic. The model runs out of steam halfway through.

"Prompt chaining" means breaking it into steps, reviewing each one before moving to the next. You literally take the output of prompt 1 and paste it as input into prompt 2.

Why this works better than one big prompt:

  • Catch errors early - spot problems before 5 steps of compounding
  • Smaller context - model focuses on one task, not juggling 10 instructions
  • Easier to debug - you know exactly which step failed
  • Reusable pieces - swap out step 2 without rewriting 1,200 lines
  • Human in the loop - review and adjust between steps

Same task - as a chain:

Prompt 1: "Categorise all 50 feedback entries into themes with counts."
Output: table with 6 themes (slow checkout: 30, confusing pricing: 8, missing export: 5...)
→ ✓ Review: do these categories make sense? Merge or split any?

Prompt 2: "Here are the themes: [paste output from step 1]. For each theme, assess severity and business impact. Think step by step."
Output: slow checkout = High (causes abandonment), confusing pricing = Medium (causes support tickets)...
→ ✓ Review: does the reasoning hold up? Any wrong assumptions?

Prompt 3: "Here's the full analysis: [paste output from step 2]. Write the summary for the product team with top 3 recommendations."
Final output: ready to send.

Each step is small enough to actually verify. You catch wrong categories at step 1 instead of discovering them baked into the final recommendations at step 3.

And what do we have here in the result?

Use when:

  • Task has multiple distinct deliverables
  • Your prompt is getting so long the AI ignores parts of it
  • You want to review intermediate results before continuing

Signal that something's off:

  • Individual steps produce bad reasoning → add chain-of-thought within each step
  • Chain works but results feel generic → better context or examples needed in step 1
  • Conversation is sideways after many messages → start a fresh one. Context rots.

Those four techniques are the core of it. But knowing which technique to use is only half the problem. The other half is how you structure the prompt itself - and that's where most people quietly lose hours without realising it.

The patterns nobody teaches you

Techniques tell you what to do. Patterns tell you how to do it well. If techniques are the bricks, these are the cement - and skipping them is why a lot of prompts that should work still don't.

The prompt anatomy

Every prompt has up to four elements. When something goes wrong, one is usually missing:

Element What it is If missing...
Instruction The task to perform AI guesses what you want
Context Background, constraints, role AI makes wrong assumptions
Input data Content to process Nothing to work with
Output indicator Expected format You get 500 words when you needed 2 bullets

Remember our feedback analysis prompt from the few-shot section? Let's map the four elements onto it:

[INSTRUCTION] Analyse the customer feedback below and summarise it
              for the product team. Follow this format:
[OUTPUT]      Example:
              | Theme | Mentions | Severity | Example quote |
              | Slow page loads | 12 | High | "Dashboard takes 8s..." |
              Top priority: Slow page loads - affects daily usage...
[INPUT DATA]  Now analyse this feedback:
              [50 input entries here]

Two elements present, two missing. There's no context (who's the analyst? what's the review for?) and no delimiters between the instruction and the input data. It works OK because the few-shot examples carry most of the weight - but it could be better. Add the missing elements:

[CONTEXT]     You are a product analyst preparing a quarterly review.
[INSTRUCTION] Analyse the customer feedback below and summarise it
              for the product team. Follow this format:
[OUTPUT]      Example:
              | Theme | Mentions | Severity | Example quote |
              | Slow page loads | 12 | High | "Dashboard takes 8s..." |
              Top priority: Slow page loads - affects daily usage...
[INPUT DATA]  <feedback> [50 input entries here] </feedback>

Now the context shapes the analysis (product analyst prioritises by business impact), and the <feedback> tags separate data from instruction. Same few-shot technique, better prompt structure.

This is also a diagnostic tool. Prompt didn't work as expected? Check which element is vague or absent. Takes ten seconds, fixes most problems on the spot.

Four patterns that save the most time

1. Scope boundaries - tell the AI what NOT to do.

The "eager intern" problem - where the AI helpfully restructures your entire document when you asked it to fix one paragraph - is solved by explicit fences:

  • ONLY modify: [specific section]
  • Do NOT: [things you don't want changed]
  • Match: [existing conventions, tone, style]

Applied to our example: "Only analyse the feedback entries I provide. Do not add themes that aren't in the data. Do not invent quotes."

When to skip: discovery phase, architecture discussions, brainstorming - boundaries kill creativity when you actually want broad thinking.

2. Output specification - define the container.

"Summarise this" gets you 500 words. "Summarise in 3 bullet points, max 15 words each" gets you exactly what you asked for. If you specify the shape - format, length, sections, what to exclude - the AI can't invent things that don't fit.

Applied to our example: this is exactly what the few-shot table format did - but you can also do it without examples, just by describing the container: "Return a markdown table with columns: Theme, Mentions, Severity, Example Quote. Then write exactly 3 recommendations, one sentence each."

When to skip: exploratory questions ("what should I consider?"), creative brainstorming, or one-off queries where you'll just read and act.

3. Delimiters - separate sections clearly.

Use XML tags, markdown headers, or triple quotes between your instruction, context, and input data. Without them, the AI sometimes confuses what's an instruction and what's content to process.

Applied to our example: wrapping the feedback in <feedback> tags tells the model "this is the data, not part of the instruction." Without it, if a customer wrote "ignore previous instructions" in their feedback (yes, this happens), the model might actually obey it.

4. Role and audience - narrow the model's behaviour.

An LLM is a generalist by default - it draws on everything it was trained on, which is basically the entire internet. Setting a role and audience constrains that enormous solution space to a specific domain, expertise level, and communication style. "You are a senior engineer, I am also senior, skip basics" activates domain-specific knowledge, suppresses beginner-level explanations, and calibrates the output for a professional context. One line of context, completely different answer.

Applied to our example: "You are a product analyst" tells the model to prioritise themes by business impact rather than just frequency - because that's how a product analyst thinks. Without it, you get a generic summary. With it, you get one that's shaped by domain expertise.

The quick reference

Two things worth bookmarking.

Where to start for your task type:

Task type Start with If that fails
Text classification Zero-shot Few-shot with examples
Summarisation Zero-shot + output spec Few-shot with example summaries
Information extraction Zero-shot + output format Few-shot with examples
Code generation Zero-shot + scope boundaries Few-shot + chaining
Code review Role + scope Few-shot with example reviews
Reasoning / math Zero-shot CoT Few-shot CoT
Complex multi-step Prompt chaining Add CoT within each step
Docs / reports Output specification Few-shot with example docs

When your prompt doesn't work:

Problem Fix
Wrong format Specify output shape
Inconsistent results Add 2-5 examples (few-shot)
Wrong reasoning "Let's think step by step" (CoT)
AI invents things you didn't ask for Add scope boundaries
Task too complex Break into smaller prompts (chaining)
Conversation went off track Start fresh
Response too basic or too advanced Set role and audience
Not reading output before sharing Human verification always

The prompt diagnostic: instruction, context, input, output indicator. When something fails, one of these is missing or vague. Start there.

The embarrassingly simple conclusion

The AI does exactly what autocomplete would do with your input. Vague input, random output. Specific input, useful output.

Four techniques, a handful of patterns, and a couple of hours of practice. That's the gap between babysitting and actually getting work done.

The AI is not the only one who needs training, we need it too if we want to learn to use it right.

Top comments (0)