Nex Tools

Posted on Apr 27 • Originally published at nextools.hashnode.dev

Claude Code Prompt Engineering Workflow: How I Stopped Wasting Tokens and Got 10x Better Results

#claudecode #ai #workflow #productivity

Originally published on Hashnode. Cross-posted for the DEV.to community.

For the first three months I used Claude Code, my prompts were a mess. I'd type a sentence, get back something close to what I wanted, type a correction, get something else, and burn through tokens correcting things I should have specified upfront. My output was inconsistent, my context window filled up with noise, and I blamed the model when the problem was on my side of the keyboard.

Then I started treating prompts like code. I version them, I review them, I iterate on them deliberately, and I keep a library of patterns that work. The difference in output quality and speed is not subtle. This is the workflow I use now and what I'd tell my past self.

The Mistake That Cost Me 6 Weeks

I was approaching prompts like conversation. Throw something at the model, see what comes back, refine. That works for casual queries. It does not work when you're shipping code, generating content at scale, or running production workflows.

The problem with conversational prompting at scale: every prompt is one-off, every result is non-reproducible, and every improvement gets lost the next time you start a session. You're rebuilding context from scratch every time.

When I switched to treating prompts as artifacts that get saved, named, tested, and reused, my output got dramatically more consistent and my time-per-result dropped by an order of magnitude.

A prompt is code. If you're not versioning it, naming it, and reviewing it, you're not engineering anything.

The Four-Layer Prompt Structure

Every prompt I ship to Claude Code follows the same four-layer structure. Each layer answers one question.

Layer 1: Role and Context

Who is Claude in this prompt, and what world is it operating in? This is not flavor text. The role determines what knowledge the model pulls forward and what tone it produces.

You are a senior backend engineer reviewing a Node.js API for a 
fintech startup that processes 10K transactions per day. Security 
and reliability are non-negotiable. You have 15 years of experience 
and you push back on lazy patterns.

The specifics matter. "Senior backend engineer" produces different output than "code reviewer." The transaction volume sets the stakes. The pushback instruction prevents sycophantic agreement.

Layer 2: Task and Constraints

What exactly do you want, and what are the rules? This is where most prompts collapse into vagueness. Instead of "review this code," I write:

Review the attached order-processing module for:
1. Race conditions in the payment confirmation flow
2. Missing input validation on amount and currency fields
3. Error handling gaps that could leave orders in inconsistent states
4. Database query patterns that won't scale past 100 concurrent users

Skip stylistic feedback. Skip suggestions about adding tests unless 
a critical path is untested.

Specific tasks produce specific output. Negative instructions ("skip stylistic feedback") prevent the model from filling space with low-value commentary.

Layer 3: Format

How should the output be structured? I specify this every time, even for short outputs.

Return findings as a markdown table with these columns: 
File, Line, Severity (Critical/High/Medium), Issue, Suggested Fix.
Sort by severity descending. Limit to top 10 findings.

If I don't specify format, I get prose, sometimes bullets, sometimes a table, sometimes a wall of code with explanation around it. None of these are easily processed downstream.

Layer 4: Examples

One good example beats three paragraphs of explanation. I include 1 to 3 examples in any prompt where output structure matters.

Example finding:
| api/orders.js | 47 | Critical | Payment confirmation 
runs before DB write completes, allowing double-charge | 
Use transaction with SELECT FOR UPDATE on order row |

The model now has a concrete reference for what good output looks like.

How I Build a Prompt From Scratch

When I need a new prompt for a recurring task, I follow a five-step process. The whole thing takes 15 to 30 minutes and produces a prompt I'll use for months.

Step 1: Write the rough version

I type out what I want as if I'm explaining it to a colleague who's never done the task before. No editing, no structure. Just dump the brain.

Step 2: Run it once

I paste the rough version into Claude Code and see what comes back. The first run almost always reveals what I forgot to specify.

Step 3: Identify the gaps

Common gaps I spot in first runs:

Output format isn't what I wanted (too long, wrong structure)
Model included things I didn't want (stylistic notes, alternative approaches)
Model missed things I assumed were obvious (security, edge cases)
Examples would have prevented confusion

Step 4: Restructure into the four layers

I rewrite the prompt with explicit Role, Task, Format, and Examples sections. Each gap from step 3 becomes a constraint or example in the new version.

Step 5: Save it

The prompt goes into a markdown file in my prompts library, named for the task. Next time I need it, I'm not starting from zero.

Want to see my actual prompts library and the workflow templates I use? Check out my prompt engineering toolkit where I share every pattern I've validated in production.

The Prompt Library

I keep all production prompts in a single repo. The structure is simple:

prompts/
  code-review/
    backend-security-review.md
    frontend-accessibility-review.md
    database-query-review.md
  content/
    blog-outline-generation.md
    technical-tutorial-draft.md
    headline-variations.md
  research/
    competitor-feature-analysis.md
    api-documentation-summary.md
  ops/
    incident-postmortem-template.md
    daily-briefing-generator.md

Each file is a prompt I can copy, paste, and run. Some are static. Most have placeholder sections where I drop in the specific code or content for that run.

The library has about 40 prompts after a year of use. About 10 of them I run multiple times per week. Those 10 alone have saved me hundreds of hours.

The Iteration Pattern That Actually Works

When a prompt isn't producing what I want, the temptation is to add more instructions. This usually makes things worse. Long prompts dilute attention. The model starts treating each instruction as equal weight when some are critical and others are minor.

My iteration pattern:

Read the bad output carefully. What specifically is wrong? Is it format, content, tone, or scope?
Identify which layer needs adjustment. Bad scope is usually a Task layer issue. Bad format is a Format layer issue. Bad tone is usually a Role layer issue.
Make one change, not five. I adjust one layer, run again, and see if the issue is resolved before touching anything else.
If multiple changes are needed, version the prompt. Save v1, save v2, compare outputs side by side. This is how I know I'm actually improving and not just trading one problem for another.

This is slower than rapid-fire revisions but converges on a good prompt much faster.

Context Window Management

Even with great prompts, you can sabotage yourself by polluting the context window. A few patterns I've adopted:

Use fresh sessions for unrelated tasks

If I just spent an hour debugging a Python issue and now I want to write a blog post, I start a fresh Claude Code session. The Python context is irrelevant noise for the writing task and will leak into the output in subtle ways.

Reference files instead of pasting

When I need Claude to work with a long document, I reference the file path instead of pasting the entire content into the prompt. Claude Code reads it on demand, which keeps the context window cleaner and ensures it's reading the current version, not a stale copy.

Summarize, don't accumulate

For long-running sessions, I periodically ask Claude to summarize what we've established so far, save the summary, and start a new session with that summary as the context. This keeps signal-to-noise high.

The biggest unlock in my workflow: a saved prompts library + fresh sessions per task. This combination eliminated 80% of the inconsistency I was fighting before.

The Patterns Worth Memorizing

A handful of patterns show up in almost every prompt I write. These are worth committing to muscle memory.

"Show your work"

For complex reasoning tasks, I add: "Before giving your final answer, list the key considerations and tradeoffs you're weighing." This forces explicit reasoning and makes errors visible.

"If you're uncertain, say so"

I add this whenever the task involves judgment: "If you're not confident about a specific point, mark it as such rather than presenting it as fact." Reduces hallucination and gives me a clearer signal of where to verify.

"Compare against this baseline"

When I'm refining content or code, I include the previous version and ask: "Compare this to the previous version and explain what's improved and what regressed." Catches drift that isn't obvious from reading the new version alone.

"Stop at the first issue"

For long reviews, I sometimes use: "Identify the first critical issue and stop. Do not continue to lower-priority findings until the critical one is addressed." Useful when I want to fix-as-I-go rather than getting a 50-item report.

What I'd Tell Someone Starting Out

Three things that took me too long to figure out.

First: stop optimizing the prompt before you've validated the task. If you're not sure what good output looks like, no amount of prompt engineering will produce it. Get clarity on the goal first, then engineer the prompt.

Second: keep a "things that didn't work" file. When a prompt produces bad output, save the prompt and the output. These are gold for understanding model behavior. After three months you'll have a personal corpus of failure modes that makes you faster than any guide.

Third: prompts compound. Every good prompt you save becomes the foundation for the next one. A year in, I'm not writing prompts from scratch anymore. I'm assembling them from pieces I've already validated. That compound effect is the real productivity gain.

What's Next

The prompts library is a force multiplier, but the next layer is automation. I'm starting to wire individual prompts into scheduled tasks and CI hooks so the prompt runs without me invoking it. A weekly competitor review prompt that runs every Monday morning. A code review prompt that fires on every PR. A daily briefing prompt that summarizes Shopify, Meta, and Klaviyo data.

The goal is for the prompts library to graduate from "things I run manually" to "the operating system for my work."

If you want my actual prompt templates and the automation patterns I'm using to wire them into scheduled workflows, grab the full toolkit here. 40+ production-ready prompts plus the automation recipes that make them run without you.

The prompts library is one of those investments that compounds slowly then suddenly. Six months in, you barely notice it. A year in, you can't remember how you ever worked without it.

DEV Community