Abhishek Gautam

Posted on Aug 20

Steerable Prompts: Prompt Engineering for the GPT-5 Era

#ai #chatgpt #promptengineering #productivity

Welcome, fellow builders! If you're diving into GPT-5, you're stepping into a new era of AI. GPT-5, represents a significant leap forward in areas like agentic task performance, coding prowess, raw intelligence, and its ability to be steered. But what does "steerability" really mean for us, the developers and problem-solvers on the front lines? It means that how you ask matters more than ever.

What Exactly is Prompt Engineering?

At its core, a large language model (LLM) like GPT-5 is a sophisticated prediction engine. Give it an input – what we call your "prompt" – and it calculates the most probable next word (or "token") based on the colossal datasets it was trained on. So, your prompt isn't just a question; it's the blueprint. It's the DNA of the output you want.

At its heart, prompt engineering is simply the art and science of teaching AI to think clearly.

Now, with GPT-5, there’s a fascinating wrinkle: adaptive compute. This means your prompt isn't just guiding the content; it's literally influencing how hard the model works to deliver that content.

For complex reasoning tasks, GPT-5 can allocate more computational resources, while for simpler ones, it might use less. This is a profound shift from earlier models and opens up new avenues for efficiency and performance.

Why Does Prompt Engineering Matter So Much Now?

The beauty of prompt engineering is its accessibility. What it does demand is clarity, specificity, and intentionality in your inputs.

Imagine you're briefing a highly capable, exceptionally intelligent junior engineer. If you give them a vague request like "Help me with this draft," you'll get a vague output. But if you tell them: "You are a brand copywriter. Improve the tone of this draft to make it more confident and modern," suddenly, you've provided the context, the role, and the desired outcome, and they can deliver something truly useful.

This is precisely why prompt engineering is a powerful leverage skill. The clearer you are, the more productive and valuable AI becomes in your workflows. With GPT-5's enhanced capabilities – its built-in memory, multimodal understanding (yes, it's not just text anymore!), and significantly increased sensitivity to instructions – mastering this skill is more critical than ever. It's how you go from merely using AI to truly partnering with it.

Because GPT-5 is so surgically precise in following instructions, poorly constructed prompts with contradictory or vague guidance can be more damaging than with older models. (increased sensitivity to instructions). The model will expend valuable "reasoning tokens" trying to reconcile these contradictions instead of delivering the desired output.

You'll learn how prompts work behind the scenes, proven techniques to boost accuracy and creativity, ready-to-use templates for various workflows, and crucial mistakes to avoid, especially given GPT-5's instruction sensitivity.

The "Truth" About Generative AI (What You Can't Control... Entirely)

It's important to remember that while we call them "AI," the "artificial" part is as crucial as the "intelligent". These LLMs aren't thinking like a human brain. They're intricate prediction engines, generating the most statistically likely sequence of tokens based on your input and their training data.

Even with GPT-5's phenomenal adaptive compute, it's still operating on probability. This means tiny changes in phrasing or structure can sometimes lead to radically different outputs. Our job is to minimize that randomness and maximize the intentionality.

LLM Output Configuration (What You Can Control)

While you can't control the model's fundamental nature as a prediction engine, you have powerful levers to control its behavior and output. Many AI platforms offer settings to adjust how responses are generated.

Temperature: This controls the randomness of the output. A lower temperature (e.g., 0.2) means more focused and factual responses, while a higher temperature (e.g., 0.8) encourages creativity and variability. For high-stakes tasks where accuracy is paramount, you'll want that temperature closer to freezing.
Max Tokens: This is your cap on the length of the response. It prevents the model from rambling on endlessly.
Top-p / Top-k: These are more granular sampling settings that determine the pool of words the model can choose from next, influencing the diversity of the output.

But with GPT-5, we get two new, incredibly important API parameters to add to our toolkit:

reasoning_effort: This directly controls how "hard" the model thinks and how eagerly it calls tools. The default is medium, but you can scale it up for complex, multi-step tasks to ensure the best outputs, or scale it down for latency-sensitive applications. We'll dive into this more when we discuss agentic behaviors.
verbosity: This parameter influences the length of the model’s final answer, distinct from its internal thinking process. The beauty here is that while you can set a global verbosity parameter, GPT-5 is trained to respond to natural language overrides within your prompt for specific contexts. For example, you could set a global low verbosity but then instruct the model to be highly verbose specifically when generating code.

These controls, especially reasoning_effort and verbosity, give you unprecedented granular control over GPT-5's behavior. Learning to wield them effectively is key to unlocking the model's full potential.

The Anatomy of a Perfect Prompt: Your Master Blueprint

When engineering enterprise systems, we'd often talk about "getting it right on the first try." That's the holy grail of prompting: the one-shot. A perfectly crafted prompt that inspires the AI to generate exactly what you need without any follow-up tweaks.

Interestingly, much of the philosophy behind this perfect prompt comes from insights shared by Greg Brockman, the president of OpenAI, regarding their o1 reasoning model. While his guide was for o1, the core structure is remarkably applicable across all modern LLMs, and certainly holds true for GPT-5.

Let's dissect this "perfect prompt" into its four essential components:

1. Goal: Your North Star

This is where you state your ultimate objective as clearly and concisely as possible. No ambiguity, no fluff. Just the pure, unadulterated intent.

Think of it like defining the acceptance criteria for a user story. If you can't articulate the why and what in a single, focused sentence, your prompt is already fighting an uphill battle.

Example: "I want a list of the best medium-length hikes within two hours of San Francisco. Each hike should provide a cool and unique adventure, and be lesser known".
Try it out: Before you type anything, ask yourself: "What is the single, most important thing I want this model to achieve?" Write that down first.

2. Return Format: Shaping the Output

Once the model understands what you want, the next crucial step is telling it how you want it. This eliminates the guesswork and ensures consistency. Do you need a JSON object? A bulleted list? A multi-paragraph email? Specify it!

This is where we impose structure on what can otherwise be a free-form text blob. If you've ever dealt with inconsistent API responses from a poorly documented service, you know the pain. Don't let your LLM outputs be that service. With GPT-5, explicitly defining the format helps prevent it from defaulting to a generic, "lowest-common-denominator" response. We’ve even seen how you can prompt GPT-5 to emit clear upfront plans and consistent progress updates via "tool preamble" messages, drastically improving user experience.

Example from the source: "For each hike, return the name of the hike as I’d find it on AllTrails, then provide the starting address of the hike, the ending address of the hike, distance, drive time, hike duration, and what makes it a cool and unique adventure".
Try it out: After your goal, add a line like: "Format your response as a JSON object with keys name, address, distance, duration, unique_aspect." Or, "Provide the answer as a bulleted list, each point no longer than 15 words."

3. Warnings: Guarding Against Pitfalls

This section is your opportunity to preemptively address potential errors, especially the dreaded "hallucination" – where the model confidently generates realistic-sounding but utterly false information. This is your chance to apply guardrails.

Even the most advanced models can veer off course if you don't set clear boundaries. Especially when dealing with real-world data, the risk of hallucination is ever-present. Explicitly tell the model what not to do, or what areas require extreme caution. The source notes that phrases like "Think hard" and "Be careful" can signal to the model that these instructions are of paramount importance.

Example: "Be careful to make sure that the name of the trail is correct, that it actually exists, and that the time is correct".
Try it out: Add phrases like: "Verify all factual claims with external data before responding," or "Do not invent any information; if you're unsure, state that clearly."

4. Context: The Rich Tapestry

This is arguably the most powerful part of your prompt. Context provides the "Who" and "Why" behind your request, along with deeper nuances for the "What," "Where," "How," and "When". Without context, the model can't truly understand what you mean by subjective terms like a "unique" adventure or a "medium-length" hike.

This is where you bring the human element to the cold probabilistic logic of the LLMs. The more authentic and detailed your context, the better the model's "mental model" of your intent becomes.

Try it out: Always ask yourself: "What background information, no matter how small, could help the model better understand my underlying need or preference?"

By meticulously crafting these four sections, you're not just writing a prompt; you're engineering a precise instruction set for a powerful AI, setting the stage for truly exceptional outputs.

The Inner Workings of a Prompt: Factors, Iteration, and GPT-5's Nuances

From my experience with prompt engineering, I can tell you that successful interaction with an LLM is rarely a one-and-done affair. It's an iterative dance of testing, tweaking, and refining.

Think of it like giving a highly capable assistant a task. If you don't explain what you want, how you want it, and why it matters, the results might be vague, verbose, or just plain wrong.

Several Factors That Shape a Prompt

The Model Itself: Each LLM has its own unique strengths, capabilities, and even quirks. GPT-5, for instance, leads all frontier models in coding capabilities and frontend/backend app development.
Context Input: The quality of your provided documents, examples, or background information significantly impacts reasoning and accuracy.
Structure: Clear formatting in your prompt improves output consistency and usefulness.
Style + Tone: You can directly control the formality, voice, or persona.
Model Settings: Parameters like temperature, max_tokens, top-p/top-k influence creativity vs. precision.

GPT-5's Nuances: Precision, Persistence, and Power

Precision and Instruction Following

GPT-5 is our most steerable model yet, extraordinarily receptive to prompt instructions regarding verbosity, tone, and tool-calling behavior. It follows instructions with surgical precision.

But beware: vague or contradictory prompts can cause wasted reasoning tokens.

Real-world Example (Healthcare Assistant): Conflicting instructions (auto-assign appointment vs. require patient consent vs. escalate emergency) made GPT-5 burn reasoning effort trying to reconcile them. Fixing instruction hierarchy drastically improved performance.

Actionable Today: Review prompts for ambiguities and contradictions before deploying.

Reasoning Effort and Agentic Behavior

Prompting for Less Eagerness: lower reasoning_effort, set tool budgets, provide escape hatches.
Prompting for More Eagerness: increase reasoning_effort, add persistence prompts, define stop conditions.

Minimal Reasoning: The Need for Speed

Best for latency-sensitive applications.

Actionable Today: Use short explanations, tool-calling preambles, explicit planning snippets.

Reusing Reasoning Context with the Responses API

Use previous_response_id to conserve reasoning tokens, reduce latency, and improve agentic flows.

Markdown Formatting & Metaprompting

Markdown Formatting: Prompt GPT-5 explicitly for markdown consistency.
Metaprompting: Ask GPT-5 to optimize prompts for itself, suggesting minimal edits.

Why Do Prompts Go Sideways?

Before we fix a prompt, we need to understand why it broke. Imagine you're giving instructions to a highly capable, incredibly literal assistant. If you don't explain what you want, how you want it, and why it matters, the results might be vague, overly verbose, or just plain wrong.

So, when your prompt goes astray, it's often due to one or more of these factors:

Model: Each LLM has its own quirks.
Context: Insufficient or poor-quality input can derail reasoning.
Structure: Unclear formatting leads to inconsistent outputs.
Style + Tone: If you don't specify, the AI might default to a generic voice.
Model Settings: Things like temperature (randomness) or max tokens (length) can be miscalibrated for the task.

Your Diagnostic Toolkit: Spotting the Trouble

When you get an output that just isn't cutting it, pause. Don't just re-roll or try a completely new prompt. Use this quick checklist, straight from the guide, to diagnose the problem:

Am I being too vague? Be specific about the task and expectations.
Did I include a role or point of view? Adding "You are a..." sets the tone and mindset.
Is the input complete and relevant? Include all necessary information for the model to reason effectively.
Have I requested a clear format? Specify if you want bullets, a paragraph, JSON, etc..
Am I asking for reasoning? If judgment is involved, ask the model to "think step by step" or explain its logic.
Have I broken the task into smaller parts if needed? Split complex requests into multiple, focused steps.
Could I include examples or longer input context? GPT-5 handles massive context windows – entire documents, transcripts, or long examples – which can guide the output effectively.

Now, let's dive into some common prompt "ailments" and their practical cures.

Prescription for Prompts: Common Ailments and Their Cures

The guide provides a fantastic "Problem ❌ Weak Prompt ✅ Improved Prompt" table that's a masterclass in prompt refinement. Let's break down some of these patterns and connect them to foundational prompt engineering principles.

1. The Vague Instruction: "Write a summary."

The Problem: This is the most common culprit. It tells the LLM what to do, but not how or for whom, or what kind of summary. The model has too much freedom and defaults to a lowest-common-denominator output.

The Fix: Be Specific! Define your Goal clearly. Add constraints, target audience, and desired output characteristics.

Weak Prompt: "Write a summary."
Improved Prompt: "Summarize the article below in 3 bullet points. Focus on key findings, avoid repeating the introduction."

2. Missing Audience or Role: "Rewrite this for clarity."

The Problem: The LLM doesn't know who it's writing for, or who it should pretend to be to write it.

The Fix: Assign a clear Role and specify the Audience.

Weak Prompt: "Rewrite this for clarity."
Improved Prompt: "Rewrite this for a busy executive audience. Use short sentences and strip out nonessential background."
Another Example: "You are a brand copywriter. Improve the tone of this draft to make it more confident and modern."

3. Insufficient Context: "Help me with this draft."

The Problem: The model lacks necessary background information or scenario to provide a helpful response.

The Fix: Provide complete and relevant input using Contextual Prompting.

Weak Prompt: "Help me with this draft."
Improved Prompt: "Using the customer persona and product description below, write a 2-sentence ad hook that appeals to first-time users."

4. Missing Return Format Instruction: "What's a good alternative?"

The Problem: The model might give you a paragraph when you need a list.

The Fix: Specify a clear Return Format.

Weak Prompt: "What's a good alternative?"
Improved Prompt: "Suggest 3 alternatives in a numbered list. Include 1–2 sentence explanations for each."

5. No Reasoning Requested: "What's the best option here?"

The Problem: Asking for just an answer leads to shallow responses.

The Fix: Ask for reasoning step-by-step (Chain-of-Thought).

Weak Prompt: "What’s the best option here?"
Improved Prompt: "Evaluate these 3 options. List pros and cons for each, then recommend one with a short rationale."

6. Complex Tasks, Undivided: "Help me improve this."

The Problem: Multi-faceted tasks overwhelm the model.

The Fix: Break tasks into smaller parts.

Weak Prompt: "Help me improve this."
Improved Prompt: "Rewrite this performance review to follow this structure: achievements, challenges, and next steps."

7. Contradictory Instructions: The Silent Killer (Especially for GPT-5)

The Problem: Conflicting instructions waste reasoning tokens.

The Fix: Review and resolve contradictions.

Correction Example: For the CareFlow Assistant, clarify that auto-assignment happens only after informing the patient, consistent with consent.

8. Managing Agentic Behavior and Verbosity

The Problem: The model may be too eager, not eager enough, or too verbose/terse.

The Fix:

For Less Eagerness: Lower reasoning_effort, add early stop criteria.
For More Eagerness: Increase reasoning_effort, encourage persistence.
For Verbosity Control: Use verbosity parameter and natural-language overrides.
For Tool Use: Provide clear upfront plans and progress updates.

The Iterative Lab: Refining for Consistency

Prompt engineering is iterative. Test, tweak, and refine.

Key Tips for Testing Prompts:

Change one variable at a time.
Compare outputs across models.
Keep a reusable prompt library.
Diagnose failures (unclear instruction, missing input, poor formatting).

Takeaway:

Pick one weak prompt. Use the 7-point Prompt Quality Scorecard. Tweak just one variable (e.g., role, format, context). Iterate until you achieve a strong, consistent result.

Closing

Prompt engineering with GPT-5 isn't about guesswork; it's about intentional design. By understanding these core concepts – from defining your goal and format to meticulously managing context, reasoning, and even allowing the model to optimize its own instructions – you're ready to build truly robust and intelligent applications.

Now go forth and make LLMs work for you!

DEV Community