Romina Elena Mendez Escobar

Posted on Sep 9

GenAI Foundations – Chapter 2: Prompt Engineering in Action – Unlocking Better AI Responses

#ai #openai #data

👉 “Techniques to boost reasoning, accuracy, and interaction”

Introduction

As introduced in Chapter 1, language models operate on tokens, and every token carries both cost and context-window implications. In practice, this means that every word you add has a price and contributes to the limited memory window of the model.

Longer prompts can certainly provide richer context, but they also increase latency, raise costs, and may hit model length limits.

We can now explore the main groups of prompting techniques, each applying these principles in different ways (balancing clarity, structure, and context), while introducing methods tailored to specific use cases.

Prompt Engineering

What is prompt engineering and what is its objective?

Prompt Engineering is the practice of designing, structuring, and optimizing instructions (prompts) to guide generative AI models, with the goal of ensuring they understand and respond accurately and effectively to a wide range of queries, aligned with the user’s objectives.

This discipline goes beyond simply writing instructions, because It involves understanding how models process information and applying specific techniques to maximize the effectiveness of human–machine communication, ensuring that the system correctly interprets the task, the context, and the expected response format.

Classification of Prompt Engineering techniques

For this article, I propose a classification with a didactic and practical purpose, to make it easier to study and understand the main prompt engineering techniques. Although the specialized literature may present other ways of grouping them, I decided to organize them into four groups that allow for a quick understanding of their objectives and most common uses, based on my practical experience with these techniques.

Group 1: According to the number of examples

Before diving into advanced strategies, let’s start with the basics: how much does a model need to see in order to understand what we want? Sometimes a simple instruction is enough (zero-shot). Other times, it helps to show one example (one-shot) or a handful (few-shot). This group is all about “teaching by examples,” the gateway to almost any prompting technique.

Zero-shot prompting: Asking the model to perform a task without showing it any previous examples.

Some recommendations:

🎯Best for: Simple tasks, well-known domains, budget-conscious applications, quick prototyping
💡Trade-offs: Fast and cheap (minimal tokens) but lower accuracy on complex or domain-specific tasks
❌Avoid when: Working with specialized terminology, complex formatting requirements, or novel task types

Example:
Write a formal invitation email for a corporate networking event hosted by a technology company.

One-shot prompting: Including a single example to guide the model.

Some recommendations:

🎯Best for: Testing response formats, when you have limited examples available, establishing basic patterns
💡Trade-offs: Better than zero-shot for formatting, but single example can create bias toward that specific style
❌Avoid when: Task has multiple valid approaches, or when the single example might mislead the model

Example:
Write a formal invitation email for professional events.
Example:
"Subject: Invitation to the Annual Health Innovation Forum
Dear Dr. Smith,
We are delighted to invite you to the Annual Health Innovation Forum taking place on March 10th in New York City…"

Now write an invitation for a corporate networking event hosted by a technology company.

Few-shot prompting: Including several examples to reinforce the pattern.

Some recommendations:

🎯Best for: Complex tasks, specialized domains, when accuracy justifies increased cost, establishing consistent patterns
💡Trade-offs: Higher accuracy and consistency but 3-5x more input tokens (higher cost and longer prompts)
❌Avoid when: Working with simple tasks, strict budget constraints, or when examples are difficult to create

Example:
Write professional invitation emails following these patterns:

Example 1:
"Subject: Invitation to the Annual Health Innovation Forum
Dear Dr. Smith,
We are delighted to invite you to the Annual Health Innovation Forum taking place on March 10th in New York City…"

Example 2:
"Subject: Join Us at the Global Fintech Summit
Dear Ms. Johnson,
It is our pleasure to invite you to the Global Fintech Summit, scheduled for April 15th in London…"

Example 3:
"Subject: Invitation to the Cultural Leadership Gala
Dear Mr. Brown,
You are cordially invited to attend the Cultural Leadership Gala on > May 20th in Paris…"

Now write an invitation for a corporate networking event hosted by a technology company.

Group 2: Personalization techniques

Getting the model to “answer correctly” is one thing—but what if you also need it to sound like a doctor, a teacher, or a lawyer? Or to adapt its tone to be formal, persuasive, or playful? This group gathers techniques that shape the model’s voice, identity, and emotional tone, so the outputs don’t just make sense, they also feel aligned with the situation or audience.

Style Prompting

Style Prompting A technique that modifies the tone, register, or format of the model's response to adapt it to a specific context, allowing the same content to be expressed formally, creatively, technically, or colloquially.
The process to implement it is:

Indicate the desired tone or style.
Specify the content to be generated.

Some recommendations:

🎯Best for: Brand consistency, specific communication requirements, matching existing content tone
💡 Trade-offs: Ensures consistent style but may override natural response patterns for the content type
❌Avoid when: Style conflicts with the role, or when content type has established conventions

Emotion Prompting

A technique that guides the model's response to convey a specific emotion (joy, urgency, enthusiasm, nostalgia, etc.), generating more persuasive and emotionally connected texts.
The process to implement it is:
Indicate the emotion the text should convey.
Request the specific task.

Some recommendations:

🎯Best for: Marketing copy, persuasive content, user engagement, creative writing.
💡 Trade-offs: Can increase engagement but may compromise objectivity or professionalism.
❌Avoid when: Professional analysis or when emotion conflicts with role/context.

Role Prompting

Assigns the model a specific role or identity to influence how it responds, adopting the perspective, knowledge, and style of a particular profile.
The process to implement it is:
Define the desired role.
Indicate the task to be performed from that perspective.

Some recommendations:

🎯Best for: Professional contexts, when specific expertise perspective is needed, formal communications
💡 Trade-offs: Provides expert context but may limit creative responses or alternative viewpoints
❌Avoid when: You need multiple perspectives, creative brainstorming, or informal contexts

System Prompting

System Prompting A technique that defines global behavioral rules for the model from the beginning of the conversation or task. Unlike a specific prompt, these instructions apply persistently and condition all subsequent responses.
The process to implement it is:

Establish the rules and role of the model at the beginning.
Maintain consistency across all interactions.

Some recommendations:

🎯Best for: Applications with consistent interaction patterns, chatbots, persistent behavior requirements
💡Trade-offs: Ensures consistency across conversations but harder to override for specific exceptions
❌Avoid when: One-time tasks, or when flexibility is more important than consistency

Group 3: Techniques that improve reasoning and response quality

This is where we move from simply “responding” to actually thinking before responding. These techniques guide the model to structure its reasoning, explore alternatives, and build more reliable answers. They shine when you’re dealing with complex problems, multi-step processes, or high-stakes decisions where accuracy matters most.

🪜 Chain of Thought (CoT) – reason step by step

Chain-of-thought prompting guides a model to produce intermediate reasoning steps before the final answer. Rather than requesting the answer directly, the prompt structures the task into smaller logical substeps and asks the model to address them sequentially. This approach can reduce errors on complex problems and improve clarity.

In practice, the prompt typically:

Asks the model to break the problem into logical intermediate steps.
Requires completing each step before proceeding, then providing the final answer.

This approach helps reduce logical errors, increases process clarity, and improves accuracy, as the model follows a structured reasoning path.

Some recommendations:

🎯Best for: Math problems, logical reasoning, step-by-step processes, debugging code.
💡Trade-offs: 2-3x more output tokens but significantly better accuracy on multi-step problems.
❌Avoid when: Simple factual queries, creative tasks, or when intermediate steps aren't valuable.

🔎 Step-Back Prompting

Step-back prompting guides the model to reason at a higher level of abstraction before addressing specific details. The prompt first directs the model to identify principles and key concepts relevant to the task and then apply them to the concrete case.
In practice, the prompt typically instructs the model to:

Identify the principles, concepts, or key elements related to the task.
Apply those elements to solve the specific case.

This structured approach can reduce errors in intermediate steps and improve coherence, since the model begins with a general perspective before generating the final solution.

Some recommendations:

🎯Best for: Complex problems that benefit from understanding broader principles first, educational content, troubleshooting technical issues, strategic analysis that requires foundational context.
💡Trade-offs: Helps establish solid foundations before diving into specifics, but adds an extra step that increases token usage and response time.
❌Avoid when: Working with simple, direct questions that don't require conceptual framework, time-sensitive queries, or when the foundational concepts are already well-established in the context.

❓Self-Ask Prompting

Self-ask prompting decomposes a complex task into simpler sub-questions that are answered sequentially and then synthesized into a final response.
In practice, the prompt typically guides the model to:

determine whether sub-questions are needed to solve the task
generate those sub-questions and provide intermediate answers
synthesize the intermediate answers into a coherent final solution.

This decomposition can improve coverage of requirements, surface assumptions, and reduce errors in multi-step problems.

Some recommendations:

🎯Best for: Research questions, complex information gathering, breaking down requirements
💡Trade-offs: Good for decomposition but can over-complicate simple questions
❌Avoid when: Straightforward queries, when you need concise responses, or for creative tasks

🧵Thread-of-Thought (ThoT) Prompting

Thread-of-thought prompting is used to process extensive contexts that contain mixed or noisy information. The technique divides the content into smaller segments, extracts the relevant elements from each, and discards the rest. The model then integrates the key findings to reach a coherent conclusion.
In practice, the prompt typically:

segments the information into manageable parts
analyzes each segment to identify and filter out irrelevant details
combines the remaining elements to produce a synthesized response.

This structured process helps the model maintain focus, reduce the impact of noise, and generate more reliable outputs.

Some recommendations:

🎯Best for: Processing long documents with mixed information, filtering relevant from irrelevant data
💡Trade-offs: Excellent for noisy data but adds processing overhead for clean documents
❌Avoid when: Working with well-structured, clean documents or short content

🌳 Tree-of-Thought (ToT) Prompting

Organizes reasoning as a decision tree, exploring several lines of thought in parallel before reaching the best response. The model generates different options, evaluates and discards non-viable ones, and deepens the most promising ones.
Instead of following a single line of reasoning, the model:

Generates different routes or options (branches).
Evaluates and discards the ones that are not viable.
Expands on the most promising, with the possibility of going back and trying alternatives.

This approach allows planning ahead, correcting errors midway, and finding more robust solutions for complex problems.

Some recommendations:

🎯Best for: Open-ended problems with multiple valid solutions, creative problem-solving where you need to explore alternatives, strategic planning with different scenarios, complex decision-making that benefits from evaluating multiple approaches.
💡Trade-offs: Explores multiple solution paths which increases accuracy for complex problems, but requires 5-10x more tokens due to generating and evaluating multiple branches, making it the most expensive reasoning technique.
❌Avoid when: Simple problems with obvious solutions, time-critical applications, strict budget constraints, or when you need a single definitive answer quickly rather than exploring alternatives.

✅ Self-Consistency Prompting

Self-consistency prompting enhances reasoning by generating multiple reasoning paths for the same problem and selecting the most consistent outcome. The model explores alternative approaches, compares the resulting answers, and chooses the one supported by the greatest agreement.
In practice, the prompt typically guides the model to:

generate several distinct reasoning paths.
compare the different solutions obtained.
select the final answer based on consistency across them.

Some recommendations:

🎯Best for: Important decisions, fact-checking, when single errors are costly
💡Trade-offs: 3-5x more tokens for multiple attempts but higher reliability
❌Avoid when: Casual applications, creative tasks where consistency isn't needed, or budget constraints

Group 4: Techniques to improve understanding and interaction

Finally, we arrive at the techniques that make conversations with the model smoother and more precise. The focus here is on reducing ambiguity, checking understanding, and enriching the context. With these strategies, the model becomes a more attentive and reliable collaborator—perfect for advanced assistants or multi-step workflows.

Contextual Prompting

It is a technique that improves the accuracy of the response by providing additional context information before making the question or request.
The goal is for the model to better understand the situation and respond with greater relevance.
The basic steps to do this are:

Add relevant data to the prompt about the event (place, date, audience, purpose), where it can also involve limits or scopes (e.g. only in nearby places) or certain aspects to emphasize such as cost/performance (place, date, audience, purpose).
Formulate the request clearly.

This technique can be combined with others, such as including examples (Group 1) or establishing a role, tone, or personality (Group 3), to further enrich the context and improve the result.

Some recommendations:

🎯Best for: Domain-specific tasks, when background information affects the answer, personalized responses
💡Trade-offs: More relevant responses but increased token usage and complexity
❌Avoid when: Context is obvious, working with general knowledge questions, or strict token limits

Rephrase and Respond (RaR)

The model reformulates the request to confirm that it has understood it before generating the final response, reducing interpretation errors and improving coherence.
The reformulation can include clarifying the context, reorganizing the information, or specifying details of the task, so that the interpretation is as faithful as possible to the user’s intention.
This approach helps reduce interpretation errors and improves coherence, ensuring that the final result meets expectations.

The basic steps to do this are:

Ask the model to reformulate the question or task.
Produce the response based on that reformulation.

Some recommendations:

🎯Best for: Ambiguous requests, critical applications where misunderstanding is costly, complex instructions.
💡Trade-offs: Reduces misunderstanding but doubles response time and token usage.
❌Avoid when: Simple, clear requests or when speed is more important than precision.

Re-reading (RE2)

It is a technique that consists of reading and processing the information twice before responding, to improve accuracy and avoid omissions.
The first reading identifies the key points; the second confirms and organizes them to generate the response, being especially useful when the source of information is extensive, complex, or contains details that could be overlooked.
The basic steps to do this are:

First reading: The model makes an initial reading to identify and extract the main data or key points.
Second reading: The model rereads the same text to verify, confirm, and organize that data, paying attention to possible corrections, contradictions, or key secondary details.

Recommendations:

🎯Best for: Long documents, complex information with potential contradictions, critical data extraction
💡Trade-offs: Higher accuracy with complex content but unnecessary overhead for simple documents
❌Avoid when: Working with short, clear content or when processing speed is critical.

ReAct (Reason and Act)

It is an advanced technique that combines step-by-step reasoning (Reason) with concrete actions (Act), such as searching for external data or executing functions.
It is used especially in autonomous agents or tasks that require obtaining and processing information before responding.
Follow this reasoning and action process:

Reasoning Step: Analyze the request and determine what information you are missing to complete the task.
Action Step: Use a search tool to find the missing information. Describe the search you would carry out.
Observation Step: Simulate the result of the search and present the information you would obtain.
Reasoning Step: With the new information, plan the structure of the invitation.
Final Response: Write the complete invitation using all the data.

Some recommendations:

🎯Best for: Research tasks, multi-step processes requiring external data, autonomous agents
💡Trade-offs: Can handle complex workflows but requires tool integration and multiple API calls
❌Avoid when: Simple queries, no external data needed, or when tool complexity outweighs benefits

What are the limitations and factors that affect the quality of the responses?

Language models, although powerful, have inherent limitations and are subject to factors that can degrade the quality of their responses or generate additional risks. Understanding these limitations is essential for a responsible and effective use of the technology.

Hallucinations

Hallucinations constitute one of the key challenges of generative AI and they occur when a model generates content that seems plausible but is factually incorrect or lacks real support. Essentially, the model “perceives” non-existent patterns or invents information to complete a response.

Main causes

📉 Limited training data → model fills gaps with invented content.
⚠️ Noisy data → incorrect, biased, or contradictory inputs replicated.
❓ Lack of context → ambiguous prompts lead to invented details.
🚫 No restrictions → model generates beyond the available sources.
🔥 High temperature → more randomness = higher chance of hallucinations.

Model sampling parameters

When a language model predicts the next word, it first generates a probability distribution over all the possible options. The sampling parameters allow controlling how it is finally decided what, how many, and which words of the output to generate, of which the most used are Temperature, Top-p, Frequency Penalty, and Max Tokens, although they may vary between models.
These parameters, in summary, allow adjusting the balance between creativity and predictability of the generated texts, making it possible to optimize the responses according to the specific context of use.

Temperature: controlling creativity and coherence

Temperature is a parameter that controls the behavior of the model and adjusts the degree of randomness in the responses.

💡 Note: Adjusting the temperature depends on the objective. In formal writing or technical analysis, low values are preferable. To generate ideas, narratives, or creative explorations, higher values can be useful.

Top-p: the precision of the vocabulary

This parameter controls the diversity of the vocabulary by dynamically selecting a subset of tokens whose cumulative probability mass does not exceed a defined threshold.

How does Top-p work?

The model calculates the probability of each possible next word.
It orders them by probability from highest to lowest.
It calculates the cumulative probability until reaching the specified P value.
It only considers words within this “nucleus” for the selection.

Practical examples:

Top-p = 0.1: Only considers the 10% most probable of the vocabulary (very conservative).
Top-p = 0.9: Considers the 90% most probable (more diverse).
Top-p = 1.0: Considers all possible words.

Example

Frequency Penalty

The frequency penalty is a parameter that controls the repetition of tokens in text generation. Its function is to reduce the probability that the model will again choose words or phrases that have already appeared previously in the conversation.
In other words, each time a token is repeated, it receives a penalty in its score, which makes it less likely to be selected again. This helps to avoid responses with excessive repetitions and favors a more varied and natural output.

Max token

This parameter specifies the maximum number of tokens that can be generated when completing a response. It is a limitation that can cause responses to be left incomplete if the limit is reached before finishing naturally.

An example of this would be:
"Espresso coffee is prepared with hot water at high pressure. The ground beans must..." [⚠️ ABRUPT CUT]

Key considerations

🔽 Low value → truncated or incomplete responses.
🔼 High value → unnecessary costs without real benefit.

Practical optimization:

✍️ Short responses: 50–150 tokens
📑 Detailed explanations: 300–800 tokens
📚 Long content: 1000+ tokens

🔮 What’s Next?

Prompt engineering provides a powerful toolkit to guide language models and helping them reason step by step, follow examples, or adapt their tone and style. These methods significantly improve accuracy and interaction. However, no matter how refined our prompts are, models remain limited by the data they were trained on and may still hallucinate under uncertainty.

This is where Retrieval-Augmented Generation (RAG) becomes essential, by linking prompts to external knowledge sources( databases, documents, or APIs).

In the next chapter, we will explore RAG patterns, from simple to advanced, and see how they expand prompting into enterprise-ready systems.

📖 Series Overview

You can find the entire series on my Profile:

✏️ GenAI Foundations – Chapter 1: Prompt Basics – From Theory to Practice
🧩 GenAI Foundations – Chapter 2: Prompt Engineering in Action – Unlocking Better AI Responses
📚 GenAI Foundations – Chapter 3: RAG Patterns – Building Smarter AI Systems
✅ GenAI Foundations – Chapter 4: Model Customization & Evaluation – Can We Trust the Outputs?
🗂️ GenAI Foundations – Chapter 5: AI Project Planning – The Generative AI Canvas

📚 References

Academy OpenAI. (2025, febrero 13). Advanced prompt engineering. https://academy.openai.com/home/videos/advanced-prompt-engineering-2025-02-13
Anthropic. (s.f.). Creating message batches. Anthropic Documentation. https://docs.anthropic.com/en/api/creating-message-batches
AWS. (s.f.). ¿Qué son los modelos fundacionales?. https://aws.amazon.com/es/what-is/foundation-models/
AWS. (s.f.). ¿Qué es Retrieval-Augmented Generation (RAG)?. https://aws.amazon.com/es/what-is/retrieval-augmented-generation/
Cloud Skills Boost. (s.f.). Introduction to generative AI. Google Cloud. https://www.cloudskillsboost.google/course_templates/536
Google Developers. (s.f.). Ingeniería de instrucciones para la IA generativa https://developers.google.com/machine-learning/resources/prompt-eng?hl=es-419
Google Developers. (s.f.). Información general: ¿Qué es un modelo generativo? https://developers.google.com/machine-learning/gan/generative?hl=es-419
IBM. (s.f.). What is LLM Temperature?. https://www.ibm.com/think/topics/llm-temperature
IBM. (s.f.). ¿Qué es el prompt engineering ? https://www.ibm.com/es-es/think/topics/prompt-engineering
IBM. (s.f.). AI hallucinations. https://www.ibm.com/es-es/think/topics/ai-hallucinations
Luke Salamone. (s.f.). What is temperature?. https://blog.lukesalamone.com/posts/what-is-temperature/
McKinsey & Company. (2024-04-02). What is generative AI?https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-generative-ai
New York Times. (2025-05-08). La IA es cada vez más potente, pero sus alucinaciones son cada vez peores https://www.nytimes.com/es/2025/05/08/espanol/negocios/ia-errores-alucionaciones-chatbot.html
Prompt Engineering. (2024-04-06). Complete Guide to Prompt Engineering with Temperature and Top-p https://promptengineering.org/prompt-engineering-with-temperature-and-top-p/
Prompting Guide. (s.f.). ReAct prompting. https://www.promptingguide.ai/techniques/react
Prompting Guide. (s.f.). Consistency prompting. https://www.promptingguide.ai/techniques/consistency
Learn Prompting. (2024-09-27). Self-Calibration Prompting https://learnprompting.org/docs/advanced/self_criticism/self_calibration
AI Prompt Theory. (2025-07-08). Temperature and Top p: Controlling Creativity and Predictability https://aiprompttheory.com/temperature-and-top-p-controlling-creativity-and-predictability/?utm_source=chatgpt.com
Vellum. (s.f.). How to use JSON Mode https://www.vellum.ai/llm-parameters/json-mode?utm_source=www.vellum.ai&utm_medium=referral
OpenAI. (2025-08). What are tokens and how to count them?. https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them
Milvus.(s.f.) What are benchmark datasets in machine learning, and where can I find them?. https://milvus.io/ai-quick-reference/what-are-benchmark-datasets-in-machine-learning-and-where-can-i-find-them