DEV Community

Ramya Perumal
Ramya Perumal

Posted on • Edited on

RAG - Prompt Engineering

Prompt engineering is the process of designing and structuring prompts to get better results from an LLM.

In a RAG application, a prompt template typically contains:

  • User query
  • Retrieved documents from the vector database
  • Additional context or instructions

The quality of the prompt plays a major role in determining the quality of the response generated by the LLM.

There are several prompting techniques that can be used depending on the use case.

Zero-Shot Prompting

In zero-shot prompting, only the query or instruction is provided to the LLM without any examples.

The model generates a response based on its pre-trained knowledge and the given prompt.

Example

Prompt:
How do I make tea?

No examples are provided.

The LLM generates the answer directly.

One-Shot and Few-Shot Prompting

Providing examples helps the LLM understand the expected format and style of the response.

One-Shot Prompting

In one-shot prompting, a single example is provided along with the query.

Example

Prompt:

How to make coffee?

Step 1: Boil water
Step 2: Add coffee powder
Step 3: Mix well
Step 4: Serve

How to make tea?

The LLM will likely generate the tea-making instructions using the same format.

Few-Shot Prompting

In few-shot prompting, multiple examples are provided before the actual query.

The model learns the expected structure, style, and pattern from the examples and generates responses accordingly.

Advantage
Better formatting consistency
Improved accuracy
Better task understanding

Disadvantage
Higher token consumption
Increased cost and latency

System Prompting

System prompting is used to define rules, constraints, and behavior for the LLM.

The model is expected to operate within these boundaries.

Examples

You must return the output in JSON format.
Do not include floating-point values in the response.
Answer using only the information provided in the context.

System prompts are commonly used in production RAG applications to control model behavior.

Role Prompting

In role prompting, the LLM is instructed to behave as a specific role, profession, or expert.

Examples

Act as a Python developer.
Act as a cybersecurity expert.
Act as a technical interviewer.

Role prompting helps the model generate responses from a particular perspective and expertise level.

Contextual Prompting

Contextual prompting provides background information to help the LLM better understand the situation and generate a more relevant response.

Example

I have an exam tomorrow, and this is a difficult subject for me.
Please answer the following question in a simple and easy-to-understand manner.

The additional context helps the model tailor its response to the user's situation.

Chain of Thought Prompting

Chain of Thought (CoT) prompting is a technique where the model is instructed to analyze the input step by step before giving the final answer.

This helps the LLM break down complex problems into smaller logical steps, leading to better reasoning and more accurate results.

Self-Consistent Prompting

In this approach, the LLM is asked to:

  • Try solving the same problem using multiple reasoning paths
  • Generate multiple possible answers
  • Select the answer that appears most frequently or is the most consistent

This improves reliability by reducing randomness in reasoning.

Tree of Thoughts

Tree of Thoughts is an advanced version of self-consistent prompting.

Instead of following a single reasoning path, the LLM:

  • Explores multiple possible solution paths
  • Evaluates each path
  • Decides which path is most promising
  • Expands only the best or optimal paths further

This creates a tree-like structure of reasoning, where different branches represent different thought processes.

Tree of Thoughts is useful for complex problem-solving tasks that require exploration and decision-making.

Prompt Chaining

Prompt chaining is a technique where the output of one prompt is used as the input for another prompt.

In this approach:

  • A problem is broken into multiple stages
  • Each stage is handled by a separate prompt
  • The result of one prompt flows into the next

This creates a pipeline of prompts, allowing complex tasks to be solved step by step in a structured manner.

Prompt chaining is commonly used in workflows where tasks need decomposition and sequential processing.

Combining Prompting Techniques

For better performance, multiple prompting techniques can be combined.

Examples

  • System Prompting + User Prompting
  • System Prompting + Few-Shot Prompting
  • Role Prompting + Contextual Prompting
  • Role Prompting + Few-Shot Prompting
  • System Prompting + Role Prompting + Contextual Prompting
  • Chain of Thought + Prompt Chaining
  • Self-Consistent Prompting + Few-Shot Prompting
  • Tree of Thoughts + Role Prompting + System Prompting

Example

You are a senior Python developer.

Answer only using the provided context.

Provide the response in JSON format.

Example:
{
"language": "Python",
"difficulty": "Easy"
}

Question:
How do Python dictionaries work?

This prompt combines:

  • System Prompting
  • Role Prompting
  • One-Shot Prompting

Prompt Template in RAG

A typical RAG prompt template consists of:

  • System Instructions
  • Retrieved Context/Documents
  • User Query

Example

You are a helpful assistant.

Context:

Question:
What is vector chunking?

Answer:

The LLM uses the retrieved documents, instructions, and user query together to generate an accurate and human-readable response.

Key Takeaway

There is no single prompting technique that works best for every scenario.

The choice depends on:

  • Application requirements
  • Cost constraints
  • Token limits
  • Desired output format
  • Accuracy requirements

In real-world applications, combining multiple prompting techniques often produces the best results.

ReAct (Reason + Action)

ReAct (Reasoning + Action) is a proven methodology used to improve the performance of LLMs by combining reasoning with external tool usage.

In this approach, the model not only thinks about the problem but also decides when to take action by calling external tools or functions.

Why ReAct is Needed

If we ask a question like:

“What is the current temperature?”

A standard LLM cannot directly know real-time information such as current weather or live data.

However, it can:

  • Understand the intent of the question
  • Identify that external information is required
  • Decide to use an available tool (e.g., weather API)
  • Use the tool output to generate the final response

How ReAct Works

ReAct follows a loop of:

1. Reasoning

The LLM analyzes the question and determines what is needed.

  • What is the user asking?
  • Do I already know the answer?
  • Do I need external data?

2. Action

If external data is required, the model selects an appropriate tool or function.

Examples of tools:

  • Weather API
  • Calculator
  • Search engine
  • Database query

3. Observation

The tool returns results, and the LLM observes the output.

4. Final Answer Generation

The LLM combines:

  • Reasoning
  • Tool output
  • Context

and generates the final human-readable response.

Example

User Query:

“What is the current temperature in Chennai?”

Step 1: Reasoning

The model understands that this requires real-time data.

Step 2: Action

It calls a weather API tool.

Step 3: Observation

Tool returns:
“32°C, partly cloudy”

Step 4: Final Answer

“The current temperature in Chennai is 32°C with partly cloudy conditions.”

Key Idea of ReAct

ReAct allows LLMs to:

  • Think (Reason)
  • Act (Use tools)
  • Improve accuracy using real-world data

Benefits of ReAct

  • Reduces hallucination
  • Enables real-time information access
  • Improves reasoning accuracy
  • Makes LLMs more agent-like

Where to use
Research Activities
Troubleshooting in Kubernetes
Support in Devops

Top comments (0)