Ramya Perumal

Posted on Jun 8 • Edited on Jun 9

RAG - Prompt Engineering

#ai #beginners #rag #nlp

Prompt engineering is the process of designing and structuring prompts to get better results from an LLM.

In a RAG application, a prompt template typically contains:

User query
Retrieved documents from the vector database
Additional context or instructions

The quality of the prompt plays a major role in determining the quality of the response generated by the LLM.

There are several prompting techniques that can be used depending on the use case.

Zero-Shot Prompting

In zero-shot prompting, only the query or instruction is provided to the LLM without any examples.

The model generates a response based on its pre-trained knowledge and the given prompt.

Example

Prompt:
How do I make tea?

No examples are provided.

The LLM generates the answer directly.

One-Shot and Few-Shot Prompting

Providing examples helps the LLM understand the expected format and style of the response.

One-Shot Prompting

In one-shot prompting, a single example is provided along with the query.

Example

Prompt:

How to make coffee?

Step 1: Boil water
Step 2: Add coffee powder
Step 3: Mix well
Step 4: Serve

How to make tea?

The LLM will likely generate the tea-making instructions using the same format.

Few-Shot Prompting

In few-shot prompting, multiple examples are provided before the actual query.

The model learns the expected structure, style, and pattern from the examples and generates responses accordingly.

Advantage
Better formatting consistency
Improved accuracy
Better task understanding

Disadvantage
Higher token consumption
Increased cost and latency

System Prompting

System prompting is used to define rules, constraints, and behavior for the LLM.

The model is expected to operate within these boundaries.

Examples

You must return the output in JSON format.
Do not include floating-point values in the response.
Answer using only the information provided in the context.

System prompts are commonly used in production RAG applications to control model behavior.

Role Prompting

In role prompting, the LLM is instructed to behave as a specific role, profession, or expert.

Examples

Act as a Python developer.
Act as a cybersecurity expert.
Act as a technical interviewer.

Role prompting helps the model generate responses from a particular perspective and expertise level.

Contextual Prompting

Contextual prompting provides background information to help the LLM better understand the situation and generate a more relevant response.

Example

I have an exam tomorrow, and this is a difficult subject for me.
Please answer the following question in a simple and easy-to-understand manner.

The additional context helps the model tailor its response to the user's situation.

Chain of Thought Prompting

Chain of Thought (CoT) prompting is a technique where the model is instructed to analyze the input step by step before giving the final answer.

This helps the LLM break down complex problems into smaller logical steps, leading to better reasoning and more accurate results.

Self-Consistent Prompting

In this approach, the LLM is asked to:

Try solving the same problem using multiple reasoning paths
Generate multiple possible answers
Select the answer that appears most frequently or is the most consistent

This improves reliability by reducing randomness in reasoning.

Tree of Thoughts

Tree of Thoughts is an advanced version of self-consistent prompting.

Instead of following a single reasoning path, the LLM:

Explores multiple possible solution paths
Evaluates each path
Decides which path is most promising
Expands only the best or optimal paths further

This creates a tree-like structure of reasoning, where different branches represent different thought processes.

Tree of Thoughts is useful for complex problem-solving tasks that require exploration and decision-making.

Prompt Chaining

Prompt chaining is a technique where the output of one prompt is used as the input for another prompt.

In this approach:

A problem is broken into multiple stages
Each stage is handled by a separate prompt
The result of one prompt flows into the next

This creates a pipeline of prompts, allowing complex tasks to be solved step by step in a structured manner.

Prompt chaining is commonly used in workflows where tasks need decomposition and sequential processing.

Combining Prompting Techniques

For better performance, multiple prompting techniques can be combined.

Examples

System Prompting + User Prompting
System Prompting + Few-Shot Prompting
Role Prompting + Contextual Prompting
Role Prompting + Few-Shot Prompting
System Prompting + Role Prompting + Contextual Prompting
Chain of Thought + Prompt Chaining
Self-Consistent Prompting + Few-Shot Prompting
Tree of Thoughts + Role Prompting + System Prompting

Example

You are a senior Python developer.

Answer only using the provided context.

Provide the response in JSON format.

Example:
{
"language": "Python",
"difficulty": "Easy"
}

Question:
How do Python dictionaries work?

This prompt combines:

System Prompting
Role Prompting
One-Shot Prompting

Prompt Template in RAG

A typical RAG prompt template consists of:

System Instructions
Retrieved Context/Documents
User Query

Example

You are a helpful assistant.

Context:

Question:
What is vector chunking?

Answer:

The LLM uses the retrieved documents, instructions, and user query together to generate an accurate and human-readable response.

Key Takeaway

There is no single prompting technique that works best for every scenario.

The choice depends on:

Application requirements
Cost constraints
Token limits
Desired output format
Accuracy requirements

In real-world applications, combining multiple prompting techniques often produces the best results.

ReAct (Reason + Action)

ReAct (Reasoning + Action) is a proven methodology used to improve the performance of LLMs by combining reasoning with external tool usage.

In this approach, the model not only thinks about the problem but also decides when to take action by calling external tools or functions.

Why ReAct is Needed

If we ask a question like:

“What is the current temperature?”

A standard LLM cannot directly know real-time information such as current weather or live data.

However, it can:

Understand the intent of the question
Identify that external information is required
Decide to use an available tool (e.g., weather API)
Use the tool output to generate the final response

How ReAct Works

ReAct follows a loop of:

1. Reasoning

The LLM analyzes the question and determines what is needed.

What is the user asking?
Do I already know the answer?
Do I need external data?

2. Action

If external data is required, the model selects an appropriate tool or function.

Examples of tools:

Weather API
Calculator
Search engine
Database query

3. Observation

The tool returns results, and the LLM observes the output.

4. Final Answer Generation

The LLM combines:

Reasoning
Tool output
Context

and generates the final human-readable response.

Example

User Query:

“What is the current temperature in Chennai?”

Step 1: Reasoning

The model understands that this requires real-time data.

Step 2: Action

It calls a weather API tool.

Step 3: Observation

Tool returns:
“32°C, partly cloudy”

Step 4: Final Answer

“The current temperature in Chennai is 32°C with partly cloudy conditions.”

Key Idea of ReAct

ReAct allows LLMs to:

Think (Reason)
Act (Use tools)
Improve accuracy using real-world data

Benefits of ReAct

Reduces hallucination
Enables real-time information access
Improves reasoning accuracy
Makes LLMs more agent-like

Where to use
Research Activities
Troubleshooting in Kubernetes
Support in Devops

DEV Community

RAG - Prompt Engineering

Zero-Shot Prompting

Example

One-Shot Prompting

Example

Few-Shot Prompting

System Prompting

Examples

Role Prompting

Examples

Contextual Prompting

Example

Chain of Thought Prompting

Self-Consistent Prompting

Tree of Thoughts

Prompt Chaining

Combining Prompting Techniques

Examples

Example

Prompt Template in RAG

Example

Key Takeaway

ReAct (Reason + Action)

Why ReAct is Needed

How ReAct Works

Example

Top comments (0)