π Tech Acronyms Reference
Quick reference for acronyms used in this article:
- API - Application Programming Interface
- CoT - Chain-of-Thought
- JSON - JavaScript Object Notation
- LLM - Large Language Model
- NLP - Natural Language Processing
- ReAct - Reasoning and Acting
- ROI - Return on Investment
- XML - Extensible Markup Language
π― Introduction: The Art and Science of Prompting
You have the same Large Language Model (LLM). Same parameters. Same context window.
But your results vary wildly:
- One prompt: Perfect, structured output
- Another prompt: Vague, incomplete, wrong format
The difference? Prompt engineering.
Prompt engineering is how you communicate with LLMs to get consistent, high-quality results. It's not magic. It's a set of patterns that work reliably across models and use cases.
Real-Life Analogy: Giving Directions
Bad directions:
"Go that way, turn somewhere, you'll see the place."
Good directions:
"Walk north for 2 blocks. Turn right at the traffic light. The coffee shop is on the left, blue awning, next to a pharmacy."
Same destination. Different clarity. Different success rate.
Prompt engineering is about precision, structure, and clarity.
π‘ Data Engineer's ROI Lens
For this article, we're focusing on:
- Which prompting patterns exist? (Zero-shot, few-shot, chain-of-thought)
- When should I use each? (Cost, quality, consistency trade-offs)
- How do I build production-ready prompts? (Templates, system messages, validation)
These patterns directly impact reliability and cost at scale.
π² Part 1: Shot-Based Prompting Strategies
Zero-Shot: Just Ask
The Pattern: Give the task with no examples.
prompt = """
Classify the sentiment of this review as positive, negative, or neutral:
Review: "The product arrived quickly but the quality was disappointing."
Sentiment:
"""
Real-Life Analogy: The Expert Hire
You hire an expert programmer and say: "Build me a REST API for user authentication."
They know what REST is, what authentication means, what best practices are. No examples needed.
Zero-shot assumes the LLM already knows the task pattern.
When to Use:
- β Well-known tasks (translation, summarization, sentiment analysis)
- β Cost-sensitive applications (fewer tokens)
- β Simple, unambiguous instructions
When NOT to Use:
- β Custom formats or domain-specific outputs
- β Tasks requiring specific structure
- β Inconsistent results across queries
Code Example: Zero-Shot Classification
from litellm import completion
def zero_shot_classify(text: str, model: str = "gpt-4") -> str:
"""Zero-shot sentiment classification"""
prompt = f"""
Classify the sentiment of this review as positive, negative, or neutral.
Respond with only one word.
Review: "{text}"
Sentiment:"""
response = completion(
model=model,
messages=[{"role": "user", "content": prompt}],
temperature=0.0 # Deterministic
)
return response.choices[0].message.content.strip()
# Test
reviews = [
"Amazing product! Worth every penny.",
"Terrible experience. Would not recommend.",
"It's okay, nothing special."
]
for review in reviews:
sentiment = zero_shot_classify(review)
print(f"Review: {review}")
print(f"Sentiment: {sentiment}\n")
Output:
Review: Amazing product! Worth every penny.
Sentiment: positive
Review: Terrible experience. Would not recommend.
Sentiment: negative
Review: It's okay, nothing special.
Sentiment: neutral
Cost Analysis:
- Input tokens: ~50 per request
- Zero-shot is the cheapest prompting strategy
One-Shot: Show One Example
The Pattern: Give one example before the actual task.
prompt = """
Classify the sentiment of reviews.
Example:
Review: "Great service and fast delivery!"
Sentiment: positive
Now classify this:
Review: "The product arrived quickly but the quality was disappointing."
Sentiment:
"""
Real-Life Analogy: The Demo
Before asking someone to format a document, you show them one completed page:
"See this? Title in bold, 14pt. Body text in 11pt, left-aligned. Do the same for these 10 pages."
One example clarifies expectations.
When to Use:
- β Custom output formats
- β Domain-specific tasks with unusual patterns
- β Zero-shot gives inconsistent results
Code Example: One-Shot Entity Extraction
def one_shot_extract_entities(text: str) -> dict:
"""Extract entities with one example"""
prompt = f"""
Extract person names, companies, and locations from text.
Example:
Text: "John Smith from Acme Corp visited Tokyo last week."
Output: {{"person": ["John Smith"], "company": ["Acme Corp"], "location": ["Tokyo"]}}
Now extract from this:
Text: "{text}"
Output:"""
response = completion(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.0
)
import json
return json.loads(response.choices[0].message.content)
# Test
text = "Sarah Johnson and Michael Chen from DataTech Inc. met in San Francisco."
entities = one_shot_extract_entities(text)
print(json.dumps(entities, indent=2))
Output:
{
"person": ["Sarah Johnson", "Michael Chen"],
"company": ["DataTech Inc."],
"location": ["San Francisco"]
}
Cost Analysis:
- Input tokens: ~100-150 per request (includes example)
- ~2-3x more expensive than zero-shot
- Worth it for consistent formatting
Few-Shot: Multiple Examples
The Pattern: Give 3-5 examples to establish a clear pattern.
Real-Life Analogy: The Training Session
You're teaching customer support reps how to respond to complaints:
"Here's how we handled 5 different complaint types. Notice the pattern:
- Acknowledge the issue
- Apologize sincerely
- Offer specific solution
- Provide timeline
Now you try with these new complaints."
Multiple examples teach the pattern clearly.
When to Use:
- β Complex tasks with subtle patterns
- β Custom formats that need consistency
- β Domain-specific classification with edge cases
- β Zero-shot and one-shot give unreliable results
Code Example: Few-Shot Custom Classification
def few_shot_classify_support_ticket(ticket: str) -> dict:
"""Classify support tickets into categories with priority"""
prompt = f"""
Classify support tickets into category and priority.
Example 1:
Ticket: "Cannot log in, tried resetting password 3 times"
Output: {{"category": "authentication", "priority": "high", "reason": "blocks user access"}}
Example 2:
Ticket: "The button color doesn't match our brand guidelines"
Output: {{"category": "ui", "priority": "low", "reason": "cosmetic issue"}}
Example 3:
Ticket: "Data export has been running for 6 hours, still not complete"
Output: {{"category": "performance", "priority": "high", "reason": "impacts operations"}}
Example 4:
Ticket: "Would like to add a new report type to the dashboard"
Output: {{"category": "feature_request", "priority": "medium", "reason": "enhancement, not urgent"}}
Now classify this:
Ticket: "{ticket}"
Output:"""
response = completion(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.0
)
import json
return json.loads(response.choices[0].message.content)
# Test
tickets = [
"Payment processing failed, customers can't complete checkout",
"Would be nice to have dark mode",
"API returning 500 errors for the past hour"
]
for ticket in tickets:
result = few_shot_classify_support_ticket(ticket)
print(f"Ticket: {ticket}")
print(f"Category: {result['category']}")
print(f"Priority: {result['priority']}")
print(f"Reason: {result['reason']}\n")
Output:
Ticket: Payment processing failed, customers can't complete checkout
Category: payment
Priority: high
Reason: prevents revenue generation
Ticket: Would be nice to have dark mode
Category: feature_request
Priority: low
Reason: nice-to-have enhancement
Ticket: API returning 500 errors for the past hour
Category: technical
Priority: high
Reason: service disruption
Cost Analysis:
- Input tokens: ~300-500 per request (includes 4 examples)
- ~6-10x more expensive than zero-shot
- Worth it for critical classification tasks
Pattern Comparison: When to Use Each
| Pattern | Examples | Cost | Accuracy | Use When |
|---|---|---|---|---|
| Zero-Shot | 0 | $ | Good | Standard tasks, model knows pattern |
| One-Shot | 1 | $$ | Better | Custom formats, clarify expectations |
| Few-Shot | 3-5 | $$$$ | Best | Complex patterns, critical accuracy |
Real-World ROI Example:
Customer support ticket classifier (1M tickets/month):
| Approach | Cost/Ticket | Monthly Cost | Accuracy | Wrong Classifications/Month |
|---|---|---|---|---|
| Zero-Shot | $0.002 | $2,000 | 85% | 150,000 |
| Few-Shot | $0.010 | $10,000 | 96% | 40,000 |
Few-shot costs 5x more but reduces errors by 73%.
For a team where each misclassification costs 10 minutes of manual review ($5 in labor):
- Zero-shot: $2,000 (API) + $750,000 (labor fixing errors) = $752,000
- Few-shot: $10,000 (API) + $200,000 (labor fixing errors) = $210,000
Few-shot saves $542,000/year despite being "more expensive."
π§ Part 2: Chain-of-Thought (CoT) Reasoning
The Problem: Complex Multi-Step Reasoning
LLMs can fail on problems requiring multiple reasoning steps:
prompt = "If a train leaves Chicago at 10am going 60mph, and another leaves New York at 11am going 80mph, and they're 800 miles apart, when do they meet?"
# Often gets wrong answer without showing work
Real-Life Analogy: Show Your Work
Remember math class? Teachers always said "show your work."
Not because they didn't trust your answer, but because breaking down steps catches errors and makes the reasoning auditable.
Chain-of-Thought does the same for LLMs.
Basic Chain-of-Thought Pattern
The Pattern: Ask the LLM to explain its reasoning step-by-step.
def chain_of_thought_solve(problem: str) -> dict:
"""Solve problems with explicit reasoning steps"""
prompt = f"""
Solve this problem step by step. Show your reasoning.
Problem: {problem}
Let's think through this step by step:
1."""
response = completion(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.0
)
return {"reasoning": response.choices[0].message.content}
# Test
problem = """
A company has 150 employees. 60% work remotely. Of the remote workers,
40% work from home, while the rest work from co-working spaces.
How many employees work from co-working spaces?
"""
result = chain_of_thought_solve(problem)
print(result["reasoning"])
Output:
Let's think through this step by step:
1. Total employees: 150
2. Remote workers: 150 Γ 0.60 = 90 employees
3. Remote workers from home: 90 Γ 0.40 = 36 employees
4. Remote workers from co-working spaces: 90 - 36 = 54 employees
Answer: 54 employees work from co-working spaces.
Zero-Shot Chain-of-Thought (Magic Phrase)
Research showed that simply adding "Let's think step by step" dramatically improves reasoning:
def zero_shot_cot(problem: str) -> str:
"""Zero-shot Chain-of-Thought with magic phrase"""
prompt = f"""
{problem}
Let's think step by step.
"""
response = completion(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.0
)
return response.choices[0].message.content
# Test
problem = "If 5 machines make 5 widgets in 5 minutes, how long does it take 100 machines to make 100 widgets?"
answer = zero_shot_cot(problem)
print(answer)
Output:
Let's think step by step:
1. First, let's understand what we know:
- 5 machines make 5 widgets in 5 minutes
2. This means each machine makes 1 widget in 5 minutes
- Rate per machine = 1 widget / 5 minutes
3. With 100 machines:
- Each of the 100 machines makes 1 widget in 5 minutes
- All 100 machines working simultaneously will make 100 widgets
4. Therefore: 100 machines make 100 widgets in 5 minutes.
Answer: 5 minutes
The phrase "Let's think step by step" triggers systematic reasoning.
Few-Shot Chain-of-Thought
Combine few-shot examples with Chain-of-Thought (CoT) reasoning:
def few_shot_cot_math(problem: str) -> str:
"""Few-shot CoT for math word problems"""
prompt = f"""
Solve math word problems by thinking step by step.
Example 1:
Problem: A store had 25 apples. They sold 8 in the morning and 7 in the afternoon. How many apples are left?
Reasoning:
- Started with: 25 apples
- Sold in morning: 8 apples
- Sold in afternoon: 7 apples
- Total sold: 8 + 7 = 15 apples
- Remaining: 25 - 15 = 10 apples
Answer: 10 apples
Example 2:
Problem: A recipe needs 3 eggs per batch. If you want to make 4 batches, how many eggs do you need?
Reasoning:
- Eggs per batch: 3
- Number of batches: 4
- Total eggs needed: 3 Γ 4 = 12 eggs
Answer: 12 eggs
Now solve this:
Problem: {problem}
Reasoning:"""
response = completion(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.0
)
return response.choices[0].message.content
# Test
problem = "A parking lot has 5 rows with 12 spaces each. If 47 cars are parked, how many empty spaces are there?"
result = few_shot_cot_math(problem)
print(result)
Output:
Reasoning:
- Rows: 5
- Spaces per row: 12
- Total spaces: 5 Γ 12 = 60 spaces
- Cars parked: 47
- Empty spaces: 60 - 47 = 13 spaces
Answer: 13 empty spaces
π Part 3: ReAct (Reasoning + Acting)
The Pattern: Interleave Thinking and Actions
ReAct combines:
- Thought: What should I do next?
- Action: Execute a tool/function
- Observation: What was the result?
- Repeat until problem is solved
Real-Life Analogy: The Detective
A detective solving a case doesn't just thinkβthey alternate between reasoning and action:
Thought: "The suspect claims he was at the movies. I should verify this."
Action: Check movie theater records
Observation: "Receipt shows ticket purchased at 8:47 PM"
Thought: "But the crime occurred at 8:30 PM, so he has an alibi. Let me check another angle..."
Action: Check traffic cameras
Observation: "Video shows his car near the crime scene at 8:25 PM"
Thought: "He couldn't have been in two places. Let me verify the receipt timestamp..."
This interleaving of reasoning and action is ReAct.
ReAct Implementation
from typing import List, Dict
def react_agent(
query: str,
tools: Dict[str, callable],
max_iterations: int = 5
) -> str:
"""
ReAct agent that alternates between reasoning and action.
Args:
query: User's question
tools: Dictionary of available tools/functions
max_iterations: Maximum reasoning loops
"""
# Build tool descriptions
tool_descriptions = "\n".join([
f"- {name}: {func.__doc__}"
for name, func in tools.items()
])
conversation_history = []
iteration = 0
system_prompt = f"""
You are a helpful assistant that solves problems by alternating between reasoning and taking actions.
Available tools:
{tool_descriptions}
For each step, use this format:
Thought: [Your reasoning about what to do next]
Action: [tool_name: arguments]
Observation: [I will provide the result]
When you have the final answer:
Thought: I now have all the information needed
Answer: [Your final answer]
"""
conversation_history.append({
"role": "system",
"content": system_prompt
})
conversation_history.append({
"role": "user",
"content": f"Question: {query}"
})
while iteration < max_iterations:
iteration += 1
# Get LLM response
response = completion(
model="gpt-4",
messages=conversation_history,
temperature=0.0
)
agent_response = response.choices[0].message.content
print(f"\n{'='*60}")
print(f"Iteration {iteration}")
print(f"{'='*60}")
print(agent_response)
conversation_history.append({
"role": "assistant",
"content": agent_response
})
# Check if agent is done
if "Answer:" in agent_response:
# Extract final answer
answer = agent_response.split("Answer:")[1].strip()
return answer
# Parse action
if "Action:" in agent_response:
action_line = [line for line in agent_response.split("\n") if "Action:" in line][0]
action_part = action_line.split("Action:")[1].strip()
# Parse tool name and arguments
if ":" in action_part:
tool_name, args_str = action_part.split(":", 1)
tool_name = tool_name.strip()
args_str = args_str.strip()
# Execute tool
if tool_name in tools:
try:
result = tools[tool_name](args_str)
observation = f"Observation: {result}"
except Exception as e:
observation = f"Observation: Error - {str(e)}"
else:
observation = f"Observation: Tool '{tool_name}' not found"
print(f"\n{observation}")
conversation_history.append({
"role": "user",
"content": observation
})
return "Maximum iterations reached without finding answer"
# Define tools
def search_database(query: str) -> str:
"""Search the customer database for information"""
# Simulated database
database = {
"order ORD-123": "Status: Shipped, Tracking: TRK-789, ETA: 2024-03-15",
"customer john@email.com": "Account status: Active, Orders: 5, Last order: 2024-03-01",
"product SKU-456": "Price: $29.99, Stock: 150, Rating: 4.5/5"
}
for key, value in database.items():
if query.lower() in key.lower():
return value
return "No results found"
def calculate(expression: str) -> str:
"""Perform mathematical calculations"""
try:
result = eval(expression)
return str(result)
except Exception as e:
return f"Calculation error: {str(e)}"
def get_current_date() -> str:
"""Get the current date"""
from datetime import datetime
return datetime.now().strftime("%Y-%m-%d")
# Run ReAct agent
tools = {
"search_database": search_database,
"calculate": calculate,
"get_current_date": get_current_date
}
query = "What's the status of order ORD-123, and how many days until it arrives?"
answer = react_agent(query, tools)
print(f"\n{'='*60}")
print(f"FINAL ANSWER: {answer}")
print(f"{'='*60}")
Output:
============================================================
Iteration 1
============================================================
Thought: I need to find the status of order ORD-123 first.
Action: search_database: order ORD-123
Observation: Status: Shipped, Tracking: TRK-789, ETA: 2024-03-15
============================================================
Iteration 2
============================================================
Thought: I have the order status and ETA. Now I need to find out how many days until March 15, 2024. First, I need to get the current date.
Action: get_current_date:
Observation: 2024-03-10
============================================================
Iteration 3
============================================================
Thought: Current date is March 10, 2024, and the ETA is March 15, 2024. I need to calculate the difference.
Action: calculate: 15 - 10
Observation: 5
============================================================
Iteration 4
============================================================
Thought: I now have all the information needed.
Answer: Order ORD-123 has been shipped with tracking number TRK-789. It will arrive in 5 days, with an estimated delivery date of March 15, 2024.
============================================================
FINAL ANSWER: Order ORD-123 has been shipped with tracking number TRK-789. It will arrive in 5 days, with an estimated delivery date of March 15, 2024.
============================================================
ReAct is powerful because it combines:
- β Explicit reasoning (auditable decisions)
- β Tool use (access to external data/functions)
- β Self-correction (can try different approaches)
π Part 4: Production Prompt Templates
System Messages: Setting the Context
System messages define the LLM's role, behavior, and constraints:
def create_support_agent(model: str = "gpt-4"):
"""Customer support agent with system message"""
system_message = """
You are a helpful customer support agent for TechShop, an e-commerce company.
Guidelines:
- Always be polite and professional
- Use the customer's name when available
- For order issues, ask for order ID
- For technical problems, gather: device, OS, browser
- Escalate to human agent if: payment issues, angry customer, legal concerns
- Never make promises about refunds without checking eligibility
- Keep responses concise (under 100 words unless details needed)
Tone: Friendly but professional
"""
def respond(user_message: str) -> str:
response = completion(
model=model,
messages=[
{"role": "system", "content": system_message},
{"role": "user", "content": user_message}
],
temperature=0.7
)
return response.choices[0].message.content
return respond
# Test
agent = create_support_agent()
messages = [
"I'm having trouble logging in",
"My order still hasn't arrived and it's been 3 weeks!",
"How do I reset my password?"
]
for msg in messages:
print(f"User: {msg}")
print(f"Agent: {agent(msg)}\n")
Structured Output Templates
Use delimiters and formatting for consistent parsing:
def extract_structured_data(text: str) -> dict:
"""Extract data with XML-style delimiters"""
prompt = f"""
Extract information from this text and format as XML:
Text: {text}
Output format:
<person>name</person>
<company>company name</company>
<role>job title</role>
<location>city, country</location>
<email>email address</email>
If any field is not found, use <field>UNKNOWN</field>
Output:"""
response = completion(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.0
)
import re
output = response.choices[0].message.content
# Parse XML-like output
def extract_field(field_name: str) -> str:
pattern = f"<{field_name}>(.*?)</{field_name}>"
match = re.search(pattern, output)
return match.group(1) if match else "UNKNOWN"
return {
"person": extract_field("person"),
"company": extract_field("company"),
"role": extract_field("role"),
"location": extract_field("location"),
"email": extract_field("email")
}
# Test
text = "Sarah Johnson is the VP of Engineering at DataCorp in Austin, Texas. Contact her at sjohnson@datacorp.com"
result = extract_structured_data(text)
print(json.dumps(result, indent=2))
Output:
{
"person": "Sarah Johnson",
"company": "DataCorp",
"role": "VP of Engineering",
"location": "Austin, Texas",
"email": "sjohnson@datacorp.com"
}
Template Variables and Reusability
class PromptTemplate:
"""Reusable prompt template with variables"""
def __init__(self, template: str):
self.template = template
def format(self, **kwargs) -> str:
"""Replace {variables} with actual values"""
return self.template.format(**kwargs)
# Define templates
SUMMARIZATION_TEMPLATE = PromptTemplate("""
Summarize the following {document_type} in {num_sentences} sentences.
Focus on {focus_area}.
{document_type}:
{content}
Summary:
""")
TRANSLATION_TEMPLATE = PromptTemplate("""
Translate the following text from {source_lang} to {target_lang}.
Maintain {tone} tone.
Text: {text}
Translation:
""")
CODE_REVIEW_TEMPLATE = PromptTemplate("""
Review this {language} code for:
- Bugs and errors
- Performance issues
- Security vulnerabilities
- Code style and best practices
Code:
{language}
{code}
Review:
""")
# Use templates
prompt1 = SUMMARIZATION_TEMPLATE.format(
document_type="research paper",
num_sentences=3,
focus_area="key findings and methodology",
content="[long paper content here]"
)
prompt2 = CODE_REVIEW_TEMPLATE.format(
language="python",
code="def process_data(data):\n return [x*2 for x in data]"
)
print("Prompt 1:", prompt1[:100], "...")
print("\nPrompt 2:", prompt2[:100], "...")
π― Conclusion: Prompt Engineering as a Production Skill
Prompt engineering isn't about "prompt hacking" or finding magic words. It's about systematic patterns that produce consistent, high-quality results at scale.
The Business Impact:
π° Cost:
- Few-shot costs 5-10x more than zero-shot
- But reduces downstream errors by 70%+
- ROI is in labor saved, not tokens saved
π Quality:
- Chain-of-Thought improves reasoning accuracy by 30-50%
- ReAct enables complex multi-step workflows
- Structured templates ensure parseability
β‘ Performance:
- System messages set behavior once (not per-request)
- Templates enable reusability and consistency
- Few-shot enables reliable automation of complex tasks
Key Takeaways for Data Engineers
On Shot-Based Prompting:
- Zero-shot: Cheapest, use for standard tasks
- Few-shot: More expensive, but worth it for critical accuracy
- Always test: what works for GPT-4 may not work for Claude
- Action: Start zero-shot, upgrade to few-shot only when accuracy demands it
- ROI Impact: $542K/year savings example (few-shot vs fixing zero-shot errors)
On Chain-of-Thought:
- "Let's think step by step" dramatically improves reasoning
- Essential for math, logic, multi-step problems
- Makes reasoning auditable (see the work, not just the answer)
- Action: Use CoT for any task requiring >2 reasoning steps
- ROI Impact: Prevents costly logic errors in production
On ReAct:
- Interleaves reasoning and tool use
- Enables complex agentic workflows
- Self-corrects when actions fail
- Action: Use for tasks requiring multiple tool calls and decision-making
- ROI Impact: Automates workflows previously requiring human orchestration
On Production Patterns:
- System messages define consistent behavior
- Templates enable reusability
- Structured outputs (XML, JSON) simplify parsing
- Action: Build a prompt template library for your use cases
- ROI Impact: 10x faster prompt iteration, consistent results
The Prompt Engineering ROI Pattern
Every decision follows this pattern:
- Start simple β Zero-shot first
- Add examples when needed β Few-shot for consistency
- Add reasoning when complex β CoT for multi-step tasks
- Add tools when needed β ReAct for actions
- Templatize for production β System messages + templates
Real-World Example:
Legal document analysis company:
Before prompt engineering:
- Generic prompts: "Summarize this contract"
- 60% accuracy on key clause extraction
- Manual review of all outputs (10 min/document)
- 1000 docs/month = 167 hours/month labor
After prompt engineering:
- Few-shot with examples of good extractions
- CoT reasoning for clause interpretation
- Structured XML output templates
- 94% accuracy on key clause extraction
- Manual review only for low-confidence (15% of docs)
- 1000 docs/month = 25 hours/month labor
Result: 85% reduction in manual review time.
This is why prompt engineering matters. Not for clever tricksβbut for systematic, reliable, production-quality results.
Start with zero-shot. Upgrade strategically. Template everything. Test on YOUR data.
Top comments (0)