Introduction
As AI continues to evolve, mastering how we prompt and guide language models has become just as important as the models themselves. This chapter explores cutting-edge prompting strategies and vector-based text representations that significantly enhance the capabilities of modern AI systems.
From adjusting randomness with temperature and top-P sampling to guiding AI thought processes with techniques like Chain-of-Thought (CoT) and ReAct prompting, we unlock ways to improve both the creativity and reliability of AI-generated responses. You'll also discover how embeddings and cosine similarity allow us to quantify meaning and relevance between pieces of text laying the groundwork for powerful applications in search, recommendation, and question-answering systems.
Finally, we explore Retrieval-Augmented Generation (RAG) and ChromaDBfor combining traditional knowledge retrieval with generative AI, offering a practical approach to building systems that are both smart and informed.
Whether you're building chatbots, search engines, or decision-making tools, the techniques in this chapter will help you get the most out of your AI models making them not just responsive, but truly intelligent.
Prompting Strategies
1. Understanding AI Model Configuration
When working with AI models like Gemini 2.0, configuring the model parameters is crucial for controlling the diversity, randomness, and output length. Some of the key parameters include:
Temperature
- Controls randomness in token selection.
- Higher values (e.g., 0.8–1.0) produce more diverse and creative responses.
- Lower values (e.g.,0.1–0.3) make the model more deterministic and focused.
- Setting temperature to 0 forces greedy decoding (selecting the most probable token at each step).
Top-P (Nucleus Sampling)
- Defines a probability threshold for selecting tokens.
- Top-P = 1 considers all tokens.
- Top-P < 1 restricts token selection to the most probable ones.
Example Configuration in Python
model_config = types.GenerateContentConfig(
temperature=0.1,
top_p=1,
max_output_tokens=5, )
2. Zero-Shot Prompting
Zero-shot learning is when an AI model is given a prompt without prior examples and must generate a response based solely on its training knowledge.
Example:
Prompt:
Classify movie reviews as POSITIVE, NEUTRAL, or NEGATIVE.
Review: "The movie had stunning visuals but a weak storyline."
Sentiment:
AI Response:
NEUTRAL
Why Use Zero-Shot Learning?
- Requires no training data.
- Works well for basic classification and fact-based queries.
- May struggle with nuanced or domain-specific tasks.
3. Chain of Thought (CoT) Prompting
Chain of Thought (CoT) prompting enhances reasoning by making the AI model explicitly break down its thought process step by step.
Example Without CoT:
Prompt: What is 23 × 47?
AI Response: 1081
Example With CoT:
Prompt: Solve step by step: What is 23 × 47?
AI Response: First, break it down:
23 × 47 = (23 × 40) + (23 × 7)
= 920 + 161
= 1081
Why CoT Prompting
- Enhances logical reasoning
- Reduces AI hallucinations
- Makes AI outputs transparent and verifiable
4. ReAct Prompting (Reasoning + Acting)
ReAct (Reasoning + Acting) prompting is a method where the AI model:
- Thinks through the problem (reasoning).
- Performs an action (e.g., searching an external source).
Example Using Wikipedia Search in LangChain
from langchain.chat_models import ChatGoogleGenerativeAI
from langchain.tools import WikipediaQueryRun
from langchain.agents import initialize_agent, AgentType
llm = ChatGoogleGenerativeAI(model="gemini-pro")
wikipedia = WikipediaQueryRun()
agent = initialize_agent(
tools=[wikipedia],
llm=llm,
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=True
)
agent.run("Who discovered penicillin?")
Why ReAct Prompting
- Automates reasoning and fact-finding
- Improves factual accuracy by verifying sources
- Reduces incorrect assumptions by AI
5. Thinking Mode in Gemini Flash 2.0
The experimental Thinking Mode in Gemini Flash 2.0 is designed to simulate a model's internal reasoning process before generating a final response.
How It Works:
- The AI internally brainstorms ideas before finalizing an answer.
- The API only returns the final response, but you can view the thought process in AI Studio.
Example:
Prompt:Who discovered penicillin?
Thinking Mode's Internal Thought Process:
Penicillin is an antibiotic. The discovery happened in the early 20th century. The scientist who discovered it was Alexander Fleming in 1928.
Final AI Response:
Alexander Fleming discovered penicillin in 1928.
Why Thinking Mode
- Stronger reasoning capabilities without extra prompting
- Improves response accuracy
- Best for knowledge-based and analytical queries
Embeddings
1. Understanding Embeddings and Cosine Similarity
Embeddings convert text into numerical vectors, making it easier for AI models to compare text similarity. Each sentence is mapped into high-dimensional space.
Example Embedding Output:
ID Document Embedding
0 "AI is the future" [0.12, 0.34, ...]
1 "Farming is important" [0.56, 0.78, ...]
Cosine Similarity measures how similar two embeddings are:
Cosine Similarity = (A ⋅ B) / (||A|| * ||B||)
- 1.0 → Identical text
- 0.0 → Completely different text
- -1.0 → Opposite meaning Example in Python:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
a = np.array([0.12, 0.34, 0.56])
b = np.array([0.12, 0.33, 0.57])
similarity_score = cosine_similarity([a], [b])
print(similarity_score)
2. Retrieval-Augmented Generation (RAG)
RAG enhances AI models by fetching external documents to generate better responses. Steps include:
- Retrieve relevant documents from a knowledge base.
- Embed the retrieved text into a structured format.
- Generate a final answer by combining AI generation with retrieved content.
query = "Impact of climate change on agriculture"
prompt = f"You are an AI assistant. Answer the question using the retrieved text.\nQUESTION: {query}\n"
3. Implementing Embeddings in ChromaDB
ChromaDB allows storing and retrieving embeddings efficiently. Example code:
import chromadb
DB_NAME = "agriculture_db"
embed_fn = GeminiEmbeddingFunction()
embed_fn.document_mode = True
chroma_client = chromadb.Client()
db = chroma_client.get_or_create_collection(name=DB_NAME, embedding_function=embed_fn)
db.add(documents=documents, ids=[str(i) for i in range(len(documents))])
4. Visualizing Similarity with Heatmaps
A heatmap represents how similar different texts are. Example:
import seaborn as sns
import pandas as pd
# Create similarity matrix
df = pd.DataFrame([e.values for e in response.embeddings], index=truncated_texts)
similarity_matrix = df @ df.T
# Plot heatmap
sns.heatmap(similarity_matrix, vmin=0, vmax=1, cmap="Greens")
Conclusion
This chapter provided a deep dive into advanced AI prompting techniques, embeddings, and similarity scoring. By leveraging these tools, you can build more intelligent and reliable AI applications, ensuring responses are accurate, structured, and contextually relevant.
References
https://aistudio.google.com/app/prompts
https://docs.langchain.com/docs/components/agents
https://ai.google.dev/gemini-api/docs/embed
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html
https://haystack.deepset.ai/overview/intro
https://docs.trychroma.com
Top comments (0)