This is a submission for the Agentic Postgres Challenge with Tiger Data
What I Built
I recently had a chat with a friend who works in an e-commerce company, and he was complaining that his company spent a fortune on an AI Customer Service application, only to see an increase in user complaints.
I asked him what was going on, and he showed me a real conversation:
User: "I want to return this laptop. Also, I wanted to ask, how does the warranty work if I exchange it for a new one?"
AI Customer Service: "Okay, I can process the return for you. Please provide your order number."
User: "??? What about my warranty question?"
AI Customer Service: "Sorry, how can I help you?"
Here is where the problem with router-based Conversational AI is, which routes messages to agents who own specific topics and tasks.
They inherently fail for natural, free-form conversation -- not due to implementation details you can fix, but due to fundamental architectural constraints.
In this post, we'll build a Conversational AI on Parlant to handle free-form dialogues.
Understanding Parlant Approach
Parlant is an AI alignment engine designed specially for natural customer-facing conversations.
To handle free-form dialogues, Parlant introduces the Dynamic Guideline Matching mechanism that instructs your systems to evaluate all guidelines and load only the relevant ones for each conversation turn.
# Parlant approach - no routing, just conditional guidelines
await agent.create_guideline(
condition="User asks for the refund and order amount > $500",
action="First use order status tool, confirm injectim eligibility, then provide detailed refund process",
tools=[check_order_status, calculate_refund_amount]
)
# More guidelines creation
....
Notice what's different here: there's no routing decision.
Parlant's guideline matcher evaluates all guidelines and determines which ones are contextually relevant. This is where Parlant's architecture does most of its work.
Timescale Postgres - Production-Grade Vector Database
When we delve deeper into Parlant's Dynamic Guideline Matching Mechanism, we uncover a core technical challenge:
How do we efficiently retrieve the 3-5 most relevant guidelines from potentially hundreds or even thousands of available entries, all while maintaining millisecond-level latency?
This is precisely the scenario where a Vector Database comes into play.
The Guideline Matching process in Parlant is essentially a Semantic Retrieval problem. The system converts the condition field of every guideline into a Vector Embedding. When a user input arrives, the process unfolds as follows:
- Step 1: Encode the user query and the conversation history into a Query Vector.
- Step 2: Execute the Similarity Search on the Vector Database.
- Step 3: Return the Top-K (e.g., 3-5) most relevant guidelines.
- Step 4: Inject these retrieved guidelines into the LLM's context window.
This requires the Vector Database to possess three core capabilities:
- High-performance
Approximate Nearest Neighbor (ANN)Search. - Flexible metadata filtering.
- Real-time vector updates.
Timescale Postgres provides production-grade support across all three of these dimensions.
Demo
This demo primarily showcases the core capability: using intelligent Guidelines (or Directives) to allow the Large Language Model (LLM) to answer general questions directly, while automatically triggering a Knowledge Base Retrieval for specialized or complex questions, and providing a structured, traceable answer.
If you'd like to run the demo locally, please download the repo, and follow the instructions in README.md to set up your local development environment.
How I Used Agentic Postgres
To start, you need to follow the provided instructions to create a Tiger-Cloud account. For more information, refer to this guide.
Database setup
Then follow the instructions to create a new database service. For more details on how to create a new Tiger Cloud database, refer to this guide.
Let's create a table named documents to store all documents with both text content and vector embeddings, and the necessary indexes for fast searching across different methods.
-- Enable extensions
CREATE EXTENSION IF NOT EXISTS vector;
-- Create a table for our news articles
CREATE TABLE documents (
id TEXT PRIMARY KEY,
title TEXT,
content TEXT,
embedding VECTOR(1024) -- jina-embeddings-v3 dimensions
);
-- Create indexes for performance
CREATE INDEX ON documents USING GIN (to_tsvector('english', content));
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops);
You are allowed to execute the table creation SQL in the TigerCloud console.
Implementing Hybrid Search with Jina and PgVector
In this section, we will build a hybrid search engine leveraging Jina and pgvector on Tiger Cloud. Before that, let's build an overall understanding of the architecture and concepts required.
The architecture comprises five steps:
- Documents: The process starts with a collection of documents.
- Embedding generation: a. Semantic search using Jina: Documents are processed using Cohere to create dense embeddings, which capture the semantic meaning of the text. b. Keyword search using PostgreSQL: Documents are processed using PostgreSQL to create sparse embeddings focusing on the text's specific keywords.
- Storage: Both dense and sparse embeddings are stored on Timescale’s PostgreSQL.
-
Retrieval and reranking: The system retrieves results from the Timescale database in response to a query. The results are then reranked using
Jinato prioritize the most relevant documents. - Results generation: The reranked results are compiled into a final list, generating a ranking of the most relevant documents for the query.
Here's the implementation detail in Python:
from typing import Any
from env import SEMANTIC_SCORE_THRESHOLD
class MultiSearchTool:
def __init__(self, connection, jina_client):
self.conn = connection
self.jina = jina_client
def extract_search(self, query: str, limit: int = 5):
"""Extract keyword matching for specific terms, names, dates"""
with self.conn.cursor() as cur:
# Search for exact phrase and important keywords
cur.execute("""
SELECT id, title, content, 1.0 AS score, 'exact' AS search_type
FROM documents
WHERE content LIKE %s
ORDER BY char_length(content) -- Prefr shorter, more focused articles
LIMIT %s
""", (f"%{query}%", limit))
return cur.fetchall()
def fulltext_search(self, query: str, limit: int = 5):
"""PostgreSQL full-text search with ranking"""
with self.conn.cursor() as cur:
cur.execute("""
SELECT id, title, content,
ts_rank_cd(
to_tsvector('english', content),
plainto_tsquery(%s)
) AS score,
'fulltext' AS search_type
FROM documents,
plainto_tsquery('english', %s) AS query
WHERE to_tsvector('english', content) @@ query
ORDER BY score DESC
LIMIT %s
""", (query, query, limit))
return cur.fetchall()
def semantic_search(self, query: str, limit: int=5):
"""Vector-based semantic search using Jina embeddings"""
try:
# Generate query embedding
query_embeddings = self.jina.embed([query], task="retrieval.query")
with self.conn.cursor() as cur:
cur.execute("""
SELECT * FROM (
SELECT id, title, content,
1 - (embedding <=> %s::vector) AS score,
'semantic' AS search_type
FROM documents
) a
WHERE score >= %s
ORDER BY score DESC
LIMIT %s
""", (
query_embeddings[0],
SEMANTIC_SCORE_THRESHOLD,
limit,
))
return cur.fetchall()
except Exception as e:
print(f"Semantic search failed: {e}")
return []
def combine_and_deduplicate(self, *result_sets):
"""Combine results from multiple search methods, removing duplicaties"""
seen_ids = set()
combined = []
# Process results in order of priority
for results in result_sets:
for result in results:
doc_id = result[0]
if doc_id not in seen_ids:
seen_ids.add(doc_id)
combined.append({
"id": doc_id,
"title": result[1],
"content": result[2],
"score": result[3],
"search_type": result[4]
})
return combined
def rerank_results(
self, query: str, results: list[Any], top_k: int = 5
):
"""Use Jina's rerank API for final relevance scoring"""
if not results:
return []
# Prepare documents for reranking (truncate long articles)
# Take first 2000 chars to stay within rerank limits
documents = [result["content"][:2000] for result in results]
try:
rerank_response = self.jina.rerank(
query, documents=documents, top_k=top_k
)
# Map reranked results back to original data
reranked = []
for rerank_result in rerank_response:
original_idx = rerank_result["index"]
result = results[original_idx].copy()
result["rerank_score"] = rerank_result["relevance_score"]
reranked.append(result)
return reranked
except Exception as e:
print(f"Reranking error: {e}")
return results[:top_k]
def hybrid_search(self, query: str, limit: int = 5):
"""Main hybrid search function combining all methods"""
# Cast wide net with all search methods
extract_results = self.extract_search(query, limit=limit)
fulltext_results = self.fulltext_search(query, limit=limit)
semantic_results = self.semantic_search(query, limit=limit)
# Combine and deduplicate (extract matches prioritized first)
combined = self.combine_and_deduplicate(
extract_results,
fulltext_results,
semantic_results
)
# Rerank for final relevance
final_results = self.rerank_results(query, combined, limit)
return final_results
Overall Experience
Tiger Cloud is a cloud PostgreSQL platform tailored for AI applications. This article demonstrates how to leverage its integral vector search capability to construct a conversational AI agent.
However, when deploying the application to a production environment, performance is a critical factor that must be addressed. We can utilize pgvectorscale, another extension provided by Tiger Cloud, to significantly improve application performance. Due to constraints of time and scope, a subsequent post will be dedicated to discussing the production-level usage of pgvectorscale and other performance optimization techniques.




Top comments (7)
Great article — really enjoyed reading about your build on the Parlant-powered conversational agent at Tiger Cloud. You've done a solid job explaining how the dynamic guideline-matching plus vector DB setup enables free-form dialogue in a way that traditional router-based architectures struggle with.
A few thoughts that stood out:
documents.embedding VECTOR(1024)and the GIN/ivfflat indexes shows you’ve done the heavy lifting.pgvectorscale).One question for you: As you move toward production, how are you thinking about real-world latency and guideline management (i.e., when you have 1,000s of guidelines, will retrieval time become the bottleneck)?
Thanks for sharing such a clear write-up! I’m looking forward to your next post on performance optimisation.
Thank you so much for your thoughtful and detailed feedback! That’s an excellent question about scaling, particularly regarding latency and managing a large volume of guidelines. The strategy focuses on a couple of key areas:
Advanced Vector Search Optimization: I plan to leverage pgvectorscale to improve vector search performance within PostgreSQL significantly. This will be the main focus of my upcoming post, where I'll dive into the technical details of how it helps maintain low latency even with a growing dataset.
Strategic Data Structuring: I'll be implementing table partitioning. This approach involves dividing our main table into smaller, more manageable "ranges" based on a key column. By doing this, the query only needs to scan a fraction of the data instead of the entire table, which dramatically reduces retrieval time and prevents it from becoming a bottleneck.
Thanks again for your comments and for taking the time to engage with the post. I appreciate your interest, and I’m looking forward to sharing the deep dive on performance optimization with you soon.
Great Job!!
great job!
Great job
Cool, nice! Greet Job!
Excellent job!