Lin Liang

Posted on Nov 7

We built an AI Agent on Tiger Cloud

#agenticpostgreschallenge #devchallenge #postgres #ai

Agentic Postgres Challenge Submission

This is a submission for the Agentic Postgres Challenge with Tiger Data

What I Built

I recently had a chat with a friend who works in an e-commerce company, and he was complaining that his company spent a fortune on an AI Customer Service application, only to see an increase in user complaints.

I asked him what was going on, and he showed me a real conversation:

User: "I want to return this laptop. Also, I wanted to ask, how does the warranty work if I exchange it for a new one?"

AI Customer Service: "Okay, I can process the return for you. Please provide your order number."

User: "??? What about my warranty question?"

AI Customer Service: "Sorry, how can I help you?"

Here is where the problem with router-based Conversational AI is, which routes messages to agents who own specific topics and tasks.

They inherently fail for natural, free-form conversation -- not due to implementation details you can fix, but due to fundamental architectural constraints.

In this post, we'll build a Conversational AI on Parlant to handle free-form dialogues.

Understanding Parlant Approach

Parlant is an AI alignment engine designed specially for natural customer-facing conversations.

To handle free-form dialogues, Parlant introduces the Dynamic Guideline Matching mechanism that instructs your systems to evaluate all guidelines and load only the relevant ones for each conversation turn.

# Parlant approach - no routing, just conditional guidelines
await agent.create_guideline(
    condition="User asks for the refund and order amount > $500",
    action="First use order status tool, confirm injectim eligibility, then provide detailed refund process",
    tools=[check_order_status, calculate_refund_amount]
)

# More guidelines creation
....

Notice what's different here: there's no routing decision.

Parlant's guideline matcher evaluates all guidelines and determines which ones are contextually relevant. This is where Parlant's architecture does most of its work.

Timescale Postgres - Production-Grade Vector Database

When we delve deeper into Parlant's Dynamic Guideline Matching Mechanism, we uncover a core technical challenge:

How do we efficiently retrieve the 3-5 most relevant guidelines from potentially hundreds or even thousands of available entries, all while maintaining millisecond-level latency?

This is precisely the scenario where a Vector Database comes into play.

The Guideline Matching process in Parlant is essentially a Semantic Retrieval problem. The system converts the condition field of every guideline into a Vector Embedding. When a user input arrives, the process unfolds as follows:

Step 1: Encode the user query and the conversation history into a Query Vector.
Step 2: Execute the Similarity Search on the Vector Database.
Step 3: Return the Top-K (e.g., 3-5) most relevant guidelines.
Step 4: Inject these retrieved guidelines into the LLM's context window.

This requires the Vector Database to possess three core capabilities:

High-performance Approximate Nearest Neighbor (ANN) Search.
Flexible metadata filtering.
Real-time vector updates.

Timescale Postgres provides production-grade support across all three of these dimensions.

Demo

This demo primarily showcases the core capability: using intelligent Guidelines (or Directives) to allow the Large Language Model (LLM) to answer general questions directly, while automatically triggering a Knowledge Base Retrieval for specialized or complex questions, and providing a structured, traceable answer.

If you'd like to run the demo locally, please download the repo, and follow the instructions in README.md to set up your local development environment.

How I Used Agentic Postgres

To start, you need to follow the provided instructions to create a Tiger-Cloud account. For more information, refer to this guide.

Database setup

Then follow the instructions to create a new database service. For more details on how to create a new Tiger Cloud database, refer to this guide.

Let's create a table named documents to store all documents with both text content and vector embeddings, and the necessary indexes for fast searching across different methods.

-- Enable extensions
CREATE EXTENSION IF NOT EXISTS vector;

-- Create a table for our news articles
CREATE TABLE documents (
    id TEXT PRIMARY KEY,
    title TEXT,
    content TEXT,
    embedding VECTOR(1024) -- jina-embeddings-v3 dimensions
);

-- Create indexes for performance
CREATE INDEX ON documents USING GIN (to_tsvector('english', content));
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops);

You are allowed to execute the table creation SQL in the TigerCloud console.

Implementing Hybrid Search with Jina and PgVector

In this section, we will build a hybrid search engine leveraging Jina and pgvector on Tiger Cloud. Before that, let's build an overall understanding of the architecture and concepts required.

The architecture comprises five steps:

Documents: The process starts with a collection of documents.
Embedding generation: a. Semantic search using Jina: Documents are processed using Cohere to create dense embeddings, which capture the semantic meaning of the text. b. Keyword search using PostgreSQL: Documents are processed using PostgreSQL to create sparse embeddings focusing on the text's specific keywords.
Storage: Both dense and sparse embeddings are stored on Timescale’s PostgreSQL.
Retrieval and reranking: The system retrieves results from the Timescale database in response to a query. The results are then reranked using Jina to prioritize the most relevant documents.
Results generation: The reranked results are compiled into a final list, generating a ranking of the most relevant documents for the query.

Here's the implementation detail in Python:

from typing import Any
from env import SEMANTIC_SCORE_THRESHOLD


class MultiSearchTool:

    def __init__(self, connection, jina_client):
        self.conn = connection
        self.jina = jina_client


    def extract_search(self, query: str, limit: int = 5):
        """Extract keyword matching for specific terms, names, dates"""
        with self.conn.cursor() as cur:
            # Search for exact phrase and important keywords
            cur.execute("""
                SELECT id, title, content, 1.0 AS score, 'exact' AS search_type
                    FROM documents
                    WHERE content LIKE %s
                    ORDER BY char_length(content)  -- Prefr shorter, more focused articles
                    LIMIT %s
            """, (f"%{query}%", limit))
            return cur.fetchall()


    def fulltext_search(self, query: str, limit: int = 5):
        """PostgreSQL full-text search with ranking"""
        with self.conn.cursor() as cur:
            cur.execute("""
                SELECT id, title, content,
                       ts_rank_cd(
                            to_tsvector('english', content), 
                            plainto_tsquery(%s)
                        ) AS score,
                       'fulltext' AS search_type
                FROM documents,
                     plainto_tsquery('english', %s) AS query
                WHERE to_tsvector('english', content) @@ query
                ORDER BY score DESC
                LIMIT %s
            """, (query, query, limit))
            return cur.fetchall()


    def semantic_search(self, query: str, limit: int=5):
        """Vector-based semantic search using Jina embeddings"""
        try:
            # Generate query embedding
            query_embeddings = self.jina.embed([query], task="retrieval.query")

            with self.conn.cursor() as cur:
                cur.execute("""
                    SELECT * FROM (
                        SELECT id, title, content,
                            1 - (embedding <=> %s::vector) AS score,
                            'semantic' AS search_type
                        FROM documents
                    ) a
                    WHERE score >= %s
                    ORDER BY score DESC
                    LIMIT %s
                """, (
                    query_embeddings[0],
                    SEMANTIC_SCORE_THRESHOLD,
                    limit,
                ))

                return cur.fetchall()

        except Exception as e:
            print(f"Semantic search failed: {e}")
            return []


    def combine_and_deduplicate(self, *result_sets):
        """Combine results from multiple search methods, removing duplicaties"""
        seen_ids = set()
        combined = []

        # Process results in order of priority
        for results in result_sets:
            for result in results:
                doc_id = result[0]
                if doc_id not in seen_ids:
                    seen_ids.add(doc_id)
                    combined.append({
                        "id": doc_id,
                        "title": result[1],
                        "content": result[2],
                        "score": result[3],
                        "search_type": result[4]
                    })

        return combined


    def rerank_results(
            self, query: str, results: list[Any], top_k: int = 5
        ):
        """Use Jina's rerank API for final relevance scoring"""
        if not results:
            return []

        # Prepare documents for reranking (truncate long articles)
        # Take first 2000 chars to stay within rerank limits
        documents = [result["content"][:2000] for result in results]

        try:
            rerank_response = self.jina.rerank(
                query, documents=documents, top_k=top_k
            )

            # Map reranked results back to original data
            reranked = []
            for rerank_result in rerank_response:
                original_idx = rerank_result["index"]

                result = results[original_idx].copy()
                result["rerank_score"] = rerank_result["relevance_score"]

                reranked.append(result)

            return reranked

        except Exception as e:
            print(f"Reranking error: {e}")
            return results[:top_k]


    def hybrid_search(self, query: str, limit: int = 5):
        """Main hybrid search function combining all methods"""
        # Cast wide net with all search methods
        extract_results = self.extract_search(query, limit=limit)
        fulltext_results = self.fulltext_search(query, limit=limit)
        semantic_results = self.semantic_search(query, limit=limit)

        # Combine and deduplicate (extract matches prioritized first)
        combined = self.combine_and_deduplicate(
            extract_results,
            fulltext_results,
            semantic_results
        )

        # Rerank for final relevance
        final_results = self.rerank_results(query, combined, limit)

        return final_results

Overall Experience

Tiger Cloud is a cloud PostgreSQL platform tailored for AI applications. This article demonstrates how to leverage its integral vector search capability to construct a conversational AI agent.

However, when deploying the application to a production environment, performance is a critical factor that must be addressed. We can utilize pgvectorscale, another extension provided by Tiger Cloud, to significantly improve application performance. Due to constraints of time and scope, a subsequent post will be dedicated to discussing the production-level usage of pgvectorscale and other performance optimization techniques.

Top comments (4)

Andy0898 • Nov 7

great job!

George-chu • Nov 7

Great Job!!

Eko Budiharto • Nov 8

Great job

李晓敏 • Nov 8

Cool, nice! Greet Job!