Rishi Raj Jain

Posted on Apr 27

Build an AI teammate to help with your Postgres

#postgres #ai #database #productivity

Think about those times on team calls when a simple question like “Can someone check the latest sales numbers?” puts the meeting to a pause while someone goes through dashboards or runs a query. It gets awkward and slows everyone down. Now imagine if your teammate was an AI that could listen in, fetch answers from your live database, and explain them right there as part of the conversation that would require no context-switching, no waiting, and just give you natural explanations as you talk.

In fast-paced teams (especially those where not everyone is a database whiz), having an AI teammate that can jump into your calls and patiently answer questions about your Postgres data can be a great help. It acts as a built-in tutor for clarifying, demonstrating, and even breaking down tricky SQL results in plain language for any team member. This isn’t about fancy interfaces or replacing people, but making it easy for anyone (whether they’re technical or not) to just talk and understand what’s happening in the data, together.

In this tutorial, we build exactly that kind of AI teammate. Instead of piecing together complex infrastructure (audio pipelines, transcription, NLP, TTS, avatars), we'll use Vision Agents to tie everything together, using Stream's WebRTC APIs, and give the AI both a voice and a human-like video presence using ElevenLabs voice and Anam avatars. The system we’ll create can listen, query Postgres, and respond during your calls, making your organization’s knowledge more available and accessible, all through a single participant in the meeting.

Demo

Prerequisites

You will need the following to get going with the implementation:

Python 3.0 or later
uv Python package manager
A Stream account
A ElevenLabs account
A Anam account
A Deepgram account
A Neon account
A OpenAI account

Create a Stream Application

Navigate to the Stream dashboard.
Select + Create an App.
In the dialog, enter a name for your application and choose the appropriate region for your edge-server location(s).
After creation, locate the API Key and Secret under Your Credentials.

Add these credentials to your .env file as follows:

   STREAM_API_KEY="your-api-key"
   STREAM_API_SECRET="your-api-secret"

Configure ElevenLabs

Visit the ElevenLabs API Key dashboard.
Create a new API key.
Add the key to your .env file:

   ELEVENLABS_API_KEY="your-elevenlabs-api-key"

Configure Deepgram

Go to the Deepgram dashboard.
In the left sidebar, click API Keys.
Select Create a New API Key.
Add the generated key to your .env file:

   DEEPGRAM_API_KEY="your-deepgram-api-key"

Configure OpenAI

Access the OpenAI API Key dashboard.
Click + Create new secret key.
Add the generated key to your .env file:

   OPENAI_API_KEY="your-openai-api-key"

Configure Anam

Access the Anam API Key dashboard.
Click + to create a new key.
Add the generated key to your .env file:

   ANAM_API_KEY="your-anam-api-key"

Access the Anam Build view.
Click Avatar, hover your desired avatar, and click Copy ID.

Add the ID to your .env file:

   ANAM_AVATAR_ID="your-anam-api-key"

That's it for configuring your environment variables. Next, let's move on to scripting the helpful AI teammate for Postgres.

Set up a new Python application

In this section, you will learn how to create a new Python application, set up vision agents in it, and install relevant libraries for a quick implementation.

Let’s get started by creating a new Python project. Open your terminal and run the following commands:

mkdir my-db-agent && cd my-db-agent
uv init && uv add "vision-agents[anam,deepgram,elevenlabs,getstream,openai,redis]" python-dotenv asyncpg httpx

Now, create a .env file at the root of your project. You are going to add the items we saved from the above sections.

It should look something like this:

# .env

## Stream environment variables
STREAM_API_KEY="..."
STREAM_API_SECRET="..."
EXAMPLE_BASE_URL="https://demo.visionagents.ai"

## OpenAI environment variable
OPENAI_API_KEY="sk-proj-..."

## Deepgram environment variable
DEEPGRAM_API_KEY="..."

## ElevenLabs environment variable
ELEVENLABS_API_KEY="sk_..."

## Anam environment variables
ANAM_API_KEY="..."
ANAM_AVATAR_ID="...-...-...-...-..."

## Postgres environment variable
DATABASE_URL="postgresql://neondb_owner:...@ep-...aws.neon.tech/neondb?sslmode=require&channel_binding=require"

Then, create a main.py file with the following code:

from dotenv import load_dotenv
from vision_agents.core import Agent, AgentLauncher, Runner, User
from vision_agents.plugins import deepgram, elevenlabs, getstream, openai

load_dotenv()

async def create_agent(**kwargs) -> Agent:
    agent = Agent(
        stt=deepgram.STT(eager_turn_detection=True),  # Deepgram for speech-to-text
        tts=elevenlabs.TTS(),  # ElevenLabs for text-to-speech
        edge=getstream.Edge(),  # GetStream edge network for AV transport
        llm=openai.LLM(model="gpt-5.4-nano"),  # OpenAI LLM (model specified)
        agent_user=User(name="Assistant", id="agent"),  # Agent's identity
        instructions=(
            """You are a helpful voice assistant and database assistant."""
        ),
    )
    return agent

async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
    call = await agent.create_call(call_type, call_id)
    # Context manager handles join and clean-up automatically
    async with agent.join(call):
        await agent.finish()


if __name__ == "__main__":
    Runner(AgentLauncher(create_agent=create_agent, join_call=join_call)).cli()

The code block above sets up a voice agent capable of joining group calls with real-time speech-to-speech AI features. It loads secret credentials from a .env file, then creates an agent that uses Deepgram for speech-to-text (transcribing what people say) with eager_turn_detection=True for faster responses and more natural turn taking, ElevenLabs for text-to-speech (making the agent talk back), OpenAI for language understanding and reasoning, and Stream as the underlying audio/video transport. The create_agent function initializes and configures this agent with a helpful voice assistant persona, while the join_call defines how the agent joins and waits for the call to end gracefully to exit automatically.

Bring Your Agent to Life with an Anam Avatar

Anam adds a real time interactive avatar to the agent. The avatar speaks with natural movements and automatic lip sync, which makes a voice agent feel like a present participant on a video call instead of an invisible bot.

Vision Agents ships an Anam integration as a processor. A processor in this context is a flexible building block i.e. it can be an avatar like Anam, a computer vision model (like YOLO), or any custom Python code designed to intercept raw WebRTC audio and video streams, perform some manipulation or transformation, and then republish the processed result back into the Stream call. For Anam, the processor takes audio output from the agent, streams it to the Anam service, and then publishes the resulting animated avatar video and audio as part of the group call experience.

Make the following changes to add Anam avatars to your agent:

# existing imports

+ from vision_agents.plugins.anam import AnamAvatarPublisher

async def create_agent(**kwargs) -> Agent:
    ...
    agent = Agent(
        ...
        instructions=(
            """You are a helpful voice assistant and database assistant."""
        ),
+        processors=[AnamAvatarPublisher()],
    )

    return agent

# rest of the file

The code block above integrates Anam avatars into your agent by importing AnamAvatarPublisher and adding it to the agent's processors list, enabling the agent to produce a real-time animated avatar in calls.

Enable Postgres Access via Function Calling

Vision Agents supports function calling, which lets the LLM call real Python functions during a conversation. In the code block below, the agent uses one tool called postgres_query to run SQL against Neon and return rows back to the model.

This approach avoids creating a separate tool for each database operation. Instead of writing tools like list_tables, get_user_by_id, and create_invoice, the agent exposes one query tool and the LLM generates the SQL string it needs for the question, calls the tool, then uses the returned rows to respond.

When the agent hears a question like "How many failed payments happened last week", the LLM decides that it needs database context. The LLM composes a SQL query, calls postgres_query with the SQL string, and then uses the returned rows to produce a final spoken answer on the call.

# existing imports

+ import os
+ import asyncpg

async def create_agent(**kwargs) -> Agent:
    ...
    agent = Agent(
        ...
        instructions=(
            """You are a helpful voice assistant and database assistant.
+            You are an expert in Postgres database and can perform read or write operations against the database for the requirements of the user.
+            When asked about anything for a database, use the postgres_query function.
            """
        ),
        ...
    )

+    @agent.llm.register_function(
+        description="Run a read or write SQL query against Postgres and return rows."
+    )
+    async def postgres_query(sql: str) -> list:
+        """
+        Execute a read or write query against the Postgres database
+        """
+        database_url = os.getenv("DATABASE_URL")
+        if not database_url:
+            return [
+                "DATABASE_URL is not set. Set DATABASE_URL (e.g. postgres://user:pass@host:5432/db)."
+            ]
+        sql_stripped = sql.strip().lower()
+        print(sql_stripped)
+        conn = await asyncpg.connect(database_url)
+        try:
+            rows = await conn.fetch(sql)
+            return [dict(row) for row in rows]
+        finally:
+            await conn.close()

    return agent

# rest of the file

The reason you need to tell the AI (LLM) to use the postgres_query function in its system prompt/instructions is because function calling in Vision Agents (using OpenAI-compatible models) relies on explicit instructions for when a function should be invoked. The model doesn't have internal knowledge of your codebase, so it will only attempt to use a registered tool/function (like postgres_query) if you clearly instruct it that "for any database-related questions, use this function."

Without this guidance, the LLM might try to answer database questions generically, or guess at the code's capabilities, which leads to hallucinations or missed opportunities to use your live data. By stating in the system that it should call postgres_query for all database operations, you're making the tool discoverable and ensuring the agent automatically composes and runs SQL as needed.

How to Run

To launch the service locally, use the following command in your terminal:

uv run main.py run

It will open a demo at https://demo.visionagents.ai and automatically join the session for you. You can then interact with your agent in real time, as demonstrated in the demo.

Ending Thoughts

Building this really got me reflecting on where tech is going. With tools like Vision Agents, we're quickly entering a world where complex tasks (like voice, video, or even database operations) can be "offloaded" to robust agents just by describing what you need in simple words on a video call. The technology infrastructure is shaping up to let you plug in speech pipelines, switch to real-time conversation, keep improving with context, and better observability tools. It's an exciting time, more and more of the heavy lifting is shifting from brittle code to smart agents you direct in natural language.

Top comments (2)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.