Think about those times on team calls when a simple question like “Can someone check the latest sales numbers?” puts the meeting to a pause while someone goes through dashboards or runs a query. It gets awkward and slows everyone down. Now imagine if your teammate was an AI that could listen in, fetch answers from your live database, and explain them right there as part of the conversation that would require no context-switching, no waiting, and just give you natural explanations as you talk.
In fast-paced teams (especially those where not everyone is a database whiz), having an AI teammate that can jump into your calls and patiently answer questions about your Postgres data can be a great help. It acts as a built-in tutor for clarifying, demonstrating, and even breaking down tricky SQL results in plain language for any team member. This isn’t about fancy interfaces or replacing people, but making it easy for anyone (whether they’re technical or not) to just talk and understand what’s happening in the data, together.
In this tutorial, we build exactly that kind of AI teammate. Instead of piecing together complex infrastructure (audio pipelines, transcription, NLP, TTS, avatars), we'll use Vision Agents to tie everything together, using Stream's WebRTC APIs, and give the AI both a voice and a human-like video presence using ElevenLabs voice and Anam avatars. The system we’ll create can listen, query Postgres, and respond during your calls, making your organization’s knowledge more available and accessible, all through a single participant in the meeting.
Demo
Prerequisites
You will need the following to get going with the implementation:
- Python 3.0 or later
- uv Python package manager
- A Stream account
- A ElevenLabs account
- A Anam account
- A Deepgram account
- A Neon account
- A OpenAI account
Create a Stream Application
- Navigate to the Stream dashboard.
- Select + Create an App.
- In the dialog, enter a name for your application and choose the appropriate region for your edge-server location(s).
- After creation, locate the API Key and Secret under Your Credentials.
- Add these credentials to your
.envfile as follows:
STREAM_API_KEY="your-api-key"
STREAM_API_SECRET="your-api-secret"
Configure ElevenLabs
- Visit the ElevenLabs API Key dashboard.
- Create a new API key.
- Add the key to your
.envfile:
ELEVENLABS_API_KEY="your-elevenlabs-api-key"
Configure Deepgram
- Go to the Deepgram dashboard.
- In the left sidebar, click API Keys.
- Select Create a New API Key.
- Add the generated key to your
.envfile:
DEEPGRAM_API_KEY="your-deepgram-api-key"
Configure OpenAI
- Access the OpenAI API Key dashboard.
- Click + Create new secret key.
- Add the generated key to your
.envfile:
OPENAI_API_KEY="your-openai-api-key"
Configure Anam
- Access the Anam API Key dashboard.
- Click + to create a new key.
- Add the generated key to your
.envfile:
ANAM_API_KEY="your-anam-api-key"
- Access the Anam Build view.
- Click Avatar, hover your desired avatar, and click Copy ID.
- Add the ID to your
.envfile:
ANAM_AVATAR_ID="your-anam-api-key"
That's it for configuring your environment variables. Next, let's move on to scripting the helpful AI teammate for Postgres.
Set up a new Python application
In this section, you will learn how to create a new Python application, set up vision agents in it, and install relevant libraries for a quick implementation.
Let’s get started by creating a new Python project. Open your terminal and run the following commands:
mkdir my-db-agent && cd my-db-agent
uv init && uv add "vision-agents[anam,deepgram,elevenlabs,getstream,openai,redis]" python-dotenv asyncpg httpx
Now, create a .env file at the root of your project. You are going to add the items we saved from the above sections.
It should look something like this:
# .env
## Stream environment variables
STREAM_API_KEY="..."
STREAM_API_SECRET="..."
EXAMPLE_BASE_URL="https://demo.visionagents.ai"
## OpenAI environment variable
OPENAI_API_KEY="sk-proj-..."
## Deepgram environment variable
DEEPGRAM_API_KEY="..."
## ElevenLabs environment variable
ELEVENLABS_API_KEY="sk_..."
## Anam environment variables
ANAM_API_KEY="..."
ANAM_AVATAR_ID="...-...-...-...-..."
## Postgres environment variable
DATABASE_URL="postgresql://neondb_owner:...@ep-...aws.neon.tech/neondb?sslmode=require&channel_binding=require"
Then, create a main.py file with the following code:
from dotenv import load_dotenv
from vision_agents.core import Agent, AgentLauncher, Runner, User
from vision_agents.plugins import deepgram, elevenlabs, getstream, openai
# Load environment variables from a .env file for secrets and configuration
load_dotenv()
async def create_agent(**kwargs) -> Agent:
"""
Factory function to create and configure an Agent for a group call assistant.
Returns:
Agent: An instance of a ready-to-use Agent.
"""
agent = Agent(
stt=deepgram.STT(eager_turn_detection=True), # Deepgram for speech-to-text
tts=elevenlabs.TTS(), # ElevenLabs for text-to-speech
edge=getstream.Edge(), # GetStream edge network for AV transport
llm=openai.LLM(model="gpt-5.4-nano"), # OpenAI LLM (model specified)
agent_user=User(name="Assistant", id="agent"), # Agent's identity
instructions=(
"""You are a helpful voice assistant and database assistant."""
),
)
return agent
async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
call = await agent.create_call(call_type, call_id)
# Context manager handles join and clean-up automatically
async with agent.join(call):
await agent.finish()
if __name__ == "__main__":
# Entrypoint for CLI: launches agent and runs the join_call automatically when invoked from the command line
Runner(AgentLauncher(create_agent=create_agent, join_call=join_call)).cli()
The code block above sets up a voice agent capable of joining group calls with real-time speech-to-speech AI features. It loads secret credentials from a .env file, then creates an agent that uses Deepgram for speech-to-text (transcribing what people say) with eager_turn_detection=True for faster responses and more natural turn taking, ElevenLabs for text-to-speech (making the agent talk back), OpenAI for language understanding and reasoning, and Stream as the underlying audio/video transport. The create_agent function initializes and configures this agent with a helpful voice assistant persona, while the join_call defines how the agent joins and waits for the call to end gracefully to exit automatically.
Bring Your Agent to Life with an Anam Avatar
Anam adds a real time interactive avatar to the agent. The avatar speaks with natural movements and automatic lip sync, which makes a voice agent feel like a present participant on a video call instead of an invisible bot.
Vision Agents ships an Anam integration as a processor. A processor in this context is a flexible building block i.e. it can be an avatar like Anam, a computer vision model (like YOLO), or any custom Python code designed to intercept raw WebRTC audio and video streams, perform some manipulation or transformation, and then republish the processed result back into the Stream call. For Anam, the processor takes audio output from the agent, streams it to the Anam service, and then publishes the resulting animated avatar video and audio as part of the group call experience.
Make the following changes to add Anam avatars to your agent:
# existing imports
+ from vision_agents.plugins.anam import AnamAvatarPublisher
async def create_agent(**kwargs) -> Agent:
...
agent = Agent(
...
instructions=(
"""You are a helpful voice assistant and database assistant."""
),
+ processors=[AnamAvatarPublisher()],
)
return agent
# rest of the file
The code block above integrates Anam avatars into your agent by importing AnamAvatarPublisher and adding it to the agent's processors list, enabling the agent to produce a real-time animated avatar in calls.
Enable Postgres Access via Function Calling
Vision Agents supports function calling, which lets the LLM call real Python functions during a conversation. In the code block below, the agent uses one tool called postgres_query to run SQL against Neon and return rows back to the model.
This approach avoids creating a separate tool for each database operation. Instead of writing tools like list_tables, get_user_by_id, and create_invoice, the agent exposes one query tool and the LLM generates the SQL string it needs for the question, calls the tool, then uses the returned rows to respond.
When the agent hears a question like "How many failed payments happened last week", the LLM decides that it needs database context. The LLM composes a SQL query, calls postgres_query with the SQL string, and then uses the returned rows to produce a final spoken answer on the call.
# existing imports
+ import os
+ import asyncpg
async def create_agent(**kwargs) -> Agent:
...
agent = Agent(
...
instructions=(
"""You are a helpful voice assistant and database assistant.
+ You are an expert in Postgres database and can perform read or write operations against the database for the requirements of the user.
+ When asked about anything for a database, use the postgres_query function.
"""
),
...
)
+ @agent.llm.register_function(
+ description="Run a read or write SQL query against Postgres and return rows."
+ )
+ async def postgres_query(sql: str) -> list:
+ """
+ Execute a read or write query against the Postgres database
+ """
+ database_url = os.getenv("DATABASE_URL")
+ if not database_url:
+ return [
+ "DATABASE_URL is not set. Set DATABASE_URL (e.g. postgres://user:pass@host:5432/db)."
+ ]
+ sql_stripped = sql.strip().lower()
+ print(sql_stripped)
+ conn = await asyncpg.connect(database_url)
+ try:
+ rows = await conn.fetch(sql)
+ return [dict(row) for row in rows]
+ finally:
+ await conn.close()
return agent
# rest of the file
The reason you need to tell the AI (LLM) to use the postgres_query function in its system prompt/instructions is because function calling in Vision Agents (using OpenAI-compatible models) relies on explicit instructions for when a function should be invoked. The model doesn't have internal knowledge of your codebase, so it will only attempt to use a registered tool/function (like postgres_query) if you clearly instruct it that "for any database-related questions, use this function."
Without this guidance, the LLM might try to answer database questions generically, or guess at the code's capabilities, which leads to hallucinations or missed opportunities to use your live data. By stating in the system that it should call postgres_query for all database operations, you're making the tool discoverable and ensuring the agent automatically composes and runs SQL as needed.
How to Run
To launch the service locally, use the following command in your terminal:
uv run main.py run
It will open a demo at https://demo.visionagents.ai and automatically join the session for you. You can then interact with your agent in real time, as demonstrated in the demo.
Ending Thoughts
Building this really got me reflecting on where tech is going. With tools like Vision Agents, we're quickly entering a world where complex tasks (like voice, video, or even database operations) can be "offloaded" to robust agents just by describing what you need in simple words on a video call. The technology infrastructure is shaping up to let you plug in speech pipelines, switch to real-time conversation, keep improving with context, and better observability tools. It's an exciting time, more and more of the heavy lifting is shifting from brittle code to smart agents you direct in natural language.


Top comments (0)