Building AI calling agents shouldn't require a commercial license or massive per-minute markups.
If you are a Python developer, you should be able to spin up a sub-500ms latency voice agent on your own machine.
Today, I'm going to introduce you to Siphon, an open-source (Apache 2.0) Python framework built by BLACKDWARF that natively bridges SIP trunks to LLMs.
Prerequisites
- Python 3.10+
- A Twilio or Telnyx SIP Trunk
- LiveKit Credentials
- An OpenAI API Key
Step 1: Installation & Setup
First, clone the Siphon repository and install the requirements.
pip install siphon-ai
Next, create a .env file in your project root to hold your raw provider keys.
Because Siphon is self-hosted, you pay providers like OpenAI and LiveKit directly—NO MIDDLEMAN FEES.
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_livekit_key
LIVEKIT_API_SECRET=your_livekit_secret
OPENAI_API_KEY=sk-yourkey
DEEPGRAM_API_KEY=yourkey
FROM_NUMBER=+15551234567
SIP_USERNAME=your_sip_user
SIP_PASSWORD=your_sip_pass
Step 2: Defining the Agent
Siphon abstracts away the complex WebRTC media pipelines and Voice Activity Detection (VAD).
You just need to define how your agent behaves using Siphon's plugin architecture.
from siphon.agent import Agent
from siphon.plugins import openai, cartesia, deepgram
# Define the Agent
agent = Agent(
agent_name="Receptionist",
llm=openai.LLM(),
tts=cartesia.TTS(),
stt=deepgram.STT(),
system_instructions="You are a helpful dental receptionist. Help the user book an appointment."
)
Step 3: Triggering an Outbound Call
Siphon makes outbound SIP signaling incredibly straightforward. If you don’t have a trunk ID setup, you can programmatically trigger a call using SIP credentials, and Siphon will natively reuse or create an outbound trunk.
import os
from dotenv import load_dotenv
from siphon.telephony.outbound import Call
load_dotenv()
# Instantiate the outbound dialing sequence with SIP Credentials
call = Call(
agent_name="Receptionist",
sip_trunk_setup={
"name": "telnyx-primary",
"sip_address": "sip.telnyx.com",
"sip_number": os.getenv("FROM_NUMBER"),
"sip_username": os.getenv("SIP_USERNAME"),
"sip_password": os.getenv("SIP_PASSWORD"),
},
number_to_call="+15550200",
)
# Execute the asynchronous dial and bridge to the LiveKit WebRTC room
call.start()
Step 4: Handling State and Interruptions
One of the hardest things to build in Voice AI is handling interruptions (barge-ins).
Because Siphon uses LiveKit's WebRTC engine natively, it halts TTS output instantly when it detects human speech. Run your script, and you will have a natural, low-latency conversation with your AI—hosted entirely on your own infrastructure.
Check out the full documentation and repository at👾
GitHub: [https://github.com/blackdwarftech/siphon]
Siphon Website: [https://siphon.blackdwarf.in/docs]
and drop us a star if this saves you money!
Top comments (0)