How I built a voice-controlled on-call engineer that deploys code and calls me back
It's 3am. PagerDuty is screaming. Your deployment is down.
You could open your laptop, ssh into servers, grep through logs, find the issue, push a fix, wait for CI, deploy, and hope it works. Or you could pick up your phone, call a number, and say: "Check the logs, fix the auth bug, and call me back when it's deployed."
I built the second option. It's an AI agent you can call on your phone that has full access to your infrastructure—reading logs, deploying code, running tests—and can call you back when work is done.
This post walks through how I built it on Render.
Architecture Overview
The system connects phone calls to Claude with full tool access:
Phone <-> Twilio <-> Pipecat [Deepgram Flux STT -> Claude SDK -> Cartesia TTS]
|
Claude Agent SDK
|
+------------+------------+
| | |
Render MCP Bash/gh Callbacks
(deploy, (git ops) (phone/SMS)
logs,
metrics)
Four Render services make this work:
| Service | Purpose |
|---|---|
| Web | Handles incoming calls, runs voice pipeline |
| Worker | Executes background tasks autonomously |
| Redis | Task queue for callbacks and reminders |
| Postgres | User data, task history, conversation memory |
When someone calls, Twilio opens a WebSocket to the web service. Pipecat manages real-time audio—Deepgram Flux converts speech to text, Claude processes it, and Cartesia speaks the response. Claude has full tool access: file operations, bash, git, and the Render MCP for infrastructure management.
The magic is in the callbacks. When you say "deploy and call me back," Claude hands the task to a background worker. The worker runs autonomously—no user interaction—and when it's done, initiates an outbound call to deliver the summary.
The Key Innovation: Callbacks
Most AI is request-response. You ask, it answers. But voice agents need to work asynchronously. You can't stay on hold for 10 minutes while a deployment runs.
The solution is a handoff pattern:
- User requests background work ("fix the bug and call me back")
- Claude calls
handoff_taskwith a structured plan - Task goes to Redis queue
- Worker picks it up and executes autonomously
- When complete, worker triggers a Twilio outbound call
- New voice session delivers the summary
Here's the handoff tool:
@tool("handoff_task", """Hand off a task to run AFTER the call ends.
The background agent will execute autonomously with full tool access.
Use when user says things like:
- "Deploy to staging and call me back"
- "Fix the bug and let me know when it's done"
""", {
"task_type": str,
"plan": dict, # {objective, steps, success_criteria}
"notify_on": str,
})
async def handoff_task_tool(args: dict[str, Any]) -> dict[str, Any]:
ctx = _get_session_context()
phone = ctx.get("caller_phone")
# Save task to Postgres
task_id = await create_background_task(
user_id=user_id,
phone=phone,
task_type=args.get("task_type"),
plan=args.get("plan"),
)
# Queue for background execution
await enqueue_background_task(task_id)
return {
"content": [{
"type": "text",
"text": f"Task handed off. I'll call you back when done."
}]
}
The worker then spawns a headless Claude session with the same tools:
async def execute_background_task(ctx: dict, task_id: str):
task = await get_background_task(task_id)
plan = task["plan"]
system_prompt = f"""You are executing a background task AUTONOMOUSLY.
The user is NOT on the call.
## CRITICAL RULES
- Do NOT ask questions or wait for input
- Make decisions and proceed
- If something fails, try to fix it yourself
## Your Task
**Objective**: {plan.get('objective')}
**Steps**: {plan.get('steps')}
"""
query_options = ClaudeAgentOptions(
system_prompt=system_prompt,
mcp_servers=mcp_servers,
permission_mode="bypassPermissions",
allowed_tools=[...],
)
async for msg in query(prompt=f"Execute: {plan['objective']}",
options=query_options):
# Process messages
pass
# Call user back with result
await initiate_callback(
phone=phone,
context={"summary": summary, "success": True},
callback_type="task_complete",
)
When the callback connects, Claude has full context of what just happened and delivers a detailed summary—including specific details like hex codes, file names, and PR URLs.
Setting Up on Render
The entire system is defined in render.yaml:
services:
- type: web
name: ringfra
runtime: docker
healthCheckPath: /health
envVars:
- key: TWILIO_ACCOUNT_SID
sync: false
- key: ANTHROPIC_API_KEY
sync: false
- key: DEEPGRAM_API_KEY
sync: false
- key: CARTESIA_API_KEY
sync: false
- key: RENDER_API_KEY
sync: false
- key: REDIS_URL
fromService:
type: redis
name: ringfra-redis
property: connectionString
- key: DATABASE_URL
fromDatabase:
name: ringfra-db
property: connectionString
- type: worker
name: ringfra-worker
runtime: docker
dockerCommand: python -m arq src.tasks.worker.WorkerSettings
envVars:
# Same env vars as web service
- key: ANTHROPIC_API_KEY
sync: false
# ...
databases:
- name: ringfra-redis
type: redis
plan: free
- name: ringfra-db
databaseName: ringfra
plan: free
The web service handles incoming calls. The worker processes background tasks. Redis powers the task queue (ARQ). Postgres stores user data and task history.
Render's service linking (fromService, fromDatabase) handles the wiring automatically—no manual connection strings.
The Voice Pipeline
Real-time voice AI is tricky. Audio streams in, needs to be transcribed, processed, and spoken back—all while handling interruptions and maintaining natural conversation flow.
Pipecat handles this with a frame-based pipeline:
async def run_sdk_pipeline(websocket, stream_sid, call_sid):
# Speech-to-Text with Deepgram Flux
stt = DeepgramFluxSTTService(
api_key=settings.DEEPGRAM_API_KEY,
model="flux-general-en",
params=DeepgramFluxSTTService.InputParams(
eot_threshold=0.65, # End-of-turn sensitivity
eot_timeout_ms=3000, # Max silence before turn ends
),
)
# Text-to-Speech with Cartesia
tts = CartesiaTTSService(
api_key=settings.CARTESIA_API_KEY,
voice_id=settings.TTS_VOICE,
)
# Claude SDK bridge
sdk_bridge = SDKBridgeProcessor(session=session)
# Build pipeline
pipeline = Pipeline([
transport.input(), # Audio from Twilio
stt, # Speech -> Text
sdk_bridge, # Text -> Claude -> Text
tts, # Text -> Speech
transport.output(), # Audio to Twilio
])
The key insight is Deepgram Flux. Traditional voice activity detection (VAD) listens for silence to determine when you're done speaking. This cuts people off during natural pauses: "I want to... deploy the API."
Flux uses semantic understanding—it knows when a thought is complete, not just when you stopped making noise.
The Render MCP Integration
MCP (Model Context Protocol) lets Claude talk to external APIs through a standardized interface. Render's MCP server exposes your entire infrastructure:
mcp_servers = {
"render": {
"type": "http",
"url": "https://mcp.render.com/mcp",
"headers": {
"Authorization": f"Bearer {render_api_key}",
},
},
}
With this configured, Claude can:
- List services: "What services do I have running?"
- Read logs: "Any errors in the last hour?"
- Check metrics: "What's using the most CPU?"
- Deploy: "Deploy the API to production"
- Manage databases: "Show me the Postgres instances"
Here's what a log query looks like in practice:
User: "Any errors in the API logs?"
Claude: [calls mcp__render__list_services]
Claude: [calls mcp__render__list_logs with filters]
Claude: "I found 3 errors in the last hour. Two are null pointer
exceptions in the auth module, one is a timeout connecting
to Redis. Want me to look into the auth issue?"
The MCP integration means Claude isn't just answering questions—it's actually querying your infrastructure and taking action.
Demo
Here's what it looks like in action:
[VIDEO LINK: https://youtu.be/tUcLhMSpCJ0]
The demo shows:
- Calling the agent with a complex request ("fix the branding and call me back")
- Agent reads back a 7-step plan
- Hanging up while it works
- Getting called back 10 minutes later with a detailed summary
- Memory persistence across calls ("Do you remember my name?")
- Live log reading via Render MCP
Try It Yourself
The code is open source:
git clone https://github.com/Designedforusers/RingFra.git
cd RingFra
python -m venv venv && source venv/bin/activate
pip install -e .
You'll need accounts with:
- Twilio - Phone/SMS ($15 trial credit)
- Deepgram - Speech-to-text (free tier)
- Cartesia - Text-to-speech (free tier)
- Anthropic - Claude API
- Render - Hosting + MCP
Full setup instructions are in the repo's TUTORIAL.md:
https://github.com/Designedforusers/RingFra/blob/main/docs/TUTORIAL.md
Production stats:
- P50 response latency: 3.3s
- Cost per call: ~$0.08-0.15/min
What's Next
This is just the beginning of what's possible with proactive AI agents:
- Monitoring alerts: Agent notices high CPU, investigates, fixes, calls you with summary
- PR review calls: "Call me when the PR has comments"
- Incident response: Automatic escalation through phone tree
- Scheduled check-ins: "Call me every morning with deployment status"
The core insight is that AI shouldn't just wait for you. It should work in the background and reach out when something matters.
Built on Render. Deploy your own: https://github.com/Designedforusers/RingFra
Top comments (0)