DEV Community

Cover image for Build an AI Agent with Phone Callbacks on Render
Nick lamela
Nick lamela

Posted on

Build an AI Agent with Phone Callbacks on Render

How I built a voice-controlled on-call engineer that deploys code and calls me back


It's 3am. PagerDuty is screaming. Your deployment is down.

You could open your laptop, ssh into servers, grep through logs, find the issue, push a fix, wait for CI, deploy, and hope it works. Or you could pick up your phone, call a number, and say: "Check the logs, fix the auth bug, and call me back when it's deployed."

I built the second option. It's an AI agent you can call on your phone that has full access to your infrastructure—reading logs, deploying code, running tests—and can call you back when work is done.

This post walks through how I built it on Render.


Architecture Overview

The system connects phone calls to Claude with full tool access:

Phone <-> Twilio <-> Pipecat [Deepgram Flux STT -> Claude SDK -> Cartesia TTS]
                                              |
                                       Claude Agent SDK
                                              |
                                 +------------+------------+
                                 |            |            |
                            Render MCP    Bash/gh     Callbacks
                            (deploy,      (git ops)   (phone/SMS)
                             logs,
                             metrics)
Enter fullscreen mode Exit fullscreen mode

Four Render services make this work:

Service Purpose
Web Handles incoming calls, runs voice pipeline
Worker Executes background tasks autonomously
Redis Task queue for callbacks and reminders
Postgres User data, task history, conversation memory

When someone calls, Twilio opens a WebSocket to the web service. Pipecat manages real-time audio—Deepgram Flux converts speech to text, Claude processes it, and Cartesia speaks the response. Claude has full tool access: file operations, bash, git, and the Render MCP for infrastructure management.

The magic is in the callbacks. When you say "deploy and call me back," Claude hands the task to a background worker. The worker runs autonomously—no user interaction—and when it's done, initiates an outbound call to deliver the summary.


The Key Innovation: Callbacks

Most AI is request-response. You ask, it answers. But voice agents need to work asynchronously. You can't stay on hold for 10 minutes while a deployment runs.

The solution is a handoff pattern:

  1. User requests background work ("fix the bug and call me back")
  2. Claude calls handoff_task with a structured plan
  3. Task goes to Redis queue
  4. Worker picks it up and executes autonomously
  5. When complete, worker triggers a Twilio outbound call
  6. New voice session delivers the summary

Here's the handoff tool:

@tool("handoff_task", """Hand off a task to run AFTER the call ends.
The background agent will execute autonomously with full tool access.

Use when user says things like:
- "Deploy to staging and call me back"
- "Fix the bug and let me know when it's done"
""", {
    "task_type": str,
    "plan": dict,  # {objective, steps, success_criteria}
    "notify_on": str,
})
async def handoff_task_tool(args: dict[str, Any]) -> dict[str, Any]:
    ctx = _get_session_context()
    phone = ctx.get("caller_phone")

    # Save task to Postgres
    task_id = await create_background_task(
        user_id=user_id,
        phone=phone,
        task_type=args.get("task_type"),
        plan=args.get("plan"),
    )

    # Queue for background execution
    await enqueue_background_task(task_id)

    return {
        "content": [{
            "type": "text",
            "text": f"Task handed off. I'll call you back when done."
        }]
    }
Enter fullscreen mode Exit fullscreen mode

The worker then spawns a headless Claude session with the same tools:

async def execute_background_task(ctx: dict, task_id: str):
    task = await get_background_task(task_id)
    plan = task["plan"]

    system_prompt = f"""You are executing a background task AUTONOMOUSLY.
The user is NOT on the call.

## CRITICAL RULES
- Do NOT ask questions or wait for input
- Make decisions and proceed
- If something fails, try to fix it yourself

## Your Task
**Objective**: {plan.get('objective')}
**Steps**: {plan.get('steps')}
"""

    query_options = ClaudeAgentOptions(
        system_prompt=system_prompt,
        mcp_servers=mcp_servers,
        permission_mode="bypassPermissions",
        allowed_tools=[...],
    )

    async for msg in query(prompt=f"Execute: {plan['objective']}",
                           options=query_options):
        # Process messages
        pass

    # Call user back with result
    await initiate_callback(
        phone=phone,
        context={"summary": summary, "success": True},
        callback_type="task_complete",
    )
Enter fullscreen mode Exit fullscreen mode

When the callback connects, Claude has full context of what just happened and delivers a detailed summary—including specific details like hex codes, file names, and PR URLs.


Setting Up on Render

The entire system is defined in render.yaml:

services:
  - type: web
    name: ringfra
    runtime: docker
    healthCheckPath: /health
    envVars:
      - key: TWILIO_ACCOUNT_SID
        sync: false
      - key: ANTHROPIC_API_KEY
        sync: false
      - key: DEEPGRAM_API_KEY
        sync: false
      - key: CARTESIA_API_KEY
        sync: false
      - key: RENDER_API_KEY
        sync: false
      - key: REDIS_URL
        fromService:
          type: redis
          name: ringfra-redis
          property: connectionString
      - key: DATABASE_URL
        fromDatabase:
          name: ringfra-db
          property: connectionString

  - type: worker
    name: ringfra-worker
    runtime: docker
    dockerCommand: python -m arq src.tasks.worker.WorkerSettings
    envVars:
      # Same env vars as web service
      - key: ANTHROPIC_API_KEY
        sync: false
      # ...

databases:
  - name: ringfra-redis
    type: redis
    plan: free

  - name: ringfra-db
    databaseName: ringfra
    plan: free
Enter fullscreen mode Exit fullscreen mode

The web service handles incoming calls. The worker processes background tasks. Redis powers the task queue (ARQ). Postgres stores user data and task history.

Render's service linking (fromService, fromDatabase) handles the wiring automatically—no manual connection strings.


The Voice Pipeline

Real-time voice AI is tricky. Audio streams in, needs to be transcribed, processed, and spoken back—all while handling interruptions and maintaining natural conversation flow.

Pipecat handles this with a frame-based pipeline:

async def run_sdk_pipeline(websocket, stream_sid, call_sid):
    # Speech-to-Text with Deepgram Flux
    stt = DeepgramFluxSTTService(
        api_key=settings.DEEPGRAM_API_KEY,
        model="flux-general-en",
        params=DeepgramFluxSTTService.InputParams(
            eot_threshold=0.65,      # End-of-turn sensitivity
            eot_timeout_ms=3000,     # Max silence before turn ends
        ),
    )

    # Text-to-Speech with Cartesia
    tts = CartesiaTTSService(
        api_key=settings.CARTESIA_API_KEY,
        voice_id=settings.TTS_VOICE,
    )

    # Claude SDK bridge
    sdk_bridge = SDKBridgeProcessor(session=session)

    # Build pipeline
    pipeline = Pipeline([
        transport.input(),   # Audio from Twilio
        stt,                 # Speech -> Text
        sdk_bridge,          # Text -> Claude -> Text
        tts,                 # Text -> Speech
        transport.output(),  # Audio to Twilio
    ])
Enter fullscreen mode Exit fullscreen mode

The key insight is Deepgram Flux. Traditional voice activity detection (VAD) listens for silence to determine when you're done speaking. This cuts people off during natural pauses: "I want to... deploy the API."

Flux uses semantic understanding—it knows when a thought is complete, not just when you stopped making noise.


The Render MCP Integration

MCP (Model Context Protocol) lets Claude talk to external APIs through a standardized interface. Render's MCP server exposes your entire infrastructure:

mcp_servers = {
    "render": {
        "type": "http",
        "url": "https://mcp.render.com/mcp",
        "headers": {
            "Authorization": f"Bearer {render_api_key}",
        },
    },
}
Enter fullscreen mode Exit fullscreen mode

With this configured, Claude can:

  • List services: "What services do I have running?"
  • Read logs: "Any errors in the last hour?"
  • Check metrics: "What's using the most CPU?"
  • Deploy: "Deploy the API to production"
  • Manage databases: "Show me the Postgres instances"

Here's what a log query looks like in practice:

User: "Any errors in the API logs?"

Claude: [calls mcp__render__list_services]
Claude: [calls mcp__render__list_logs with filters]
Claude: "I found 3 errors in the last hour. Two are null pointer
         exceptions in the auth module, one is a timeout connecting
         to Redis. Want me to look into the auth issue?"
Enter fullscreen mode Exit fullscreen mode

The MCP integration means Claude isn't just answering questions—it's actually querying your infrastructure and taking action.


Demo

Here's what it looks like in action:

[VIDEO LINK: https://youtu.be/tUcLhMSpCJ0]

The demo shows:

  1. Calling the agent with a complex request ("fix the branding and call me back")
  2. Agent reads back a 7-step plan
  3. Hanging up while it works
  4. Getting called back 10 minutes later with a detailed summary
  5. Memory persistence across calls ("Do you remember my name?")
  6. Live log reading via Render MCP

Try It Yourself

The code is open source:

git clone https://github.com/Designedforusers/RingFra.git
cd RingFra
python -m venv venv && source venv/bin/activate
pip install -e .
Enter fullscreen mode Exit fullscreen mode

You'll need accounts with:

  • Twilio - Phone/SMS ($15 trial credit)
  • Deepgram - Speech-to-text (free tier)
  • Cartesia - Text-to-speech (free tier)
  • Anthropic - Claude API
  • Render - Hosting + MCP

Full setup instructions are in the repo's TUTORIAL.md:
https://github.com/Designedforusers/RingFra/blob/main/docs/TUTORIAL.md

Production stats:

  • P50 response latency: 3.3s
  • Cost per call: ~$0.08-0.15/min

What's Next

This is just the beginning of what's possible with proactive AI agents:

  • Monitoring alerts: Agent notices high CPU, investigates, fixes, calls you with summary
  • PR review calls: "Call me when the PR has comments"
  • Incident response: Automatic escalation through phone tree
  • Scheduled check-ins: "Call me every morning with deployment status"

The core insight is that AI shouldn't just wait for you. It should work in the background and reach out when something matters.


Built on Render. Deploy your own: https://github.com/Designedforusers/RingFra

Top comments (0)