Sarah Lindauer for Stream

Posted on Dec 23 • Originally published at getstream.io

Build a Voice-Controlled GitHub Agent in Python (MCP + Vision Agents)

#github #python #ai #tutorial

Turn any GitHub repo into a voice assistant: ask about branches, open issues, create pull requests, list contributors—all via natural conversation.

Powered by OpenAI's Realtime API for low-latency voice, GitHub's Model Context Protocol (MCP) for secure repo actions, and Vision Agents for seamless orchestration.

In the demo, the agent understands spoken repo names (even when spelled out), checks branch counts, and answers follow-up questions about open PRs.

Here's how to build it yourself in under three minutes.

What You'll Build

A voice-controlled GitHub assistant that can read and act on any public (or private) repository
Supports queries like: "How many branches in getstream/vision-agents?", "List open issues", "Create a PR", "Who contributed most?"
Real-time voice interaction with natural turn-taking
Secure GitHub access via personal access token and MCP

The Stack

LLM & Voice → OpenAI Realtime API (gpt-4o-realtime-preview)
GitHub Actions → Model Context Protocol (MCP) server
Real-Time Transport → Stream WebRTC
Orchestration → Vision Agents (open-source)

Requirements (API Keys & Tokens)

You'll need:

OpenAI API key (for Realtime API voice model)
Stream API key & secret (low-latency WebRTC)
GitHub Personal Access Token (with repo scope for private repos)

Step 1: Set Up the Project

uv  init  github-voice-agent
cd  github-voice-agent
# Install Vision Agents and the OpenAI Python plugin
uv  add  "vision-agents[getstream, openai]"

Step 2: Full Working Code (main.py)

import  logging
import  os

from  dotenv  import  load_dotenv

from  vision_agents.core.agents  import  Agent,  AgentLauncher
from  vision_agents.core  import  cli
from  vision_agents.core.mcp  import  MCPServerRemote
from  vision_agents.plugins.openai.openai_realtime  import  Realtime
from  vision_agents.plugins  import  getstream
from  vision_agents.core.events  import  CallSessionParticipantJoinedEvent
from  vision_agents.core.edge.types  import  User

# Load environment variables from .env file
load_dotenv()

# Set up logging
logging.basicConfig(level=logging.INFO)
logger  =  logging.getLogger(__name__)

async  def  create_agent(**kwargs)  ->  Agent:
    """Demonstrate OpenAI Realtime with GitHub MCP server integration."""

    # Get GitHub PAT from environment
    github_pat  =  os.getenv("GITHUB_PAT")
    if  not  github_pat:
        logger.error("GITHUB_PAT environment variable not found!")
        logger.error("Please set GITHUB_PAT in your .env file or environment")
        raise  ValueError("GITHUB_PAT environment variable not found")

    # Check OpenAI API key from environment
    openai_api_key  =  os.getenv("OPENAI_API_KEY")
    if  not  openai_api_key:
        logger.error("OPENAI_API_KEY environment variable not found!")
        logger.error("Please set OPENAI_API_KEY in your .env file or environment")
        raise  ValueError("OPENAI_API_KEY environment variable not found")

    # Create GitHub MCP server
    github_server  =  MCPServerRemote(
        url="https://api.githubcopilot.com/mcp/",
        headers={"Authorization":  f"Bearer {github_pat}"},
        timeout=10.0, # Shorter connection timeout
        session_timeout=300.0,
    )

    # Create OpenAI Realtime LLM (uses OPENAI_API_KEY from environment)
    llm  =  Realtime(model="gpt-4o-realtime-preview-2024-12-17")

    # Create real edge transport and agent user
    edge  =  getstream.Edge()
    agent_user  =  User(name="GitHub AI Assistant",  id="github-agent")

    # Create agent with GitHub MCP server and Gemini Realtime LLM
    agent  =  Agent(
        edge=edge,
        llm=llm,
        agent_user=agent_user,
        instructions="You are a helpful AI assistant with access to GitHub via MCP server. You can help with GitHub operations like creating issues, managing pull requests, searching repositories, and more. Keep responses conversational and helpful. When you need to perform GitHub operations, use the available MCP tools.",
        processors=[],
        mcp_servers=[github_server],
    )
    logger.info("Agent created with OpenAI Realtime and GitHub MCP server")
    logger.info(f"GitHub server: {github_server}")

    return  agent

async  def  join_call(agent:  Agent,  call_type:  str,  call_id:  str,  **kwargs)  ->  None:
    try:
        # Set up event handler for when participants join
        @agent.subscribe
        async  def  on_participant_joined(event:  CallSessionParticipantJoinedEvent):
            # Check MCP tools after connection
            available_functions  =  agent.llm.get_available_functions()
            mcp_functions  =  [
                f  for  f  in  available_functions  if  f["name"].startswith("mcp_")
            ]
            logger.info(
                f"✅ Found {len(mcp_functions)} MCP tools available for function calling"
            )
            await  agent.say(
                f"Hello {event.participant.user.name}! I'm your GitHub AI assistant powered by OpenAI Realtime. I have access to {len(mcp_functions)} GitHub tools and can help you with repositories, issues, pull requests, and more through voice commands!"
            )

        # ensure the agent user is created
        await  agent.create_user()

        # Create a call
        call  =  await  agent.create_call(call_type,  call_id)

        # Have the agent join the call/room
        logger.info("🎤 Agent joining call...")
        with  await  agent.join(call):
            logger.info(
                "✅ Agent is now live with OpenAI Realtime! You can talk to it in the browser."
            )
            logger.info("Try asking:")
            logger.info("  - 'What repositories do I have?'")
            logger.info("  - 'Create a new issue in my repository'")
            logger.info("  - 'Search for issues with the label bug'")
            logger.info("  - 'Show me recent pull requests'")
            logger.info("")
            logger.info(
                "The agent will use OpenAI Realtime's real-time function calling to interact with GitHub!"
            )

            # Run until the call ends
            await  agent.finish()
    except  Exception  as  e:
        logger.error(f"Error with OpenAI Realtime GitHub MCP demo: {e}")
        logger.error("Make sure your GITHUB_PAT and OPENAI_API_KEY are valid")
        import  traceback
        traceback.print_exc()

    # Clean up
    await  agent.close()
    logger.info("Demo completed!")
if  __name__  ==  "__main__":
    cli(AgentLauncher(create_agent=create_agent,  join_call=join_call))

Step 3: Set Credentials & Run

export  OPENAI_API_KEY=sk-...
export  STREAM_API_KEY=...
export  STREAM_API_SECRET=...
export  GITHUB_ACCESS_TOKEN=ghp_...

uv  run  main.py

A browser window opens with a video call UI. Join the call, allow mic access, and start talking:

Agent: "Could you spell out the repository name for me?"
You: "V-I-S-I-O-N-A-G-E-N-T-S"
Agent: "The getstream/vision-agents repo has nine branches."
You: "How many open pull requests?"
Agent: "There are X open pull requests..."

Why This Stack is Powerful

Vision Agents delivers sub-second voice latency with built-in reasoning plus MCP integration in under 100 lines of code. OpenAI's Realtime API handles voice streaming (speech-to-text and text-to-speech) and turn detection. GitHub MCP provides secure, structured repo access without brittle API wrappers.

Links & Resources

Try it out yourself!

What did you make your GitHub agent do: triage issues, onboard new contributors, or something more creative? 🤖

DEV Community