DEV Community

voipbin
voipbin

Posted on

Build an AI Voice Agent in Go: Complete Tutorial with VoIPBin SDK

Every AI voice tutorial on the internet starts with Python.

Python is great. But Go powers a huge chunk of production infrastructure — microservices, APIs, cloud-native backends. If your team writes Go, you should not have to context-switch into Python just to add voice calling to your AI agent.

This tutorial shows you how to build a working AI voice bot in Go, end to end, using the VoIPBin Go SDK.


The Problem with AI Voice Today (for Go Developers)

Here is what most Go developers face when they want to add voice to their AI agent:

  1. Find a VoIP provider → SIP credentials, codec negotiation, RTP handling
  2. Set up an audio pipeline → STT, buffering, VAD (voice activity detection)
  3. Wire in TTS → audio encoding, streaming to the caller
  4. Keep latency low enough to feel like a real conversation

None of this is your application logic. It is infrastructure. And it is the reason most Go developers give up and reach for a managed service that only has a Python SDK.

VoIPBin handles all of the above. Your Go code just handles webhooks and returns JSON.


What You Will Build

A simple AI voice receptionist that:

  • Receives an inbound call
  • Greets the caller with TTS
  • Listens for their question
  • Responds with a smart answer
  • Ends the call cleanly

All in Go. Under 100 lines of application logic.


Prerequisites

  • Go 1.21+
  • A public webhook URL (use ngrok for local dev)
  • A VoIPBin account (sign up via API — no OTP, instant token)

Step 1: Sign Up and Get Your Token

curl -s -X POST "https://api.voipbin.net/v1.0/auth/signup" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "your-name",
    "email": "you@example.com",
    "password": "yourpassword"
  }'
Enter fullscreen mode Exit fullscreen mode

You get back:

{
  "accesskey": {
    "token": "YOUR_API_TOKEN"
  }
}
Enter fullscreen mode Exit fullscreen mode

Save that token. That is your auth header going forward.


Step 2: Install the SDK

go get github.com/voipbin/voipbin-go
Enter fullscreen mode Exit fullscreen mode

This gives you a fully typed client for the VoIPBin API — calls, flows, agents, numbers.


Step 3: Write Your Webhook Handler

When VoIPBin receives a call, it sends a POST request to your webhook. Your webhook returns a list of actions — speak, listen, transfer, hang up. VoIPBin executes them.

Create main.go:

package main

import (
    "encoding/json"
    "fmt"
    "log"
    "net/http"
)

// VoIPBin sends this when a call event occurs
type CallEvent struct {
    CallID    string `json:"call_id"`
    Type      string `json:"type"`
    Transcript string `json:"transcript,omitempty"`
}

// We return a list of actions
type Action struct {
    Type string `json:"type"`
    Text string `json:"text,omitempty"`
}

func webhookHandler(w http.ResponseWriter, r *http.Request) {
    var event CallEvent
    if err := json.NewDecoder(r.Body).Decode(&event); err != nil {
        http.Error(w, "bad request", http.StatusBadRequest)
        return
    }

    log.Printf("event: type=%s call_id=%s transcript=%q",
        event.Type, event.CallID, event.Transcript)

    var actions []Action

    switch event.Type {
    case "call.started":
        // Greet the caller
        actions = []Action{
            {Type: "talk", Text: "Hello! You have reached our AI assistant. How can I help you today?"},
            {Type: "listen"},
        }

    case "call.input":
        // Caller said something — respond based on transcript
        reply := getAIReply(event.Transcript)
        actions = []Action{
            {Type: "talk", Text: reply},
            {Type: "listen"},
        }

    default:
        // End anything we do not handle
        actions = []Action{
            {Type: "hangup"},
        }
    }

    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(actions)
}

func getAIReply(input string) string {
    // Plug in your LLM here — OpenAI, Anthropic, Gemini, local model
    // For this tutorial, we use a simple static response
    if input == "" {
        return "Sorry, I did not catch that. Could you repeat?"
    }
    return fmt.Sprintf("You said: %s. Let me look that up for you — our team will follow up shortly!", input)
}

func main() {
    http.HandleFunc("/webhook", webhookHandler)
    log.Println("Webhook listening on :8080")
    log.Fatal(http.ListenAndServe(":8080", nil))
}
Enter fullscreen mode Exit fullscreen mode

Run it:

go run main.go
Enter fullscreen mode Exit fullscreen mode

In another terminal, expose it:

ngrok http 8080
Enter fullscreen mode Exit fullscreen mode

Copy the ngrok URL — you will use it in the next step.


Step 4: Register Your Webhook and Make a Call

With the VoIPBin Go SDK, create a helper script to place a test call:

package main

import (
    "context"
    "fmt"
    "log"

    voipbin "github.com/voipbin/voipbin-go"
)

func main() {
    client := voipbin.NewClient("YOUR_API_TOKEN")

    // Place an outbound call to your own number to test
    call, err := client.Calls.Create(context.Background(), voipbin.CreateCallRequest{
        Source: "voipbin",
        Destinations: []voipbin.Destination{
            {Type: "tel", Target: "+1234567890"}, // your number
        },
        WebhookURL: "https://your-ngrok-id.ngrok.io/webhook",
    })
    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("Call started: %s\n", call.ID)
}
Enter fullscreen mode Exit fullscreen mode

Your phone rings. You answer. Your Go webhook handles the conversation.


Step 5: Wire In a Real LLM

Replace the getAIReply stub with an actual LLM call. Here is the OpenAI pattern:

import (
    "context"
    "github.com/sashabaranov/go-openai"
)

func getAIReply(input string) string {
    client := openai.NewClient("OPENAI_API_KEY")

    resp, err := client.CreateChatCompletion(
        context.Background(),
        openai.ChatCompletionRequest{
            Model: openai.GPT4oMini,
            Messages: []openai.ChatCompletionMessage{
                {
                    Role:    openai.ChatMessageRoleSystem,
                    Content: "You are a friendly customer support agent. Keep responses under 30 words for voice.",
                },
                {
                    Role:    openai.ChatMessageRoleUser,
                    Content: input,
                },
            },
        },
    )
    if err != nil {
        return "I am having trouble right now. Please try again."
    }

    return resp.Choices[0].Message.Content
}
Enter fullscreen mode Exit fullscreen mode

Two things to keep in mind for voice LLM responses:

  • Keep it short. Nobody wants to listen to a paragraph read aloud. Aim for 1-2 sentences.
  • Avoid markdown. TTS engines will literally say "asterisk asterisk bold asterisk asterisk". Write plain prose.

What VoIPBin Is Handling For You

While your Go code is happily processing HTTP requests, VoIPBin is doing the heavy lifting:

Task VoIPBin handles it
SIP signaling
RTP media stream
Speech-to-Text (STT)
Text-to-Speech (TTS)
DTMF detection
Call recording
Number provisioning

Your code is just an HTTP server. Which means everything you already know about Go servers applies: middleware, structured logging, graceful shutdown, horizontal scaling. No new mental model required.


Going to Production

A few things to harden before you ship:

Verify webhook authenticity:
VoIPBin signs webhook payloads. Validate the signature so random clients cannot trigger your bot.

Keep conversation state:
The webhook is stateless by default. For multi-turn conversations, store state in Redis or a DB keyed by call_id.

var sessions = sync.Map{} // call_id -> []Message

func getHistory(callID string) []Message {
    if v, ok := sessions.Load(callID); ok {
        return v.([]Message)
    }
    return nil
}
Enter fullscreen mode Exit fullscreen mode

Handle timeouts:
LLM calls can be slow. If your webhook takes too long, VoIPBin plays a hold message. Add a context deadline:

ctx, cancel := context.WithTimeout(r.Context(), 3*time.Second)
defer cancel()
Enter fullscreen mode Exit fullscreen mode

What You Get

By the end of this tutorial you have:

  • A production-ready webhook pattern for Go AI voice bots
  • STT, TTS, and RTP off your plate entirely
  • A codebase that is just HTTP handlers — testable, deployable, scalable
  • A starting point you can extend with any LLM, any business logic

Go is an excellent language for this. The concurrency model handles parallel calls cleanly. The binary deployment is simple. The type system keeps webhook parsing honest.

The only missing piece was a VoIP layer that speaks Go. Now you have one.


Resources

If you build something with this, drop a comment below. Always curious what people are actually shipping.

Top comments (0)