Every AI voice tutorial on the internet starts with Python.
Python is great. But Go powers a huge chunk of production infrastructure — microservices, APIs, cloud-native backends. If your team writes Go, you should not have to context-switch into Python just to add voice calling to your AI agent.
This tutorial shows you how to build a working AI voice bot in Go, end to end, using the VoIPBin Go SDK.
The Problem with AI Voice Today (for Go Developers)
Here is what most Go developers face when they want to add voice to their AI agent:
- Find a VoIP provider → SIP credentials, codec negotiation, RTP handling
- Set up an audio pipeline → STT, buffering, VAD (voice activity detection)
- Wire in TTS → audio encoding, streaming to the caller
- Keep latency low enough to feel like a real conversation
None of this is your application logic. It is infrastructure. And it is the reason most Go developers give up and reach for a managed service that only has a Python SDK.
VoIPBin handles all of the above. Your Go code just handles webhooks and returns JSON.
What You Will Build
A simple AI voice receptionist that:
- Receives an inbound call
- Greets the caller with TTS
- Listens for their question
- Responds with a smart answer
- Ends the call cleanly
All in Go. Under 100 lines of application logic.
Prerequisites
- Go 1.21+
- A public webhook URL (use ngrok for local dev)
- A VoIPBin account (sign up via API — no OTP, instant token)
Step 1: Sign Up and Get Your Token
curl -s -X POST "https://api.voipbin.net/v1.0/auth/signup" \
-H "Content-Type: application/json" \
-d '{
"name": "your-name",
"email": "you@example.com",
"password": "yourpassword"
}'
You get back:
{
"accesskey": {
"token": "YOUR_API_TOKEN"
}
}
Save that token. That is your auth header going forward.
Step 2: Install the SDK
go get github.com/voipbin/voipbin-go
This gives you a fully typed client for the VoIPBin API — calls, flows, agents, numbers.
Step 3: Write Your Webhook Handler
When VoIPBin receives a call, it sends a POST request to your webhook. Your webhook returns a list of actions — speak, listen, transfer, hang up. VoIPBin executes them.
Create main.go:
package main
import (
"encoding/json"
"fmt"
"log"
"net/http"
)
// VoIPBin sends this when a call event occurs
type CallEvent struct {
CallID string `json:"call_id"`
Type string `json:"type"`
Transcript string `json:"transcript,omitempty"`
}
// We return a list of actions
type Action struct {
Type string `json:"type"`
Text string `json:"text,omitempty"`
}
func webhookHandler(w http.ResponseWriter, r *http.Request) {
var event CallEvent
if err := json.NewDecoder(r.Body).Decode(&event); err != nil {
http.Error(w, "bad request", http.StatusBadRequest)
return
}
log.Printf("event: type=%s call_id=%s transcript=%q",
event.Type, event.CallID, event.Transcript)
var actions []Action
switch event.Type {
case "call.started":
// Greet the caller
actions = []Action{
{Type: "talk", Text: "Hello! You have reached our AI assistant. How can I help you today?"},
{Type: "listen"},
}
case "call.input":
// Caller said something — respond based on transcript
reply := getAIReply(event.Transcript)
actions = []Action{
{Type: "talk", Text: reply},
{Type: "listen"},
}
default:
// End anything we do not handle
actions = []Action{
{Type: "hangup"},
}
}
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(actions)
}
func getAIReply(input string) string {
// Plug in your LLM here — OpenAI, Anthropic, Gemini, local model
// For this tutorial, we use a simple static response
if input == "" {
return "Sorry, I did not catch that. Could you repeat?"
}
return fmt.Sprintf("You said: %s. Let me look that up for you — our team will follow up shortly!", input)
}
func main() {
http.HandleFunc("/webhook", webhookHandler)
log.Println("Webhook listening on :8080")
log.Fatal(http.ListenAndServe(":8080", nil))
}
Run it:
go run main.go
In another terminal, expose it:
ngrok http 8080
Copy the ngrok URL — you will use it in the next step.
Step 4: Register Your Webhook and Make a Call
With the VoIPBin Go SDK, create a helper script to place a test call:
package main
import (
"context"
"fmt"
"log"
voipbin "github.com/voipbin/voipbin-go"
)
func main() {
client := voipbin.NewClient("YOUR_API_TOKEN")
// Place an outbound call to your own number to test
call, err := client.Calls.Create(context.Background(), voipbin.CreateCallRequest{
Source: "voipbin",
Destinations: []voipbin.Destination{
{Type: "tel", Target: "+1234567890"}, // your number
},
WebhookURL: "https://your-ngrok-id.ngrok.io/webhook",
})
if err != nil {
log.Fatal(err)
}
fmt.Printf("Call started: %s\n", call.ID)
}
Your phone rings. You answer. Your Go webhook handles the conversation.
Step 5: Wire In a Real LLM
Replace the getAIReply stub with an actual LLM call. Here is the OpenAI pattern:
import (
"context"
"github.com/sashabaranov/go-openai"
)
func getAIReply(input string) string {
client := openai.NewClient("OPENAI_API_KEY")
resp, err := client.CreateChatCompletion(
context.Background(),
openai.ChatCompletionRequest{
Model: openai.GPT4oMini,
Messages: []openai.ChatCompletionMessage{
{
Role: openai.ChatMessageRoleSystem,
Content: "You are a friendly customer support agent. Keep responses under 30 words for voice.",
},
{
Role: openai.ChatMessageRoleUser,
Content: input,
},
},
},
)
if err != nil {
return "I am having trouble right now. Please try again."
}
return resp.Choices[0].Message.Content
}
Two things to keep in mind for voice LLM responses:
- Keep it short. Nobody wants to listen to a paragraph read aloud. Aim for 1-2 sentences.
- Avoid markdown. TTS engines will literally say "asterisk asterisk bold asterisk asterisk". Write plain prose.
What VoIPBin Is Handling For You
While your Go code is happily processing HTTP requests, VoIPBin is doing the heavy lifting:
| Task | VoIPBin handles it |
|---|---|
| SIP signaling | ✅ |
| RTP media stream | ✅ |
| Speech-to-Text (STT) | ✅ |
| Text-to-Speech (TTS) | ✅ |
| DTMF detection | ✅ |
| Call recording | ✅ |
| Number provisioning | ✅ |
Your code is just an HTTP server. Which means everything you already know about Go servers applies: middleware, structured logging, graceful shutdown, horizontal scaling. No new mental model required.
Going to Production
A few things to harden before you ship:
Verify webhook authenticity:
VoIPBin signs webhook payloads. Validate the signature so random clients cannot trigger your bot.
Keep conversation state:
The webhook is stateless by default. For multi-turn conversations, store state in Redis or a DB keyed by call_id.
var sessions = sync.Map{} // call_id -> []Message
func getHistory(callID string) []Message {
if v, ok := sessions.Load(callID); ok {
return v.([]Message)
}
return nil
}
Handle timeouts:
LLM calls can be slow. If your webhook takes too long, VoIPBin plays a hold message. Add a context deadline:
ctx, cancel := context.WithTimeout(r.Context(), 3*time.Second)
defer cancel()
What You Get
By the end of this tutorial you have:
- A production-ready webhook pattern for Go AI voice bots
- STT, TTS, and RTP off your plate entirely
- A codebase that is just HTTP handlers — testable, deployable, scalable
- A starting point you can extend with any LLM, any business logic
Go is an excellent language for this. The concurrency model handles parallel calls cleanly. The binary deployment is simple. The type system keeps webhook parsing honest.
The only missing piece was a VoIP layer that speaks Go. Now you have one.
Resources
-
VoIPBin Go SDK:
go get github.com/voipbin/voipbin-go - VoIPBin API Docs
- Sign up — instant API token, no credit card required to test
If you build something with this, drop a comment below. Always curious what people are actually shipping.
Top comments (0)