DTMF is one of the most overlooked configuration details in Voice AI deployments. It only becomes visible when a caller presses a key - to enter a PIN, navigate a menu, or confirm a choice - and nothing happens. No error. No feedback. The call continues as if the keypress never occurred.
I have debugged DTMF failures on three separate enterprise deployments. In every case the root cause was the same: the SIP trunk and the Voice AI platform were configured for different DTMF transmission modes and neither side flagged the mismatch.
This is the complete guide I wish I had before the first of those incidents.
What DTMF is and why Voice AI needs to handle it
DTMF stands for Dual-Tone Multi-Frequency. When you press a key on a telephone keypad, your phone generates a unique combination of two audio tones. Pressing 1 generates 697 Hz + 1209 Hz. Pressing # generates 941 Hz + 1477 Hz.
Voice AI systems need DTMF for: PIN entry and account verification, legacy IVR menu navigation, payment card collection via keypad (to avoid STT capturing card numbers), and any structured numerical input.
The 3 DTMF transmission modes
Mode 1 - In-band DTMF ❌ Avoid with modern codecs
Tones sent inside the RTP audio stream. The problem: G.729 and Opus compress audio in ways that distort or destroy the precise frequency combinations DTMF relies on. Silent failures. Use only with G.711 PCMU.
Mode 2 - RFC 2833 / RFC 4733 ✅ Use this in 2026
DTMF events sent as separate RTP packets, independent of the audio stream. Not subject to codec compression. The industry standard supported by Twilio, Plivo, and all modern SIP carriers.
Declared in SDP as:
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
Mode 3 - SIP INFO ⚠️ Legacy PBX only
DTMF sent via the SIP signalling channel. Used by older Cisco and Avaya PBX systems. If your enterprise client has a legacy PBX, confirm which mode it sends before configuring your Voice AI platform.
The most common misconfiguration
Platform configured for RFC 2833. Legacy carrier defaults to in-band. The SDP exchanges cleanly. Call connects. Keypad presses produce no response. No error in the logs.
This is the failure pattern that causes 34% call abandonment on PIN entry steps - as it did on one of my financial services deployments.
How to configure correctly
Step 1 - Confirm your carrier's DTMF mode
- Twilio: RFC 2833 default (verify in SIP trunk → Voice → DTMF Type)
- Plivo: RFC 2833 default
- Vonage: both supported — confirm with account team
- Cisco/Avaya legacy PBX: likely SIP INFO
Step 2 - Match the mode in your Voice AI platform
In Vapi: Phone number settings → SIP Configuration → DTMF Mode → RFC 2833. Both sides must match. Consistency matters more than which mode you choose.
Step 3 - Use G.711 PCMU on DTMF-critical routes
Even with RFC 2833 configured, G.729/Opus on the route introduces in-band fallback risk. Set PCMU as the preferred codec for any call flow that requires keypad input.
Step 4 - Handle DTMF in your system prompt
DTMF events are passed to the AI layer as structured input. Define explicitly: how many digits to collect, timeout behaviour, and what to do on invalid input.
The 2-minute test before go-live
Call your test number from a real mobile phone on the
real production carrier. Press 1 when the AI asks for
keypad input.
✅ AI responds = DTMF working
❌ AI repeats the prompt = mode mismatch to fix
Softphones on your office network default to RFC 2833 and
will pass this test even when the production carrier sends
in-band. Always test from the real device.
Carrier reference table
| Provider | Default mode | Notes |
|---|---|---|
| Twilio | RFC 2833 | Configurable in SIP trunk settings |
| Plivo | RFC 2833 | Reliable on US/UK routes |
| Vonage | RFC 2833 / SIP INFO | Confirm with account team |
| Cisco PBX (legacy) | SIP INFO | Configure RFC 2833 in dial-peer |
| Avaya PBX (legacy) | In-band or SIP INFO | Test on real hardware |
| Vapi | RFC 2833 | Configurable per phone number |
Pre-go-live DTMF checklist
- [ ] Test from real mobile on production carrier
- [ ] Test all 12 keys: 0–9, *, #
- [ ] Test rapid key entry (6 digits quickly)
- [ ] Test DTMF while AI is speaking (barge-in)
- [ ] Test invalid input — graceful reprompt?
- [ ] Test timeout — AI reprompts after 10 seconds?
- [ ] Verify telephone-event in SDP via SIP trace
- [ ] Test on two different mobile carriers
Full guide with architecture diagrams and more context:
https://www.voiceaipm.com/2026/04/how-voice-ai-handles-dtmf-complete-guide.html
I write weekly about Voice AI and SIP telephony at
voiceaipm.com
Top comments (0)