\`html
DTMF Hand-Raise System
wilio Conference Call
Participants
5–7 callers + 1 host
System Feasibility
✓ Still Fully Feasible
01
The Core TwiML Constraint corrected
02
Verified Working Patterns POC
03
Alternative 1: Media Streams + Goertzel
04
Alternative 2: Twilio Flex / TaskRouter
05
Alternative 3: Twilio Sync State
06
Alternative 4: Conference Hold + Gather
07
Alternative 5: Dual-Channel (Phone + Web)
08
Decision Matrix & Recommendation
01
The Core TwiML Constraint — What V1 Got Wrong
❌ Root Problem Twilio’s TwiML specification does NOT allow DTMF detection while a caller is inside a <Dial><Conference>. Any keypad digits pressed by a participant are transmitted as audio tones to the conference room — no server-side webhook fires. There is no native event for this.
Why the V1 Pattern Fails
The original document proposed nesting <Conference> inside <Gather>. This is structurally invalid. The Twilio TwiML schema enforces strict parent-child rules:
| TwiML Verb | Valid Children | Notes |
|---|---|---|
<Gather> |
<Say>, <Play>, <Pause>
|
Collects DTMF or speech input |
<Dial> |
<Conference>, <Number>, <Client>, <Queue>, <Sip>
|
<Conference> is a noun inside Dial |
<Conference> |
— | No children. Cannot be inside Gather. |
`
action="/dtmf-handler"
timeout="0"
numDigits="2">
<!-- Conference CANNOT be a
child of Gather -->
MainRoom
Press *1 to raise hand.
action="/dtmf-handler"
timeout="3"
numDigits="2">
<!-- Only Say/Play/Pause allowed -->
Press now or hold.
MainRoom
`
SDK Error Encountered in Practice
The Twilio Node.js SDK correctly enforces this schema. Calling .conference() on a Gather or VoiceResponse object throws immediately:
`
// These both throw TypeError at runtime
twiml.conference('MainRoom', { muted: true }); // conference() doesn't exist on VoiceResponse
gather.conference('MainRoom', { muted: true }); // conference() doesn't exist on Gather
// TypeError: twiml.conference is not a function
// Correct: conference() only exists on Dial
const dial = twiml.dial();
dial.conference({ muted: true, statusCallback: '...', ... }, 'MainRoom');
`
Additional SDK Issue: “unmute” is Not a Valid Event
⚠️ statusCallbackEvent Correction The statusCallbackEvent parameter does NOT include an “unmute” event. Valid values are: start, end, join, leave, mute, hold, modify, speaker, announcement. Twilio fires the participant-mute event for both mute and unmute actions. The Muted field in the webhook body (‘true’ or ‘false’) tells you which occurred.
`
// Correct status callback handler
app.post('/webhooks/conference', (req, res) => {
const { StatusCallbackEvent, CallSid, Muted } = req.body;
if (StatusCallbackEvent === 'participant-mute') {
const isMuted = Muted === 'true'; // "true" or "false" as strings
// isMuted = true → participant was muted
// isMuted = false → participant was unmuted
updateParticipantState(CallSid, { muted: isMuted });
broadcastToAdmins({ type: isMuted ? 'participant_muted' : 'participant_unmuted', callSid: CallSid });
}
res.sendStatus(200);
});
`
02
Verified Working Patterns — The POC Implementation
The POC combines two approaches that work within Twilio’s actual TwiML constraints. Together they cover the hand-raise use case without any additional Twilio services.
Pattern A — Used in POC
Gather-Before-Conference
Participant gets a short <Gather> window before joining the conference. If they press *1 during this window, hand-raise is registered before entry.
✓ Zero audio interruption ⚡ One-time only
Pattern B — Used in POC
REST API Call Redirect
Host triggers mid-conference DTMF prompts via the dashboard. The backend calls twilioClient.calls(callSid).update({ url: gatherUrl }), temporarily pulling the participant out to collect a keypress, then returning them.
✓ Mid-call capable ⚡ Host-initiated
Pattern A — Full Code: Gather-Before-Conference
`
// Inbound call → welcome + DTMF window → then conference
app.post('/voice/incoming', (req, res) => {
const twiml = new VoiceResponse();
twiml.say({ voice: 'Polly.Joanna' },
'Welcome. You are joining the conference muted. ' +
'Press star 1 to raise your hand before entering.');
// Gather window — participant can press *1 NOW
const gather = twiml.gather({
input: 'dtmf',
action: '/voice/pre-join-dtmf',
timeout: 4, // 4 seconds to press a key
numDigits: 2,
finishOnKey: '',
});
gather.say({ voice: 'Polly.Joanna' }, 'Press star 1 now, or hold to join.');
// Fall-through: no key pressed → enter conference muted
const dial = twiml.dial();
dial.conference({
muted: true,
startConferenceOnEnter: false,
endConferenceOnExit: false,
statusCallback: ${BASE_URL}/webhooks/conference,
statusCallbackEvent: ['start', 'end', 'join', 'leave', 'mute', 'hold'],
waitUrl: ${BASE_URL}/hold-music,
}, 'MainRoom');
res.type('text/xml').send(twiml.toString());
});
// Handle pre-join keypress
app.post('/voice/pre-join-dtmf', async (req, res) => {
const { Digits, CallSid } = req.body;
const twiml = new VoiceResponse();
if (Digits === '*1') {
await markHandRaised(CallSid); // store in state, push to dashboard WS
twiml.say({ voice: 'Polly.Joanna' }, 'Your hand has been raised. Joining now.');
}
// Either way, enter the conference
const dial = twiml.dial();
dial.conference({ muted: true, ... }, 'MainRoom');
res.type('text/xml').send(twiml.toString());
});
`
Pattern B — Full Code: REST Redirect for Mid-Call DTMF
The dashboard shows a “Prompt Hand Raise” button per participant. When clicked, the backend redirects that caller’s active call to a gather TwiML page, collects their response, then returns them to the conference.
${BASE_URL}/voice/gather-hand-raise`,
// Dashboard API: host triggers DTMF prompt for a specific participant
app.post('/api/prompt-hand-raise/:callSid', authenticateHost, async (req, res) => {
const { callSid } = req.params;
try {
// Redirect their active call to a gather prompt
await twilioClient.calls(callSid).update({
url:
method: 'POST',
});
res.json({ success: true });
} catch (err) {
res.status(500).json({ error: err.message });
}
});
// The gather prompt TwiML page
app.post('/voice/gather-hand-raise', (req, res) => {
const twiml = new VoiceResponse();
const gather = twiml.gather({
input: 'dtmf',
action: '/voice/mid-call-dtmf',
timeout: 6,
numDigits: 2,
});
gather.say({ voice: 'Polly.Joanna' },
'You have been prompted by the host. Press star 1 to raise your hand, ' +
'or star 2 to decline. Or stay silent to return.');
// Fall-through: no press → rejoin conference
const dial = twiml.dial();
dial.conference({ muted: true }, 'MainRoom');
res.type('text/xml').send(twiml.toString());
});
// Process their response
app.post('/voice/mid-call-dtmf', async (req, res) => {
const { Digits, CallSid } = req.body;
const twiml = new VoiceResponse();
if (Digits === '*1') {
await markHandRaised(CallSid);
twiml.say({ voice: 'Polly.Joanna' }, 'Hand raised. Returning you to the conference.');
} else {
twiml.say({ voice: 'Polly.Joanna' }, 'Returning you to the conference.');
}
const dial = twiml.dial();
dial.conference({ muted: true }, 'MainRoom');
res.type('text/xml').send(twiml.toString());
});
`
ℹ️ Trade-off of Pattern B The participant is briefly disconnected from conference audio (~3–5 seconds) while the gather prompt plays. This is host-initiated — the participant cannot trigger it themselves from inside the conference. Pattern B is a usable POC solution, but the production path should upgrade to Media Streams (Section 03) for true participant-initiated hand-raises.
03
Alternative 1 — Media Streams + Server-Side DTMF Detection
Twilio Media Streams + Goertzel Algorithm
Raw audio piped to your WebSocket server — detect DTMF tones in real-time, participant never leaves the conference
Production Grade No Audio Interruption Participant-Initiated
Twilio Media Streams sends a raw audio stream (8kHz mulaw) from each participant’s call to a WebSocket endpoint on your server. You run a DTMF tone detector (Goertzel algorithm) on this audio stream. When a key is pressed, the tone is detected server-side without ever pulling the participant out of the conference.
How it Works
Participant presses *1
Phone keypad → Twilio streams audio 8kHz mulaw via WS → Your WS server Goertzel detector → Hand-raise event Dashboard notified
TwiML — Enable Media Stream
`
<!-- Stream audio from this call to your WebSocket server -->
track="inbound_track" />
Joining the conference. Press star 1 any time to raise your hand.
statusCallback="..."
statusCallbackEvent="start end join leave mute hold">
MainRoom
`
Server-Side DTMF Detector (Node.js)
`
import WebSocket from 'ws';
import { GoertzelDTMFDetector } from './dtmf-detector'; // or npm: node-dtmf
const mediaWss = new WebSocket.Server({ path: '/media-stream', server });
mediaWss.on('connection', (ws) => {
let callSid: string;
const detector = new GoertzelDTMFDetector({ sampleRate: 8000 });
ws.on('message', (raw: string) => {
const msg = JSON.parse(raw);
if (msg.event === 'start') {
callSid = msg.start.callSid; // map stream → call SID
}
if (msg.event === 'media') {
// Decode base64 mulaw audio payload
const audio = Buffer.from(msg.media.payload, 'base64');
// Run Goertzel DTMF detector on this audio chunk
const digit = detector.detect(audio);
if (digit) {
handleDTMFDigit(callSid, digit);
}
}
if (msg.event === 'stop') {
detector.reset();
}
});
});
async function handleDTMFDigit(callSid: string, digit: string) {
// Debounce — same digit repeated quickly = one press
if (digit === '' || digit === '1') {
// You'll receive '' then '1' as separate detections
// Buffer them with a short debounce window
bufferDigit(callSid, digit, async (fullDigit) => {
if (fullDigit === '*1') {
await markHandRaised(callSid); // update state
broadcastToAdmins({ // push to dashboard
type: 'hand_raised',
callSid,
callerId: getCallerInfo(callSid).callerId,
});
}
});
}
}
`
DTMF Detection Library Options
| Library | Language | Notes |
|---|---|---|
| node-dtmf | Node.js | Simple Goertzel implementation for mulaw audio |
| goertzel-js | Node.js | Low-level Goertzel filter, needs DTMF freq mapping |
| dtmf-decoder | Python | Good if your backend is Python/FastAPI |
| librosa + custom | Python | Overkill but very accurate |
Advantages
- Participant stays in conference — zero audio interruption
- True participant-initiated hand-raises at any time
- No additional Twilio cost — Media Streams included in Voice
- Works with any keypad digit combination you define
Challenges
- Requires a WebSocket server to receive audio
- Need DTMF detection library + Goertzel implementation
- Higher server resource usage (audio processing per participant)
- Multi-digit debouncing logic needed (*1 = two signals)
`\
Top comments (0)