DEV Community

Albin Manoj
Albin Manoj

Posted on

HTML Generation

\`html

DTMF Hand-Raise System

wilio Conference Call

Participants

5–7 callers + 1 host

System Feasibility

✓ Still Fully Feasible

01

The Core TwiML Constraint corrected

02

Verified Working Patterns POC

03

Alternative 1: Media Streams + Goertzel

04

Alternative 2: Twilio Flex / TaskRouter

05

Alternative 3: Twilio Sync State

06

Alternative 4: Conference Hold + Gather

07

Alternative 5: Dual-Channel (Phone + Web)

08

Decision Matrix & Recommendation

01

The Core TwiML Constraint — What V1 Got Wrong

❌ Root Problem Twilio’s TwiML specification does NOT allow DTMF detection while a caller is inside a <Dial><Conference>. Any keypad digits pressed by a participant are transmitted as audio tones to the conference room — no server-side webhook fires. There is no native event for this.

Why the V1 Pattern Fails

The original document proposed nesting <Conference> inside <Gather>. This is structurally invalid. The Twilio TwiML schema enforces strict parent-child rules:

TwiML Verb Valid Children Notes
<Gather> <Say>, <Play>, <Pause> Collects DTMF or speech input
<Dial> <Conference>, <Number>, <Client>, <Queue>, <Sip> <Conference> is a noun inside Dial
<Conference> No children. Cannot be inside Gather.

`


action="/dtmf-handler"
timeout="0"
numDigits="2">
<!-- Conference CANNOT be a
child of Gather -->

MainRoom



Press *1 to raise hand.
action="/dtmf-handler"
timeout="3"
numDigits="2">
<!-- Only Say/Play/Pause allowed -->
Press now or hold.



MainRoom


`

SDK Error Encountered in Practice

The Twilio Node.js SDK correctly enforces this schema. Calling .conference() on a Gather or VoiceResponse object throws immediately:

`
// These both throw TypeError at runtime
twiml.conference('MainRoom', { muted: true }); // conference() doesn't exist on VoiceResponse
gather.conference('MainRoom', { muted: true }); // conference() doesn't exist on Gather

// TypeError: twiml.conference is not a function
// Correct: conference() only exists on Dial
const dial = twiml.dial();
dial.conference({ muted: true, statusCallback: '...', ... }, 'MainRoom');

`

Additional SDK Issue: “unmute” is Not a Valid Event

⚠️ statusCallbackEvent Correction The statusCallbackEvent parameter does NOT include an “unmute” event. Valid values are: start, end, join, leave, mute, hold, modify, speaker, announcement. Twilio fires the participant-mute event for both mute and unmute actions. The Muted field in the webhook body (‘true’ or ‘false’) tells you which occurred.

`
// Correct status callback handler
app.post('/webhooks/conference', (req, res) => {
const { StatusCallbackEvent, CallSid, Muted } = req.body;

if (StatusCallbackEvent === 'participant-mute') {
const isMuted = Muted === 'true'; // "true" or "false" as strings
// isMuted = true → participant was muted
// isMuted = false → participant was unmuted
updateParticipantState(CallSid, { muted: isMuted });
broadcastToAdmins({ type: isMuted ? 'participant_muted' : 'participant_unmuted', callSid: CallSid });
}

res.sendStatus(200);
});

`

02

Verified Working Patterns — The POC Implementation

The POC combines two approaches that work within Twilio’s actual TwiML constraints. Together they cover the hand-raise use case without any additional Twilio services.

Pattern A — Used in POC

Gather-Before-Conference

Participant gets a short <Gather> window before joining the conference. If they press *1 during this window, hand-raise is registered before entry.

✓ Zero audio interruption ⚡ One-time only

Pattern B — Used in POC

REST API Call Redirect

Host triggers mid-conference DTMF prompts via the dashboard. The backend calls twilioClient.calls(callSid).update({ url: gatherUrl }), temporarily pulling the participant out to collect a keypress, then returning them.

✓ Mid-call capable ⚡ Host-initiated

Pattern A — Full Code: Gather-Before-Conference

`
// Inbound call → welcome + DTMF window → then conference
app.post('/voice/incoming', (req, res) => {
const twiml = new VoiceResponse();

twiml.say({ voice: 'Polly.Joanna' },
'Welcome. You are joining the conference muted. ' +
'Press star 1 to raise your hand before entering.');

// Gather window — participant can press *1 NOW
const gather = twiml.gather({
input: 'dtmf',
action: '/voice/pre-join-dtmf',
timeout: 4, // 4 seconds to press a key
numDigits: 2,
finishOnKey: '',
});
gather.say({ voice: 'Polly.Joanna' }, 'Press star 1 now, or hold to join.');

// Fall-through: no key pressed → enter conference muted
const dial = twiml.dial();
dial.conference({
muted: true,
startConferenceOnEnter: false,
endConferenceOnExit: false,
statusCallback: ${BASE_URL}/webhooks/conference,
statusCallbackEvent: ['start', 'end', 'join', 'leave', 'mute', 'hold'],
waitUrl: ${BASE_URL}/hold-music,
}, 'MainRoom');

res.type('text/xml').send(twiml.toString());
});

// Handle pre-join keypress
app.post('/voice/pre-join-dtmf', async (req, res) => {
const { Digits, CallSid } = req.body;
const twiml = new VoiceResponse();

if (Digits === '*1') {
await markHandRaised(CallSid); // store in state, push to dashboard WS
twiml.say({ voice: 'Polly.Joanna' }, 'Your hand has been raised. Joining now.');
}

// Either way, enter the conference
const dial = twiml.dial();
dial.conference({ muted: true, ... }, 'MainRoom');

res.type('text/xml').send(twiml.toString());
});

`

Pattern B — Full Code: REST Redirect for Mid-Call DTMF

The dashboard shows a “Prompt Hand Raise” button per participant. When clicked, the backend redirects that caller’s active call to a gather TwiML page, collects their response, then returns them to the conference.


// Dashboard API: host triggers DTMF prompt for a specific participant
app.post('/api/prompt-hand-raise/:callSid', authenticateHost, async (req, res) => {
const { callSid } = req.params;
try {
// Redirect their active call to a gather prompt
await twilioClient.calls(callSid).update({
url:
${BASE_URL}/voice/gather-hand-raise`,
method: 'POST',
});
res.json({ success: true });
} catch (err) {
res.status(500).json({ error: err.message });
}
});

// The gather prompt TwiML page
app.post('/voice/gather-hand-raise', (req, res) => {
const twiml = new VoiceResponse();

const gather = twiml.gather({
input: 'dtmf',
action: '/voice/mid-call-dtmf',
timeout: 6,
numDigits: 2,
});
gather.say({ voice: 'Polly.Joanna' },
'You have been prompted by the host. Press star 1 to raise your hand, ' +
'or star 2 to decline. Or stay silent to return.');

// Fall-through: no press → rejoin conference
const dial = twiml.dial();
dial.conference({ muted: true }, 'MainRoom');

res.type('text/xml').send(twiml.toString());
});

// Process their response
app.post('/voice/mid-call-dtmf', async (req, res) => {
const { Digits, CallSid } = req.body;
const twiml = new VoiceResponse();

if (Digits === '*1') {
await markHandRaised(CallSid);
twiml.say({ voice: 'Polly.Joanna' }, 'Hand raised. Returning you to the conference.');
} else {
twiml.say({ voice: 'Polly.Joanna' }, 'Returning you to the conference.');
}

const dial = twiml.dial();
dial.conference({ muted: true }, 'MainRoom');
res.type('text/xml').send(twiml.toString());
});

`

ℹ️ Trade-off of Pattern B The participant is briefly disconnected from conference audio (~3–5 seconds) while the gather prompt plays. This is host-initiated — the participant cannot trigger it themselves from inside the conference. Pattern B is a usable POC solution, but the production path should upgrade to Media Streams (Section 03) for true participant-initiated hand-raises.

03

Alternative 1 — Media Streams + Server-Side DTMF Detection

Twilio Media Streams + Goertzel Algorithm

Raw audio piped to your WebSocket server — detect DTMF tones in real-time, participant never leaves the conference

Production Grade No Audio Interruption Participant-Initiated

Twilio Media Streams sends a raw audio stream (8kHz mulaw) from each participant’s call to a WebSocket endpoint on your server. You run a DTMF tone detector (Goertzel algorithm) on this audio stream. When a key is pressed, the tone is detected server-side without ever pulling the participant out of the conference.

How it Works

Participant presses *1

Phone keypad → Twilio streams audio 8kHz mulaw via WS → Your WS server Goertzel detector → Hand-raise event Dashboard notified

TwiML — Enable Media Stream

`


<!-- Stream audio from this call to your WebSocket server -->
track="inbound_track" />

Joining the conference. Press star 1 any time to raise your hand.

statusCallback="..."
statusCallbackEvent="start end join leave mute hold">
MainRoom


`

Server-Side DTMF Detector (Node.js)

`
import WebSocket from 'ws';
import { GoertzelDTMFDetector } from './dtmf-detector'; // or npm: node-dtmf

const mediaWss = new WebSocket.Server({ path: '/media-stream', server });

mediaWss.on('connection', (ws) => {
let callSid: string;
const detector = new GoertzelDTMFDetector({ sampleRate: 8000 });

ws.on('message', (raw: string) => {
const msg = JSON.parse(raw);

if (msg.event === 'start') {
  callSid = msg.start.callSid;   // map stream → call SID
}

if (msg.event === 'media') {
  // Decode base64 mulaw audio payload
  const audio = Buffer.from(msg.media.payload, 'base64');

  // Run Goertzel DTMF detector on this audio chunk
  const digit = detector.detect(audio);

  if (digit) {
    handleDTMFDigit(callSid, digit);
  }
}

if (msg.event === 'stop') {
  detector.reset();
}
Enter fullscreen mode Exit fullscreen mode

});
});

async function handleDTMFDigit(callSid: string, digit: string) {
// Debounce — same digit repeated quickly = one press
if (digit === '' || digit === '1') {
// You'll receive '
' then '1' as separate detections
// Buffer them with a short debounce window
bufferDigit(callSid, digit, async (fullDigit) => {
if (fullDigit === '*1') {
await markHandRaised(callSid); // update state
broadcastToAdmins({ // push to dashboard
type: 'hand_raised',
callSid,
callerId: getCallerInfo(callSid).callerId,
});
}
});
}
}

`

DTMF Detection Library Options

Library Language Notes
node-dtmf Node.js Simple Goertzel implementation for mulaw audio
goertzel-js Node.js Low-level Goertzel filter, needs DTMF freq mapping
dtmf-decoder Python Good if your backend is Python/FastAPI
librosa + custom Python Overkill but very accurate

Advantages

  • Participant stays in conference — zero audio interruption
  • True participant-initiated hand-raises at any time
  • No additional Twilio cost — Media Streams included in Voice
  • Works with any keypad digit combination you define

Challenges

  • Requires a WebSocket server to receive audio
  • Need DTMF detection library + Goertzel implementation
  • Higher server resource usage (audio processing per participant)
  • Multi-digit debouncing logic needed (*1 = two signals)

`\

Top comments (0)