DEV Community

Cover image for I built and deployed a Voice AI Agent in 30 minutes! πŸŽ‰
Anmol Baranwal
Anmol Baranwal Subscriber

Posted on

I built and deployed a Voice AI Agent in 30 minutes! πŸŽ‰

I have been experimenting with AI agents for a while now but this time, I wanted to build a Voice AI Agent. I won't lie, it does feel intimidating (if you have never built one before).

Voice AI agents are becoming more common these days so I took the chance to learn the core components with principles and understand how everything fits together.

This post covers everything I picked up: building blocks of Voice AI Agents, a step-by-step guide to building, testing and deploying one. Also listed popular platforms out there with some real-world use cases to learn from.

If you have been curious about how voice agents work (or want to build your own), this might help you get started. It took me just 30 minutes to deploy the agent to my portfolio.

portfolio live deployed ai agent


What isΒ covered?

In summary, we are going to cover these topics in detail.

  1. What is a Voice AI Agent?
  2. Popular tools and platforms for building one.
  3. A step-by-step guide to build your first voice Agent from scratch.
  4. Practical use cases with examples.

1. What is a Voice AI Agent?

You might already be familiar with AI Agents which are computer programs that can understand tasks, think for themselves and take actions on their own.

Voice AI Agents take a step further by combining speech and reasoning capabilities.

They are autonomous system that listens to your voice, understand what you are saying (using speech-to-text), respond using Large Language Models (LLMs) like GPT-4 and speak the answer back to you using a synthetic voice (text-to-speech).

There are mainly two types:

  • Inbound agents : answer calls and respond when someone reaches out.Β Β 

  • Outbound agents : proactively make calls to deliver messages, reminders or carry out tasks.

tech behind ai voice agents

Unlike traditional virtual assistants (such as Siri), Voice AI Agents can perform multi-step, complex tasks that range from:

  • answer customer phone calls

  • initiate outbound campaigns

  • provide support via a voice widget on your site

  • speak English, Arabic or any other supported languages

If you are curious about the tech behind these agents, here are two great reads:

The best part is that you don’t need to be an expert to build one. There are many tools like VoiceHub by DataQueue and Retell AI that make it super easy to create and test a working voice agent in just a few minutes.

Let’s explore some of those platforms next.


2. Popular tools and platforms for building one.

If you want to build your own Voice AI Agent, there are several frameworks and tools available to help you get started. Picking the right one depends on:

  • Language and regional support. For example, many platforms don’t handle Arabic (MENA) or Indian English well.

  • Whether you are comfortable writing code or prefer no-code platforms.

  • Whether you want custom setups (more control, takes longer) or something faster

  • Whether your focus is on mobile-based agents or web experiences.

I believe choosing the right platform is less about what's "best" and more about what fits your use case.

Here are the most popular platforms:

  • VoiceHub by DataQueue - Easiest way to build voice agents without writing code. It connects LLMs to phone calls, lets you define workflows and deploy quickly. Bonus: MENA region support is solid (unlike many others). I will be using this one in the next section.

voicehub by dataqueue

  • Rime - lets you build conversational AI apps with both voice and text. Good for more advanced voice flows, supports integrations and has a polished UI.

rime

  • Vapi – Build phone-based voice agents using LLMs and connect them to real numbers. Offers a simple API and UI for call flows, used for scheduling, Q&A bots, and hotlines.

vapi

  • Retell AI - Specializes in phone call automation. Let you create voice agents that can have real-time conversations over phone lines.

retell ai

  • LiveKit - Open source platform for real-time audio/video development. While it doesn't include AI by itself, it gives you the live voice infrastructure

livekit

  • Twilio Voice + OpenAI + ElevenLabs - A more flexible setup using Twilio for phone/audio input, GPT-4 for responses, and ElevenLabs for speech. Requires coding but gives you full control.

There are specific platforms like DeepgramΒ is recommended for highly accurate speech-to-text (STT) and ElevenLabsΒ is popular for realistic text-to-speech (TTS), generating natural-sounding voices. You can then enable your agent to make and receive phone calls through services likeΒ Twilio.

It totally depends on your use case, but I'm choosing VoiceHub to create the voice agent. It will be using ElevenLabs voices & OpenAI GPT-4o as a model under the hood.


3. A step-by-step guide to build your first voice Agent from scratch.

It's finally time to build a real agent. I will be using VoiceHub because of its fast setup, easy third-party integrations and solid support for the MENA region.

I went through the official docs, tested everything hands-on and documented the key steps, so you don’t have to get stuck in jargon.

Step 1: Sign up for the dashboard

You can sign up to visit the dashboard at voicehub.dataqueue.ai/.

sign up

Here’s what the dashboard looks like. We will walk through each part as we go.

dashboard

Click on the + New Agent button to start. You can either create a blank agent or use an existing template. I'm using an existing template to make things easier to follow.

Once created, you will land inside the agent’s workspace.

agent

There are several useful tabs here including Call logs, Phonebook, Analytics, Provider keys and more.

provider keys

provider keys

Β 

The most useful is the knowledge base (RAG), which helps in the intelligent retrieval of information during conversations to provide accurate responses.

knowledge bases

If you switch to the configuration option in the sidebar, you will see everything you can tweak. You should change the language to English (the default is Arabic). Here is the brief info on tabs:

  • Models : choose your STT and LLM providers

  • Voices : choose how your agent sounds. You can test voices by typing something and hearing it out loud

  • Pathway : Build your agent's logic visually or with a global prompt

  • VoIP : Assign phone numbers for your agent to receive or make calls

  • Analysis : Decide how to tag calls, track how they went and check sentiment.

  • Widget : Add a voice chat interface to your website and customize how it looks.

  • White Labeling : Set up your own branding, logo and custom domain for your team

configuration options

As you can see on the top right, you can toggle between two modes:

a) DataQueue Mode (Optimized stack for MENA)

  • default mode when creating a new agent
  • uses DataQueue’s fully optimized models for speech recognition (STT), conversation logic (LLM), speech synthesis (TTS).
  • designed for MENA use cases where accuracy, latency, sentiment detection matter most.
  • voice setup and tuning are handled via the DQ Configs tab.

Note: You cannot manually override model providers in this mode It's possible in the second section.

b) Custom Mode (Provider Flexibility)

Custom Mode gives you full flexibility in model selection and configuration. Here are the supported providers:

  • STT Providers: Google, Deepgram, Gladia, Speechmatics, Azure
  • TTS Providers: ElevenLabs, Deepgram, LMNT, Cartesia, Rime AI, Azure, OpenAI, Google
  • LLM Providers: OpenAI, Groq, Claude (Anthropic), Cohere, DeepSeek, Ollama, Grok

Make sure to switch the language to en-US from the SST settings in the Models tab.

change language

You can also perform side-by-side Benchmarking to compare different setups to identify the optimal configuration for any specific use cases.

custom models

There are thousands of voices available across different accents and tones. Third-party integrations (TTS providers) are also easy to set up.

voices

The customization options are huge, perfect for developers/teams who want full control over voice, prompts and model selection.

elevenlabs

Β 

Step 2: Building the logic

They provide two approaches for defining how your agent behaves:

a) Global Prompt

A single prompt that guides the agent’s entire behavior (similar to system prompts in traditional LLM apps). Use this when your agent only needs to answer general questions or operate reactively.

b) Conversational Pathway

It's a visual drag-and-drop builder to define complex flows, variables and decision logic using connected nodes. This is what I will be using (seems easier).

I recommend using this when:

  • You need branching logic (such as verification β†’ escalate β†’ book β†’ end)
  • You want to extract variables (date, location..)
  • You want fine control over what the agent says and when

Yes, it's possible to combine them. You can start with a global prompt and add the conversation flow later or build the flow first and use the global prompt as a backup.

logic

You can add different nodes in the pathway. Here is the list of nodes and the purpose of each node:

For instance, the default node speaks a message and waits for the reply. While the End Call Node just ends the conversation.

add nodes

Click a node to open customization options. You can define specific behaviors, conditions or plug in knowledge base lookups.

knowledge based node

Β 

Step 3: Testing it out

You can test your agent with Start Test Call or Start Test Chat. You just need to provide the necessary microphone permissions on the website and the assistant will respond based on your flow.

This sample had only two nodes so the agent replies once and ends the call.

testing voice ai agent

You can also perform QA Testing to simulate conversational scenarios and evaluate how well an agent actually behaves. It produces pass/fail results per test scenario and lets you identify weaknesses before deploying to production.

Example test case: Hi, I want to schedule a new appointment next Monday at 3 pm.

Result:

βœ… PASS: Agent confirms the correct date/time with the proper tone
❌ FAIL: Agent ignores time or responds vaguely

You will also get full call logs to analyze past conversations.

call logs

Β 

Step 4: Deployment options

I think we all can agree that local testing is easier. Just build the workflow, test it out and voila. But what's the purpose if we cannot take it to real users?

VoiceHub makes this super easy. Navigate to Configuration > Widget and you will get a unique embed code for your website. You will get the option to customize the look, position and welcome message.

It will look something like this.

<dq-voice agent-id="your-agent-id" env="https://voicehub.dataqueue.ai/"> </dq-voice>
<script src="https://voicehub.dataqueue.ai/DqVoiceWidget.js"></script>
Enter fullscreen mode Exit fullscreen mode

deployment with the widget

I tried it on my Next.js portfolio website and it worked properly. If you directly place it just before the closing </body> tag, you will probably get an error as Property 'dq-voice' does not exist on type 'JSX.IntrinsicElements'.

To fix that, follow these steps:

a) There is a <dq-voice> tag so for TS/React to treat it as a valid JSX tag, we need to add this to a new declaration file (src/types/custom-elements.d.ts).

declare namespace JSX {
Β Β interface IntrinsicElements {
Β  Β  'dq-voice': React.DetailedHTMLProps<
Β  Β  Β Β React.HTMLAttributes<HTMLElement>,
Β  Β  Β Β HTMLElement
Β  Β Β >
Β Β }
}
Enter fullscreen mode Exit fullscreen mode

b) In your project’s root tsconfig.json, add the types folder so the TS will now load that file and no longer complain about the unknown tag.

"include": [
"src/types/custom-elements.d.ts",
"next-env.d.ts",
Β Β "**/*.ts",
Β Β "**/*.tsx",
".next/types/**/*.ts",
],
Enter fullscreen mode Exit fullscreen mode

c) Now just insert the widget into your Next.js layout.tsx.

  • dq-voice mounts the custom widget element.

  • Script ensures the widget script loads after the page is interactive (safe and efficient).

<dq-voice agent-id="id"></dq-voice>
<Script
Β Β src="https://voicehub.dataqueue.ai/DqVoiceWidget.js"
Β Β strategy="afterInteractive"
/>
Enter fullscreen mode Exit fullscreen mode

insert

As soon as you run the server, it will be live and a pop-up will appear to ask for the necessary permission for the microphone.

working

allow the request

You can have a usual conversation based on your workflow and it will listen/respond accordingly.

listening conversation

And just like that, I built and deployed my first Voice AI Agent! πŸŽ‰

You can also deploy with their cloud with your own private infrastructure or use a hybrid deployment (optimize infrastructure reducing server and GPU costs by up to 90%).

I tried some more advanced flows too but covering it would have made things a little bit confusing so I decided to leave it out.

I cannot cover all the things so I recommend checking the official docs and trying it out yourself. If you are still wondering about real-world use cases, check out the next section.


4. Practical use cases with examples.

Once you are familiar with the Voice AI Agents, it's easy to see how powerful they can be (especially when used to automate workflows).

Here are a few real-world use cases that show what’s possible:

βœ… AI agent deployed in an international airport to support passengers with disabilities

This is the use case that will blow your mind. The DataQueue team officially deployed the VoiceHub AI agent at Queen Alia International Airport in Amman, Jordan.

The agent is designed to support passengers with disabilities, ensuring they receive the assistance they need within 5 minutes.

Here is the demo video!

They are rolling out similar projects across airports in MENA and Europe, creating a positive impact at airports by handling customer support, accessibility and real-time responsiveness.

Β 

βœ… Auto call Agent for Internal Status Checks (Engineering & Ops Teams)

Startups are usually fast environments (infra teams, devops, logistics ops), teams often need to stay updated on ongoing issues, service status or even deployment logs.

Instead of Slack reminders or waiting for someone to check dashboards, a voice agent can actively call team members, summarize the current situation and log any updates or confirmations.

The flow can look something like this:

  • Cron job triggers every 2 hours.

  • Agent calls on-call engineer with a status update: Hi, just checking in. The latest deployment finished with 2 minor warnings. Do you want me to notify QA or hold off?

  • Engineer responds with Hold off until we patch β†’ Agent logs the response to Jira or an internal dashboard via API.

  • If there is no answer β†’ it falls back to SMS or call escalation.

Β 

βœ… Voice Agent to make cold Email more personal (via Call)

This is a very exciting workflow for sales teams that want to warm up cold emails before sending a pitch.

Instead of sending a generic email blast, the agent calls the lead, confirms if they are open to receiving information and gets some light qualifying data without a human SDR involved.

The flow can look something like this:

  • Lead data is pulled from CRM.

  • Voice agent calls: β€œHey, I’m helping xyz company learn more about founders in the fintech space. Just a quick 1-minute call, are you still working on xyz?”

  • Captures 2–3 data points (interest level, industry fit, team size) using variable extraction.

  • If the response is positive β†’ marks lead as warm β†’ generates a tailored intro email and sends via marketing tool.

The result is a more personal, context-aware email that’s far more likely to convert.


I used to think building voice AI agents required a lot of custom engineering, but it's accessible with the right set of tools.

This was just a basic version so there's still a lot more to explore.

If you have any questions, feedback or end up building something cool, please share in the comments.

Have a great day! Until next timeΒ :)

You can check
my work at anmolbaranwal.com.
Thank you for reading! πŸ₯°
twitter github linkedin

Ending GIF waving goodbye

Top comments (32)

Collapse
 
divyasinghdev profile image
Divya

Incredible as usual.
I gotta try this out soon.

Thank you

Collapse
 
anmolbaranwal profile image
Anmol Baranwal

Appreciate you reading this Divya. You can make one for free with the credits you get. I'm checking out a few other platforms as well, may write about it soon :)

Collapse
 
divyasinghdev profile image
Divya

I am planning on it.
Thank you for this article.

Looking forward to those articles, if you write them .

Collapse
 
_ndeyefatoudiop profile image
Ndeye Fatou Diop

This is amazing Anmol! Thanks for sharing πŸ™

Collapse
 
anmolbaranwal profile image
Anmol Baranwal

thanks for reading Ndeye! If you end up creating an agent, you should definitely write about it.

Collapse
 
aniruddhaadak profile image
ANIRUDDHA ADAK

Great job! This is super helpful! πŸ‘

Collapse
 
anmolbaranwal profile image
Anmol Baranwal

means a lot. thanks for reading!

Collapse
 
alifar profile image
Ali Farhat

This is amazing brother!! πŸ™Œ

Collapse
 
anmolbaranwal profile image
Anmol Baranwal

Appreciate you saying that Ali. I'm also looking into 11ai (recently launched by ElevenLabs).

Collapse
 
blessed_thompson_ca9a7ebd profile image
Blessed Thompson

Can it be integrated with n8n?

Collapse
 
anmolbaranwal profile image
Anmol Baranwal

I don't think there is a direct integration with n8n (as per the docs) but we might be able to do it indirectly using a webhook.

Collapse
 
blessed_thompson_ca9a7ebd profile image
Blessed Thompson

Please can you try and let me know if it's possible?

Collapse
 
parag_nandy_roy profile image
Parag Nandy Roy

Love how clearly you broke everything down too...inspiring stuff

Collapse
 
anmolbaranwal profile image
Anmol Baranwal

Feels great to hear that Parag! I spent a lot of time learning everything (since this was my first time too) and tried my best to explain the stuff.

Collapse
 
psylligent profile image
Vlad I

Amazing, can it take info from call and push into my db ?

Collapse
 
anmolbaranwal profile image
Anmol Baranwal

Yeah, you can use the Webhook node to collect the data and send it to your endpoint. Then parse the JSON payload & use any ORM to write data in your DB.

Collapse
 
reena_ram_16266afd30c436f profile image
Reena Ram

I am a beginner, without any prior knowledge can I complete this???

Collapse
 
anmolbaranwal profile image
Anmol Baranwal

I didn't have much knowledge in this space which is why I wrote this.. so others can understand the fundamental concepts. After reading this, I'm sure you can do it easily. And if you want to build something more advanced, just refer to the docs.

Collapse
 
reena_ram_16266afd30c436f profile image
Reena Ram

πŸ™ thanksss

Collapse
 
ridhamu profile image
ridhamu

good articles, love airport video!

Collapse
 
anmolbaranwal profile image
Anmol Baranwal

thanks. I'm so happy you noticed that :D

It was a shorts video so it wasn't embedding properly which is probably why most people missed it.

Collapse
 
michael_nielsen_70ab83d55 profile image
Michael Nielsen

Love it! Have to try this at some point. Tried something similar 3 years ago, but the technology just wasn't there to make it useful.

Collapse
 
anmolbaranwal profile image
Anmol Baranwal

Yeah, I was researching and found some crazy useful platforms out there. Some are a bit technical, others are easier so the barrier to entry in this space is dropping really fast.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.