DEV Community

langford
langford

Posted on

a platform for creating sales voice agents for businesses

AssemblyAI Voice Agents Challenge: Business Automation

This is a submission for the AssemblyAI Voice Agents Challenge

What I Built

i built a platform that lets businesses create AI sales reps (voice agents) that can hold near-real-time conversations, pitch products/services, collect customer information, and answer relevant questions

the voice agent can be embedded into a website as a widget and users can trigger a conversation via a floating action button

each agent is provided with domain-specific knowledge tailored to the business, ensuring that responses stay accurate. these agents aren’t just chatbots, instead, they act as an extension of the business, using preconfigured behavior, and product/service expertise defined by the business

the app shows usage analytics as well as summarized conversations with extracted information like lead quality, customer information and intent, as well as next steps the business can take based on the conversation between user and ai sales agent

this project falls primarily under the business automation and domain expert prompt categories

Demo

app link: click here

embed script generation

random website with embedded script tag

agent running on random website

conversation analytics & summary

full conversation transcript on dashboard

GitHub Repository

click here

Technical Implementation & AssemblyAI Integration

the frontend is built with Next.js with Tailwind CSS, and the backend is in Node.js, Express, PostgreSQL and the AssemblyAI SDK

AssemblyAI’s Universal-Streaming API powers real-time voice transcription via WebSockets. The low-latency transcription it provides is essential for delivering almost instantaneous transcriptions

after a successful web socket connection is made, the app connects to Assembly's transcriber and pipes audio from the frontend to a transcriber stream

here is a sample nodejs code for connecting to Assembly AI via a websocket:

import { AssemblyAI } from "assemblyai";
import ffmpeg from "fluent-ffmpeg";
import { PassThrough, Readable } from "stream";

async function handleWebSocketConnection(ws) {
  const client = new AssemblyAI({ apiKey: process.env.ASSEMBLY_AI_API_KEY });

  const transcriber = client.streaming.transcriber({
    sampleRate: 16_000,
  });


  transcriber.on("turn", (turn) => {
    if (!turn.transcript) return;
    ws.send(JSON.stringify({ transcript: turn.transcript }));
  });

  await transcriber.connect();
  const inputStream = new PassThrough();

  const ffmpegProcess = ffmpeg()
    .input(inputStream)
    .inputFormat("webm")
    .audioFrequency(16000)
    .audioChannels(1)
    .audioCodec("pcm_s16le")
    .format("s16le")
    .on("error", (err) => console.error("ffmpeg error:", err))
    .pipe();

  Readable.toWeb(ffmpegProcess).pipeTo(transcriber.stream());

  ws.on("message", (data) => {
    inputStream.write(data);
  });

  ws.on("close", async () => {
    inputStream.end();
    await transcriber.close();
  });
}
Enter fullscreen mode Exit fullscreen mode

once transcription is received, it's sent to an llm that generates appropriate responses based on predefined context. any response from the llm is read out loud in human voice using amazon polly's predefined voices

that's all for now. cheers!

thank you gif

Top comments (0)