This is a submission for the AssemblyAI Challenge : No More Monkey Business.
What I Built
I built Hanashi (話) AI, a real-time customer support analytics app powered by AssemblyAI, to provide meaningful insights into customer service calls.
The app combines live call transcription with post-call analysis, offering businesses a clear view of customer interactions. With these insights, teams can identify patterns, address challenges, and continuously improve their approach to delivering great customer experiences.
Demo
You will need to enter your own AssemblyAI key of a fully enabled account.
Journey
I evaluated AssemblyAI's API docs and this use-case seemed to be a great fit to showcase its full potential.
I believe this submission covers all three challenge prompts, but it delivers its core value through LeMUR, which is why my primary submission is for the prompt "No More Monkey Business."
The app enables users to record audio through their device's microphone and provides real-time transcription using AssemblyAI's real-time API. Implementing the real-time API was straightforward, with the primary challenge being the conversion of the browser's audio codec to PCM16. Audioworkets and this sample app provided enough context to implement the same!
I have also added a few sample customer support conversations (from AWS's sample dataset) to help demo the transcription and the analytics dashboard in one-click.
The analytics dashboard is completely powered using AssemblyAI's transcription API, audio intelligence features and LeMUR llm features.
Call analytics dashboard
We are able to calculate some simple metrics directly from the transcription response such as:
- Duration.
- # of speakers (using diarization feature).
- Words/min (indicates efficiency).
- Silence Time as a % of total duration of the call.
The sentiment detection features allow us to visualize the speaker sentiments on the timeline of the call. This is really useful to quickly evaluate the overall sentiment progression of the call.
I was then able to use LeMUR's Q&A feature to further evaluate sentiments at a deeper level on the basis of 5 tone of voice indicators: confidence, clarity, frustration, engagement, satisfaction and empathy.
The Q&A prompt works great here since i am able to ask the LLM to generate a numerical score for each of these attributes for both the agent and the customer and get a structured response to render a graph.
This feature helped get a more structured and predictable response from the LLM. A feedback for AssemblyAI team would be to add claude's structured outputs support to their API.
The dashboard also uses LeMUR for several other analytics aspects such as for objective feedback on agent performance, summarization of the calls by topics and more.
The dashboard also uses topic detection to reference utterances by transcript and does some simple math to calculate speaker time analytics.
Future Scope:
- Use of PII redaction feature could prove helpful here if data privacy is of importance for the business.
- Aggregated metrics over multiple transcripts.
- Ability to record webrtc calls.
Feedback for AssemblyAI
Overall, it’s been a great experience, and I appreciate the quality of your API, SDK, and documentation—they are well-written, concise, and made it easy to get started quickly. Great work on that front!
I do, however, have a few suggestions that could enhance the offering even further:
- Expanding the real-time API to include features like diarization and broader audio codec support would be incredibly valuable.
- Using multi-modal models to enable transcription and LLM-based generations in a single step could unlock powerful new features, such as tone detection.
- Adding functionality to predict and link person names to speaker labels would further elevate the usability and accuracy of speaker-related outputs.
- Adding structured outputs support to LeMUR APIs.
Thank you!!
Top comments (0)