DEV Community

Cover image for Hanashi (話) AI | Turn Every Customer Support Call into Actionable Intelligence
3kb-dev
3kb-dev

Posted on

Hanashi (話) AI | Turn Every Customer Support Call into Actionable Intelligence

This is a submission for the AssemblyAI Challenge : No More Monkey Business.

What I Built

I built Hanashi (話) AI, a real-time customer support analytics app powered by AssemblyAI, to provide meaningful insights into customer service calls.

The app combines live call transcription with post-call analysis, offering businesses a clear view of customer interactions. With these insights, teams can identify patterns, address challenges, and continuously improve their approach to delivering great customer experiences.

Demo

Try it out here! 🚀

You will need to enter your own AssemblyAI key of a fully enabled account.

Journey

I evaluated AssemblyAI's API docs and this use-case seemed to be a great fit to showcase its full potential.

I believe this submission covers all three challenge prompts, but it delivers its core value through LeMUR, which is why my primary submission is for the prompt "No More Monkey Business."

The app enables users to record audio through their device's microphone and provides real-time transcription using AssemblyAI's real-time API. Implementing the real-time API was straightforward, with the primary challenge being the conversion of the browser's audio codec to PCM16. Audioworkets and this sample app provided enough context to implement the same!

Image description

I have also added a few sample customer support conversations (from AWS's sample dataset) to help demo the transcription and the analytics dashboard in one-click.

The analytics dashboard is completely powered using AssemblyAI's transcription API, audio intelligence features and LeMUR llm features.

Call analytics dashboard

We are able to calculate some simple metrics directly from the transcription response such as:

  1. Duration.
  2. # of speakers (using diarization feature).
  3. Words/min (indicates efficiency).
  4. Silence Time as a % of total duration of the call.

Image description

The sentiment detection features allow us to visualize the speaker sentiments on the timeline of the call. This is really useful to quickly evaluate the overall sentiment progression of the call.

Image description

I was then able to use LeMUR's Q&A feature to further evaluate sentiments at a deeper level on the basis of 5 tone of voice indicators: confidence, clarity, frustration, engagement, satisfaction and empathy.

The Q&A prompt works great here since i am able to ask the LLM to generate a numerical score for each of these attributes for both the agent and the customer and get a structured response to render a graph.

This feature helped get a more structured and predictable response from the LLM. A feedback for AssemblyAI team would be to add claude's structured outputs support to their API.

Image description

The dashboard also uses LeMUR for several other analytics aspects such as for objective feedback on agent performance, summarization of the calls by topics and more.

Image description

The dashboard also uses topic detection to reference utterances by transcript and does some simple math to calculate speaker time analytics.

Image description

Future Scope:

  1. Use of PII redaction feature could prove helpful here if data privacy is of importance for the business.
  2. Aggregated metrics over multiple transcripts.
  3. Ability to record webrtc calls.

Feedback for AssemblyAI

Overall, it’s been a great experience, and I appreciate the quality of your API, SDK, and documentation—they are well-written, concise, and made it easy to get started quickly. Great work on that front!

I do, however, have a few suggestions that could enhance the offering even further:

  1. Expanding the real-time API to include features like diarization and broader audio codec support would be incredibly valuable.
  2. Using multi-modal models to enable transcription and LLM-based generations in a single step could unlock powerful new features, such as tone detection.
  3. Adding functionality to predict and link person names to speaker labels would further elevate the usability and accuracy of speaker-related outputs.
  4. Adding structured outputs support to LeMUR APIs.

Thank you!!

Top comments (4)

Collapse
 
murthyug profile image
U G Murthy

@3kb-dev , Looks great works very well. I am working on a different use case of the streaming API and like you found the Real time example provided by assembly.ai useful but was unable to combine the audio chunks to get a full audio file. Can you please help me any suggestion for get to the combined audio file once streaming stops. Thanks in advance.

Collapse
 
3kb-dev profile image
3kb-dev • Edited

Sure, here's the function to create a wav file from the chunks. It pretty much adds the wav header in front of the chunks.

  const createWavFile = (audioData: ArrayBuffer[]): Blob => {
    // Combine all chunks
    const totalLength = audioData.reduce((acc, chunk) => acc + chunk.byteLength, 0)
    const combinedData = new Int16Array(totalLength / 2)

    let offset = 0
    audioData.forEach(chunk => {
      const view = new Int16Array(chunk)
      combinedData.set(view, offset)
      offset += view.length
    })

    // Create WAV header
    const wavHeader = new ArrayBuffer(44)
    const view = new DataView(wavHeader)

    // "RIFF" identifier
    view.setUint32(0, 0x52494646, false)
    // File length
    view.setUint32(4, 36 + combinedData.length * 2, true)
    // "WAVE" identifier
    view.setUint32(8, 0x57415645, false)
    // "fmt " chunk header
    view.setUint32(12, 0x666D7420, false)
    // Chunk length
    view.setUint32(16, 16, true)
    // Sample format (1 is PCM)
    view.setUint16(20, 1, true)
    // Mono channel
    view.setUint16(22, 1, true)
    // Sample rate (16000 Hz)
    view.setUint32(24, 16000, true)
    // Byte rate
    view.setUint32(28, 16000 * 2, true)
    // Block align
    view.setUint16(32, 2, true)
    // Bits per sample
    view.setUint16(34, 16, true)
    // "data" chunk header
    view.setUint32(36, 0x64617461, false)
    // Data length
    view.setUint32(40, combinedData.length * 2, true)

    return new Blob([wavHeader, combinedData.buffer], { type: 'audio/wav' })
  }
Enter fullscreen mode Exit fullscreen mode
Collapse
 
murthyug profile image
U G Murthy

@3kb-dev Thanks very much. Will give it a go once I complete my other bits. Thanks again.

Thread Thread
 
murthyug profile image
U G Murthy

@3kb-dev , correct line 7 to make it work chunk -> chunk.buffer