DEV Community

Cover image for The Polymath Tool for All Your Audio and Document Needs
Karneeshkar V
Karneeshkar V Subscriber

Posted on

The Polymath Tool for All Your Audio and Document Needs

AssemblyAI Voice Agents Challenge: Domain Expert

This is a submission for the AssemblyAI Voice Agents Challenge

What I Built

I built a Command-Line Interface (CLI) tool designed to help users manage their medical and legal conversations more effectively.

This tool can transcribe audio files or calls with your doctor or Financial advisor, then organize and retrieve relevant insights to assist in decision-making.

The idea stems from a personal pain point — often during important medical or legal discussions, I found it difficult to:

  • Ask detailed follow-up questions
  • Recall key points accurately
  • Understand complex terminology on the spot

By using AssemblyAI’s accurate transcription, especially for domain-specific (medical/legal) vocabulary, the project came to life

All the CLI commands and flags can be found in the README.md
To set it up you will need this in your .env file

ASSEMBLY_AI_API_KEY=""
OPENAI_API_KEY=""
QDRANT_URL=""
Enter fullscreen mode Exit fullscreen mode

Make sure to run Qdrant in your local system

Demo

Using Assembly AI for transcription and injecting it to rag

Using memory from past call from doctor

GitHub Repository

https://github.com/KarneeshkarV/-AssemblyAI-Domain-Expert-Voice-Agent

Technical Implementation & AssemblyAI Integration

  • Built using the Agno agent framework
  • Each domain-specific agent (medical or legal) is powered by a team of sub-agents
    • One for RAG (retrieval)
    • One for memory/context management
    • One for web search and knowledge lookups
    • So on ....
  • I used OpenAI models in the primary implementation due to cost-effectiveness, though I found Claude models to perform better in tool use during testing
  • Made some audio optimizations to effectively use TTS credits
  • Core transcription powered by AssemblyAI, enabling robust handling of domain-specific vocabulary

    Future Work

    I had plans to:

  • Make the entire injecting of data more easier and user Friendly
  • Integrate SIP Sorcery for capturing and analyzing VoIP call streams
  • Add another specialized agent focused on legal document processing

However, due to my time constraints — they remain on my Todo list!

I am all hears to know how I can improve this project

Top comments (3)

Collapse
 
hariharan_ganeshs_66531d profile image
Hariharan Ganesh S

Awesome work!!

Collapse
 
ashish_ramja profile image
Ashish Ram J A

Great work mate! never stop building and shipping!!

Collapse
 
advaith_dev profile image
Advaith R

Great Job!!!

Some comments may only be visible to logged-in visitors. Sign in to view all comments.