The Polymath Tool for All Your Audio and Document Needs

Karneeshkar V — Sat, 26 Jul 2025 08:03:34 +0000

This is a submission for the AssemblyAI Voice Agents Challenge

What I Built

I built a Command-Line Interface (CLI) tool designed to help users manage their medical and legal conversations more effectively.

This tool can transcribe audio files or calls with your doctor or Financial advisor, then organize and retrieve relevant insights to assist in decision-making.

The idea stems from a personal pain point — often during important medical or legal discussions, I found it difficult to:

Ask detailed follow-up questions
Recall key points accurately
Understand complex terminology on the spot

By using AssemblyAI’s accurate transcription, especially for domain-specific (medical/legal) vocabulary, the project came to life

All the CLI commands and flags can be found in the README.md
To set it up you will need this in your .env file

ASSEMBLY_AI_API_KEY=""
OPENAI_API_KEY=""
QDRANT_URL=""

Make sure to run Qdrant in your local system

Demo

Using Assembly AI for transcription and injecting it to rag

Using memory from past call from doctor

GitHub Repository

https://github.com/KarneeshkarV/-AssemblyAI-Domain-Expert-Voice-Agent

Technical Implementation & AssemblyAI Integration

Built using the Agno agent framework
Each domain-specific agent (medical or legal) is powered by a team of sub-agents
- One for RAG (retrieval)
- One for memory/context management
- One for web search and knowledge lookups
- So on ....
I used OpenAI models in the primary implementation due to cost-effectiveness, though I found Claude models to perform better in tool use during testing
Made some audio optimizations to effectively use TTS credits
Core transcription powered by AssemblyAI, enabling robust handling of domain-specific vocabulary
Future Work

I had plans to:
Make the entire injecting of data more easier and user Friendly
Integrate SIP Sorcery for capturing and analyzing VoIP call streams
Add another specialized agent focused on legal document processing

However, due to my time constraints — they remain on my Todo list!

I am all hears to know how I can improve this project

DEV Community: Karneeshkar V

The Polymath Tool for All Your Audio and Document Needs

What I Built

Demo

GitHub Repository

Technical Implementation & AssemblyAI Integration

Future Work