Building a TTS Tool for My Friend in One Hour

#tts #azure #ai #webdev

Building a TTS Tool for My Friend in One Hour

The Ask

My friend asked if there was a service that could read academic papers aloud - not like NotebookLM which creates podcast-style summaries, but something that would actually read the original text. She wanted to listen to papers like audiobooks when her eyes got tired.

I didn't know of such a service, but since I'm familiar with Microsoft Azure Language Services, I offered to help: "Send me the paper and I'll make mp3 for you."

The Reality Check

I thought this would be simple:

Extract text from PDF using Claude/ChatGPT/Grok
Run it through Azure TTS (Text to Speech)
Done!

Wrong. Academic PDFs are messy. Extract text and you get dozens of co-author names, chart numbers, table data, footnotes - everything my friend didn't want to hear.

I tried asking different AIs to extract only title, abstract, and main content: