Building a TTS Tool for My Friend in One Hour
The Ask
My friend asked if there was a service that could read academic papers aloud - not like NotebookLM which creates podcast-style summaries, but something that would actually read the original text. She wanted to listen to papers like audiobooks when her eyes got tired.
I didn't know of such a service, but since I'm familiar with Microsoft Azure Language Services, I offered to help: "Send me the paper and I'll make mp3 for you."
The Reality Check
I thought this would be simple:
- Extract text from PDF using Claude/ChatGPT/Grok
- Run it through Azure TTS (Text to Speech)
- Done!
Wrong. Academic PDFs are messy. Extract text and you get dozens of co-author names, chart numbers, table data, footnotes - everything my friend didn't want to hear.
I tried asking different AIs to extract only title, abstract, and main content:
- Claude Sonnet: Failed
- Grok: Failed with errors
- Claude Opus: Success
Turns out extracting clean content from academic PDFs is harder than expected.
From Manual to Automated
I generated the MP3 using Azure TTS and made a tutorial video. But a week later, my friend was still hunting for paid PDF-to-speech services.
That's when I realized: I might not know the perfect PDF-to-text AI, but I can definitely build a basic TTS web app.
So I coded one with Claude Sonnet, built it in Next.js, deployed on Vercel, and shared it.
This is why I love being able to code!!!
Visit my github blog to read more
https://battlelolo.github.io/
Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.