Making of Aantraa
aantraa.site — AI audio & video translation, caption generator, and viral shorts cutter.
Under the Hood
I run a small YouTube channel. I'm not a full-time content creator, but YouTube is a solid platform to gain traffic for your online work, business, project, or idea.
Aantraa is what I built in a week. The main concept is simple:
- Video translation into multiple languages
- Audio translation — including text-to-audio, with MP3 output for Premiere Pro
- Long-form to shorts — convert YouTube long-form video into short clips
At that time, only three features were needed, so website development wasn't the heavy lift. The real work was building APIs, backend infrastructure to integrate AI into video, and dealing with heavy storage.
Breaking the execution into steps: How I made Aantraa
AI LLM layering and provider
Aantraa is heavily dependent on AI APIs — we need reliable infrastructure for LLM providers.
OpenRouter, Portkey, Vercel AI SDK labs, and individual APIs for Anthropic, Deepseek, and OpenAI are solid options.
I prefer OpenRouter for Aantraa for one reason: multiple model support — it's easy to pick the cheapest capable model for each job. Easy to integrate, strong community support, free model access, and more.
AI LLM APIs are needed at almost every stage in the backend:
- Understanding video context and creating a script
- Translating the script into target languages
- Recording the script into MP3 or WAV format
- Summarising the video
- Generating captions
- Cutting videos into shorts
Building APIs and servers
Each layer needs heavy AI context and prompt engineering. Loop engineering is the trend here — and it's required for aantraa.
For example, video translation works in multiple connected steps:
Video translation API breakdown
- AI understands the video, fed into the LLM via the ffmpeg module
- AI generates a script/caption from the video
- AI translates the script into the desired language
- AI generates audio (MP3 or WAV) of the new translation
- AI glues audio and video together using ffmpeg
Each step depends on the previous one, which makes production debugging hard when something breaks.
Solution: Track each process — usage tokens, estimated time, errors, and response metadata.
The same pattern applies to audio translation, viral clip cutter, and caption generator.
Infrastructure and servers
Local API development is manageable until you ship to production.
| Layer | Choice |
|---|---|
| Framework | Hono.js |
| Backend hosting | Vercel Edge or Fly.io |
| DevOps | Docker, simple Git CI/CD |
| Database | Firebase / Supabase |
| Storage | UploadThing |
I found UploadThing as a practical alternative to AWS S3 and Firebase/Supabase storage for file uploads. It provides client and server SDKs to upload files quickly (5 MB per chunk on the free plan).
We need storage heavily because every AI layer doesn't keep its own memory — every generated audio/video file must land in storage.
FFmpeg is essential for video and audio work, but it has limitations on serverless functions and Vercel Edge. That pushed us toward Fly.io, Railway, or Render for heavier media workloads.
Video translation into 90+ languages
Aantraa supports 90+ languages for video and audio translation.
AI translates scripts, text, on-screen text, and video context well — it needs to understand the video through the script and each frame. FFmpeg helps with that pipeline.
The flow:
- AI generates a script from the source
- AI translates into the target language
- AI creates dubbed audio in that language
- FFmpeg merges audio and video into a new translated file
Each step needs debugging, prompt engineering, and FFmpeg integration.
Finally, the output uploads to storage and returns a URL to the client for download and playback.
Audio to MP3 in multiple languages
Aantraa isn't only video translation. As a creator, I also wanted to turn blog posts or text into audio — podcast-style listening.
The audio translation tool covers:
- Text to audio — download MP3 or WAV
- 90+ languages for text and video sources
- Video to audio extraction and translation
- Multiple target languages in parallel
That makes aantraa a supporting platform: one recording → 90+ language MP3s, ready for one-click sharing.
YouTube videos to shorts
The viral shorts feature converts long-form YouTube video into short clips you can upload directly to your channel.
API breakdown
- AI understands the full video context — summary and script
- AI breaks the script by timestamp into the desired number of shorts
- FFmpeg cuts each clip; APIs upload to storage
It sounds simple, but production needs FFmpeg tuning, AI context limits, and file-size guardrails — videos over ~10 MB cost more time and money to process.
Video translation examples
See 10+ translated videos in Spanish, Hindi, Bengali, Gujarati, Marathi, Tamil, French, English, Japanese, Chinese, and more on our examples page.
90+ languages make aantraa a universal, global platform.
Conclusion
The first version is live. We also offer APIs for business agencies and startup teams — reach out via contact if you're interested.
Links
- Website: aantraa.site
- Blog: aantraa.site/blog
- Examples: aantraa.site/examples
- Pricing: aantraa.site/#pricing
Try the product and share your feedback — early signups get 1 free minute of AI translation credit.
Cheers,
Shrey
Aantraa · aantraa.site
Top comments (1)
How are you handling the length drift when the dubbed audio doesn't line up with the original's timing? Translated speech almost always runs longer or shorter than the source, so once ffmpeg merges it the voice and the on-screen action slowly fall out of sync. Curious whether you stretch the audio, trim silences, or just let it ride for now.