DEV Community

Cover image for Building My First AI-Powered App: From Whisper to Vercel Limits
sidharth sangelia
sidharth sangelia

Posted on

Building My First AI-Powered App: From Whisper to Vercel Limits

I thought building AI apps would be about training models and complex machine learning stuff, but as I dug deeper, I found out - why reinvent the wheel when you can solve 70-80% of tasks with existing models?

Turns out, the real challenge wasn't the AI part. It was everything else.

The Idea: What I Wanted to Build

I'd been consuming tons of great video content on YouTube (both long and short form) and Instagram, and kept thinking: "This would make such a good blog post for better readability and SEO reach."

So I came up with what seemed like a simple solution - a NextJS 15 web app where users could:

  1. Upload a video

  2. Get AI-generated transcript using OpenAI's Whisper-1

  3. Transform it into SEO-optimized content using ChatGPT/Gemini

  4. Download a ready-to-publish blog post

How hard could it be, right? OpenAI already has Whisper for transcription, other LLM models are readily available through APIs. Just connect the dots and boom - instant article creation.

Plus, the blogging and content creation industry is booming. I thought if I could build this, maybe I could create some recurring revenue like the folks on Twitter always talk about.

🛠️ The Tech Stack (What I Thought I Needed)

Frontend & Backend: Next.js seemed like the obvious choice - I could handle both frontend and backend with API routes and server actions.

AI Processing: OpenAI's Whisper-1 for transcription, GPT-4/Gemini for content optimization.

Hosting: Vercel, because... well, it's the easiest deployment ever.

File handling: This became a whole saga (more on this below).

Here's what I thought the flow would look like:

javascript

Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML// What I thought the flow would look like const processVideo = async (file) => { const transcription = await openai.audio.transcriptions.create({ file: file, model: "whisper-1" }); const blogPost = await openai.chat.completions.create({ model: "gpt-4", messages: [{ role: "user", content: `Turn this into an SEO blog: ${transcription.text}` }] }); return blogPost; // Easy, right? 😅 }

Spoiler alert: It wasn't that easy.

⚡ First Reality Check: The 25MB File Upload Nightmare

My initial plan was to use UploadThing (Theo's creation) since it had a generous free tier. But then I discovered Whisper-1 has a 25MB file limit per upload.

Problem: Most decent-quality videos are way larger than 25MB.

In development, I could easily test with small videos under 25MB, but for a production app that people would actually pay for? This was a major roadblock.

Solution attempt #1: Extract audio from video (audio files are much smaller than video files).

I found ImageKit.io, which could handle uploads, compression, and audio extraction. Perfect! I ditched UploadThing and dove into ImageKit's documentation.

Spent hours implementing the audio extraction feature, writing code, testing locally. Everything looked good. Then came the moment of truth - testing the actual audio file extraction.

Nothing worked.

I added console.logs everywhere, thinking it was a code issue. After hours of debugging, I realized the problem: I'd exhausted ImageKit's free plan just from testing.

The kicker: ImageKit's paid plan costs $80/month and provides way less processing power than I'd need for a robust app.

Solution attempt #2: Use FFmpeg on my own server.

New problem: I'm a student without money for servers.

So I decided to postpone the audio extraction feature and just put a strict 25MB limit for now. "I'll figure this out later," I told myself.

💻 The API Integration That Worked... Until It Didn't

I implemented all the API integrations and functions. Everything worked beautifully on localhost. The transcription was accurate, the content optimization was decent, and I was feeling pretty good about myself.

Time to deploy and show my friends!

I deployed to Vercel, shared the link with excitement, and... nothing worked.

The app would start processing, show a loading state, and then just timeout with a 504 error.

After a lot of head-scratching and debugging, I found the culprit: Vercel's function timeout limits.

My processing pipeline was taking 60+ seconds:

  • File upload: ~10 seconds

  • Audio processing: ~20 seconds

  • Whisper transcription: ~30 seconds

  • GPT optimization: ~15 seconds

Google search time: "vercel function timeout limit"

The harsh reality:

  • Hobby plan: 10 seconds

  • Pro plan: 60 seconds

  • Enterprise: 15 minutes

Quick math: My processing takes 60+ seconds. Even the Pro plan wouldn't save me.

Bigger realization: This isn't just a Vercel problem. AWS Lambda has 15-minute limits, Netlify has similar constraints. Serverless functions aren't meant for long-running tasks.

Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML❌ What I Built: User Upload → Vercel Function → Process (60s) → Timeout ✅ What I Actually Needed: User Upload → Queue Job → Background Worker → Notify User

🤔 Understanding the Real Problem

Here's what I learned about serverless architecture the hard way:

Serverless functions are great for:

  • Fast responses (under 30 seconds)

  • Auto-scaling

  • Quick API endpoints

  • Simple data processing

Serverless functions are terrible for:

  • Long-running AI processing

  • File manipulation tasks

  • Complex workflows

  • Anything that takes time

The fundamental issue: I was trying to fit a long-running AI workflow into a request-response architecture. That's like trying to fit a truck through a car door.

🔧 Solution Discovery: Enter Background Jobs

Back to Google: "background jobs nodejs", "async processing for AI apps"

I discovered several options:

  • Inngest - Developer-friendly, good free tier

  • BullMQ - Redis-based, more complex setup

  • AWS SQS - Powerful but overkill for my needs

  • Redis Queue - DIY approach

I chose Inngest because:

  1. It handles the infrastructure for me

  2. Great developer experience

  3. Built-in retries and error handling

  4. Free tier was sufficient for testing

Here's the new architecture:

javascript

Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXMLimport { inngest } from "./inngest/client"; // API endpoint just queues the job (fast!) app.post('/api/process-video', async (req, res) => { const jobId = await inngest.send({ name: "video/process", data: { videoUrl: req.body.videoUrl, userId: req.body.userId } }); res.json({ message: "Processing started! We'll notify you when it's ready.", jobId, checkStatusAt: `/api/status/${jobId}` }); }); // Actual processing happens in background (no time limits!) export const processVideo = inngest.createFunction( { id: "process-video" }, { event: "video/process" }, async ({ event }) => { // Now I can take all the time I need const transcription = await processWithWhisper(event.data.videoUrl); const blog = await optimizeWithGPT(transcription); // Notify user when done await notifyUser(event.data.userId, blog); } );

The difference: Instead of making users wait for 60+ seconds (and timing out), I immediately return a "we're processing it" response and handle the heavy lifting in the background.

📚 What I Actually Learned

Architecture Matters More Than I Thought

You can't just throw AI processing into a standard web app architecture and expect it to work. AI apps have fundamentally different requirements:

  • Long processing times

  • Unpredictable resource usage

  • Need for progress tracking

  • Error handling for expensive operations

Research Infrastructure Constraints First

I should have researched deployment limitations before writing a single line of code. Now I know to ask:

  • What are the timeout limits?

  • How much memory/CPU can I use?

  • What happens if processing fails halfway through?

  • How do I handle user notifications?

The Hardest Part Isn't the AI

I thought the challenging parts would be:

  • Getting good transcriptions

  • Optimizing content for SEO

  • Fine-tuning prompts

Actually challenging parts:

  • File upload and processing

  • Background job orchestration

  • User experience for async operations

  • Error handling and retries

  • Infrastructure costs and scaling

Local Development Can Be Misleading

Everything worked perfectly on my MacBook Pro. But production environments have:

  • Stricter resource limits

  • Network latency

  • Timeout constraints

  • Different error conditions

Lesson: Test in production-like environments early and often.

🔮 What I'm Building Next

I'm currently rebuilding the entire app with background jobs as a first-class citizen:

New tech stack:

  • Frontend: Still Next.js, but with real-time progress indicators

  • Background jobs: Inngest for orchestration

  • Database: Adding Supabase for job status tracking

  • File storage: Moving to Cloudinary for better video handling

  • Notifications: WebSocket connections for real-time updates

Timeline reality check: What I thought would be a 2-week project is now a 3-month learning journey. And honestly? I'm more excited about it now.

Other technologies I discovered:

  • Langchain & Langraph: For more complex AI workflows

  • Redis: For caching and session management

  • WebSocket/Server-sent events: For real-time progress updates

  • Queue monitoring tools: For debugging background jobs

💭 Advice for Other Developers

If you're building AI-powered apps, here's what I wish someone had told me:

Plan for Async from Day One

Don't build a synchronous AI app and try to make it async later. Design your user experience around the fact that AI processing takes time:

  • Show progress indicators

  • Send email/push notifications when jobs complete

  • Let users check status later

  • Handle failures gracefully

Research Your Platform's Limits

Before you write any code, understand:

  • Function timeout limits

  • Memory constraints

  • File size restrictions

  • Pricing for overages

Start with Background Jobs

Even if your AI processing is currently fast, it will get slower as you add features. Background jobs give you:

  • Better user experience

  • Easier scaling

  • Retry mechanisms

  • Progress tracking

Infrastructure Is Harder Than AI

Getting good results from OpenAI's APIs is pretty straightforward. Getting those results reliably delivered to users in production? That's the real challenge.

It's Totally Normal

If you're struggling with infrastructure for AI apps, you're not alone. Every AI developer goes through this learning curve. The AI part is often the easy part.

What about you?

Have you hit similar serverless limitations while building AI apps? What solutions did you find?

Are you currently building something with AI? What infrastructure challenges are you facing?

Drop a comment below - I'd love to hear about your experiences and maybe we can help each other avoid these pitfalls!

Currently rebuilding this app the right way. Follow my journey as I document everything I learn about building production-ready AI applications.

Top comments (0)