I thought building AI apps would be about training models and complex machine learning stuff, but as I dug deeper, I found out - why reinvent the wheel when you can solve 70-80% of tasks with existing models?
Turns out, the real challenge wasn't the AI part. It was everything else.
The Idea: What I Wanted to Build
I'd been consuming tons of great video content on YouTube (both long and short form) and Instagram, and kept thinking: "This would make such a good blog post for better readability and SEO reach."
So I came up with what seemed like a simple solution - a NextJS 15 web app where users could:
Upload a video
Get AI-generated transcript using OpenAI's Whisper-1
Transform it into SEO-optimized content using ChatGPT/Gemini
Download a ready-to-publish blog post
How hard could it be, right? OpenAI already has Whisper for transcription, other LLM models are readily available through APIs. Just connect the dots and boom - instant article creation.
Plus, the blogging and content creation industry is booming. I thought if I could build this, maybe I could create some recurring revenue like the folks on Twitter always talk about.
🛠️ The Tech Stack (What I Thought I Needed)
Frontend & Backend: Next.js seemed like the obvious choice - I could handle both frontend and backend with API routes and server actions.
AI Processing: OpenAI's Whisper-1 for transcription, GPT-4/Gemini for content optimization.
Hosting: Vercel, because... well, it's the easiest deployment ever.
File handling: This became a whole saga (more on this below).
Here's what I thought the flow would look like:
javascript
Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML// What I thought the flow would look like const processVideo = async (file) => { const transcription = await openai.audio.transcriptions.create({ file: file, model: "whisper-1" }); const blogPost = await openai.chat.completions.create({ model: "gpt-4", messages: [{ role: "user", content: `Turn this into an SEO blog: ${transcription.text}` }] }); return blogPost; // Easy, right? 😅 }
Spoiler alert: It wasn't that easy.
⚡ First Reality Check: The 25MB File Upload Nightmare
My initial plan was to use UploadThing (Theo's creation) since it had a generous free tier. But then I discovered Whisper-1 has a 25MB file limit per upload.
Problem: Most decent-quality videos are way larger than 25MB.
In development, I could easily test with small videos under 25MB, but for a production app that people would actually pay for? This was a major roadblock.
Solution attempt #1: Extract audio from video (audio files are much smaller than video files).
I found ImageKit.io, which could handle uploads, compression, and audio extraction. Perfect! I ditched UploadThing and dove into ImageKit's documentation.
Spent hours implementing the audio extraction feature, writing code, testing locally. Everything looked good. Then came the moment of truth - testing the actual audio file extraction.
Nothing worked.
I added console.logs everywhere, thinking it was a code issue. After hours of debugging, I realized the problem: I'd exhausted ImageKit's free plan just from testing.
The kicker: ImageKit's paid plan costs $80/month and provides way less processing power than I'd need for a robust app.
Solution attempt #2: Use FFmpeg on my own server.
New problem: I'm a student without money for servers.
So I decided to postpone the audio extraction feature and just put a strict 25MB limit for now. "I'll figure this out later," I told myself.
💻 The API Integration That Worked... Until It Didn't
I implemented all the API integrations and functions. Everything worked beautifully on localhost. The transcription was accurate, the content optimization was decent, and I was feeling pretty good about myself.
Time to deploy and show my friends!
I deployed to Vercel, shared the link with excitement, and... nothing worked.
The app would start processing, show a loading state, and then just timeout with a 504 error.
After a lot of head-scratching and debugging, I found the culprit: Vercel's function timeout limits.
My processing pipeline was taking 60+ seconds:
File upload: ~10 seconds
Audio processing: ~20 seconds
Whisper transcription: ~30 seconds
GPT optimization: ~15 seconds
Google search time: "vercel function timeout limit"
The harsh reality:
Hobby plan: 10 seconds
Pro plan: 60 seconds
Enterprise: 15 minutes
Quick math: My processing takes 60+ seconds. Even the Pro plan wouldn't save me.
Bigger realization: This isn't just a Vercel problem. AWS Lambda has 15-minute limits, Netlify has similar constraints. Serverless functions aren't meant for long-running tasks.
Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXML❌ What I Built: User Upload → Vercel Function → Process (60s) → Timeout ✅ What I Actually Needed: User Upload → Queue Job → Background Worker → Notify User
🤔 Understanding the Real Problem
Here's what I learned about serverless architecture the hard way:
Serverless functions are great for:
Fast responses (under 30 seconds)
Auto-scaling
Quick API endpoints
Simple data processing
Serverless functions are terrible for:
Long-running AI processing
File manipulation tasks
Complex workflows
Anything that takes time
The fundamental issue: I was trying to fit a long-running AI workflow into a request-response architecture. That's like trying to fit a truck through a car door.
🔧 Solution Discovery: Enter Background Jobs
Back to Google: "background jobs nodejs", "async processing for AI apps"
I discovered several options:
Inngest - Developer-friendly, good free tier
BullMQ - Redis-based, more complex setup
AWS SQS - Powerful but overkill for my needs
Redis Queue - DIY approach
I chose Inngest because:
It handles the infrastructure for me
Great developer experience
Built-in retries and error handling
Free tier was sufficient for testing
Here's the new architecture:
javascript
Plain textANTLR4BashCC#CSSCoffeeScriptCMakeDartDjangoDockerEJSErlangGitGoGraphQLGroovyHTMLJavaJavaScriptJSONJSXKotlinLaTeXLessLuaMakefileMarkdownMATLABMarkupObjective-CPerlPHPPowerShell.propertiesProtocol BuffersPythonRRubySass (Sass)Sass (Scss)SchemeSQLShellSwiftSVGTSXTypeScriptWebAssemblyYAMLXMLimport { inngest } from "./inngest/client"; // API endpoint just queues the job (fast!) app.post('/api/process-video', async (req, res) => { const jobId = await inngest.send({ name: "video/process", data: { videoUrl: req.body.videoUrl, userId: req.body.userId } }); res.json({ message: "Processing started! We'll notify you when it's ready.", jobId, checkStatusAt: `/api/status/${jobId}` }); }); // Actual processing happens in background (no time limits!) export const processVideo = inngest.createFunction( { id: "process-video" }, { event: "video/process" }, async ({ event }) => { // Now I can take all the time I need const transcription = await processWithWhisper(event.data.videoUrl); const blog = await optimizeWithGPT(transcription); // Notify user when done await notifyUser(event.data.userId, blog); } );
The difference: Instead of making users wait for 60+ seconds (and timing out), I immediately return a "we're processing it" response and handle the heavy lifting in the background.
📚 What I Actually Learned
Architecture Matters More Than I Thought
You can't just throw AI processing into a standard web app architecture and expect it to work. AI apps have fundamentally different requirements:
Long processing times
Unpredictable resource usage
Need for progress tracking
Error handling for expensive operations
Research Infrastructure Constraints First
I should have researched deployment limitations before writing a single line of code. Now I know to ask:
What are the timeout limits?
How much memory/CPU can I use?
What happens if processing fails halfway through?
How do I handle user notifications?
The Hardest Part Isn't the AI
I thought the challenging parts would be:
Getting good transcriptions
Optimizing content for SEO
Fine-tuning prompts
Actually challenging parts:
File upload and processing
Background job orchestration
User experience for async operations
Error handling and retries
Infrastructure costs and scaling
Local Development Can Be Misleading
Everything worked perfectly on my MacBook Pro. But production environments have:
Stricter resource limits
Network latency
Timeout constraints
Different error conditions
Lesson: Test in production-like environments early and often.
🔮 What I'm Building Next
I'm currently rebuilding the entire app with background jobs as a first-class citizen:
New tech stack:
Frontend: Still Next.js, but with real-time progress indicators
Background jobs: Inngest for orchestration
Database: Adding Supabase for job status tracking
File storage: Moving to Cloudinary for better video handling
Notifications: WebSocket connections for real-time updates
Timeline reality check: What I thought would be a 2-week project is now a 3-month learning journey. And honestly? I'm more excited about it now.
Other technologies I discovered:
Langchain & Langraph: For more complex AI workflows
Redis: For caching and session management
WebSocket/Server-sent events: For real-time progress updates
Queue monitoring tools: For debugging background jobs
💭 Advice for Other Developers
If you're building AI-powered apps, here's what I wish someone had told me:
Plan for Async from Day One
Don't build a synchronous AI app and try to make it async later. Design your user experience around the fact that AI processing takes time:
Show progress indicators
Send email/push notifications when jobs complete
Let users check status later
Handle failures gracefully
Research Your Platform's Limits
Before you write any code, understand:
Function timeout limits
Memory constraints
File size restrictions
Pricing for overages
Start with Background Jobs
Even if your AI processing is currently fast, it will get slower as you add features. Background jobs give you:
Better user experience
Easier scaling
Retry mechanisms
Progress tracking
Infrastructure Is Harder Than AI
Getting good results from OpenAI's APIs is pretty straightforward. Getting those results reliably delivered to users in production? That's the real challenge.
It's Totally Normal
If you're struggling with infrastructure for AI apps, you're not alone. Every AI developer goes through this learning curve. The AI part is often the easy part.
What about you?
Have you hit similar serverless limitations while building AI apps? What solutions did you find?
Are you currently building something with AI? What infrastructure challenges are you facing?
Drop a comment below - I'd love to hear about your experiences and maybe we can help each other avoid these pitfalls!
Currently rebuilding this app the right way. Follow my journey as I document everything I learn about building production-ready AI applications.
Top comments (0)