Taki Tajwaruzzaman Khan

Posted on Jun 15 • Edited on Jun 17

can ai actually code? real talk after 8 months on lovable

#vibecoding #chatgpt #claude #gemini

been playing with lovable.dev for almost a year now, building everything from simple dashboards to complex saas platforms. figured it's time to break down what actually works, what doesn't, and how these ai models really stack up when you're trying to ship real products

with the ai showdown giving everyone unlimited free access to test openai, anthropic, and google's models head-to-head in lovable this weekend, there's never been a better time to dive deep into the reality of ai coding

the reality check nobody talks about

first off - yeah, ai can write code. but "writing code" and "building software" are completely different things. i've seen people get hyped about ai generating a perfect react component, then spend 3 days debugging why it doesn't work with their existing codebase

the models have gotten scary good at certain tasks, but they're still fundamentally pattern matching machines. they don't understand your business logic, your technical debt, or why you made that weird architectural decision 6 months ago

what's interesting is that lovable has cracked something most ai coding tools haven't - the full-stack problem. most ai assistants help you write individual functions or components. lovable generates entire applications with frontend, backend, database, and deployment all wired together. that's why they've managed to hit such insane growth metrics - we're talking about a platform that went from zero to $17 million arr in just 90 days with a team of 15 people.

openai models - the swiss army knife that sometimes cuts you

what they're actually good at:

rapid prototyping: need a working mvp in 2 hours? gpt can get you 80% there
boilerplate elimination: crud operations, api endpoints, basic forms - it crushes this stuff
code explanation: paste a gnarly function and it'll break down what it does better than most documentation
cross-language translation: converting python to javascript, sql to mongodb queries, etc
test generation: surprisingly decent at writing unit tests if you give it the function signature
natural conversation flow: feels the most human-like when you're iterating on requirements

where they fall apart:

context switching: loses track of what you're building if the conversation gets long
edge case handling: writes the happy path beautifully, forgets about error states
performance optimization: generates functional code that's slow as hell
integration complexity: individual components work fine, connecting them breaks everything
consistency over long projects: tends to contradict earlier architectural decisions

real example: asked gpt to build a user authentication system. got beautiful login/register forms, jwt handling, password hashing - the works. spent 2 days fixing session management bugs because it didn't handle edge cases like concurrent logins or token refresh properly

prompting strategies that actually work:

instead of: "build user authentication"
try: "create a nextjs api route for user login that:
- accepts email/password via POST
- validates against supabase user table
- returns jwt token with 24hr expiry
- handles incorrect credentials with 401 status
- includes rate limiting for failed attempts"

be stupidly specific. treat it like you're writing requirements for a junior developer who's really smart but has zero context about your project.

the openai models excel at understanding natural language and can pivot quickly when you change requirements mid-conversation. but they can be unpredictable - sometimes generating brilliant solutions, sometimes completely missing obvious issues.

anthropic (claude) - the careful craftsman

claude feels different. it's slower to respond but the code quality is consistently higher. less flashy, more reliable. after using it extensively, i understand why it's become the go-to for serious development work.

what makes claude special:

complex reasoning: handles multi-step logic way better than other models
long context retention: can work with your entire codebase without forgetting what you discussed 50 messages ago
security awareness: naturally includes input validation, sql injection prevention, proper error handling
code review quality: excellent at spotting potential issues in existing code
architecture suggestions: actually understands system design concepts and trade-offs
consistency: maintains architectural decisions throughout long conversations

limitations:

speed: noticeably slower than gpt for simple tasks
creativity: less likely to suggest novel approaches or creative solutions
overly cautious: sometimes refuses to generate code that's perfectly fine but could theoretically be misused
verbose explanations: can over-explain simple concepts

real example: building a payment processing system with stripe. claude not only generated the payment flow but proactively added webhook verification, idempotency keys, proper error logging, and even suggested implementing retry logic for failed payments. gpt would've given me the basic payment intent creation and called it done

getting the most out of claude:

leverage the long context window - paste your entire project structure
ask for security review of generated code
request architectural feedback before implementing major features
use it for code refactoring - it's excellent at improving existing code while maintaining functionality

claude shines when you need reliable, production-ready code. it's less likely to generate something that works in demo but fails in production.

google (gemini) - the speed demon with precision issues

gemini is fast. like, scary fast. but it requires a completely different approach and understanding of its strengths.

where gemini excels:

raw speed: generates code almost instantly
mathematical accuracy: complex algorithms, data structures, mathematical computations
optimization focus: naturally writes more efficient code
google cloud integration: seamless with gcp services
multimodal capabilities: can generate code from ui mockups or diagrams
technical precision: excellent at implementing specific algorithms or data structures

the gotchas:

prompt sensitivity: small changes in wording dramatically affect output quality
less conversational: doesn't handle back-and-forth refinement as well
documentation gaps: generates working code but minimal explanations
context limitations: struggles with large, complex projects
inconsistent quality: can produce brilliant code or completely miss the mark

real example: needed to implement a complex sorting algorithm for a data visualization. gemini delivered a perfect implementation in 30 seconds that was more efficient than what i would've written. but when i asked it to modify the algorithm slightly, it basically rewrote everything instead of making the small change

gemini optimization techniques:

structure prompts with clear sections (requirements, constraints, expected output)
include performance requirements upfront
provide concrete examples of input/output
ask for specific optimizations (time complexity, memory usage, etc)
be very precise about what you want changed when iterating

gemini works best when you know exactly what you want and can communicate it clearly. it's less forgiving of vague requirements but can deliver exceptional results when properly prompted.

the lovable factor: how the platform changes everything

what makes testing these models in lovable unique is the full-stack context. traditional ai coding assistants work in isolation - you're asking them to write a function or component without understanding how it fits into the bigger picture.

lovable gives these models something they've never had before: complete application context. when you ask claude to "add user authentication," it understands that means:

generating the login/register ui components
creating the backend api routes
setting up the database schema
configuring supabase auth
implementing proper error handling across all layers
ensuring the auth state management works with the existing app structure

this is why lovable's growth story is so compelling - they hit $17 million arr in 90 days because they solved the integration problem that makes ai coding tools frustrating for real projects.

the supabase integration advantage

one thing that becomes clear when using these models in lovable is how the supabase integration changes their behavior. instead of generating generic database code, they're working with a specific, well-documented backend-as-a-service platform.

claude particularly excels here - it understands supabase's row level security, realtime subscriptions, and edge functions. gpt is decent but sometimes generates supabase code that doesn't follow best practices. gemini is technically accurate but doesn't leverage supabase's unique features as effectively.

the stuff they all struggle with

state management

none of them really understand complex state flows. they'll generate redux actions that work in isolation but create race conditions in real apps. react context gets mangled when there are multiple providers. zustand stores work fine until you need complex selectors.

async operations

promise chains, concurrent api calls, proper error handling in async contexts - this is where bugs live. they all generate async code that works in the happy path but fails when networks are slow or apis return unexpected responses.

real-world data

they assume clean, well-structured data. real apis return inconsistent formats, missing fields, weird edge cases. none of the models handle this gracefully without explicit instruction.

performance at scale

generated code works fine with 10 records, falls apart with 10,000. pagination, virtualization, efficient queries - these require human oversight.

testing integration

unit tests are fine, integration tests are hit or miss, e2e tests are basically impossible without significant human intervention.

practical workflow that actually works

after months of trial and error, here's my current process that leverages each model's strengths:

phase 1: architecture planning (claude)

start with claude for overall system design. describe the feature, get architectural feedback, understand the trade-offs. claude's long context and reasoning abilities make it ideal for this phase.

phase 2: rapid prototyping (openai)

once architecture is solid, use gpt for fast iteration on ui components and basic functionality. its conversational nature makes it perfect for rapid back-and-forth during the creative phase.

phase 3: optimization and algorithms (gemini)

when you need specific performance optimizations or complex algorithms, switch to gemini. its speed and mathematical precision shine here.

phase 4: code review and security (claude)

bring everything back to claude for final review, security audit, and refactoring. its cautious nature and security awareness catch issues the other models miss.

the lovable multiplier effect

what's fascinating is how these models perform differently in lovable's full-stack environment versus traditional coding assistants:

better integration understanding: they see how frontend changes affect backend requirements
smarter defaults: knowing the supabase stack means better architectural decisions
fewer integration bugs: understanding the deployment target reduces compatibility issues
faster iteration: the visual editor combined with ai generation creates a feedback loop that accelerates development

real-world project breakdown: building a saas analytics dashboard

recently built a saas analytics dashboard using all three models in lovable - here's the detailed breakdown:

project scope:

user authentication and role-based access
real-time analytics dashboard
data export functionality
payment integration with stripe
responsive design across devices

model usage strategy:

claude (40% of interaction time):

overall system architecture and database schema
user authentication and role-based access implementation
stripe integration with proper webhook handling
security review and error handling improvements
code refactoring and optimization suggestions

gpt (50% of interaction time):

react dashboard components and ui elements
api endpoint generation and basic crud operations
responsive design implementation
user onboarding flow and ui/ux improvements
rapid iteration on feature requirements

gemini (10% of interaction time):

complex analytics calculations and data aggregation
chart performance optimization for large datasets
efficient database query optimization
mathematical functions for statistical analysis

results:

total development time: ~40 hours (would've been 80+ without ai)
ai-generated code percentage: ~60%
time spent debugging ai code: ~25% of total
production-ready features: 95% (only minor tweaks needed)

key insights:

the lovable platform's context awareness meant fewer integration issues. models understood how changes in one part of the app affected others. this reduced debugging time significantly compared to using ai assistants in traditional development environments.

credit-saving strategies (because they add up fast)

lovable's pricing is based on message limits, so efficiency matters:

draft prompts externally: write prompts in chatgpt/claude web first, refine them, then use in lovable
batch related requests: ask for multiple related functions in one prompt rather than separate messages
use code review mode: paste working code and ask for improvements rather than generating from scratch
leverage chat history: reference previous code instead of re-explaining context
be specific upfront: detailed requirements in the first prompt save multiple clarification rounds
use the visual editor: make simple ui changes visually instead of prompting for them

the uncomfortable truth about ai coding

it's not replacing developers, but it's definitely changing what we do. i spend way less time writing boilerplate and way more time on:

system design and architecture decisions
code review and quality assurance
user experience and product strategy
debugging complex integration issues
performance optimization and scaling considerations
understanding business requirements and translating them to technical specs

the junior developer who only knows how to implement features from detailed specs is in trouble. the senior developer who understands systems, trade-offs, user needs, and can effectively collaborate with ai is more valuable than ever.

looking ahead: the ai coding landscape in 2025

the models are improving fast. the gap between gpt-4 and the current generation is massive. anthropic's claude opus 4 and sonnet 4 represent a new generation of frontier models, while openai's rumored gpt-5 "arrakis" is expected to support rich multimodal interaction with parameter counts above 52 trillion.

but the fundamental limitations remain:

they don't understand your business context
they can't make product decisions
they struggle with complex system integration across multiple services
they need constant human oversight for production code
they don't understand user behavior and real-world usage patterns

the platform advantage

what lovable has done is solve the context problem. instead of asking models to generate code in a vacuum, they're working within a complete application framework. this changes everything:

models make better architectural decisions
integration bugs decrease significantly
deployment and hosting are handled automatically
team collaboration becomes seamless

this is why lovable's growth trajectory is so remarkable - from gpt engineer's open source viral moment to $17 million arr in 90 days. they've created an environment where ai coding actually works for real projects.

the weekend showdown: what to test

with unlimited free access during the ai showdown weekend, you can finally test all three models side-by-side in the same environment. here's what i'd recommend testing:

for each model, try building:

a simple crud app (todos, inventory, etc.) - tests basic competency
a dashboard with real-time updates - tests complex state management
user authentication with role-based access - tests security understanding
a form with complex validation - tests error handling
integration with external apis - tests real-world connectivity

pay attention to:

first attempt quality: how much works without iteration?
error handling: do they anticipate edge cases?
code organization: is the generated code maintainable?
security considerations: are best practices followed?
performance: does the code scale beyond toy examples?

bottom line

ai coding tools are legit game-changers if you know how to use them. they're terrible if you expect them to build your app for you without guidance or context.

treat them like really smart junior developers - excellent at implementation when given clear requirements, need guidance on architecture, require code review for anything important.

each model has its strengths:

openai: best for rapid prototyping and natural conversation
claude: best for production code and security-conscious development
gemini: best for performance-critical algorithms and technical precision

but the real magic happens when you use them in the right environment. lovable has created that environment - one where ai models have the context they need to generate production-ready applications instead of isolated code snippets.

the future isn't about replacing developers with ai. it's about creating environments where humans and ai can collaborate effectively to build better software faster.

what's your experience been? are you seeing similar patterns or completely different results?

also curious - which model do you reach for first when starting a new feature? and if you're jumping into the showdown this weekend, what are you planning to build?

this analysis is based on 8 months of real-world usage in lovable, building everything from simple prototypes to production saas applications. your mileage may vary, but these patterns have been consistent across dozens of projects.

Top comments (5)

Md. Sohel Rahman • Jun 17

cool write-up! honestly this captures the reality of using these models on lovable pretty well

been bouncing between all three and your breakdown is spot on. couple things from actually using them:

openai (gpt-4) - yeah it's the swiss army knife but sometimes too eager to "help." like you ask it to fix one component and it rewrites half your app. the boilerplate generation is unreal though

claude (anthropic) - the long context thing is huge when you're working on bigger projects. it actually remembers what you built 50 prompts ago. way less likely to break existing code when making changes

google (gemini) - blazing fast but can be weirdly picky about how you phrase things. when it works though, the code quality is usually solid

biggest pain point i've found is none of them are great at understanding lovable's specific quirks.

Rose Clevee • Jun 17

Impressive breakdown!

Mohammad Moniruzzaman Manik • Jun 17

Cool stuff dude!

Aayush • Nov 13

f you’re exploring full-stack AI building beyond Lovable, definitely try Imagine.bo — it’s been giving me cleaner code, fewer backend bugs, and faster refinement cycles overall.
Plus, your first build unlocks 150 free PRO credits automatically.

Lee Schwartz • Jul 17

Great work my man! Have you taken Claude code for a spin yet? Curious to hear how that inside a Cursor or VS Code stacks up to Lovabke the way you describe using it.