AI can churn out code faster than you can say “Stack Overflow”, but can it build software that is actually reliable?
There’s no denying that AI-assisted coding with tools like ChatGPT, Claude, and others has changed the game. They can autocomplete functions, generate boilerplate code, and even refactor entire chunks of a project. But here’s the catch—AI doesn’t understand code the way an experienced developer does. AI doesn’t have the scars of battle-tested experience—those hard-earned lessons from debugging nightmares, handling bizarre edge cases, and wrestling with unexpected challenges in production.
AI confidently generates what looks right, yet subtle errors can slip in—the kind that might go unnoticed in a marketing copy but, in software, can snowball into sneaky, hard-to-trace bugs that might show up at the worst possible moment. And if you don’t fully grasp the code AI is producing, you might end up in serious trouble faster than you expect.
The illusion of perfect code: When AI gets it wrong
AI-generated code might look polished and ready to go, but appearances can be deceiving. AI can introduce subtle, dangerous mistakes. These aren’t your average syntax errors that a compiler will catch. These are logic flaws—silent troublemakers that can lurk in your code for weeks or months before causing havoc.
To understand how AI can introduce subtle but serious bugs, let’s start with a simple example and then examine a real-world scenario where the stakes are much higher.
The “Infinite loop” that can bring down production
AI might suggest an automatic retry mechanism (the while
loop) without properly handling error or exit conditions. This might lead to a server getting flooded with requests in a failure scenario.
Consider this example of an API call:
# AI-generated code for an API call
import requests
def fetch_data():
while True:
try:
response = requests.get("https://api.example.com/data")
if response.status_code == 200:
return response.json()
except Exception:
pass # ignore
What's wrong?
Interestingly, the pass
statement in the exception handling introduces a subtle bug here by completely suppressing errors, making debugging difficult and leaving the caller unaware of failures. The while True
loop creates an infinite retry mechanism with no exit condition. If the API goes down, this code will hit the server relentlessly, potentially causing a a DoS (denial-of-service) attack, which could bring it down.
How to fix these problems caused by AI code?
Here is a better way to write this code with retry limits, and handling the exception with exponential backoff - a retry strategy where a system waits increasingly longer between each retry (e.g., 1s, 2s, 4s, 8s…) to avoid overwhelming a service and improve stability.
import time
import requests
def fetch_data():
retries = 0
while retries < 5: # Limit retries
try:
response = requests.get("https://api.example.com/data")
if response.status_code == 200:
return response.json()
except Exception:
retries += 1
time.sleep(2 ** retries) # Exponential backoff
While this simple example shows a rather trivial and clear mistake, most modern AI tools (like GitHub Copilot or ChatGPT) would likely avoid such an obvious error. However, this doesn’t mean AI-generated code is perfect. Instead, the real danger lies in subtle, hard-to-detect bugs that can creep into more complex scenarios—bugs that even experienced developers might miss without careful review. Consider the next example -
User Registration with Flask and Celery
Let’s examine a more realistic scenario: a user registration feature where account setup is typically handled as a background task using Celery (skipping result URL for simplicity here). Here’s the AI-generated code:
from flask import Flask, request, jsonify
from celery import Celery
import time
app = Flask(__name__)
app.config['CELERY_BROKER_URL'] = 'redis://localhost:6379/0'
celery = Celery(app.name, broker=app.config['CELERY_BROKER_URL'])
# In-memory "database" for simplicity
users = {}
@celery.task
def setup_user_account(user_id):
# Fetch user
user = users.get(user_id)
if not user:
print(f"User {user_id} not found!")
return
# Simulate a time-consuming setup process
time.sleep(5)
# Update user status to "active"
user['status'] = 'active'
print(f"User {user_id} setup complete. Status: {user['status']}")
# Send welcome mail to the user
@app.route('/register', methods=['POST'])
def register_user():
data = request.json
user_id = data.get('user_id')
email = data.get('email')
if not user_id or not email:
return jsonify({"error": "user_id and email are required"}), 400
# Create user with "pending" status
users[user_id] = {"email": email, "status": "pending"}
# Trigger background task
setup_user_account.apply_async(args=[user_id])
return jsonify({"message": "User registered successfully!", "user_id": user_id})
if __name__ == '__main__':
app.run(debug=True)
What’s Wrong?
Note that the code simplified for this article—so no DB operations, or duplicate checks here. At first glance, the code seems fine (especially to inexperienced developers)—it creates a user with a "pending"
status, triggers a background task to complete the setup, and updates the status to "active"
once done. But there are quite a few problems here:
-
No error handling: If the task fails (e.g., due to an exception), the user’s status remains
"pending"
indefinitely, leaving the system in an inconsistent state. - No retry mechanism: If the task fails due to transient issues (network problems, database timeouts), there's no system to retry the operation. This means the user’s account setup might never complete.
-
Race condition bug: The
setup_user_account(...)
code does some setup work (simulated withsleep(5)
, then updates the status without checking if it changed meanwhile. If two tasks process the same user concurrently, they could overwrite each other's changes, potentially causing data corruption. - Non-Idempotent operations: If the task runs multiple times for the same user (due to duplicated messages or manual retries), it will perform the same operations repeatedly without checking if they've already been done.
How to fix these problems caused by AI code?
Here’s how to improve the code to handle these issues:
from flask import Flask, request, jsonify
from celery import Celery
import time
app = Flask(__name__)
app.config['CELERY_BROKER_URL'] = 'redis://localhost:6379/0'
celery = Celery(app.name, broker=app.config['CELERY_BROKER_URL'])
# In-memory "database" for simplicity
users = {}
@celery.task(bind=True, max_retries=3)
def setup_user_account(self, user_id: str):
try:
# Fetch user
user = users.get(user_id)
if not user:
print(f"User {user_id} not found!")
return
# Check if operation has already been performed (idempotency)
if user['status'] == 'active':
print(f"User {user_id} already active, skipping setup.")
return
# Simulate a time-consuming setup process
time.sleep(5)
# Atomic update with proper check to avoid race conditions
if user['status'] == 'pending': # Check current state
user['status'] = 'active' # Update only if still in expected state
print(f"User {user_id} setup complete. Status: {user['status']}")
# Send welcome mail to the user only on first successful activation
else:
print(f"User {user_id} status changed unexpectedly to {user['status']}")
except Exception as e:
# Log the error and retry with exponential backoff
print(f"Error setting up account for user {user_id}: {e}")
raise self.retry(exc=e, countdown=2 ** self.request.retries)
@app.route('/register', methods=['POST'])
def register_user():
data = request.json
user_id = data.get('user_id')
email = data.get('email')
if not user_id or not email:
return jsonify({"error": "user_id and email are required"}), 400
# Create user with "pending" status
users[user_id] = {"email": email, "status": "pending"}
# Trigger background task
setup_user_account.apply_async(args=[user_id])
return jsonify({"message": "User registered successfully!", "user_id": user_id})
if __name__ == '__main__':
app.run(debug=True)
This improved code offers several key advantages over the AI-generated code:
-
Error handling and Retry mechanism: The improved code uses a proper
try/except
block with a retry mechanism that attempts the operation up to 3 times with increasing delays between attempts (exponential backoff). This ensures that transient issues (e.g., network failures ) are handled gracefully. -
Race condition protection: The improved code checks the current state before making changes, ensuring that updates only happen if the user is still in the expected state (
pending
). This prevents concurrent tasks from corrupting each other's work. - Idempotency: The code now checks if the user is already active before attempting to activate them again, making the operation idempotent. If the task runs multiple times, subsequent runs will detect that the work is already done.
-
Minor improvement: Python added typing hints relatively late (PEP 484). Since most GenAI text models (such as older ChatGPT, Gemini, Claude) are trained on the older code samples, they often skip type hints in their generated code. When manually revising AI code, experience developers would typically add these hints (like
def setup_user_account(self, user_id: str)
) to help IDEs and linters catch potential errors before runtime.
These improvements transform the fragile AI-generated code into a robust one that can handle real-world conditions like network failures, concurrent operations, and duplicate task executions.
It is important to note that this is a simplified example - actual real-life code will have far more complexities to handle DB with proper transaction mechanism, or to setup different systems for the newly registered user.
AI coding problems: Lessons from my own experience
While the examples above suggest how AI-generated code can overlook critical programming considerations, I’ve also encountered real-world issues when using GenAI models in my own coding work:
- Struggling with unique data generation: While writing an article on database indexing, I needed thousands of records with unique names and email IDs to demonstrate indexing impact. ChatGPT managed about 100+ unique entries before it started generating duplicates. Instead of removing multiple duplicates, I knew I was better off writing a short Python script to generate the necessary DB insert statements myself.
-
Hallucinated methods in AI responses: One of the more surprising moments came when Gemini confidently suggested non-existent methods for Python’s
aiohttp
library. I double-checked multiple versions—those methods simply didn’t exist. Gemini AI had just made them up. - Struggles with building a complete app: I recently tried using ChatGPT and Claude to generate a small but complete application. Initially, I provided an image of the UI I had in mind and asked them to build the app—both failed miserably, with Claude producing more convoluted but still incorrect code. Then, I changed my approach, describing the app’s functionality and asking them to generate the UI (React) and backend (Flask). Even then, fixing one issue often broke something else, leading to a frustrating loop. Both couldn't deal with entire context of the application, despite it being a small application. In the end, I had to build it piece by piece using my own architecture and design, which finally worked. These days, I continue with this piece-by-piece approach—using AI for small code snippets while keeping a watchful eye on its output.
Why AI misses coding nuances?
Surprising as it may sound, AI doesn’t really understand code—it predicts patterns based on vast datasets of existing examples. If those datasets contain outdated or poorly written code, AI can unknowingly replicate bad practices, even if the output looks clean. It may generate code based on older libraries that are no longer supported (deprecated), hallucinate non-existent methods, or overlook critical security considerations—especially when using general-purpose GenAI text models like ChatGPT, Gemini, or Claude for code generation.
And here’s the kicker: AI doesn’t inherently think about security risks. Unless explicitly prompted, it won’t remind you to follow best practices, like referring credentials from an .env
file instead of putting them in code, or protecting yourself against SQL injections and so on. That’s where developer knowledge and experience come into play.
To be fair, the examples used here are relatively simple—most specialized coding AI assistants, like GitHub Copilot or Cursor AI, would likely get them right on the first try. Moreover, dedicated AI coding platforms are evolving rapidly, and from what I’ve heard, they’re generating increasingly impressive results. However, it is important to understand that while AI can generate syntactically correct code, it may introduce security vulnerabilities, architectural inefficiencies, or unexpected behavior due to a lack of true comprehension. It can generate syntactically correct code, but that doesn’t mean the code is always reliable, secure, or maintainable.
Programming is not just writing code
There’s a big difference between writing code that works somehow and writing code that keeps working flawlessly. AI can handle the former, but the latter takes some experience.
Real-world programming isn’t just about codifying a few logical steps to make something run—it’s about designing readable, maintainable, adaptable systems that don’t collapse under their own weight the moment requirements change (because they *will* change). AI doesn’t have that kind of foresight—it lacks the battle-tested wisdom that comes from debugging disasters, refactoring legacy code, and making trade-offs that only years of hands-on problem-solving can teach.
The higher-order skills that AI can’t replace
Software development isn’t just about writing code—it’s about making the right decisions before a single line is even written. Skills like prudent design, weighing trade-offs, and building for the unknown are more like creativity than mere logic. They have a high ceiling that AI can’t easily reach.
AI can churn out code quickly, but it struggles with the nuanced art of crafting robust, adaptable solutions that stand the test of time. Humans, on the other hand, develop nuanced professional insights through years of solving real-world problems—something AI just can’t replicate. At least, not yet!
Why human judgment still matters
Generative AI models can produce text in seconds based on your prompt—whether it’s a newsletter draft or a working Python MVP. But in software development, the stakes are much higher. A small error in a newsletter won’t cause much trouble—you can always send a follow-up or a quick ‘PS’ to fix it. A hidden bug in production, on the other hand, can cost millions, damage reputations, or even put lives at risk in critical systems like healthcare and aviation.
That’s why context, accuracy, and human judgment aren’t just useful in software development—they’re absolutely essential.
AI is undoubtedly a powerful tool, but it is only as good as the hands that guide it. I cannot stress this enough—having an expert developer or architect review and refine AI-generated code isn’t just important; it’s non-negotiable. AI can assist, but only human oversight ensures correctness, reliability, and sound architectural decisions.
How to use AI wisely in your work
Using AI wisely in your work deserves a separate blog-post of its own, but here are some key principles to keep in mind while using AI for software development:
- Review everything: Treat AI-generated code as a first draft. Always review it line by line, looking for subtle errors or inefficiencies.
- Understand the code: Don’t just copy and paste. Make sure you understand what the code does and how it works.
- Test rigorously: Use automated tests, manual testing, and code reviews help catch hidden bugs before they turn into costly problems.
- Leverage AI for the right tasks: Allow AI handle boilerplate code, writing unit tests, documentation, or automation so you can focus on real problem-solving.
- Learning with AI: AI can be an excellent tutor for beginners, offering insights on almost any topic. It’s great for exploring new programming languages or unfamiliar fields—I’ve certainly enjoyed diving into eclectic topics with its help.
The future belongs to developers who adapt
AI isn’t here to replace developers—it’s here to push us to evolve. The ones who thrive won’t be those who blindly follow AI’s suggestions, but those who wield it as a powerful tool to amplify their creativity, efficiency, and problem-solving skills.
While asking your AI assistant to “Pls fix” might seem like the quickest route, understanding why something works (or doesn’t) is often not just the better path—but sometimes even the shorter one.
AI isn’t just knocking on the door of the tech industry—it has already stormed in, taken a seat at the table, and started rewriting the rules across every field. From coding to design, from legal briefs to medical breakthroughs, it’s weaving itself into industries at an unstoppable pace. AI is becoming as ubiquitous as Wi-Fi. And just like Wi-Fi, you don’t need to know the intricacies of how it works—but if you don’t know how to connect, you’ll be stuck buffering while the world streams ahead.
In the end, the developers who embrace AI as an ally—not a crutch—will be the ones shaping the future. Those who resist adapting may find the industry moving forward without them.
So, the real question is: which side of the future do you want to be on?
Updates: 19 March 2025
Update 1 — A real-world example of AI-generated code gone wrong
This is what I've written in this article -
If you don’t fully understand the code AI is producing, you might end up in serious trouble faster than you expect.
AI doesn’t inherently think about security risks. Unless explicitly prompted, it won’t remind you to follow best practices, like....
And here is a real-life post that exemplifies what I was saying. 👇
Update 2 — Anthropic CEO on AI Coding
"In 12 months, we may be in a world where AI [artificial intelligence] is essentially writing all of the code," said Anthropic CEO and Cofounder Dario Amodei. Watch it here -
Amodei says a programmer will still need to specify certain conditions of what the AI model is attempting to execute (such as, whether it is a secure design or insecure design and other considerations), and argues that "human productivity will actually be enhanced" by AI. (You can watch more of his insights here).
2025 is shaping up to be an interesting year for AI. Maybe I’ll write another blog post here on how to use it to generate correct code effectively—once I’ve wrestled with it enough to come away with some actually useful insights! ¯\_(ツ)_/¯
What’s your experience with AI-assisted coding? Share your thoughts in the comments below.
Top comments (0)