zhongqiyue

Posted on Jun 3

I Struggled to Get AI to Write Useful Code — Here's What Finally Worked

#ai #webdev #productivity #python

Last month, I had to build a dozen API endpoints for a new microservice. I knew the patterns – CRUD operations, SQLAlchemy models, Pydantic schemas – but typing out all that boilerplate felt soul-crushing.

I turned to AI, hoping it would save me hours. What followed was a rollercoaster of bad outputs, hallucinations, and frustration. But after a week of failed attempts, I found a prompting approach that actually produced reliable, copy-pasteable code.

This isn't a “just use this tool” story. It's about the technique that finally worked for me, with real examples you can adapt.

The Problem: AI Kept Writing Garbage

I started with the obvious: “Write a Flask route for creating a user.” The AI spat back something like:

@app.route('/users', methods=['POST'])
def create_user():
    data = request.json
    # Create user logic
    return jsonify({'id': 1, 'username': 'john'})

Hardcoded response! Not even using the input. I tried again: “Write a proper endpoint with validation.” It gave me a mix of Flask and FastAPI syntax. Maddening.

I realized the problem was me. I was asking AI to read my mind. It didn't know my database schema, error handling patterns, or even which ORM I used.

What Didn't Work

I tried several things that made things worse:

Vague prompts: “Generate a user CRUD” → got a generic mess with no imports
One-shot without context: No examples → model guessed wildly
Assuming it remembered previous messages: In stateless API calls, every prompt is a fresh start
Asking for too much at once: A single prompt to write the entire module → output often cut off or inconsistent

The Breakthrough: Structure Before Generation

I stopped treating the AI like a senior developer and started treating it like a diligent intern who needs extremely detailed instructions. The key was to provide:

Explicit input/output format – “Given a model definition in JSON, produce a Flask blueprint file.”
Few-shot examples – Show two or three complete input/output pairs before asking it to generate.
A system message setting the role – “You are a Python backend developer. You write clean, type-annotated code following Flask best practices.”

Here's the actual approach that clicked for me.

The Few-Shot Prompt Template

I built a prompt generator in Python. It takes a schema description and produces a prompt with examples:

import requests

def generate_endpoint_code(schema_json: dict) -> str:
    system_msg = "You are a Python backend developer. You write Flask routes with SQLAlchemy and Marshmallow. Always include error handling and docstrings."

    examples = [
        {
            "input": '{"model": "Project", "fields": ["id", "name", "created_at"]}',
            "output": '''
# routes/projects.py
from flask import Blueprint, request, jsonify
from models import db, Project
from schemas import ProjectSchema

projects_bp = Blueprint('projects', __name__)

@projects_bp.route('/projects', methods=['POST'])
def create_project():
    schema = ProjectSchema()
    try:
        data = schema.load(request.json)
    except ValidationError as err:
        return jsonify(err.messages), 400
    project = Project(**data)
    db.session.add(project)
    db.session.commit()
    return jsonify(schema.dump(project)), 201
'''
        }
    ]

    # Build the prompt with examples + new request
    user_prompt = f"""
Generate a Flask endpoint for creating a resource based on the following schema:
{schema_json}

Use the same style as the examples below.
Here is an example:

Example input:
{examples[0]['input']}

Example output:
{examples[0]['output']}

Now generate for this input:
{schema_json}
"""

    response = requests.post(
        "https://api.openai.com/v1/chat/completions",
        headers={"Authorization": "Bearer YOUR_KEY"},
        json={
            "model": "gpt-4",
            "messages": [
                {"role": "system", "content": system_msg},
                {"role": "user", "content": user_prompt}
            ],
            "temperature": 0.2  # low creativity for deterministic output
        }
    )
    return response.json()['choices'][0]['message']['content']

The Result

After this change, the AI started generating code that:

Actually used request.json
Imported the correct modules
Had consistent error handling
Used my naming conventions (because the example showed them)

I could then copy the output directly into my project, only tweaking a few details.

Lessons Learned & Trade-offs

This approach isn't magic. Here's what I wish someone had told me:

You still need to review everything. AI hallucinates imports or uses non-existent methods. I caught a from sqlalchemy import Column, Integer, String that was fine, but it once invented a from flask_sqlalchemy import db that doesn't exist without extension setup.
Few-shot works best for repetitive patterns. For complex business logic with multiple conditions, AI often makes mistakes. I still write those manually.
Temperature matters. Setting temperature to 0.0 gave extremely predictable but sometimes stilted code. I settled on 0.2.
Context length is a limit. My examples took up tokens, leaving less room for the actual generation. For large schemas, I had to shorten the examples.
It's not free. API calls cost money. For a quick script, it's fine; for generating hundreds of endpoints, you might prefer a template engine.

When NOT to Use This

When the pattern is trivial and you can type it in 10 seconds.
When security is critical – AI might skip authentication checks.
When you need to adhere to a very specific code style that the model hasn't seen.

What I'd Do Differently Next Time

Invest in a local model. For privacy and cost, I'm now trying Llama 3 with a quantized version on my machine. The quality is close.
Use a schema compiler. Instead of generating code with AI, I'd use a tool like datamodel-code-generator for Pydantic models and only use AI for the wiring (routes, error handling).
Build a prompt library. I keep reusable prompt fragments for different tasks (models, routes, tests) so I don't start from scratch.

Final Thoughts

AI code generation can save you from burnout on boilerplate, but it's not a silver bullet. The trick is to give the model the right amount of structure – not too vague, not too rigid. Embrace few-shot prompting. And always, always review the output.

I'm still early in this journey. How do you handle AI-generated code in your workflow? Do you rely on prompts, or have you built a custom tool around it? I'd love to hear what works for you.

DEV Community