mukitaro

Posted on Feb 5

How I Added AI-Powered Code Generation to My Typing Practice App Using Gemini API

#webdev #ai #nextjs #gemini

The Feature I Wanted to Build

I've been working on DevType, a typing practice app for programmers. The app comes with a set of pre-made code snippets, but I kept getting the same request from users:

"Can I practice typing my own code?"

Fair enough. So I added a custom problem feature. Users can paste their own code snippets and practice typing them. That worked, but then came the next question:

"What if I don't have code ready? Can you suggest something?"

I sat there thinking about this for a while. Manually curating code snippets for every possible topic users might want? That's a losing battle. Then it hit me - why not let AI generate it? The user just describes what they want to practice, and the AI creates a custom snippet instantly.

What the Feature Looks Like

Here's the flow in action:

User clicks "Generate with AI" button
A dialog opens asking for:
- What kind of code they want (natural language prompt)
- Programming language (15 options)
- Difficulty level (1-5)
AI generates the code, title, and description
User can edit, save, and start practicing

The whole thing takes about 2-3 minutes.

Why Gemini?

I actually started with Anthropic's Claude API. It works great, but I wanted a cheaper option for this feature since it's called frequently.

Gemini Flash is significantly cheaper for simple generation tasks like this. For comparison:

Anthropic Claude: ~$3 per million input tokens
Gemini Flash: ~$0.075 per million input tokens

That's a 40x difference. For a hobby project, that matters.

I kept both providers and added fallback logic - if one fails, try the other.

Setting Up the Gemini API

First, get an API key from Google AI Studio. Then install the SDK:

npm install @google/generative-ai

Basic usage is straightforward:

import { GoogleGenerativeAI } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
const model = genAI.getGenerativeModel({
  model: "gemini-2.5-flash",
  systemInstruction: "Your system prompt here",
});

const result = await model.generateContent("User prompt here");
const text = result.response.text();

The systemInstruction parameter is key - it sets the AI's role and rules before the user's request comes in.

The Prompt Engineering Part

This is where I spent most of my time. Getting consistent, usable output from an LLM requires careful prompt design.

System Prompt Structure

Here's the actual system prompt I use:

const systemPrompt = `You are a code generator for a typing practice game called DevType.
Your task is to generate realistic, practical code snippets that users will type to practice coding.

Rules:
1. Generate syntactically correct code in ${languageName}
2. Use common patterns and best practices for the language
3. Include appropriate variable names and structure
4. Do NOT include comments - they will be stripped anyway
5. Target difficulty level ${difficulty}/5:
   - Level 1: Very simple, 20-50 characters (basic variable declarations, simple functions)
   - Level 2: Simple, 50-100 characters (short functions, basic logic)
   - Level 3: Medium, 100-200 characters (moderate functions, some complexity)
   - Level 4: Complex, 200-350 characters (multiple functions, more logic)
   - Level 5: Advanced, 350-500 characters (complex algorithms, multiple components)
6. Make the code educational and representative of real-world patterns
7. The code should be self-contained and make sense on its own

IMPORTANT: Respond ONLY with valid JSON in this exact format, no markdown, no code blocks:
{
  "title": "A short descriptive title (max 50 chars)",
  "code": "The generated code here",
  "description": "A brief description of what the code does (1-2 sentences)",
  "executionOutput": "Standard output only if the code prints something, otherwise empty string"
}`;

Why These Specific Rules?

No comments - Users are practicing typing, not reading. Comments slow them down and aren't part of the "coding" experience.

Character limits per level - This took trial and error. Too short feels pointless, too long gets tedious. The ranges I landed on feel right for typing practice.

Self-contained code - The snippet needs to make sense without external context. No import statements that reference non-existent files.

JSON output format - I need structured data, not free-form text. Specifying the exact format prevents parsing headaches.

The "executionOutput" field - I added this later. Users like seeing what their code would output when run. It makes the practice feel more real.

Handling AI Response Parsing

LLMs don't always follow instructions perfectly. Sometimes they wrap JSON in markdown code blocks, sometimes they add explanatory text before or after.

Here's my parsing approach:

let generated: GeneratedProblem;
try {
  // Try to extract JSON from anywhere in the response
  const jsonMatch = responseText.match(/\{[\s\S]*\}/);
  if (!jsonMatch) {
    throw new Error("No JSON found in response");
  }
  generated = JSON.parse(jsonMatch[0]);
} catch {
  console.error("Failed to parse AI response:", responseText);
  return NextResponse.json(
    { error: "Failed to parse AI response" },
    { status: 500 }
  );
}

// Validate required fields
if (!generated.title || !generated.code) {
  return NextResponse.json(
    { error: "Invalid AI response: missing required fields" },
    { status: 500 }
  );
}

The regex /\{[\s\S]*\}/ finds the first JSON object in the response, even if there's garbage text around it. It's not bulletproof, but it handles most edge cases.

Provider Abstraction

Since I use two AI providers, I created an abstraction layer:

export type AIProvider = "anthropic" | "gemini";

export function getAIProvider(): AIProvider {
  const provider = process.env.AI_PROVIDER?.toLowerCase();
  if (provider === "gemini") return "gemini";
  return "anthropic"; // default
}

export async function generateWithAI(options: GenerateOptions): Promise<string> {
  const provider = getAIProvider();

  if (provider === "gemini") {
    try {
      return await generateWithGemini(options);
    } catch (error) {
      console.error("Gemini failed, falling back to Anthropic:", error);
      if (process.env.ANTHROPIC_API_KEY) {
        return await generateWithAnthropic(options);
      }
      throw error;
    }
  }

  // Similar fallback logic for Anthropic -> Gemini
  // ...
}

This gives me:

Easy switching between providers via environment variable
Automatic fallback if one provider has issues
Consistent interface regardless of which provider is used

Frontend: The Generate Dialog

The UI is a simple dialog with three inputs:

export function AIGenerateDialog({ open, onOpenChange, onGenerated }) {
  const [prompt, setPrompt] = useState("");
  const [language, setLanguage] = useState("python");
  const [difficulty, setDifficulty] = useState(3);
  const [isGenerating, setIsGenerating] = useState(false);

  const handleGenerate = async () => {
    setIsGenerating(true);

    const response = await fetch("/api/generate-problem", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ prompt, language, difficulty }),
    });

    const generated = await response.json();
    onGenerated(generated, language);
    onOpenChange(false);
    setIsGenerating(false);
  };

  return (
    <Dialog open={open} onOpenChange={onOpenChange}>
      <DialogContent>
        <Textarea
          value={prompt}
          onChange={(e) => setPrompt(e.target.value)}
          placeholder="e.g., A function that checks if a number is prime"
        />
        {/* Language selector, difficulty buttons, etc. */}
        <Button onClick={handleGenerate} disabled={isGenerating}>
          {isGenerating ? "Generating..." : "Generate"}
        </Button>
      </DialogContent>
    </Dialog>
  );
}

Nothing fancy here. The interesting part is what happens on the server.

Things I Learned

1. Be explicit about output format

"Return JSON" isn't enough. You need to show the exact structure you want, with field names and example values.

2. Character counts are tricky

"Generate a short function" means different things to different LLMs (and different runs). Specific character ranges give more consistent results.

3. Fallback is essential

APIs fail. Rate limits hit. Having a backup provider saved me multiple times.

4. Users will try everything

Someone will type "generate malware" or "write code that crashes the system." Your system prompt should handle adversarial inputs gracefully. The typing practice context naturally limits the damage - it's just code text that gets displayed, not executed.

5. Cost adds up fast

Even cheap APIs become expensive at scale. I'm considering adding rate limits per user to prevent abuse.

Try It Yourself

If you want to see this in action:

Go to DevType
Sign up (GitHub or Google auth)
Go to Practice > Create New
Click "Generate with AI"

You can generate code in 15 languages: Python, JavaScript, TypeScript, Go, Rust, Java, C, C++, C#, PHP, Ruby, Kotlin, Swift, SQL, and Shell.

What's Next

I'm thinking about:

Smarter difficulty detection - Automatically adjust difficulty based on the generated code's actual complexity
Multiple variations - Generate 3 options and let the user pick
Learning paths - "Practice async/await in JavaScript" generates a progressive series of problems

Resources

What AI features have you added to your projects? I'd love to hear about your prompt engineering adventures in the comments.

DEV Community