DEV Community

Cover image for Teaching Claude to Play Tetris with 100 App Store Characters
LazyDev_OH
LazyDev_OH

Posted on • Originally published at gocodelab.com

Teaching Claude to Play Tetris with 100 App Store Characters

The App Store keyword field is exactly 100 characters. Commas only, no spaces, no duplicates. You need to pack 15–20 keywords inside.

I tried writing those by hand for a dozen apps. Every time I'd leave characters on the table — a rogue space after a comma, a singular/plural duplicate Apple would auto-match anyway. Manual packing is tedious enough that most indie developers just don't iterate on ASO.

So I built an AI that does it. This post is the actual implementation — prompts, JSON schemas, validation, and the gotchas that killed my first three attempts. I ship this in my ASO tool for iOS developers (Apsity), but the approach works for any tight-constraint text-generation problem.

The Constraints That Break Generic LLMs

When you ask any LLM "generate App Store keywords for my budget app," you get something like:

budget tracker, expense manager, spending analysis,
money manager, personal finance, bill tracker
Enter fullscreen mode Exit fullscreen mode

Readable. Useless. Two characters wasted on every , (space after comma). Four characters wasted on personal finance because Apple auto-matches personal + finance separately. Total wasted: roughly 30% of your 100.

The rules that matter:

  1. Exactly ≤100 characters (including commas)
  2. Single comma separators, no spaces
  3. No duplicate tokens (Apple ignores them anyway)
  4. No singular+plural pairs (Apple auto-matches)
  5. Shorter tokens > compound words (Apple combines them for you)
  6. No competitor brand names (trademark rejection)
  7. No category names, and no app, free, new, best, iPhone, iPad (Apple auto-indexes all of these)
  8. Mix function + situation + alternative keywords

An LLM without these constraints spelled out won't enforce them. Generic "write keywords" prompts fail rules 1–4 consistently.

Why Claude Sonnet

I tested GPT-5, Gemini 2.0 Pro, and Claude Sonnet 4.6 on the same task. Three metrics:

  • Character compliance — stays under 100 chars without excess whitespace
  • JSON schema adherence — returns exactly the structured output I asked for
  • Edge case handling — catches duplicates, plural forms, category name leaks

Claude Sonnet won on all three, but the meaningful gap was edge case handling. When I explicitly said "no duplicates including singular/plural pairs," Claude filtered them out. The others listed budget and budgets and called it done — which is wrong, because Apple's algorithm auto-indexes plurals from the singular form anyway. A keyword duplicated across singular/plural just wastes characters.

I'm also passing a lot of context — competitor review snippets, current rankings, market-specific search trends. Sonnet 4.6's 1M-token context window handles it without trimming.

The Prompt Structure

The prompt is in three layers: system prompt (the rules), user prompt (the app context), and a JSON schema Claude must match.

// lib/keyword-generator.ts
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const SYSTEM_PROMPT = `
You are an ASO (App Store Optimization) keyword specialist.
Generate keywords for the app store "Keywords" field, which has
a STRICT 100-character limit. Characters include commas.

Rules (apply in order):
1. Total output length MUST be ≤100 characters
2. Use ONLY commas as separators, no spaces after commas
3. No duplicate tokens
4. No singular+plural pairs (Apple auto-matches both)
5. Prefer short atomic tokens over compound words
   (Apple combines A + B into "A B" automatically)
6. No competitor brand names (trademark violation)
7. No category names and no words Apple already indexes automatically:
   app, free, new, best, iPhone, iPad, or any category label
8. Blend three keyword types:
   - Function (what the app does)
   - Situation (when users need it)
   - Alternative (different names for the same thing)

Return JSON with this schema:
{
  "keywords": string[],         // individual tokens, no commas inside
  "joined": string,             // comma-joined, must be ≤100 chars
  "char_count": number,         // .length of "joined"
  "coverage_notes": string[]    // which search queries this covers
}
`;

type KeywordOutput = {
  keywords: string[];
  joined: string;
  char_count: number;
  coverage_notes: string[];
};
Enter fullscreen mode Exit fullscreen mode

The JSON schema isn't just for structure. char_count forces Claude to count the output itself — models aren't great at counting, but self-reporting forces a pass where the model checks its own work.

Generating Keywords

async function generateKeywords(context: {
  app_name: string;
  description: string;
  competitors: string[];
  existing_keywords?: string[];
  target_market: string;
}): Promise<KeywordOutput> {
  const userPrompt = `
App: ${context.app_name}
Description: ${context.description}
Target market: ${context.target_market}
Competitor apps (do NOT use these names): ${context.competitors.join(", ")}
${context.existing_keywords ? `Currently underperforming keywords to replace: ${context.existing_keywords.join(", ")}` : ""}

Generate an optimal 100-character keyword field.
Before finalizing, count your characters and confirm it fits.
`;

  const response = await client.messages.create({
    model: "claude-sonnet-4-6",
    max_tokens: 1024,
    system: SYSTEM_PROMPT,
    messages: [{ role: "user", content: userPrompt }],
  });

  const text = response.content[0].type === "text"
    ? response.content[0].text
    : "";

  const match = text.match(/\{[\s\S]*\}/);
  if (!match) throw new Error("No JSON in response");

  return JSON.parse(match[0]) as KeywordOutput;
}
Enter fullscreen mode Exit fullscreen mode

Straightforward Anthropic SDK call. Two things worth noting:

  1. max_tokens: 1024 — keywords are short, so we don't need more. Capping reduces cost.
  2. JSON extraction via regex — Claude sometimes wraps JSON in explanation text. Grabbing the first {...} block is more reliable than asking for raw JSON.

Validation Is Where Production Code Lives

Claude gets the constraints right ~85% of the time. Production code has to handle the other 15%.

// lib/validate-keywords.ts
import { z } from "zod";

const KeywordSchema = z.object({
  keywords: z.array(z.string()),
  joined: z.string(),
  char_count: z.number(),
  coverage_notes: z.array(z.string()),
});

export function validateKeywords(output: unknown): {
  ok: boolean;
  issues: string[];
  data?: KeywordOutput;
} {
  const parsed = KeywordSchema.safeParse(output);
  if (!parsed.success) {
    return { ok: false, issues: ["invalid JSON shape"] };
  }

  const issues: string[] = [];
  const { keywords, joined, char_count } = parsed.data;

  // 1. Length check
  if (joined.length > 100) {
    issues.push(`joined is ${joined.length} chars, exceeds 100`);
  }

  // 2. Trust but verify char_count
  if (joined.length !== char_count) {
    issues.push(`char_count mismatch: claimed ${char_count}, actual ${joined.length}`);
  }

  // 3. Commas only, no spaces
  if (joined.includes(", ")) {
    issues.push("contains ', ' — spaces after commas waste characters");
  }

  // 4. Reconstruct and compare
  const reconstructed = keywords.join(",");
  if (reconstructed !== joined) {
    issues.push("keywords array doesn't match joined string");
  }

  // 5. Duplicate detection (case-insensitive)
  const seen = new Set<string>();
  for (const k of keywords) {
    const lower = k.toLowerCase();
    if (seen.has(lower)) {
      issues.push(`duplicate token: ${k}`);
    }
    seen.add(lower);
  }

  // 6. Singular/plural detection (basic)
  for (const k of keywords) {
    const plural = k.toLowerCase() + "s";
    const singular = k.toLowerCase().replace(/s$/, "");
    if (seen.has(plural) && k.toLowerCase() !== plural) {
      issues.push(`singular/plural pair: ${k} / ${k}s`);
    }
  }

  return { ok: issues.length === 0, issues, data: parsed.data };
}
Enter fullscreen mode Exit fullscreen mode

When validation fails, I retry with the specific issue appended to the prompt:

async function generateWithRetry(
  context: KeywordContext,
  attempt = 1,
): Promise<KeywordOutput> {
  if (attempt > 3) throw new Error("Failed after 3 attempts");

  const result = await generateKeywords(context);
  const check = validateKeywords(result);

  if (check.ok) return check.data!;

  // Feed issues back to Claude for a targeted retry
  return generateWithRetry(
    {
      ...context,
      existing_keywords: result.keywords,
      // Add validation issues into a correction prompt here
    },
    attempt + 1,
  );
}
Enter fullscreen mode Exit fullscreen mode

In practice, 94% succeed on the first attempt, 5% on the second, 1% fall through (usually when the concept genuinely can't fit in 100 chars — time to simplify the app description, not the prompt).

The Output Nobody Asks For But Everyone Needs

The coverage_notes field in the schema looks optional. It's the most useful part.

{
  "keywords": ["budget","expense","payday","wallet","debt","bills","money","savings"],
  "joined": "budget,expense,payday,wallet,debt,bills,money,savings",
  "char_count": 51,
  "coverage_notes": [
    "Matches: 'budget', 'expense tracker', 'payday planner', 'wallet app'",
    "Covers 'money management' via money + bills combo",
    "Skipped 'finance' because it's the category — App Store auto-indexes that",
    "Skipped 'mint' (Mint.com trademark)"
  ]
}
Enter fullscreen mode Exit fullscreen mode

Now the app developer can audit why each keyword was picked. When someone asks "why isn't my app showing up for X?" you have a record. Without coverage_notes, the output is a black box.

Prompt Failures I Hit Along the Way

Attempt 1: "Generate 15-20 keywords under 100 characters." Result: the model wrote a nice list, counted wrong, and delivered 112 characters. No self-verification step.

Attempt 2: Added "Do not exceed 100 characters" — model now refused to output more than 10 keywords to stay safe. Under-coverage.

Attempt 3: JSON schema with char_count field. Model started counting. Characters dropped into range but duplicates appeared.

Attempt 4 (shipped): Enumerated every rule with "apply in order," asked for coverage_notes to force reasoning, and added validation with retry.

Each failure mode came from underspecifying the rules. The LLM isn't "wrong" — it's doing exactly what the prompt asked. Getting production-grade output means writing the prompt like a spec, not a request.

Where This Lives Now

I packaged this into Apsity's AI Growth Agent — it runs on every keyword field update across the apps it tracks, compares against real-time search rankings, and flags underperforming tokens for replacement. Free tier covers 1 app and 5 keywords if you want to poke at it.

More importantly, the pattern generalizes. Any time you have "generate text inside tight constraints" — tweet drafts with character limits, SMS messages, ad headlines, product names — the structure is the same:

  1. Enumerate every constraint as a numbered rule
  2. Force a JSON schema with self-reported metrics
  3. Ask for a reasoning field so you can audit
  4. Validate in code, feed failures back for retry

Writing the spec as a prompt beats writing it as docs — because you can actually run it.


Originally written for GoCodeLab. Deeper writeups on building indie SaaS with Claude are in the Lazy Developer series.

Top comments (0)