The App Store keyword field is exactly 100 characters. Commas only, no spaces, no duplicates. You need to pack 15–20 keywords inside.
I tried writing those by hand for a dozen apps. Every time I'd leave characters on the table — a rogue space after a comma, a singular/plural duplicate Apple would auto-match anyway. Manual packing is tedious enough that most indie developers just don't iterate on ASO.
So I built an AI that does it. This post is the actual implementation — prompts, JSON schemas, validation, and the gotchas that killed my first three attempts. I ship this in my ASO tool for iOS developers (Apsity), but the approach works for any tight-constraint text-generation problem.
The Constraints That Break Generic LLMs
When you ask any LLM "generate App Store keywords for my budget app," you get something like:
budget tracker, expense manager, spending analysis,
money manager, personal finance, bill tracker
Readable. Useless. Two characters wasted on every , (space after comma). Four characters wasted on personal finance because Apple auto-matches personal + finance separately. Total wasted: roughly 30% of your 100.
The rules that matter:
- Exactly ≤100 characters (including commas)
- Single comma separators, no spaces
- No duplicate tokens (Apple ignores them anyway)
- No singular+plural pairs (Apple auto-matches)
- Shorter tokens > compound words (Apple combines them for you)
- No competitor brand names (trademark rejection)
-
No category names, and no
app,free,new,best,iPhone,iPad(Apple auto-indexes all of these) - Mix function + situation + alternative keywords
An LLM without these constraints spelled out won't enforce them. Generic "write keywords" prompts fail rules 1–4 consistently.
Why Claude Sonnet
I tested GPT-5, Gemini 2.0 Pro, and Claude Sonnet 4.6 on the same task. Three metrics:
- Character compliance — stays under 100 chars without excess whitespace
- JSON schema adherence — returns exactly the structured output I asked for
- Edge case handling — catches duplicates, plural forms, category name leaks
Claude Sonnet won on all three, but the meaningful gap was edge case handling. When I explicitly said "no duplicates including singular/plural pairs," Claude filtered them out. The others listed budget and budgets and called it done — which is wrong, because Apple's algorithm auto-indexes plurals from the singular form anyway. A keyword duplicated across singular/plural just wastes characters.
I'm also passing a lot of context — competitor review snippets, current rankings, market-specific search trends. Sonnet 4.6's 1M-token context window handles it without trimming.
The Prompt Structure
The prompt is in three layers: system prompt (the rules), user prompt (the app context), and a JSON schema Claude must match.
// lib/keyword-generator.ts
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const SYSTEM_PROMPT = `
You are an ASO (App Store Optimization) keyword specialist.
Generate keywords for the app store "Keywords" field, which has
a STRICT 100-character limit. Characters include commas.
Rules (apply in order):
1. Total output length MUST be ≤100 characters
2. Use ONLY commas as separators, no spaces after commas
3. No duplicate tokens
4. No singular+plural pairs (Apple auto-matches both)
5. Prefer short atomic tokens over compound words
(Apple combines A + B into "A B" automatically)
6. No competitor brand names (trademark violation)
7. No category names and no words Apple already indexes automatically:
app, free, new, best, iPhone, iPad, or any category label
8. Blend three keyword types:
- Function (what the app does)
- Situation (when users need it)
- Alternative (different names for the same thing)
Return JSON with this schema:
{
"keywords": string[], // individual tokens, no commas inside
"joined": string, // comma-joined, must be ≤100 chars
"char_count": number, // .length of "joined"
"coverage_notes": string[] // which search queries this covers
}
`;
type KeywordOutput = {
keywords: string[];
joined: string;
char_count: number;
coverage_notes: string[];
};
The JSON schema isn't just for structure. char_count forces Claude to count the output itself — models aren't great at counting, but self-reporting forces a pass where the model checks its own work.
Generating Keywords
async function generateKeywords(context: {
app_name: string;
description: string;
competitors: string[];
existing_keywords?: string[];
target_market: string;
}): Promise<KeywordOutput> {
const userPrompt = `
App: ${context.app_name}
Description: ${context.description}
Target market: ${context.target_market}
Competitor apps (do NOT use these names): ${context.competitors.join(", ")}
${context.existing_keywords ? `Currently underperforming keywords to replace: ${context.existing_keywords.join(", ")}` : ""}
Generate an optimal 100-character keyword field.
Before finalizing, count your characters and confirm it fits.
`;
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
system: SYSTEM_PROMPT,
messages: [{ role: "user", content: userPrompt }],
});
const text = response.content[0].type === "text"
? response.content[0].text
: "";
const match = text.match(/\{[\s\S]*\}/);
if (!match) throw new Error("No JSON in response");
return JSON.parse(match[0]) as KeywordOutput;
}
Straightforward Anthropic SDK call. Two things worth noting:
-
max_tokens: 1024— keywords are short, so we don't need more. Capping reduces cost. -
JSON extraction via regex — Claude sometimes wraps JSON in explanation text. Grabbing the first
{...}block is more reliable than asking for raw JSON.
Validation Is Where Production Code Lives
Claude gets the constraints right ~85% of the time. Production code has to handle the other 15%.
// lib/validate-keywords.ts
import { z } from "zod";
const KeywordSchema = z.object({
keywords: z.array(z.string()),
joined: z.string(),
char_count: z.number(),
coverage_notes: z.array(z.string()),
});
export function validateKeywords(output: unknown): {
ok: boolean;
issues: string[];
data?: KeywordOutput;
} {
const parsed = KeywordSchema.safeParse(output);
if (!parsed.success) {
return { ok: false, issues: ["invalid JSON shape"] };
}
const issues: string[] = [];
const { keywords, joined, char_count } = parsed.data;
// 1. Length check
if (joined.length > 100) {
issues.push(`joined is ${joined.length} chars, exceeds 100`);
}
// 2. Trust but verify char_count
if (joined.length !== char_count) {
issues.push(`char_count mismatch: claimed ${char_count}, actual ${joined.length}`);
}
// 3. Commas only, no spaces
if (joined.includes(", ")) {
issues.push("contains ', ' — spaces after commas waste characters");
}
// 4. Reconstruct and compare
const reconstructed = keywords.join(",");
if (reconstructed !== joined) {
issues.push("keywords array doesn't match joined string");
}
// 5. Duplicate detection (case-insensitive)
const seen = new Set<string>();
for (const k of keywords) {
const lower = k.toLowerCase();
if (seen.has(lower)) {
issues.push(`duplicate token: ${k}`);
}
seen.add(lower);
}
// 6. Singular/plural detection (basic)
for (const k of keywords) {
const plural = k.toLowerCase() + "s";
const singular = k.toLowerCase().replace(/s$/, "");
if (seen.has(plural) && k.toLowerCase() !== plural) {
issues.push(`singular/plural pair: ${k} / ${k}s`);
}
}
return { ok: issues.length === 0, issues, data: parsed.data };
}
When validation fails, I retry with the specific issue appended to the prompt:
async function generateWithRetry(
context: KeywordContext,
attempt = 1,
): Promise<KeywordOutput> {
if (attempt > 3) throw new Error("Failed after 3 attempts");
const result = await generateKeywords(context);
const check = validateKeywords(result);
if (check.ok) return check.data!;
// Feed issues back to Claude for a targeted retry
return generateWithRetry(
{
...context,
existing_keywords: result.keywords,
// Add validation issues into a correction prompt here
},
attempt + 1,
);
}
In practice, 94% succeed on the first attempt, 5% on the second, 1% fall through (usually when the concept genuinely can't fit in 100 chars — time to simplify the app description, not the prompt).
The Output Nobody Asks For But Everyone Needs
The coverage_notes field in the schema looks optional. It's the most useful part.
{
"keywords": ["budget","expense","payday","wallet","debt","bills","money","savings"],
"joined": "budget,expense,payday,wallet,debt,bills,money,savings",
"char_count": 51,
"coverage_notes": [
"Matches: 'budget', 'expense tracker', 'payday planner', 'wallet app'",
"Covers 'money management' via money + bills combo",
"Skipped 'finance' because it's the category — App Store auto-indexes that",
"Skipped 'mint' (Mint.com trademark)"
]
}
Now the app developer can audit why each keyword was picked. When someone asks "why isn't my app showing up for X?" you have a record. Without coverage_notes, the output is a black box.
Prompt Failures I Hit Along the Way
Attempt 1: "Generate 15-20 keywords under 100 characters." Result: the model wrote a nice list, counted wrong, and delivered 112 characters. No self-verification step.
Attempt 2: Added "Do not exceed 100 characters" — model now refused to output more than 10 keywords to stay safe. Under-coverage.
Attempt 3: JSON schema with char_count field. Model started counting. Characters dropped into range but duplicates appeared.
Attempt 4 (shipped): Enumerated every rule with "apply in order," asked for coverage_notes to force reasoning, and added validation with retry.
Each failure mode came from underspecifying the rules. The LLM isn't "wrong" — it's doing exactly what the prompt asked. Getting production-grade output means writing the prompt like a spec, not a request.
Where This Lives Now
I packaged this into Apsity's AI Growth Agent — it runs on every keyword field update across the apps it tracks, compares against real-time search rankings, and flags underperforming tokens for replacement. Free tier covers 1 app and 5 keywords if you want to poke at it.
More importantly, the pattern generalizes. Any time you have "generate text inside tight constraints" — tweet drafts with character limits, SMS messages, ad headlines, product names — the structure is the same:
- Enumerate every constraint as a numbered rule
- Force a JSON schema with self-reported metrics
- Ask for a reasoning field so you can audit
- Validate in code, feed failures back for retry
Writing the spec as a prompt beats writing it as docs — because you can actually run it.
Originally written for GoCodeLab. Deeper writeups on building indie SaaS with Claude are in the Lazy Developer series.
Top comments (0)