Truong Phung

Posted on May 10

🤖 Building Social Games with AI — The Practitioner's Guide 📖

#ai #webdev #programming #tutorial

A comprehensive, opinionated, actionable guide for using AI to build, ship, and operate social games in the lineage covered by 🌾 The Social Games Playbook 🎮 — Stardew Valley, Township, Pixels.xyz, FarmVille 3, Dragon City, Core Keeper, etc.

Read this after the main playbook. The playbook tells you what to build (the 14 pillars, the daily loop, the economy). This document tells you how to use AI to build it 5–10× faster, ship more content, and operate it intelligently — without burning yourself on legal landmines, hallucinated systems, or "AI slop" that players sniff out in 30 seconds.

Distilled from current (2025–2026) tooling: Claude Code, Cursor, Unity/Godot MCP, PixelLab, Cascadeur, Inworld, Convai, Suno/Udio/ElevenLabs, ToxMod, Kumo, EA's RL playtesting, GDC 2026 sessions, Steam's January 2026 AI policy rewrite, and shipped-game case studies.

If you only read three sections: §3 The Three AI Layers, §5 The 14 Use Cases (Ranked by ROI), and §17 The 90-Day Adoption Plan.

📋 Table of Contents

🎯 Who This Guide Is For
⚡ The 30-Second Mental Model
🧱 The Three AI Layers — Dev-Time, Ship-Time, Ops-Time
🧠 First Principles — When AI Actually Wins
🏆 The 14 Use Cases, Ranked by ROI
💻 AI for Code — The Coding Loop
🎨 AI for Visual Assets — Pixel, Sprites, UI, Concept
🕺 AI for Animation
🎵 AI for Music, SFX, and Voice
📜 AI for Narrative, Quests, Items, Lore
🗣️ Live LLM NPCs — The Danger Zone
🧬 AI Procedural Content Generation
🌐 AI for Localization
🤖 AI Playtest Bots & Economy Simulation
📊 AI for Live Ops — Churn, Segments, Personalization
🛡️ AI for Moderation — Text, Voice, Image, UGC
📣 AI for UA Creative & Marketing
💬 AI for Community & Player Support
💸 The AI Cost Stack — What an Indie Studio Actually Spends
🤝 The Hybrid Pipeline — Where Humans Stay in the Loop
⚖️ Legal, Policy, and Platform Compliance
⚠️ The Anti-Patterns — How AI Sinks Social Games
🗺️ The 90-Day AI Adoption Plan
🌱 The Greenfield AI-Native Build Plan
📋 Cheat Sheet & Tool Stack

1. 🎯 Who This Guide Is For

You are one of:

Solo or small-team indie dev (1–5 people) building a cozy/farm/sim/sandbox game and competing with studios that have 30× your headcount.
Live-ops studio operator running a Township/FarmVille-class game who needs to ship a seasonal event every 2–4 weeks without burning out the team.
Web3 / crypto-native team (Pixels, Sunflower Land class) where economy balance, anti-bot, and content velocity are existential.
CTO / lead at a 10–50-person studio deciding which AI bets to make in the next 6 months without committing to dead-end tooling.

If you're a AAA studio with a 200-person content pipeline, this guide is still useful but the cost calculations are not your bottleneck — your bottleneck is org change.

This guide assumes you have read the main 🌾 The Social Games Playbook 🎮. All references to "the daily loop," "the 14 pillars," "faucets and sinks," etc. point back there.

2. ⚡ The 30-Second Mental Model

                        ┌──────────────────────────────────────┐
                        │  AI is a force-multiplier on a       │
                        │  CORRECT design. It does not invent  │
                        │  the design for you.                 │
                        └──────────────────────────────────────┘
                                          │
        ┌─────────────────────────────────┼─────────────────────────────────┐
        ▼                                 ▼                                 ▼
┌──────────────────┐           ┌──────────────────────┐         ┌─────────────────────┐
│  DEV-TIME AI     │           │   SHIP-TIME AI       │         │   OPS-TIME AI       │
│  (build faster)  │           │   (in the binary)    │         │   (run smarter)     │
│                  │           │                      │         │                     │
│ • Code gen       │           │ • Generated assets   │         │ • Churn prediction  │
│ • Asset gen      │           │ • Live LLM NPCs      │         │ • Personalization   │
│ • Playtest bots  │           │ • PCG quests/loot    │         │ • Moderation        │
│ • Localization   │           │ • Adaptive difficulty│         │ • UA creative       │
│ • QA / linting   │           │                      │         │ • Player support    │
└──────────────────┘           └──────────────────────┘         └─────────────────────┘
   HIGH ROI, LOW RISK             MEDIUM ROI, HIGH RISK            HIGH ROI, MEDIUM RISK
   Use it everywhere              Use it carefully                 Use it as you scale

The single most important insight: dev-time AI compounds without risk. Ship-time AI compounds with risk (legal, quality, immersion-breaking). Ops-time AI compounds with operational complexity. Adopt in that order. Most failures come from teams doing the reverse.

3. 🧱 The Three AI Layers

3.1 Dev-Time AI — the binary doesn't know AI was used

Tool category	Examples	What it replaces	Risk
Coding agents	Claude Code, Cursor, Copilot, Windsurf	Engineer hours	Low
Engine MCP bridges	Unity-MCP, Godot AI, Unreal MCP	Manual scene/asset wiring	Low
Asset generators	PixelLab, Sprite-AI, Cascadeur, Suno, ElevenLabs	Outsourcing, asset packs, junior artist	Med
Playtest bots	RL agents, generative ABM, Chaos Dynamics	Internal QA passes	Low
Linters / reviewers	Claude review skill, security-review skill	Senior eng review time	Low

Steam's January 2026 policy rewrite explicitly exempts dev tools (e.g., Copilot, Claude Code). They don't need disclosure. Embrace this layer fully.

3.2 Ship-Time AI — the binary contains AI artifacts or invokes AI at runtime

Sub-layer	Examples	Risk
Pre-generated assets	AI sprite art, AI music shipped in build	IP / copyright / disclosure
Server-side PCG	LLM-generated quest text, item names, dialogue	Hallucination, drift, exploit
Live LLM NPCs	Inworld, Convai, on-device ACE	Latency, jailbreak, cost, immersion
Adaptive difficulty	RL-driven enemy or pricing tuning	Manipulation perception

This is the layer where Steam, Apple, Google, and EU AI Act compliance live. Treat every shipped artifact as a future legal exhibit.

3.3 Ops-Time AI — the binary is unaware; AI runs alongside

Function	Examples	What it replaces
Churn prediction	GNN models (Kumo), in-house XGBoost	Guesswork on retention spend
Segmentation	LLM clustering of player behavior	Country/level static segments
Live ops orchestration	AI agents scheduling events / battle pass tiers	Producer hours
Moderation	ToxMod (voice), Hive (image), Perspective (text)	Outsourced mod farms
Support	RAG bots over patch notes / FAQ	T1 customer support tickets
UA creative	Sora 2, Veo 3, Higgsfield, AdCreative	Video editor / motion designer hours

Industry signal (2026 Unity Game Development Report): 95% of studios use AI in core workflows; 62% specifically use AI agents for backend and coding. If you don't, you're already behind on cost-per-feature.

4. 🧠 First Principles

Before any tool, internalize these.

4.1 The four properties of social games that AI is exceptionally good at

High-volume, low-stakes content. Crop names, item descriptions, NPC small-talk, quest variants, festival flavor text. Social games eat content like termites.
Repeated structural variations. A barn, a coop, a stable, a pen — same shape, different theme. Sprite generators love this.
Long-tail economy decisions. 400 items × 6 currencies × 30 levels = a balance problem humans cannot brute-force. Simulation + RL can.
Behavioral pattern detection at scale. Churn signatures, bot detection, exploiters, whales-about-to-leave — classic ML wins.

4.2 The four properties social games have that AI is bad at

Tone consistency across thousands of strings. AI drifts. Without a style bible and review pass, your wholesome cozy game starts sounding like a Marvel quip.
Mechanical correctness. AI happily writes "you gain 5 turnips per harvest" when the spec says 3. Numbers must be schema-validated, not prose-validated.
Long-arc narrative payoff. Foreshadowing across 40 hours of play. AI cannot hold this without a human story bible and tight retrieval.
The "warm" feeling. Stardew Valley sold 41M copies because Eric Barone wrote every line. Players read sincerity. AI-written cozy dialogue often reads as polite-but-empty.

The synthesis: use AI for volume and variation, use humans for voice, payoff, and the 100 hero strings the player remembers.

4.3 The "hero string" rule

Every cozy/social game has roughly 50–200 hero strings — first NPC line, marriage proposals, festival speeches, achievement unlocks, the loading-screen tip that becomes a meme. A human writes all of these. AI writes the surrounding 5,000 strings of barn-flavor and crop-tooltips.

If the player would screenshot the line: human-written.
If the player would skim past it: AI-acceptable.

5. 🏆 The 14 Use Cases, Ranked by ROI

Ranked for a small social-games studio (5–20 people). ROI = time saved per dollar spent, weighted for risk.

#	Use case	ROI	Risk	Adopt by	Notes
1	Code generation (Claude Code/Cursor)	⭐⭐⭐⭐⭐	Low	Day 1	30–60% throughput gain on backend/tools. No-brainer.
2	Localization (hybrid AI+linguist)	⭐⭐⭐⭐⭐	Low	Pre-launch	70–90% cost cut vs traditional LSP for first pass.
3	UA creative iteration (post-launch)	⭐⭐⭐⭐⭐	Low	Soft launch	TikTok needs 20–40 creatives/month; AI is the only way.
4	Pixel art / sprite generation	⭐⭐⭐⭐	Med	Pre-prod	Concepting: fantastic. Final assets: human polish required.
5	Churn prediction & personalization	⭐⭐⭐⭐	Med	100k MAU+	Below scale, your gut is fine. Above, GNN models pay back.
6	Voice moderation (ToxMod-class)	⭐⭐⭐⭐	Low	Voice chat	If you ship voice chat and skip this, you're negligent.
7	Music generation (Suno/Udio/ElevenLabs)	⭐⭐⭐⭐	Med	Pre-prod	Background loops great; hero theme = human composer.
8	Procedural quests / item names	⭐⭐⭐	Med	Mid-prod	Server-side, schema-constrained, human-reviewed.
9	Playtest bots / economy simulation	⭐⭐⭐	Low	Beta	Catches dead content & exploits before humans do.
10	Animation (Cascadeur, sprite-sheet AI)	⭐⭐⭐	Med	Mid-prod	Inbetweening + retargeting wins big; full mocap still better.
11	Player support RAG bot	⭐⭐⭐	Low	Live	Cuts T1 ticket volume 40–70% with patch notes + FAQ corpus.
12	Concept art & marketing key art	⭐⭐	Med	Anytime	Internal mood-boards: ✅. Final marketing: human-touched.
13	Live LLM NPCs (in-game runtime)	⭐⭐	High	Late or never	Cool demo, hard product. Read §11 before believing a vendor.
14	Voice acting (synthesis / cloning)	⭐	High	Carefully	Union/legal/contract minefield. Do not clone real actors.

Order of adoption: start at row 1 and work down. Don't skip ahead to row 13 because it's exciting on Twitter.

6. 💻 AI for Code

The single biggest lever. A solo dev with Claude Code can ship the backend a 4-person team shipped two years ago.

6.1 The stack

Tool	Best for	Cost (May 2026)
Claude Code	Long-running agentic refactors, codebase-aware multi-file edits	~$20/mo Pro, $200/mo Max
Cursor	IDE-native pair programming, fast in-line edits	$20/mo
Copilot	Inline completion in any IDE	$10/mo
Windsurf	Cursor competitor, strong agent mode	$15/mo
Claude Code Game Studios skill pack	Pre-built workflows: sprint plans, code review, asset audits, release checklists across Unity/Unreal/Godot	Free, OSS

Most pros run Claude Code (or Cursor) as the agent + Copilot for inline taps. Both. The latency profile is different — agents for big work, completion for typing.

6.2 MCP — the unlock for engine work

Model Context Protocol bridges let your AI assistant operate the engine itself: create scenes, edit prefabs, run play tests, inspect logs.

Unity MCP (CoplayDev/unity-mcp) — Unity Editor exposed to Claude/Cursor.
Godot AI — same idea for Godot.
Unreal MCP — exists but rougher; Unreal's Blueprint serialization is a pain point.

With MCP, "add a new crop type and wire it through" becomes a single conversation, not a 40-tab refactor. Set this up week 1.

6.3 Folder-level AI hygiene

Add a CLAUDE.md (or .cursorrules, or AGENTS.md) at repo root. The example in this very repo at CLAUDE.md is a template. It must contain:

Architecture diagram (services + data flow).
Folder map (what lives where).
Conventions per language (error wrapping, test style, lint config).
The "common pitfalls" list specific to your repo (e.g., "never call Python service from frontend").
Build/test/lint commands the agent should run after edits.

Without this, the agent invents conventions. With it, the agent is a 3-day-onboarded mid-level engineer on day 1.

6.4 Claude Code conventions for game dev

Use skills for repeatable workflows: /migrate, /lint, /build, /test, /review, /security-review (this repo already has them — see the available skills list).
Use subagents to parallelize independent searches (e.g., "find all spawner code" + "find all loot drop code" in parallel).
For balance work, never let the agent freehand numbers. Have it read a balance.yaml schema, propose changes, then run the simulation harness.
Keep golden replays: deterministic save files the agent runs after every refactor to catch behavioral drift.

6.5 What AI coding cannot do (yet)

Multi-day game-feel tuning. The AI doesn't play the game.
Networking / netcode under load. It writes plausible code that breaks at p99.
Shader / GPU perf optimization beyond template patterns.
Anti-cheat. Adversarial reasoning needs a human security mindset.

For these, AI is your typist, not your architect.

7. 🎨 AI for Visual Assets

7.1 The pixel-art pipeline (cozy / farm / sim genre)

Stage	Tool	Output
Mood board	Midjourney, Flux, Ideogram	Style references
Concept art	Midjourney + ControlNet, NanoBanana	Character / building concepts
Pixel sprites	PixelLab	Game-ready sprites with 4/8 directions
Sprite sheets	Sprite-AI, God Mode	Idle / walk / attack / hit-flash batches
UI icons	Recraft, Sprite-AI, custom Flux LoRA	Crop icons, currency, buttons
Tilesets	PixelLab tileset mode, hand-tiled in Aseprite	16/32px tiles
Final polish	Aseprite (human)	Production assets

The non-negotiable: every sprite that ships gets a human pass in Aseprite. AI sprite tools in 2026 are good enough to generate, not good enough to finalize. Anti-aliasing, palette discipline, and the 1-pixel decisions that separate "indie polish" from "asset flip" still need human eyes.

7.2 The "asset-flip detector" players run on you

Players in cozy/farming Discords have an instinct for AI slop. Common giveaways:

Inconsistent palette across sprites (each generation drifted).
6-fingered crop holders in NPC portraits.
Tile seams that don't tile (the AI didn't understand wrap-around).
Outline weight inconsistency (1px on some sprites, 2px on others).
Character portrait "AI gloss" — the soft, slightly-airbrushed look from Flux/SDXL.

Fix all of these in the human-polish pass. If you can't, ship fewer assets — quality > quantity in this genre, always.

7.3 LoRA / fine-tune your own style

Once you have ~50 hand-drawn assets in the game's style, train a LoRA (on Flux or SDXL) and use it as the default generator for everything else. This is how you keep palette discipline at scale. Cost: ~$5–20 to train on Replicate/Civitai.

7.4 Concept-to-sprite prompt template

A 32x32 pixel-art [SUBJECT], [POSE], facing [DIRECTION],
[N]-color limited palette: [HEX1, HEX2, ...],
1px black outline, no anti-aliasing, transparent background,
matches reference style of [GAME or LoRA name].
4 directional variants: down, up, left, right.

Iterate on the palette and pose; freeze the rest of the prompt as your house style.

7.5 What you should NOT use AI for, in this genre

The main character's portrait. Players look at this 1,000 times. Pay a human.
Marriage candidates' art (in dating-sim adjacent games). Same reason.
Logo / wordmark. Trademark lawyers will not accept "the AI made it."
Marketing key art for store listing. Steam, App Store, and Google Play all increasingly scrutinize AI key art and several have rejected listings in 2025–2026.

8. 🕺 AI for Animation

8.1 2D / pixel animation

God Mode and Sprite-AI generate idle/walk/attack/hit sprite sheets from a single base sprite. Quality: usable for prototyping; needs human cleanup for shipping.
Ludo.ai sprite generator includes animation modes for indie/commercial games.
Cascadeur 2026 added an AI Root Motion tool for motion style transfer — useful even for 2D devs who animate skeletal rigs.

For shipping pixel animations, the realistic 2026 workflow is:

AI generates the sprite-sheet skeleton (poses).
Human does the inbetween cleanup and timing in Aseprite.
AI is not trusted for the 8-frame walk cycle on the main character.

8.2 3D / skeletal

Cascadeur — keyframe + AI physics-aware autoposing. $8/mo indie tier (commercial up to $100K revenue). Best in class for indie 3D character animation in 2026.
Move.ai / DeepMotion — video-to-mocap. Replaces a mocap suit for prototyping.
Rokoko + AI cleanup — same idea, more pro.
AnimateDiff / runway video2anim — for cinematic and trailer work, not gameplay.

8.3 What still requires a human animator

Combat feel. The 4-frame hit-pause + screen-shake combo that makes Moonlighter feel good.
NPC personality animations (Stardew's Pierre's hand-rub).
Anything the camera lingers on.

9. 🎵 AI for Music, SFX, and Voice

9.1 Music — the licensing minefield

Service	Quality (2026)	Commercial license	Best use
Suno v5	Excellent	Unsettled. Settled with WMG; Sony lawsuit pending summer 2026	Demo / prototype / temp tracks
Udio	Excellent	Settled with UMG; UMG-Udio joint platform launching 2026	Track generation; pivot when joint platform launches
ElevenLabs Music	Good	Clean. License-clean enterprise terms	Shippable background tracks
Stable Audio	Good (loops)	Clean (Stability commercial)	Loopable ambient / sting beds
Riffusion	OK (loops)	Clean	Ambient / variation
AIVA	Good	Clean (Pro tier)	Orchestral / cinematic

Practical rule for shipped music in 2026: use ElevenLabs Music, Stable Audio, or AIVA Pro. Use Suno/Udio for prototype and trailer scratch only until their licensing fully settles. If your game ships a Suno track and Sony wins its case, you have a takedown problem.

The Business Tycoon case study is the proof point: 4× 2-minute instrumental tracks, ~2 minutes total generation time, $3.20. That's the new floor for background-music cost.

9.2 The hero theme rule

The main menu theme and the song that plays when the player gets married / completes the museum / wins the festival is human-composed. Always. This is your "Stardew Valley Overture." Players associate it with the brand for a decade.

Outsource it: $500–3,000 from a Fiverr Pro / Soundcloud composer or $5–20K from a name like ConcernedApe-tier indies. Don't generate it.

9.3 SFX

ElevenLabs Sound Effects — text-to-SFX, license-clean. Ship-ready.
Adobe Audition + AI denoise / cleanup — for human-recorded foley.
Soundly / Splice — non-AI but deserves a slot in the stack.

For a farming/cozy game you need ~200 SFX (tool swings, UI clicks, ambient layers, footsteps × surface, animal sounds). Generating with ElevenLabs: ~$30 in credits, ~1 day of curation.

9.4 Voice

This is the highest-risk AI sub-domain.

Use case	Recommendation
Full VO for cozy NPCs	Skip — most cozy games have no VO; preserve the player's inner reading voice.
Short barks / greetings	ElevenLabs voices, original / synthetic, never cloned.
Narrator	Hire a human (it's 50–200 lines, the most player-facing audio in your game).
Cloning a real actor	Don't. Even with consent, US/EU contract law, SAG-AFTRA agreements, and likeness rights make this a multi-year liability.
Live LLM NPC voice (§11)	If you ship this, pre-license cloned voices via Inworld/ElevenLabs Enterprise with full contract chain.

10. 📜 AI for Narrative, Quests, Items, Lore

This is where AI most reliably 10×s your throughput in social games — if you constrain it properly.

10.1 The schema-first rule

Never let an LLM emit free-form game content. Always emit structured JSON validated against a schema. Example:

{
  "id": "quest_spring_radish_001",
  "giver_npc": "pierre",
  "season": "spring",
  "tier": 1,
  "title": "<= 40 chars, no emoji, sentence case",
  "description": "<= 220 chars, second person, cozy tone",
  "objective": { "kind": "deliver", "item": "radish", "qty": 5 },
  "reward": { "gold": 120, "xp": 30, "friendship": { "pierre": 1 } },
  "tone_tags": ["wholesome", "low_stakes"]
}

The LLM fills the fields. A schema validator (Zod, Pydantic, JSON Schema) rejects malformed output. A balance validator rejects rewards outside the curve in your balance.yaml. A tone-checker LLM does a second pass to flag off-voice strings.

This pattern alone is the difference between "AI quest generator that ships" and "AI quest generator that floods QA with garbage."

10.2 The content corpus you generate

For a Township-class game, AI should generate:

200–500 collection quests (deliver X to Y).
100–300 item descriptions.
50–200 NPC small-talk lines per character (5 characters = 250–1000 lines).
30–60 festival flavor strings per festival.
50–100 loading-screen tips.
Crop / animal / building names and 1-line descriptions.

Hero strings (still human): NPC introductions, romance arcs, festival speeches, achievement unlocks, the endgame letter, the player's wedding.

10.3 The style bible — non-optional

A 2–4-page document the LLM reads on every generation request:

Tone words (e.g., "warm, gently witty, never sarcastic, never edgy").
Tone anti-words ("avoid: cynical, ironic, modern slang, references to social media, profanity").
Voice samples per NPC (3–5 lines of hand-written dialogue each).
Forbidden topics (politics, real-world religion, modern tech).
Punctuation and capitalization rules.
Example accept / reject pairs.

Without this, every generation drifts toward GPT-default voice (which is the voice of a polite-but-bland LinkedIn post).

10.4 Models for content generation

Model	Best for	Notes
Claude Opus 4.7 / Sonnet 4.6	Long-form narrative, tone-sensitive prose	Best tone fidelity; the default
GPT-5 / GPT-5-Pro	Structured JSON-mode generation, fast bulk	Fastest with json_schema
Gemini 2.x Pro	Long-context lore consistency (1M+ ctx)	Good when feeding the whole story bible
Open-source (Llama, Qwen)	Offline / cost-floor / uncensored variants	Self-host; useful at very high volume

Always cache. Your style bible is reused on every call. Anthropic / OpenAI / Gemini all support prompt caching — it cuts cost 50–90% for static system prompts. A typical content-gen pipeline pays $0.0001–0.001 per generated quest after caching.

11. 🗣️ Live LLM NPCs

The shiny demo. The hardest production system. Read this whole section before deciding.

11.1 What's actually shipped

Inworld AI — Character Engine; powered the GDC 2024 Covert Protocol demo (NVIDIA + Inworld), now used in a handful of indie titles and VR games (Office Whispers, etc.).
Convai — LLM NPCs with the Actions feature (LLMs trigger in-game actions, not just dialogue).
NVIDIA ACE — runs on-device on RTX hardware as of 2026; removes the cloud roundtrip.
Open-source (AkshitIreddy/Interactive-LLM-Powered-NPCs et al) — works for solo devs, not production-hardened.

11.2 Why it's hard for social games specifically

Social games are about persistence, predictability, and the warmth of recognition. "Pierre says the same thing on Wednesday" is a feature. Players come back because their world is comfortingly stable.

An LLM NPC is the opposite: stochastic, novel, sometimes inconsistent. This is great for an immersive sim or detective game (Covert Protocol), and culturally wrong for a Stardew-class cozy game. Players will ask Pierre about Bitcoin, Pierre will answer, the immersion breaks.

11.3 If you do ship it — the production checklist

[ ] Personality + memory persisted server-side, never trusted from client.
[ ] Hard knowledge boundary: NPC knows their lore, refuses out-of-world topics in-character ("I don't know what 'Bitcoin' is, friend").
[ ] Topic blocklist for politics, real-world tragedies, sexual content, self-harm.
[ ] Latency budget under 1.5s for first audio token (otherwise dialogue feels broken). On-device ACE or streaming TTS required.
[ ] Cost budget: $0.001–0.01 per turn × millions of turns. Model this before committing.
[ ] Jailbreak red-team before launch; reproduce attempts post-launch via telemetry.
[ ] Disclosure on Steam/App Store per January 2026 policies.
[ ] Fallback to scripted dialogue if the LLM service is down.
[ ] Per-player rate limits to prevent abuse / cost runaway.
[ ] Voice cloning contract chain if the NPC has a voice (do not skip — see §9.4).

11.4 The cozy-game compromise

Instead of full LLM NPCs, use LLMs at design time to write 10× more scripted dialogue, then ship that scripted dialogue. Players get the feel of a fuller world without runtime risk. This is what most successful cozy games will do for the next 3–5 years.

If you must ship runtime LLM behavior, scope it tight:

LLM controls only side characters (a wandering bard, a stranger at the inn).
Core characters (marriage candidates, family, vendors) stay scripted.
LLM output is constrained to a topic whitelist ("the inn, the weather, local rumors").

11.5 The Steam January 2026 policy notes

Live AI-generated content must be disclosed on the store page.
Live AI-generated adult sexual content is an absolute prohibition with no exception — relevant if your social game has romance and you let a runtime LLM handle it. Don't.
Apple and Google have parallel policies; expect tightening through 2026.

12. 🧬 AI Procedural Content Generation

12.1 Where PCG works in social games

System	PCG fit	Notes
Daily orders / quests	Excellent	Bounded, schema-driven, low narrative weight
Item / crop / animal names	Excellent	Pure flavor; cap collisions with a uniqueness check
Dungeon / mine layouts	Good	Wave Function Collapse + LLM hints for set dressing
World / island generation	Good	Minecraft-class; deterministic seed + LLM biome flavor
Loot drops	Good	Constrained generation against an item DB
NPC names + 1-line bios	Good	For populating festivals, leaderboards
Main story arc	Bad	Players need authored emotional payoff
Romance dialogue	Bad	Same
Tutorial	Bad	Must be deterministically correct

12.2 The PCG architecture

[Player request / time tick]
        │
        ▼
[Server PCG service]
        │
        ├─► Fetch context (player level, inventory, season, last 7 days of quests)
        │
        ├─► Build prompt with style bible + schema
        │
        ├─► LLM generate (with prompt cache)
        │
        ├─► Schema validate ──► reject + retry on fail
        │
        ├─► Balance validate ──► clamp values to curve
        │
        ├─► Tone validate (cheap second LLM pass) ──► flag for human
        │
        ├─► Persist to DB
        │
        └─► Return to client

Never call the LLM from the client. Every generation runs on your server, with rate limits, caching, and validation. This also gives you the audit log you'll need under EU AI Act requirements.

12.3 Determinism vs novelty

Set temperature low (0.2–0.5) for items / quests where players will compare in Discord ("did you get the carrot quest? me too"). Set higher (0.7–0.9) for personal flavor strings (loading-screen tips, idle barks).

Use a seed derived from player ID + day so the same player gets the same daily content even on retry. This prevents save-scumming and fairness complaints.

13. 🌐 AI for Localization

Maybe the highest-ROI use case after coding. Traditional LSPs charge $0.10–0.20 per word. AI-first hybrid pipelines charge $0.01–0.03 per word at equivalent quality for a cozy/casual game.

13.1 The hybrid pipeline (state of the art, 2026)

Source strings (en)
    │
    ├─► Translation Memory match (free)              [exact / fuzzy reuse]
    │
    ├─► AI MT first pass (Claude / GPT / DeepL Pro)  [bulk volume, $]
    │       └─ with: glossary, style guide, character voice notes, screenshots
    │
    ├─► AI tone/cultural review (second LLM pass)    [flags for human]
    │
    ├─► Human linguist review                        [transcreation, hero strings]
    │
    └─► QA pass in-game (LLM screenshot review)      [overflow, truncation, missing vars]

13.2 Tools

Alocai — game-specific MT + GenAI (ModelWiz).
Gridly — string management with AI translation built-in.
Lokalise + AI — established LSP platform, now AI-augmented.
Custom Claude/GPT pipeline — for studios with engineering capacity; offers most control.

13.3 Languages where AI works well out of the box

Spanish, Portuguese (BR), French, German, Italian, Polish, Russian, Korean, Japanese, Simplified Chinese.

13.4 Languages where you need a human linguist no matter what

Japanese — honorifics + character voice = automated MT will break tone in cozy games. The MT first pass is fine; the linguist pass is mandatory.
Korean — same.
Arabic — RTL layout, dialect variation, cultural sensitivities (alcohol, religion).
Traditional Chinese — different from Simplified in tone and idiom; treat as separate.
Thai / Vietnamese — tonal nuances and segmentation issues.

13.5 The dubbing question

AI lip-sync + voice cloning makes 10+ language full VO feasible for indie budgets in 2026. For a cozy game with no VO, don't add VO just because you can. For a game that has VO, AI dubbing of side characters is acceptable; main cast = human VO per language as far as budget allows.

13.6 Glossary discipline

Build a glossary table on day 1:

EN term	Tone	ja-JP	ko-KR	de-DE	Notes
Energy	warm	げんき	활력	Energie	Not "stamina"
Coin (currency)	neutral	コイン	코인	Münze	Singular always
Mayor	warm	村長	촌장	Bürgermeister	Honorific in jp/kr

This glossary feeds into every AI translation call. Without it, "Energy" becomes 5 different words across your game in the same language.

14. 🤖 AI Playtest Bots & Economy Simulation

14.1 What playtest bots actually catch

EA's RL-driven playtest framework (publicly described in 2024–2025) caught:

Inconsistent AI behavior at edge cases.
Balance asymmetries between teams.
Physics / animation glitches.
Unreachable content.
Stuck states that human QA never reproduced.

For a social game, the equivalent is:

Economy traps — quests that lock the player out of progression.
Dead content — items no rational agent ever buys.
Exploit routes — recipes / arbitrage loops that print money.
Difficulty walls — levels where the optimal strategy still fails 80% of the time.
Energy starvation — sequences where the player runs out of energy before the next milestone.

14.2 The economy simulator

Build (or buy) an agent-based simulator that replays your economy with thousands of synthetic players, each with a different strategy:

"Greedy gold-maximizer"
"Completionist"
"Casual 2-sessions-a-day"
"Whale spender"
"F2P optimizer"
"Bot operator"

Run it before every economy patch. Outputs:

Currency inflation curves.
Gini coefficient on wealth across cohorts.
Time-to-paywall by archetype.
"Dead recipe" report.
Exploit yield (gold-per-hour for the optimal exploit found).

For LLM-based realism, recent research (arXiv 2506.04699 / 2512.02358) demonstrates Generative Agent-Based Modeling — LLMs fine-tuned on real player logs play your game and surface emergent behaviors traditional ABM misses. Worth the investment at MMO scale; overkill for prototypes.

14.3 Tools

Roll your own. A 500-line Python harness running 10K simulated players overnight catches 80% of economy bugs. Highest ROI per engineer-week.
Chaos Dynamics — commercial high-fidelity simulation.
Unity ML-Agents — for engine-integrated RL playtesting.
OpenAI / Anthropic LLM agents orchestrated via tool-use to play the game over a real network.

14.4 The "20 KPIs to simulate" list

Pull from the main playbook §20 (KPIs). The simulator should output all of them for every release candidate. If you can't simulate them, you can't iterate fast enough to compete.

15. 📊 AI for Live Ops

Live ops is the multi-year game in social-games. AI here pays back over years.

15.1 Churn prediction — when is it worth it?

Stage	Approach
< 10K MAU	Don't bother. Your gut + cohort tables are enough.
10K–100K MAU	XGBoost / LightGBM on session + monetization features. Internal data scientist can build in 2–4 weeks.
100K–1M MAU	XGBoost still wins; add survival models for time-to-churn.
1M+ MAU	Graph Neural Networks (Kumo, in-house PyG). Friend-graph signal is the differentiator.

The Kumo case study figure: 5M MAU × 20% monthly churn among monetizers can yield ~$18M/year savings from a 10% retention lift on at-risk spenders. The math at smaller scales is proportional.

15.2 Personalization that respects the player

Personalization layer	What's safe	What crosses the line
Difficulty (PvE only)	Slight enemy HP / spawn-rate tuning to keep flow	Hidden difficulty adjustment that punishes wins
Daily quest selection	Bias toward content the player engages with	Hiding content the player would enjoy
Push notification timing	Send when player historically opens	Manipulative urgency / fake-scarcity FOMO
Offer composition	Bundle items the player has searched for	Hidden price discrimination (illegal in EU)
Friend / guild suggestions	Match by play-time overlap and level	Sorting by predicted spend

EU Digital Services Act + AI Act + consumer protection law actively police this. Personalize for engagement and joy, not exploitation. The Civil War of 2025–2026 lawsuits against gacha / loot box mechanics is a preview.

15.3 The live-ops AI agent

A single Claude/GPT agent, run on a daily cron, with read-only access to your analytics warehouse, can:

Diagnose why DAU dropped 4% yesterday.
Suggest which event slot to fill next based on cohort fatigue.
Draft a battle-pass tier list and write the patch notes.
Flag anomalies: "Crop X consumption is 20σ above baseline — check for exploit."
Generate an exec summary email by 9am.

Build this. It replaces 10 hours of producer work per week.

15.4 Bot / fraud detection

Web3 and F2P social games attract botters. ML signals:

Inhuman session regularity (variance below human noise floor).
Click pattern uniformity.
Wallet clustering (Web3).
Cohort sharing (multi-account farm).
Graph centrality in the trade network.

GNNs win again here. Off-the-shelf: Sift, Kasada, DataDome. In-house if Web3.

16. 🛡️ AI for Moderation

If your social game has chat, voice, UGC, or trade — you need moderation infrastructure on day 1. Skipping this is the #1 mistake of Web3 games and live-ops games alike.

16.1 The moderation stack

Surface	Tool	Coverage
Text chat	Perspective API, OpenAI / Anthropic moderation, custom LLM filter	Slurs, harassment, grooming, spam
Voice chat	ToxMod (Modulate)	Real-time toxic-voice detection, integrates with Discord SDK as of Jan 2026
Image / UGC	Hive Moderation, Sightengine	NSFW, violence, hate symbols
Player names	Custom blocklist + LLM check	Slur variants, trademark abuse
Trade / market	Pattern detection + LLM intent check	Scam detection, real-money trade
Forums / Discord	AutoMod + custom LLM workflows	Brigading, off-topic, doxxing

16.2 ToxMod in particular

The Call of Duty case study is the public proof:

50% reduction in toxicity exposure (CoD MWII multiplayer + Warzone NA).
25% reduction in toxicity exposure (CoD MWIII global ex-Asia).
8% month-over-month reduction in repeat offenders.

For a social game with voice (rare in cozy, common in MMO/sandbox), this is the only currently mature voice moderation product. As of January 2026 it integrates with Discord's Social SDK, which is how a lot of indie games already handle voice.

16.3 The escalation pipeline

Signal → Auto-action (mute, shadow-ban, throttle) → Human moderator queue → Player appeal → Audit log

Never auto-ban without an appeal path. Never train your model on appeals you didn't review. Keep the audit log for 90+ days for both legal and false-positive review.

17. 📣 AI for UA Creative

Post-launch, your survival depends on creative velocity. This is the lever AI was built for.

17.1 The TikTok / Meta reality check

TikTok generated $28B in 2025 ad revenue; for mobile games, it is now often cheaper CPI than Meta but creative-heavy.
TikTok algorithm rewards creative velocity: 7–10 day fatigue window vs Meta's 2–3 weeks.
Minimum viable cadence for a serious mobile UA program: 20–40 creatives/month per major channel.
A 4-person UA team cannot manually edit that. AI is the only way.

17.2 The AI UA stack

Tool / Model	Output	Use for
Sora 2	Photoreal video, 10–30s	UGC-style testimonials, gameplay-cuts
Veo 3	Video, strong physics	Same
Runway / Kling	Video generation, image-to-video	Stylized cuts
Higgsfield Ads	Game screenshot → ad video in 3 clicks	Programmatic creative variations
AdCreative.ai	Static + variants	Static placements, banner sets
ElevenLabs	Voice-over for ads	Multi-language ad VO
Claude / GPT	Hooks, taglines, ad scripts	Pre-production ideation
Segwise / your MMP	Performance feedback loop	What's winning, what's fatigued

17.3 The creative testing loop

Brief → AI variant gen (50–200 variants) → Cheap broad test ($300–1000) →
Top 5% scaled → Performance feedback → New brief based on winning hooks

The studios winning UA in 2026 are running this loop weekly per channel. If you're shipping 4 creatives a month, you're getting outbid.

17.4 What still needs humans

The launch trailer. Your one piece of art that lives forever on YouTube and your store page. Hire a game-trailer studio.
Festival / Steam Next Fest creative. Higher-stakes attention; humans matter.
Community-fan content. The single most credible creative is a streamer playing your game.
The hook concept itself. AI can produce 200 variants of a hook; it rarely invents the new hook. Humans set direction; AI executes the variations.

18. 💬 AI for Community & Player Support

18.1 The RAG support bot

Build it on day 1 of soft launch. Inputs:

Patch notes (ingested daily).
FAQ (curated weekly).
Game wiki / lore (slow-changing).
Common ticket categories with canned answers.

Output: a Discord bot + in-game help widget that handles 40–70% of T1 tickets. Common stack: Claude/GPT + a vector store (Pinecone, Weaviate, Postgres pgvector) + a thin web service.

18.2 The escalation pipeline

Player message → RAG bot answer → "Did this help?" → If no, route to human queue
                                                   → Human answer → fed back into FAQ

Two non-negotiable rules:

The bot must be allowed to say "I don't know — connecting you to a human." Hallucinated answers about refunds and account issues are how you end up in a regulator's inbox.
Human responses become future training data. Build the loop.

18.3 Community sentiment tracking

Run an LLM agent daily across:

Steam reviews (delta vs last week).
Discord top channels (digest).
Reddit subreddit (top posts + sentiment).
App Store / Google Play reviews.
Twitter/X mentions.

Output a 1-page exec summary: top 3 complaints, top 3 praises, notable streamer/influencer activity, sentiment delta. Replace the producer's manual community scan. Cost: $5–20/day in API spend.

19. 💸 The AI Cost Stack

Realistic monthly spend for a 5-person social-games studio in 2026 (USD):

Layer	Service	Monthly cost
Coding agents (per dev)	Claude Code Max + Cursor + Copilot	$100–250
Asset generation	PixelLab + Cascadeur Indie + Flux	$30–80
Music + SFX	ElevenLabs + AIVA Pro	$30–80
Localization (per release)	AI MT + linguist (10 langs, ~5K w)	$200–600
LLM content generation	Anthropic / OpenAI API + caching	$50–500
Playtest simulation compute	AWS / GCP spot (overnight runs)	$50–200
Live LLM NPCs (if applicable)	Inworld / Convai Pro	$200–2000+
Voice moderation	ToxMod (per concurrent voice user)	scaled
Text moderation	Perspective / OpenAI mod (free–$)	$0–100
UA creative generation	Sora 2 + Higgsfield + Runway	$200–1000
Analytics LLM agent	Claude / GPT API	$50–200

Total for a pre-launch indie team: ~$700–1,500/month.
For a live-ops studio doing serious UA: $3,000–10,000/month.

Compare to:

One outsourced pixel artist: $2–5K/month.
One translator across 10 languages, traditional LSP: $5–15K/release.
One UA creative agency: $5–20K/month + media.
One T1 support agent: $3–6K/month.

The math has been favorable since mid-2024 and the gap has widened every quarter since.

19.1 Where the money actually goes

Track per-feature cost. After 3 months you'll find:

60–70% of LLM spend is on a single workflow (usually content gen or live-ops agent).
Caching cuts that 50–80%.
Open-source models (Llama, Qwen, DeepSeek) handle 30–60% of low-stakes calls at 10× cheaper.

Tier your model usage: cheap model for first pass, expensive model for hero strings, frontier model only for narrative-critical generations.

20. 🤝 The Hybrid Pipeline

The summary table for "what does AI do, what does a human do" across the pipeline:

Function	AI does	Human does
Code	Bulk, refactors, tests, boilerplate	Architecture, netcode, anti-cheat, perf
Concept art	Mood boards, 100 variations	Final direction, hero key art
Pixel sprites	Generation, sprite-sheet expansion	Final polish in Aseprite, hero portraits
Animation	Inbetweening, retargeting, sheet expansion	Combat feel, NPC personality, camera frames
Music	Background loops, ambient beds	Hero theme, festival music, brand jingles
SFX	90% of library	Signature sounds (level up, harvest)
VO	Side characters (if any)	Main cast, narrator
Quest text	Bulk variants, tooltips, item descriptions	Hero strings, romance arcs, story beats
Localization	First pass MT, glossary, cultural flag	Hero string transcreation, JP/KR/AR review
QA	Smoke tests, regression, exploit hunting	Game-feel QA, "vibes" QA
Live ops	Anomaly detection, churn prediction, draft patch notes	Final calls on events, balance, comms
UA creative	Variant generation, copy variants	Brief, brand voice, launch trailer
Support	T1 RAG, sentiment digest	T2/T3, refunds, escalations, comms
Moderation	Detection, triage, auto-action	Appeals, novel cases, policy updates
Playtest	RL bot exploration, balance simulation	Game-feel playtests, "is this fun" calls

Read across: AI handles 60–80% of the volume in every row. Humans own the 20–40% that defines whether the game has a soul.

21. ⚖️ Legal, Policy, and Platform Compliance

21.1 Steam (Valve), per January 2026 policy rewrite

Dev tools (Copilot, Claude Code, Cursor) — exempt; no disclosure required.
Pre-generated assets shipping in the build — disclosure required on store page (AI generation kind, content types).
Live AI generation at runtime — disclosure required, plus you certify guardrails.
Live AI-generated adult / sexual content — prohibited, no exception.
Failure to disclose → store removal risk.

21.2 Apple App Store

Increasing scrutiny on AI-generated key art and screenshots.
Apps with live LLM features must have content moderation pipelines disclosed.
App Review will reject games that allow uncontrolled LLM output, especially for under-13 ratings.
Several documented rejections in 2025 of games that didn't disclose AI-generated marketing assets.

21.3 Google Play

Similar disclosure expectations as Apple.
Active enforcement on deepfake / impersonation / explicit AI content.
Targeted ad / personalization disclosures aligning with EU norms.

21.4 EU AI Act (in force, 2025–2026 phased)

Most social games will fall under "limited risk" (transparency obligations):

Inform players when interacting with an AI system (live LLM NPCs, AI moderation).
Label AI-generated content where reasonable.
Higher-risk if you do AI-driven personalization that materially affects player welfare or finances.

21.5 Copyright

US Copyright Office: works without meaningful human creative input are not protected. Translation: "I prompted Midjourney for the box art" likely cannot be copyrighted. "I prompted, then a human extensively edited, layered, composited, and directed" likely can.
Train model warranties: get indemnification from your AI provider against third-party IP claims — Anthropic, OpenAI, Google, ElevenLabs, Adobe Firefly all offer some form of this for enterprise tiers. Free / consumer tiers usually do not.

21.6 Voice / actor rights

Cloning a real person's voice without consent is actionable in most jurisdictions and explicitly prohibited by SAG-AFTRA agreements.
Even with consent, get a written, signed, scope-limited license. "Use my voice for game X for 5 years in markets Y, in genre Z, with the option to extend at price W."
Synthetic voices with no human clone source are lower-risk but still need provider warranty.

21.7 Player data + AI training

Don't train your customer-service models on player chat without a consent path.
Don't feed player payment / PII data into 3rd-party LLM APIs without DPA in place.
Anthropic / OpenAI / Google enterprise tiers all have zero-retention modes — use them for any pipeline touching player data.

22. ⚠️ The Anti-Patterns

These are the failures we see repeatedly. Avoid each.

22.1 "AI will design my game"

It won't. AI does not know whether your daily loop is satisfying. AI does not playtest your economy on a real Wednesday with a real distracted player. Use AI to implement your design, not invent it.

22.2 Shipping AI slop because it's cheap

Players in cozy/farming Discords will identify AI sprites in 30 seconds and broadcast it. The marginal cost saved on assets is dwarfed by the wishlist hit you take in week 1. Either polish AI assets to invisibility or commission human work.

22.3 Live LLM NPCs as a feature, not a system

A demo of a chatty NPC is not a feature. It's the easy part of a system that must include: persona persistence, jailbreak defense, cost control, latency budgets, content moderation, fallback paths, and disclosure. Most teams underestimate this by 5–10× engineering weeks. See §11.

22.4 No style bible → tonal drift

Without a 2–4 page style bible, every LLM call drifts toward the same flat "GPT-cozy" voice. By string #500 your game sounds like a content farm. Write the style bible first.

22.5 Letting the LLM emit free-form game data

Numbers go in balance.yaml. Strings go in strings.json validated by schema. The LLM never invents quantities. Every shipped data point passes a validator. Skip this and you'll ship "Deliver -1 carrots for ∞ gold" within 2 weeks.

22.6 Coupling tightly to one provider

Anthropic, OpenAI, Google all have outages and price changes. Build a model-abstraction layer (or use one — LiteLLM, OpenRouter, your own thin wrapper) so you can swap. Especially important for live-runtime systems.

22.7 Using Suno/Udio for shipped music while lawsuits are pending

Risk profile: a Sony win in summer 2026 could force takedowns of trained content. Use license-clean alternatives (ElevenLabs Music, Stable Audio, Adobe Firefly Audio, AIVA Pro) for anything in the build. Use Suno/Udio for trailers, scratch, and prototypes only.

22.8 Personalization that crosses into manipulation

Dynamic difficulty that makes the player lose more right before an offer. Hidden price discrimination. Fake-scarcity push notifications. These are illegal in EU consumer law and shameful regardless. Personalize for delight, never for extraction.

22.9 Skipping disclosure

It is January 2026. Steam, Apple, Google, and EU all have disclosure regimes. The cost of disclosure is a paragraph on a store page. The cost of non-disclosure is store removal. Disclose.

22.10 No human in the moderation loop

Auto-ban systems with no appeal path will produce a 1–5% false-positive rate, which at 100K MAU = 1,000–5,000 wrongly banned players per month. Each one is a refund, a chargeback, a Reddit thread, a review-bomb. Always have a human appeal path.

22.11 Treating AI as a hiring substitute on day 1

The team sizes work because the senior person knows what AI is doing wrong. Replacing your only senior with juniors-plus-Claude is how you ship a game that's half-built and unfixable. Start with senior + AI; add juniors later.

22.12 Forgetting that players hate being lied to

Don't claim "hand-crafted by humans" on Steam if your sprites are AI. Don't pretend your live NPCs are pre-scripted. Players will find out. Communities are forensic. The trust damage outweighs anything you saved.

23. 🗺️ The 90-Day AI Adoption Plan

For an existing 5–20 person social-games studio not yet AI-native.

Days 1–14 — Foundations

[ ] Every developer on Claude Code (or Cursor) + Copilot. Standardize.
[ ] Repo-root CLAUDE.md / .cursorrules written. (Use this repo's CLAUDE.md as a template.)
[ ] Unity-MCP / Godot AI installed; one engineer demos a scene-edit conversation in standup.
[ ] Style bible drafted (2–4 pages).
[ ] Glossary spreadsheet started.
[ ] One "champion" appointed per discipline (code, art, audio, narrative, ops).

Days 15–30 — Pipelines

[ ] Schema-validated content generation pipeline live for items + quests.
[ ] AI translation pipeline for one new language end-to-end (pick the cheapest: Spanish or Portuguese).
[ ] Pixel-art LoRA trained on existing house style.
[ ] AI playtest harness scaffolded; runs nightly.
[ ] RAG support bot built on patch notes + FAQ (internal-only first).

Days 31–60 — Production runs

[ ] First content pack shipped with AI-generated bulk content + human hero strings.
[ ] Localization to 3 languages shipped via hybrid pipeline.
[ ] UA creative iteration loop running on TikTok/Meta — 20+ creatives/month minimum.
[ ] Live-ops agent producing daily exec summaries.
[ ] Moderation stack (text minimum; voice if applicable).
[ ] Disclosure language updated on store pages.

Days 61–90 — Compounding

[ ] Churn prediction model live (if MAU justifies).
[ ] AI-generated asset pipeline integrated into sprint cadence.
[ ] Cost dashboard per-feature; tier models (cheap for bulk, frontier for hero).
[ ] Postmortem: which AI bets paid, which didn't. Cut what's underperforming.
[ ] Hiring plan adjusted: which roles do you still need, which do you not, which new ones (data scientist? RL eng?) do you?

Day 91 onward — The new normal

You are now operating at ~2× the throughput of a non-AI peer studio at ~70% of the cost. You will get outpaced by competitors who started 6 months earlier. Keep iterating; don't celebrate.

24. 🌱 The Greenfield AI-Native Build Plan

For a brand-new social game starting fresh in 2026.

Phase 0 — Concept (week 0–2)

AI for mood boards, references, prototype mock-ups. Cheap, fast, throwaway.
AI for competitor analysis — feed AppMagic / SensorTower exports + Steam reviews into Claude/GPT, ask for tonal differentiators.
A human writes the design pillars. AI does not.

Phase 1 — Vertical slice (week 2–8)

One engineer + Claude Code + Unity-MCP / Godot AI builds the daily-loop prototype.
AI generates the placeholder art at full volume; the artist polishes the 50 hero assets.
Human composer writes the hero theme; AI fills the 8–12 background loops.
All numbers in balance.yaml. All strings in strings.json. Schema-validated. From day 1.

Phase 2 — Content scale-up (week 8–20)

Schema-driven LLM content gen for 200+ quests, 300+ items, 500+ NPC barks.
Style bible enforced on every gen call.
LoRA trained; sprite pipeline runs at 10× original throughput.
AI playtest bots running nightly; balance issues caught before human QA sees them.

Phase 3 — Soft launch (week 20–28)

3 launch languages via AI hybrid pipeline.
UA creative iteration loop spinning at 30+ creatives/month per channel.
Moderation stack live before any voice/chat opens.
RAG support bot live; CS agent supervising it.
Live-ops agent running daily exec brief.
Disclosure language reviewed by counsel and live on the store page.

Phase 4 — Global launch & live ops (week 28+)

Full localization (10+ languages).
Churn prediction online.
Personalization layer running — engagement-positive only, regulator-compliant.
Full live-ops cadence: 2–4 week event drumbeat, AI doing 60–80% of content, humans owning the 20% players remember.

The thesis: a 4–6 person team can ship and operate, end-to-end, what a 25-person team shipped in 2022.

25. 📋 Cheat Sheet & Tool Stack

25.1 The minimum viable AI-native social-games stack (May 2026)

Layer	Pick	Backup option
Coding agent	Claude Code (Max tier)	Cursor
Inline coding	GitHub Copilot	Codeium
Engine bridge	Unity-MCP / Godot AI	Custom MCP server
Concept art	Midjourney v7 / Flux Pro	Ideogram
Pixel sprites	PixelLab	Sprite-AI
Sprite animation	Sprite-AI / God Mode	Manual Aseprite
3D animation	Cascadeur Indie	Move.ai
Music (shippable)	ElevenLabs Music + AIVA Pro	Stable Audio
SFX	ElevenLabs Sound Effects	Splice / Soundly
Voice synthesis	ElevenLabs (synthetic only)	OpenAI TTS
LLM content gen	Claude Sonnet 4.6 + Haiku 4.5 (tiered)	GPT-5-Pro / GPT-5
Live LLM NPCs (if shipping)	Inworld AI	Convai
Localization	Custom Claude pipeline + linguist	Alocai / Gridly
Playtest bots	Custom Python + Unity ML-Agents	Chaos Dynamics
Churn ML	XGBoost (in-house) / Kumo	LightGBM
Voice moderation	ToxMod	(no real competitor in 2026)
Text moderation	OpenAI moderation + Perspective	Custom LLM filter
Image moderation	Hive Moderation	Sightengine
UA creative video	Sora 2 / Veo 3 + Higgsfield Ads	Runway
Player support	Custom RAG (Claude + Postgres pgvector)	Intercom Fin
Analytics agent	Claude / GPT scheduled cron	Hex / Mode + LLM extension

25.2 The 7-line decision framework

When deciding whether to add AI to a workflow, ask in order:

Is the input bounded by a schema? If yes → AI is safe. If no → wrap it.
Is the output reviewable in <30 seconds by a human? If yes → ship it. If no → automate the review.
Is the failure mode embarrassing or expensive? If yes → human in the loop. If no → trust automation.
Is the task high-volume, low-stakes? Perfect AI fit.
Is the task low-volume, high-stakes? Keep it human.
Does a regulator care about this output? Disclose, log, audit.
Would the player screenshot this? Human owns it.

25.3 The 7 things to do before next Monday

Install Claude Code / Cursor + Copilot for every dev.
Install Unity-MCP or Godot AI in your engine.
Write a 2-page style bible.
Move all numbers to balance.yaml, all strings to strings.json.
Set up a schema-validated content-gen prototype on one quest type.
Pick one language (Spanish) and run the AI hybrid localization end-to-end on 200 strings.
Build the daily live-ops AI agent and pipe its output to your team Slack at 9am.

You will measurably ship faster within 2 weeks. Compounding starts immediately.

25.4 The one-line philosophy

AI scales the parts of social games that don't have a soul, so humans can spend their time on the parts that do.

If you keep that line in mind on every adoption decision, you'll get most of these calls right.

📚 Further Reading

The companion to this document: 🌾 The Social Games Playbook 🎮 — the design playbook this AI guide is built to accelerate.
Steam AI policy (Jan 2026): https://store.steampowered.com (Valve disclosure requirements)
2026 Unity Game Development Report — AI adoption stats.
GDC 2026 AI in Game Development track — recordings via the GDC Vault.
arXiv 2410.15644 — PCG in Games: Survey with Insights on LLM Integration.
arXiv 2506.04699 — Generative Agent-Based Modeling for MMO Economies.
arXiv 2512.02358 — Beyond Playtesting: Multi-Agent Simulation for MMOs.
Modulate / ToxMod case studies (Activision, Schell Games).
Anthropic / OpenAI / Google enterprise data-use and indemnification terms.

This document is a living guide. AI tooling moves quickly — re-evaluate every 90 days. The principles in §3, §4, and §22 should outlast the specific tools.

If you found this helpful, let me know by leaving a 👍 or a comment!, or if you think this post could help someone, feel free to share it! Thank you very much! 😃