I launched an iOS app about a month ago. It's a niche tool for resellers — people who buy stuff at car boot sales, charity shops, and flea markets, then flip it on eBay, Vinted, Depop.
56 first-time downloads in 25 days. Not a lot, but I'm a solo developer and this is a side project built with Claude Code.
I was doing something a bit weird. Every week or two I'd open ChatGPT, Perplexity, and Claude, then ask them to recommend an app for tracking reselling profits. I wanted to see if my app shows up. Sometimes it did. Sometimes it didn't. Depended entirely on how I phrased the question.
Then I realized I could automate this.
The script
I wrote a Python script that takes 11 questions a real user might ask — things like "best app for tracking reselling profit in 2026" or "free iOS app for resellers to track inventory across eBay, Vinted, and Depop." For each question it runs two passes:
- gpt-4o (no web access) — baseline, shows what's in training data
- gpt-4o-search-preview (live web search) — simulates what a ChatGPT user actually sees
If my app appears but isn't ranked first, the script asks a follow-up: "what advantages does the #1 pick have over FlipperHelper?" If it doesn't appear at all, it asks: "why don't you recommend FlipperHelper?"
First audit results
| Metric | Value |
|---|---|
| Queries tested | 11 |
| Mentioned (training data) | 4/11 |
| Mentioned (live web) | 5/11 |
| Total web citations | 40 |
| Top cited domain | apps.apple.com (13/40) |
| My website citations | 0 |
The App Store page is the single most important source of truth for how LLMs understand your app. My blog posts, Medium articles, dev.to articles — none of them were cited in app recommendation queries. This was a wake-up call.
What ChatGPT got wrong about my app
The most actionable finding was a straight-up misunderstanding. My app has optional Google Drive photo sync and CSV export to Sheets. ChatGPT interpreted this as my only backup mechanism and started warning users about "risk of data loss" if they don't connect Google Drive.
The app is actually offline-first. Everything is stored locally on-device. No account required. Google Drive is purely optional. But my App Store description didn't make this distinction clearly enough.
Since ChatGPT treats the App Store as its primary source (32.5% of all citations), this vague wording was actively hurting my recommendations. I rewrote the description to explicitly separate local storage, optional cloud backup, and optional export.
Competitive gaps the LLM found
When my app appeared but ranked below competitors, the script asked why. Here's what it found:
Custom notes field — Other apps let users add free-text notes to items. My app didn't have this. ChatGPT called it "more limited functionality." The fix took about 3 minutes — one optional text field. But the LLM considered it a meaningful competitive disadvantage.
Review count — Only 1 App Store review. The model explicitly said "community hasn't validated that everything works well." Fair point, nothing I can rush.
Freshness signals — The model favors apps with recent, frequent updates. I see competitors with changelogs that say "bug fixes" over and over. Maybe they're gaming this signal. I took a different approach — published a /changelog.html page with structured JSON-LD (FAQPage schema) that highlights my testing coverage: 10 test files, 3,424 lines of test code, 63 end-to-end QA flows, 100k-item stress test.
Indexing tricks I found along the way
Bing Webmaster Tools — ChatGPT's web search mode uses Bing under the hood. You can register your site, request indexing, and there's even an "AI Performance" tab (beta) that shows how often Copilot cites your pages. I also set up IndexNow so every GitHub Pages deploy auto-requests re-indexing.
Wayback Machine — I archived my key pages there for free. The logic: search-mode LLMs use live web (covered by Bing indexing). But statically trained models use web archive data for training. Two types of models, two indexing strategies.
Structured data — I added ai-sitemap.xml with intent annotations and FAQ JSON-LD across key pages. Not sure how much this helps yet but it can't hurt.
Is it working?
My download data tells an interesting story. First 4 days after launch (before any marketing): 1 download. Reddit post spike (March 25-26): 50 downloads over 18 days. Then April 12-14 — 5 downloads across 3 days with zero new content pushed. Every day had at least one download.
That last bit is the first real signal of organic or AI-driven discovery. People finding the app without me posting anything. Could be ASO kicking in, could be LLMs recommending it, could be blog content getting indexed. Hard to attribute exactly but the timing correlates with the AEO work I shipped on April 15.
The approach
I think of this as AEO — Answer Engine Optimization. The same way we used to optimize for Google search rankings, we now need to optimize for the answers LLMs give when someone asks "what's the best app for X."
The feedback loop is faster than traditional SEO because you can literally ask the model what's wrong and it'll tell you. Not all of it is reliable — there are system prompts preventing full transparency — but enough is actionable to make it worth running weekly.
My recommendation for anyone building a product:
- Write a script that queries major LLMs with questions your users would ask
- Track whether your product appears and where it ranks
- When it doesn't appear, ask why — the model will give specific reasons
- Fix the actual issues (description clarity, missing features, content gaps)
- Re-run weekly and track progress
The bar is low right now. Most indie developers aren't doing this at all, which means fixing a few things the LLM complains about can move you ahead of competitors who haven't even thought about it.
FlipperHelper is a free iOS app for tracking reselling purchases, expenses, and profit. App Store link
Top comments (0)