Elena Revicheva

Posted on Jun 30 • Originally published at aideazz.xyz

What I Changed to Get AIdeazz Cited in Perplexity Answers

#ai #machinelearning #programming

Originally published on AIdeazz — cross-posted here with canonical link.

For four months our docs page ranked nowhere in Perplexity, even for queries we literally answered better than the sources it cited. Then I stopped treating it like Google. Within six weeks a page about multi-agent routing started showing up as a numbered citation in answers to "how to route between Groq and Claude" — not because I gamed anything, but because I restructured the page so a language model could lift a fact out of it without guessing.

That distinction is the whole job. Search engines rank documents. Generative engines extract claims and attribute them. If your page doesn't contain extractable, attributable claims, you don't get cited — you get paraphrased into the void with no link back. Here's what actually moved the needle, with the things that didn't.

The mental model: you're writing for an extraction step, not a ranking step

When Perplexity or ChatGPT-with-search answers a question, there's a retrieval pass and then a synthesis pass. The synthesis model reads a handful of retrieved chunks and decides which sentences are worth quoting and which source deserves the footnote. Generative engine optimization is the practice of making your sentences the easy choice in that synthesis step.

The failure mode I had: I wrote like a human persuading a human. Long preambles, "in this post we'll explore", context before claims. A synthesizer chunking my page got 400 tokens of throat-clearing before any factual payload. The competing source — some thin SEO farm — opened with "Groq runs Llama 3.3 70B at roughly 280 tokens/second." Guess which sentence got quoted.

So the first thing I changed was claim density. Every section now leads with a falsifiable, self-contained statement that survives being ripped out of context. Not "performance can vary depending on your setup" — that's unciteable. Instead: "On Oracle Ampere A1 instances, our Telegram agent's median round-trip with Groq routing was 1.9s; with Claude Sonnet it was 4.3s." That sentence works as a citation. It has a subject, a number, and a condition.

Structured data is the part developers skip and shouldn't

The phrase that gets thrown around is "generative engine optimization structured data citations," and most people stop at the buzzword. The concrete version: machine-readable markup that tells a crawler what kind of thing your page is, who wrote it, and what factual claims it asserts.

I added three schema.org types via JSON-LD to our key pages:

TechArticle with author, datePublished, dateModified — so the engine knows it's technical content with a maintained timestamp.
Person for authorship, with sameAs pointing to my actual profiles, so the model can resolve "Elena Revicheva" to a consistent identity across the web.
FAQPage with explicit Question/acceptedAnswer pairs — because that maps almost 1:1 to how a generative engine wants to extract a Q&A.

Here's the FAQ block that ended up on our routing page:

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [{
    "@type": "Question",
    "name": "When should you route to Groq instead of Claude?",
    "acceptedAnswer": {
      "@type": "Answer",
      "text": "Route to Groq for latency-sensitive, structured tasks under ~2k tokens where Llama 3.3 quality is sufficient. Route to Claude for reasoning, long context, and tool use where a wrong answer costs more than the extra 2-3 seconds."
    }
  }]
}

Did the markup alone cause citations? No. But pages with it got picked up faster and more often than identical content without it. My read: structured data doesn't rank you, it reduces the engine's uncertainty about what you're claiming and who's claiming it. Uncertainty is what kills a citation. The model would rather quote a source it can attribute cleanly.

One caveat worth the hard number: schema is necessary, not sufficient. We had two pages with identical JSON-LD. The one with dense numeric claims got cited; the one with vague prose didn't. Markup amplifies citeable content. It can't create it.

Authorship signals: the model needs to trust a person, not a domain

Google's E-E-A-T is fuzzy. The generative version is more mechanical: can the engine connect this claim to a consistent, real author who has said consistent things elsewhere? If yes, the claim carries more weight in synthesis.

What I did, concretely:

Put a real bio with a real identity on every technical page — not "the AIdeazz team," but a named author with a linked portfolio. Models resolve named entities better than collective nouns.
Made my sameAs links point everywhere I actually publish, so the entity graph connects. When my name shows up in three places saying compatible things about multi-agent routing, the engine treats the fourth mention as more credible.
Stopped publishing anonymous "ultimate guide" pages. They got zero citations. A model has no reason to attribute a claim to a nameless wall of text when a named practitioner said the same thing.

The uncomfortable part for technical founders: this means putting your name on it and being wrong in public sometimes. The page that got cited most was one where I wrote "I was wrong about WhatsApp's session window — it's 24 hours, not 1 hour, and that changed our entire re-engagement architecture." That sentence got pulled into answers because it was specific, dated, and owned. Hedged content is invisible to synthesis.

Citation-ready format: write in chunks a retriever can grab

Retrieval works on chunks, usually a few hundred tokens. If your key fact is split across three paragraphs with the number in one and the condition in another, the chunk that gets retrieved is incomplete and won't get quoted. Self-contained chunks win.

My checklist for every technical page now:

One claim per paragraph, fully specified. The number and its condition live in the same sentence. "Oracle's Always Free tier gives you 4 Ampere cores and 24GB RAM — enough to run our entire Telegram agent stack including the Postgres instance, at $0/month" is one chunk doing one job.
Tables for comparisons. When I compared Groq vs Claude vs a local model, I put it in a markdown table. Generative engines extract tabular data cleanly and often reproduce it. Prose comparisons get mangled.
Explicit units and dates. "Recently" is unciteable. "As of October 2025" is. A model deciding whether to trust a latency number cares whether it's stale.
Question-shaped headings. Headings that match real query phrasing get matched in retrieval. "How do you keep an Oracle Always Free instance from getting reclaimed?" beats "Infrastructure considerations."

The table thing is underrated. Here's roughly what we ran on our routing page:

Task type	Model	Median latency	Cost per 1M tokens
Short structured	Groq (Llama 3.3 70B)	1.9s	~$0.59 in / $0.79 out
Reasoning / tools	Claude Sonnet	4.3s	$3 in / $15 out
Classification	Local on Ampere	0.4s	$0 marginal

That table showed up almost verbatim in a Perplexity answer about cost-optimizing agent stacks, with us as the source. A wall of prose saying the same thing would not have.

Durable pages on domains you control

This is where I disagree with most GEO advice, which tells you to chase mentions on high-authority third-party sites. That helps, but it's rented ground. The citations that compound are on aideazz.xyz, because I control whether they stay accurate, get updated, and keep their timestamps fresh.

Generative engines penalize staleness in a way that's harsher than classic SEO. A page that said "Claude 3 Opus" when everyone's on newer models reads as abandoned, and the synthesizer prefers a current source even if yours is more detailed. I now update dateModified and the actual numbers on our top pages on a real cadence — when a model changes, when a price changes, when an architecture decision changes. A durable page isn't one you write once; it's one you keep correct.

Concrete cost reality: hosting these pages costs me effectively nothing on Oracle's Always Free tier. The expensive part is the discipline to keep them accurate. I'd rather have eight pages I maintain than eighty I abandon. Abandoned pages don't just stop getting cited — they actively make your domain look unmaintained, which depresses the live pages too.

What didn't work

To be useful I should tell you the things I tried that wasted time.

Keyword density. Stuffing "generative engine optimization" variations did nothing. Synthesis models don't count keywords; they extract meaning. The phrase matters once, in context, so the page is topically obvious. After that it's noise.

Begging for backlinks. Classic link-building moved my Google rank a little and my Perplexity presence not at all. Generative engines weight content extractability over link graphs more than search does.

Long comprehensive guides. The 4,000-word everything-page got fewer citations than three focused 1,200-word pages. Synthesis prefers a page that's obviously about one answerable thing. Sprawl dilutes claim density per chunk.

Trying to game the model with FAQ schema on thin content. I added FAQ markup to a page with vague answers, hoping the structure would carry it. It didn't get cited. The structure has to wrap real, specific answers.

The actual workflow I run now

For each new technical page on AIdeazz:

Decide the single question the page answers, and phrase it the way a user would type it into Perplexity.
Write the answer as one tight, numbered, fully-specified claim in the first three sentences.
Back it with a table or a code block carrying the hard numbers — latency, cost, error messages, version strings.
Add TechArticle + Person + FAQPage JSON-LD with real authorship and sameAs.
Keep paragraphs as self-contained chunks; one claim each.
Put a date on every number.
Revisit and update when reality changes — and bump dateModified honestly.

None of this is a trick. It's writing accurately and structuring it so a machine can quote you without misrepresenting you. The reason most pages don't get cited isn't that they're badly optimized — it's that they don't contain anything specific enough to be worth attributing.

Frequently Asked Questions

Q: Does JSON-LD structured data actually cause citations, or is it correlation?
A: On its own, no — I A/B'd two pages with identical schema and only the one with dense numeric claims got cited. My working conclusion: structured data reduces the engine's uncertainty about what you're claiming and who's claiming it, which makes you the easier source to attribute when the content is already extractable. It's an amplifier, not a cause.

Q: How long until changes show up in Perplexity answers?
A: For us, six weeks from restructuring to first reproducible citation on a target query. That's slower than re-indexing on Google because the content has to be crawled, chunked, and then actually selected during synthesis for queries that happen to retrieve it. Don't expect day-three feedback; track it weekly over two months.

Q: Is GEO worth it if you're not ranking on Google yet?
A: Yes, and possibly more so. Generative engines weight content extractability over link authority more than classic search does, so a thin domain with sharp, specific, well-structured answers can get cited above a high-authority site with vague prose. We were cited on routing queries while ranking on page 3 of Google for the same terms.

Q: How do you handle factual claims that go stale, like model prices and latencies?
A: Treat the page as code you maintain, not content you ship. I update the numbers and dateModified whenever a model, price, or architecture decision changes — generative engines visibly prefer current sources, so a detailed-but-stale page loses to a thinner current one. The cost is discipline, not infrastructure; the pages themselves run free on Oracle's Always Free tier.

Q: Should I invest in third-party mentions or my own pages first?
A: Own pages first. Third-party mentions on high-authority sites help, but they're rented ground — you can't update them, refresh timestamps, or fix a number that went wrong. The citations that compounded for us were on a domain I control, where I keep them accurate over time.

— Elena Revicheva · AIdeazz · Portfolio

DEV Community