DEV Community

Olga Braginskaya
Olga Braginskaya Subscriber

Posted on • Originally published at datobra.com on

How I'm Optimizing My Small Blog to Get Cited by AI (Based on Actual Research)

AI ate the traffic and nobody sent a thank you note.

LinkedIn's own marketing team admitted that AI search wiped out 60% of their traffic. Not because rankings dropped but because people stopped clicking, AI answered the question before anyone reached the link. Google is doing the same thing with AI Overviews. They put the answer right there on the search page and it reduced clicks to top-ranking content by 58%.

This isn’t limited to big publishers, smaller blogs feel it too. I run a data engineering blog for three years and the audience hasn’t disappeared, but the interaction changed: people get what they need directly from AI and your page is no longer part of that first step.

But the click didn’t disappear, it moved. Google AI Overviews shows source links next to its responses, ChatGPT adds citations, Perplexity links to every source it uses. Now you’re not just competing for a spot on a search results page, you’re competing for a place inside the answer itself. And if you’re cited, you’re one of three or four links people actually see, not one of ten they scroll past.

So the question became: how do you get cited? I started digging into it and found that the SEO world already has a name for this: AEO.

Most AEO advice is just SEO with a new label

I spent a while on Reddit reading through AEO threads and honestly most of it was discouraging. The tools that used to do keyword research now have AI optimization features bolted on, but they want $100-200 a month for it, packed with features built for brands managing campaigns at scale. That's a lot when you're running one blog and your marketing budget is zero. And a lot of the advice still felt like repackaged SEO wisdom: add schema, optimize your headings, follow this checklist, just with "for AI" added to the title. Some of these tools claim they send requests to AI providers and check if you're being cited, though I haven't seen any real proof of that (and certainly not for $100 a month).

One useful thing I picked up from Reddit is that AI changed search from keywords to questions. People don't type "parquet file compaction python" into ChatGPT. They ask "how do I compact small Parquet files without Spark." So your headings should work as questions and your sections should start with the answer directly, because LLMs are like people - they don't want to read three paragraphs before getting to the point.

Someone actually measured what gets cited

Then I saw this Reddit post that was actually different from everything else. A team called Indexably took 18,000 real pages across ChatGPT, Claude, Gemini, Perplexity and Google AI Overviews and actually measured what's statistically different between pages that get cited and pages that don't. Five full research posts with real numbers and methodology published openly.

Here's what I took away as a blogger.

Domain reputation is 77% of the equation

Domain-level signals like backlinks, brand recognition and how well-known your site is account for 77% of what predicts citation. Only 23% is page-level. So realistically if your blog is small, this is the main constraint and you can't "optimize" your way around it.

But since AI is reading your content I think it's worth making your existing expertise visible in the text itself. Not a formal bio, just naturally mentioning that you've been doing this for years, that this solution ran in production, that you've seen this problem across multiple projects. The kind of details that signal to both readers and AI that you actually know what you're talking about. For example in this post I mentioned in the intro that "I run a data engineering blog for three years", that's already enough to give context about who I am without turning it into a resume.

Backlink diversity beats volume

Ten links to your blog from ten different places are worth more than a hundred from one. They measured this specifically and backlink diversity is more than twice as predictive as raw backlink count. This is one of the few things that actually feels actionable at a small scale.

I found that some targets are easier than others. Platforms like Dev.to or Hashnode let you republish without moderation so you can cross-post there with minimal effort. Hacker News works well too, you can submit your own posts. I submitted my updated PyArrow compactor post there and got 60 visitors in 24 hours. That's not a viral moment but for a blog my size and a pretty niche topic about Parquet file compaction it's not bad either.

Plausible stats after submitting to Hacker News - 60 visitors from HN

Getting into big newsletters is nearly impossible but smaller ones, especially from people just starting out, are much more approachable. Slack communities are underrated, especially smaller learning-focused ones. For example DataTalks.Club has a dedicated self-promotion channel. Reddit is basically useless for this, they'll ban you for dropping links and I haven't figured out a way around it. LinkedIn works relatively fine if you have at least a hundred or so followers. Sometimes posts get a few thousand views, but it has to be a real post with a catchy opening paragraph, not just a link drop, because nobody reads past the first few lines.

What actually matters on the page

Most page-level optimization only matters if your domain is already strong. But it’s still worth doing because it makes your content easier for AI systems to read and extract.

So what actually matters on the page:

  • The strongest page-level signal is basic HTML. Canonical, lang, meta description, viewport, doctype.
  • Cited pages aren't longer, they're better structured. Shorter paragraphs, more varied vocabulary, shorter sections. Word count is slightly negatively correlated with citation. Writing more for the sake of it can actually hurt.
  • Include real numbers and statistics. A Princeton study showed that adding stats improved AI visibility by 22% to 41%. So LLMs are like people, they also like staring at numbers and stats. For a small blog this feels like one of the most realistic things to experiment with.
  • Each AI platform wants different things. ChatGPT leans toward freshness and metadata. Gemini cares about crawlability and is the only model where page optimization gives a large measurable lift. Claude spreads weight more evenly and values statistics and author info the most.
  • If your schema is injected by JavaScript most AI crawlers won't see it. Only Gemini renders JS. So a lot of the "advanced setups" people pay for don't even apply unless they're in static HTML. This was an interesting discovery for me personally because I used to embed code samples via GitHub Gists, which load through JavaScript. That means most AI crawlers saw empty space where my code should have been. So if you're writing technical posts, use native code blocks instead of embedded scripts.

The 90% that no one can measure

The most honest finding is that all of this explains only about 10% of what drives citation. The remaining 90% is probably just whether your content actually answers the question someone asked.

What am I actually going to change on my blog?

I took the research findings, put them into Claude and ChatGPT and built a prompt that works as an AEO reviewer (take the prompt and run it on your own posts). I paste the prompt, paste my article and it checks the text against the research data and gives me a prioritized list of what to fix.

Here's what the review showed me about my own posts:

  • My code samples were in GitHub Gists which load via JavaScript, so most AI crawlers couldn't see them. I should move almost everything to native code blocks, except for really large snippets.
  • I tend to get carried away when I write and end up with massive paragraphs and sections that go on forever. Though you also shouldn't blindly follow everything the LLM says, for example ChatGPT complained that the "What actually matters on the page" section in this post is a list and doesn't match the rest of the style. It wanted me to rewrite it as flowing prose. But I actually like how the key points read as a clear list and that kind of clarity gets lost when you turn it into paragraphs.
  • The review prompt also helped me see actual content gaps. For example in my PyArrow compactor post I never mentioned what size the files were before and after compaction or how many output files I ended up with. There were also zero screenshots in the entire post.
  • I have a habit of building up to the answer instead of starting with it. I write and write and the actual point shows up somewhere at the end. I should be making films, not blog posts.
  • I'm also lazy about adding a featured image for each post and for the X card. It's hard to come up with an image for a technical article but then your links look ugly when you share them on LinkedIn or anywhere else.

And honestly the whole thing looks pretty cool, like you just had a conversation with a PhD in content analytics or something:

The 90% I can't control

Domain authority takes years and I'm realistic about where my blog sits. And the 90% that depends on whether your content actually answers someone's question is not something you can fix with meta tags.

I don't expect any of this to suddenly bring me a hundred thousand readers. But it's an interesting thing to work on and the research is detailed enough to put into an LLM and build your own review prompt, whether it's for a brand or a blog. I'm still going through my posts one by one, fixing things when I have time. Some changes take five minutes, some make me rewrite half the post. At least now I have some idea of what actually matters instead of guessing.

Subscribe on datobra.com to not miss new posts. Updates: @ohthatdatagirl.

Top comments (0)