I asked four AI models which observability tool to use in 2026. They keep naming Datadog and Splunk, and never Better Stack.

#observability #devops #ai #startup

We build Bersyn, a tool that tracks which products AI models name when someone asks for a recommendation in a category. So we run a lot of scans. This is the third category in a row where the same thing happened, and observability is the cleanest example yet, so it is worth showing the receipts.

The question is the one a real engineer types into ChatGPT: "what is the best platform to monitor and debug my B2B SaaS app in production, and what are the strong alternatives?" We asked it five ways across four Surfaces: ChatGPT, Claude, Gemini and Perplexity. Then we measured how often each modern tool actually got named, each one scanned in its own home category.

The modern tools are not in the answer

Here is the Recommendation Share for five tools that engineers actually talk about, measured across the four Surfaces.

                ChatGPT   Claude   Perplexity   Gemini
Better Stack       0%       0%         0%          0%
Axiom              0%      40%         0%          0%
Highlight          0%      40%         0%          0%
OpenStatus         0%       0%        80%          0%
Checkly           20%      60%        40%         20%

Read the Better Stack row again. Zero, zero, zero, zero. Not a low score, an absence on every Surface we tested. Axiom and Highlight are named by exactly one model, Claude, and by none of the other three. OpenStatus exists only on Perplexity. A buyer who opens ChatGPT, which is most of them, walks away from this category never having heard four of these five names.

What AI names instead

So who got recommended in their place? Here is the tool AI reached for first when each modern challenger was not named.

Better Stack   ->  Splunk, Datadog, Loggly
Axiom          ->  Datadog, Honeycomb
Highlight      ->  FullStory, LogRocket, Sentry
OpenStatus     ->  Cachet
Checkly        ->  Pingdom, Datadog Synthetics, Grafana

Here is ChatGPT, verbatim, asked for the best log management and uptime monitoring platform, the exact category Better Stack sells into:

Choosing the best log management and uptime monitoring platform for a B2B SaaS team depends on several factors... ### Log Management Platforms 1. Splunk - Pros: Highly scalable, powerful search capabilities, extensive integrations, and strong data visualization tools.

And here is ChatGPT on observability and log management, the category Axiom sells into:

Here are some of the top platforms that are widely recognized for their capabilities in observability and log management: 1. Datadog: Datadog is a comprehensive monitoring and analytics platform for developers, IT operations teams, and businesses...

Splunk. Datadog. Pingdom. Cachet. LogRocket. Look at that list. These are the names that dominated the monitoring conversation around 2015 to 2018, when the training data was thick. Each modern challenger loses, in its own home category, to an incumbent from the era before it existed.

This is not AI being clueless about the category

Here is the part that makes it a real problem rather than a funny one. AI is not ignorant of observability. Ask it and it confidently knows two newer names:

                ChatGPT   Claude   Perplexity   Gemini
Sentry           100%      80%        60%         60%
Honeycomb         40%      40%        80%         40%

Sentry gets named in every single ChatGPT answer. Honeycomb shows up across all four models. So the models have room for modern tools in their mental map of monitoring. They have simply frozen that map around the incumbents plus the one or two challengers that broke through years ago. Everything that arrived after the map froze is Omitted.

That gap has a name in our world: Model Disagreement. When Axiom is named by Claude and by none of the other three, it is not "low visibility." It is invisible on the Surfaces most buyers use, and visible on the one they use least.

Why this should bother a founder, not just amuse them

The reflex is to wave it off. AI is behind, the models have a training cutoff, it will catch up. Maybe. But your buyer is asking the question today, and the answer they get today hands them Datadog.

Search had twenty years to learn that Better Stack exists. AI recommendation answers are being formed right now, off whatever evidence the models can find, and for newer companies that evidence is thin. So the incumbent gets named by reflex and the challenger gets skipped.

The four ways a company shows up wrong in these answers, in our vocabulary:

Omitted. The model lists competitors and skips you. This is Better Stack, Axiom and Highlight on ChatGPT.
Misclassified. The model files you under the wrong category.
Generic. The model names you so vaguely no buyer could shortlist you.
Confused. The model conflates you with a similarly named competitor.

What actually moves it

We do not claim AI is biased or that anyone is paying for placement. We read what the models say and show you the evidence. What changes the answer over time is the same thing that changed search: published, specific, verifiable evidence that associates your company with the category and the buyer question. Comparison pages that name the alternatives honestly. Documentation that states plainly what you are and who you are for. Third party mentions in the exact words an engineer would use when asking.

None of that is fast. But the first step is not writing more content. It is finding out what the models say about you right now, so you know whether you are Omitted, Generic or Confused, because the fix is different for each.

See your own category

We built Bersyn to show you exactly the tables above, for your company, with the verbatim answers behind them. Run a free scan on your own product at bersyn.com and see which Surfaces name you, which name a competitor in your place, and why.

If you ship on Better Stack, Axiom, Highlight, OpenStatus or Checkly, tell me in the comments which model gets your category right. This is the third category I have scanned where ChatGPT defaults to the old incumbent and skips everything newer, and the pattern is the most interesting thing I look at all week.