It's 2 AM. PagerDuty fires. Something's wrong with the payment service.
You open Log Explorer and stare at the query bar. Is it service:payment or @service:payment? Does negation use NOT or -? What's the facet for authentication failures again?
The logs have the answer. The syntax is the bottleneck.
If you have been here, this post is for you. I will walk through how to build a tool that translates plain English into valid Log Search queries, and more importantly, I will break down the Datadog-specific gotchas that make this problem interesting.
The Syntax Gotchas Worth Understanding
Before building anything, it helps to know exactly where Log Search syntax trips people up. These are the patterns I see most often.
The @ Prefix Rule
This causes more confusion than anything else. The rule is simple, but easy to forget:
Reserved attributes don't use @. These are the core fields Datadog provides:
service:payment-service
status:error
host:web-server-01
Custom facets and log attributes require @. Anything you've indexed yourself:
@http.status_code:500
@duration:>2000000000
@error.message:*timeout*
Get it backwards and the query silently returns nothing. No error, just empty results. This is especially frustrating when you are debugging under pressure.
Duration Is in Nanoseconds
This one bites everyone at least once. Datadog stores duration in nanoseconds, not seconds or milliseconds.
Filtering for requests over 2 seconds:
@duration:>2000000000
Miss a zero and you're filtering for 200ms. Add an extra zero and you're looking for 20-second requests. During an incident, this kind of mistake costs time.
Security Facets Are Unintuitive
If you work with Cloud SIEM or audit logs, you'll encounter facets like:
@evt.name:authentication
@evt.outcome:failure
@network.client.geoip.country_name:*
These aren't attributes most engineers use daily, so they never stick in memory. But they are exactly the queries you need when investigating suspicious activity, when speed matters most.
Teaching an LLM These Rules
The core challenge, LLMs have seen Datadog queries in training data, but not enough to be reliable. They will generate plausible-looking syntax that's subtly wrong.
The fix is a system prompt that's explicit about the rules. Not "here's how Datadog works" but "here are the exact patterns, and here's what NOT to do":
Reserved Attributes (no @ prefix):
service:payment-service
status:error
host:web-server-01
Facets and Custom Attributes (@ prefix required):
@http.status_code:500
@duration:>1000000000 # nanoseconds
@error.message:*timeout*
Common mistakes to avoid:
- @service:payment (wrong — reserved attribute)
- @duration:>2 (wrong — not in nanoseconds)
The nanoseconds callout is critical. Without it, the model generates @duration:>2 for "requests over 2 seconds", logical if you don't know Datadog's storage format, but completely wrong.
Security patterns need explicit examples too. Facets like @evt.outcome:failure aren't guessable:
Authentication failures:
@evt.name:authentication @evt.outcome:failure
CloudTrail console logins:
source:cloudtrail @evt.name:ConsoleLogin
External IPs only:
NOT @network.client.ip:10.* NOT @network.client.ip:192.168.*
This gets you to roughly 80% accuracy. The remaining 20% are edge cases, obscure facet names, integration-specific attributes, syntax variations. A static prompt can't cover all of Datadog's documentation.
Closing the Gap with RAG
The solution is retrieval-augmented generation, index Datadog's docs, retrieve relevant sections at query time, inject them into the prompt.
Ask about authentication failures → retrieve Cloud SIEM documentation. Ask about CloudTrail logs → retrieve AWS integration pages.
But there's a subtlety. Pure semantic search (dense embeddings) finds conceptually similar content. "Failed login" matches documentation about "authentication failure." That's useful.
However, it misses exact syntax. When someone asks about @evt.outcome, semantic search finds pages about "authentication results" but might miss the page containing the literal string @evt.outcome.
Hybrid Search Fixes This
Combining two retrieval methods solves the problem:
Dense embeddings (I used OpenAI's text-embedding-3-large) capture semantic similarity. They understand that "failed login" and "authentication failure" mean the same thing.
Sparse embeddings (SPLADE) capture keyword overlap. They ensure that @evt.outcome matches documents containing that exact string.
Reciprocal Rank Fusion merges both result sets. Documents appearing in both searches rise to the top:
results = qdrant_client.query_points(
collection_name=collection,
prefetch=[
Prefetch(query=dense_vector, using="dense", limit=limit * 2),
Prefetch(query=sparse_vector, using="sparse", limit=limit * 2),
],
query=FusionQuery(fusion=Fusion.RRF),
limit=limit,
)
The improvement is significant. Conceptual matches and exact syntax matches both surface, covering the edge cases that broke the static prompt.
Explaining Queries
Generation gets the attention, but explanation might be more useful day-to-day.
When you inherit a dashboard with a query like this:
@evt.name:authentication @evt.outcome:failure NOT @network.client.ip:10.* NOT @network.client.ip:192.168.* NOT @network.client.ip:172.16.*
Being able to ask "what does this do?" and get "Failed authentication attempts from IPs outside your internal network ranges" accelerates understanding significantly.
You can also use explanation to validate generated queries, generate one, then explain it to verify it does what you intended.
I woud suggest starting with explanation if you try this yourself. Once you see it working accurately, you will trust generation more.
What This Looks Like in Practice
A few examples of natural language → Log Search:
| Input | Output |
|---|---|
| "Errors from the payment service" | service:payment-service status:error |
| "Slow requests over 2 seconds" | @duration:>2000000000 |
| "Failed logins from external IPs" | @evt.name:authentication @evt.outcome:failure NOT @network.client.ip:10.* NOT @network.client.ip:192.168.* |
The security queries show the highest value. Engineers write observability queries often enough to memorize service:api status:error. But SIEM queries are rare enough that the syntax never sticks, and those are exactly the queries where response time matters.
Wrapping Up
The interesting part of this project wasn't the LLM integration, that's straightforward. It was learning the Datadog-specific details deeply enough to teach them to a model.
The @ prefix rule, the nanoseconds gotcha, the security facet patterns, these are the things that separate a query that works from one that silently fails. Encoding that knowledge explicitly, and augmenting it with retrieved documentation, is what makes the tool reliable.
If you want to explore the implementation, the code is on GitHub.
Views expressed are my own and do not represent my employer.

Top comments (0)