DEV Community

Make 'em behave! Don't let your AI agents hallucinate

I built a multi-agent project, for users to ask questions about their AWS infrastructure (3 AWS accounts managed by AWS Organizations) and get answers in human readable way.

The system connects to users AWS infrastructure and provide the answer by reading various log types and creating API calls to multiple AWS resources.

Project repo
Part 1: I built a multi-agent project on AWS, with Strands AI and AgentCore
Part 2: Give 'em something to read! Building a data pipeline for your agentic AI project
Part 3: Make 'em safe! Security for your agentic AI project
Part 4: Make 'em remember! Memory in the agentic AI project
Part 5: Make 'em visible! See what is happening inside your agentic workflow
Part 6: When shebangs party hard with your MAC path on OpenTelemetry
Part 7: Make 'em behave! Don't let your AI agents hallucinate

 

No matter what, they will try!

This article is about hallucinations, or to be more precise: how I tried to make hallucinations more difficult to happen, easier to detect and less dangerous when happenning anyway.

Because let's face the truth:

  1. You cannot just tell an AI agent: Do not hallucinate and expect it won't.

  2. LLM's only purpose it's generate text. If there is nothing to generate, or not enough data to generate from guess what it does.


The problem

At the begging I thought the main challenge would be something like: can the agent answer questions about my AWS accounts?
It turned out my main challenge actually was: Can I trust the answer?

If users asks...

./alexandra.sh --new "Give me last CloudTrail row from today"
Enter fullscreen mode Exit fullscreen mode

...and if the agent invents one row, drops one important finding, access the wrong account, or queries the wrong date, the final answer still looks nice and professional but it's worthy of nothing.


Multi-agent makes it worse

With multi-agent pattern known as agents as tools this could get even worse.

SCENARIO 1:

  1. Supervisor agent receives question Give me last CloudTrail row from today.

  2. Supervisor agent correctly understands to invoke CloudTrail subagent, so it does.

  3. Despite its instructions, CloudTrail subagent incorrectly creates an SQL query with yesterday's date. This is not truth, this is pure hallucination.

  4. SQL query is not syntactically wrong, so Athena retrieves the rows from DataLake (for the wrong date) and sends the data back to CloudTrail subagent.

  5. Response is sent back to supervisor agent, which doesn't care if it is right. It got its rows so it summarizes.
    Hallucination of one became a hard truth for the other read here

  6. Response seems legit, so user has no doubt.

    hallucination 1

SCENARIO 2:

  1. Supervisor agent receives question Give me last CloudTrail row from today.

  2. Supervisor agent correctly understands to invoke CloudTrail subagent, so it does.

  3. CloudTrail subagent correctly creates an SQL query with today's date.

  4. SQL query is not syntactically wrong, so Athena retrieves the rows from DataLake and sends the data back to CloudTrail subagent.

  5. CloudTrail subagent, despite its instructions not to summarize, actually summarizes the output and send to supervisor agent.

  6. Summarized response is received by supervisor agebt, which doesn't care if it is right. It got its data so it summarizes. It is actually summarizing a summary.
    When two agents are summarizing, the danger of hallucination doubles. Even if sub-agent summary is correct, it should not summarized - this is the job of supervisor.
    And if sub-agent fabricated just a single fact, the supervisor's summary becomes invalid. Same pattern as before about hallucination and ground truth.

  7. Response seems legit, so user has no doubt.

    hallucination 2


Hallucination patterns

During the testing I observed nine hallucinations and sorted them into categories (H1 - H9) for better mitigation:

  • H1: Supervisor says "no results" even though a tool returned data.
  • H2: Supervisor agent drops rows from the tool result.
  • H3: Supervisor agent fabricates rows or fields that were not returned.
  • H4: Supervisor agent picks the wrong subagent.
  • H5: Supervisor agent passes the wrong account or time range.
  • H6: Subagent creates incorrect or too big SQL.
  • H7: Subagent returns a summary instead of raw evidence.
  • H8: Supervisor asks a follow-up question instead of answering with the data it already has.
  • H9: Summary of supervisor agent is out of the line from user's question

Layers of mitigation

There are several layer I use to deal with the hallucination patterns, from "prompt to hooks."
 

It all starts with prompt

Bulletproof prompt is absolutely the must.
Every agent in the project uses a structured (RISEN - Role, Instructions, Steps, Expectation, Narrowing) prompt.

For example, the CloudTrail subagent's prompt does not say:

You are a helpful assistant, answer questions about AWS.
Enter fullscreen mode Exit fullscreen mode

Instead, it is says exactly what that particular agent is:

You are a CloudTrail log analyst.
You translate natural language questions about AWS API activity into Athena SQL.
Use lttm_logs.cloudtrail_logs.
Always include partition keys.
Return raw result rows.
Do not summarize or paraphrase the data.
Enter fullscreen mode Exit fullscreen mode

A narrow prompt reduces the chance, agent starts doing creative writing instead of serious log analysis.

However, prompt instructions are not enforced, because the model may still ignore, misunderstand, or do something almost right but still wrong.

Prompt is just first layer, but not the only layer.


Layer 2: One summarizer only

This was already mentioned before - I want my subagents not to summarize at all.
But this is a problem - generating the text is what LLM was created for, so no matter how many times I tell it in the prompt not to summarize, it will.

So I let it summarize and gratefully ignore it.

hallucination 2

Whatever the subagent creates, raw tool result (the Athena response) is the only part of the data I want supervisor to receive, so this is exactly what is extracted.

  • sub-agent returns result (sub-agent summary and raw rows)

  • raw rows are extracted as raw_json

result = cloudtrail_agent(question)

raw_json = _extract_raw_result(cloudtrail_agent)

if not raw_json:
    return str(result)

rows = json.loads(raw_json)
if isinstance(rows, list):
    return format_athena_rows(rows)
Enter fullscreen mode Exit fullscreen mode

Raw rows looks something like this:

[
{"eventtime": "2026-04-25T10:30:00Z", "eventname": "CreateBucket", "eventsource": "s3.amazonaws.com", "useridentity": "arn:aws:iam::123:user/admin"},
{"eventtime": "2026-04-25T09:15:00Z", "eventname": "TerminateInstances", "eventsource": "ec2.amazonaws.com", "useridentity": "arn:aws:iam::123:role/deploy"}
]
Enter fullscreen mode Exit fullscreen mode

Rows are then deterministically formatted by another function, so supervisor receives data formatted in the way it expects:

Results: 2 rows returned.

Row 1:
  eventtime: 2026-04-25T10:30:00Z
  eventname: CreateBucket
  eventsource: s3.amazonaws.com
  useridentity: arn:aws:iam::<account-id>:user/admin

Row 2:
  eventtime: 2026-04-25T09:15:00Z
  eventname: TerminateInstances
  eventsource: ec2.amazonaws.com
  useridentity: arn:aws:iam::<account-id>:role/deploy
Enter fullscreen mode Exit fullscreen mode

This is the data supervisor agent works with and summarizes. It receives data deterministically formatted while subagent summary is not the source of truth anymore.


Layer 3: The hooks

Deterministic validations are essential part of my anti-hallucination layers.
Here I am using 3 hooks:

  • SQLValidatorHook - is SQL query is correct?
  • SQLRewriteHook - might SQL response be too big?
  • OutputIntegrityHook - did supervisor agent summarize anything?

Those hooks run on different Strands events.
 

SQLValidatorHook

Because subagent generates SQL, there is always a chance SQL goes bad.
This hooks runs on every subagent creating SQL queries and is invoked before query is sent to Athena...

class SQLValidatorHook(HookProvider):
    def register_hooks(self, registry: HookRegistry, **kwargs: Any) -> None:
        registry.add_callback(BeforeToolCallEvent, self.on_before_tool_call)

    def on_before_tool_call(self, event: BeforeToolCallEvent) -> None:
        if event.tool_use.get("name") != "run_athena_query":
            return

        sql = event.tool_use.get("input", {}).get("sql", "")
        if not sql:
            return

        errors = validate_sql(sql)
        if errors:
            msg = f"SQL validation failed: {'; '.join(errors)}. Fix and retry."
            event.cancel_tool = msg
Enter fullscreen mode Exit fullscreen mode

... and calls function validate_sql which checks for patterns like:

  • awsdatacatalog. prefix in SQL
  • Blocked keywords: DROP, DELETE, UPDATE, INSERT, ALTER, TRUNCATE
  • wrong table
  • wrong partition keys (must match the glue table)
  • SELECT * is used

This hook is a mix of antihallucination and security and is also described here.

Example problem:
Sub-agent creates SQL like this:

SELECT *
FROM cloudtrail_logs
WHERE eventname = 'CreateBucket'
Enter fullscreen mode Exit fullscreen mode

That looks innocent, but it's actually wrong. It should use the real Glue table name, explicit columns and required partitions.

The hook rejects it and sends feedback back into the agent loop, so model can retry and fix it:

SQL validation failed: Use fully qualified table name: 'lttm_logs.cloudtrail_logs'; Missing required partition keys in WHERE: account_id, region, year, month, day; Use explicit column names instead of SELECT *.
Fix and retry.
Enter fullscreen mode Exit fullscreen mode

 

SQLRewriteHook

This hook runs as well on every subagent creating SQL queries and truncates the lines, if user asked for too many rows.

Why is this a problem?
If a user asks:

./alexandra.sh --new "show me last 1000 CloudTrail events"
Enter fullscreen mode Exit fullscreen mode

The agent actually gets too much data back and the model may:

  • truncate the answer
  • summarize too aggressively
  • drop rows
  • retry again and again
  • confidently produce a partial answer
  • or simply context window hits the token limitation

None of that is good, so that's why SQLRewriteHook adds LIMIT 20 to the SQL query.

current_limit = self._get_current_limit(sql)
target_limit = self._default_limit

if current_limit is None:
    sql = self._set_limit(sql, target_limit)
    emit_status(f"Added LIMIT {target_limit} to prevent oversized results")

elif current_limit > target_limit:
    sql = self._set_limit(sql, target_limit)
    emit_status(
        f"Requested {current_limit} lines, but due to context limitations stripping to {target_limit}"
    )
    self._limit_was_capped = True

if sql != original_sql:
    event.tool_use["input"]["sql"] = sql
Enter fullscreen mode Exit fullscreen mode

User see this behavior in streaming:

⏳ CloudTrail agent processing...
⏳ Added LIMIT 20 to prevent oversized results
⏳ Athena query executing (QueryExecutionId: 43a72cbd-39a7-4c5f-8dba-8be31aa2e45c)
Enter fullscreen mode Exit fullscreen mode

But models are smart! During the testing I realized that if I limit it like that, the model retries to query the 100 rows (or whatever the initial request was), instead of actual 20.
That actually makes sense because model sees that it was asked for 100 but it created SQL query for 20, so it tries to correct itself.

Therefore the hook also blocks the retry from happening and actually explains who is the boss here.

if self._limit_was_capped and self._last_query_returned_rows:
    event.cancel_tool = (
        "Your previous query already returned data with the maximum allowed rows. "
        "Do NOT retry for more rows. Return the results you already have to the user."
    )
Enter fullscreen mode Exit fullscreen mode


king

The same hook is called one more time and that's when results from Athena are returned, when it's check if Athena did not return empty response.
 

OutputIntegrityHook

Time to time even supervisor agent joined the dope party and started to hallucinate in its own way, by actually receiving the data but outputting No results found instead and going for retry. Well, at least it tried, until I played with better cards.

OutputIntegrityHook runs on supervisor agent, checks which sub-agent (which query_* tool) returned the data,

QUERY_TOOLS = {
    "query_cloudtrail", "query_cloudwatch", "query_config", "..."
}
Enter fullscreen mode Exit fullscreen mode

remembers that data and after response is generated, it checks for "contradiction" and "follow-up-question" patterns.

CONTRADICTION_PATTERNS = [
    "no results found", "no results were found", "didn't return any", "..."
]

FOLLOWUP_PATTERNS = [
    "would you like me to", "shall i", "should i check", "..."
]
Enter fullscreen mode Exit fullscreen mode

This catche two stupid but dangerous behaviors:

  • Tool returned data, but model says no data.
  • Tool returned data, but model asks whether it should check something.

Nice try buddy. Now do your job!

agentcore deploy


LLM-as-judge

Some problems are easy to catch with deterministic or regex-ish checks like we saw above, but other need more sophisticated touch.
Especially if problem needs some kind of a judgement to be solved.

Example:

./alexandra.sh --new "Give me last CloudTrail row from today"
Enter fullscreen mode Exit fullscreen mode

If supervisor agent invokes CuardDuty gent, this is wrong.

Therefore I added SupervisorSteeringHandler plugin, an LLM-as-judge layer.

This is the first and last check running on supervisor agent, because it runs on two different Strands events:

On BeforeToolCallEvent - the routing check

  • Plugin checks if the supervisor agent called the right sub-agent, using the right AWS account and right time range.

On AfterModelResponse - the response validation

  • It checks if the final response faithfully represents the tool result.

None of that is deterministic check, it actually calls another LLM, in my case it's Claude Haiku 4.5
 

The routing check

Before the supervisor agent calls a subagent as its tool, the judge receives:

  • User's original question
  • Which subagent is about to be calle being called
  • Prompt which is about be passed to tool

Judge validates it and returns either VALID or GUIDE with some guidance what to do, such as

GUIDE: use the cloudtrail instead, because the user asked about cloudtrail rows
Enter fullscreen mode Exit fullscreen mode

The plugin then returns corrective feedback to the supervisor, which supervisor knows what to do with - either pass data to subagent or correct:

if verdict.upper().startswith("GUIDE"):
    reason = verdict.split(":", 1)[1].strip()
    return Guide(reason=reason)
else:
    return Proceed(reason=f"Routing validated for {tool_name}")
Enter fullscreen mode Exit fullscreen mode

 

The response validation

The second time the judge runs is after the supervisor generates the final response. It compares subagent result vs supervisor agent response.

It actually checks if supervisor is:

  • Skipping the rows or summarizing too much - Subagent returned 17 rows, supervisor showed 9 rows.
  • Fabricating results - Supervisor mention parameters which are not present in any subagent result.

Yes, that's AI checking AI

agentcore deploy


Conclusion

During the building and testing this project, here are some facts I learned:

  1. Do not rely only on prompt - just because LLM have one, doesn't mean it will follow it for 100% all the time.

  2. Use deterministic hooks where possible - even if the code looks big and ungly with huge lists of values, code is a code and once it's written, it's followed.

  3. If the check needs a judgement, use it - LLM as judge is your friend.

    agentcore deploy


What's next

This article covered antihallucination patterns of this project.

In the rest of the articles in these series I cover:


Additional reading

Multi-Agent AI Production Requirements Beyond the Demo

Writing System Prompts That Actually Work: The RISEN Framework for AI Agents

Agents as Tools with Strands Agents SDK

The Agent Buddy System: When Prompt Engineering Isn't Enough

5 Techniques to Stop AI Agent Hallucinations in Production

AI Agent Guardrails: Rules That LLMs Cannot Bypass

Runtime Guardrails for AI Agents — Steer, Don't Block

How Steering Hooks Achieved 100% Agent Accuracy Where Prompts and Workflows Failed

Top comments (0)