Shridhar Shah

Posted on Jun 27

I Built an AI Agent That Gets Curious On Its Own

#ai #llm #machinelearning #python

Active inference: curiosity emerges for free from minimizing surprise — 48% vs 100% on a foraging task.

TL;DR: Most AI agents chase rewards — they pick whatever action scores the most points. I wanted to see what happens if you build one that just tries not to be surprised. Something neat happened — the agent became curious without being told to. It goes looking for information before acting, and that takes it from 48% to 100% on a simple task.

Two different ways to make decisions

Most AI agents are "reward chasers." Give them points for doing well, and they'll pick whatever action they expect to score highest. Simple and effective.

There's another idea from brain science: instead of chasing points, try to avoid being surprised — act so the world matches what you expected. It sounds almost too simple, but it leads to a surprising bonus: when you're trying not to be surprised, going and finding out what you don't know becomes valuable all by itself. In other words, curiosity isn't something you have to bolt on. It comes for free.

This is called active inference, and in 2026 it jumped from neuroscience into AI as a serious approach (here's a 2026 paper). Here's the smallest demo that makes it click.

The 10-second version

The task: a reward is hidden behind either the LEFT door or the RIGHT door (50/50). There's also a hint you can check that tells you which door — if you bother to look.

	❌ Reward-chaser	✅ Curious agent
What it cares about	getting the reward, right now	getting the reward + not being unsure
What it does	guesses a door	checks the hint first, then opens the right door
Success (400 tries)	48%	100%

Nobody told the second agent "go check the hint." It did it on its own, because being unsure bothered it.

How it works

Before acting, the agent scores each option on two things:

Does this get me closer to the reward?
Does this make me less unsure about what's going on?

value_of_checking_the_hint = how_unsure_am_i    # high when it's a total coin-flip
value_of_just_guessing     = chance_of_being_right  # only ~50% on a blind guess

if value_of_checking_the_hint > value_of_just_guessing:
    check_the_hint()     # this is where curiosity shows up
open(best_door)          # now actually go get the reward

When it's a total coin-flip, checking the hint is worth a lot (it removes all the doubt), way more than a 50/50 guess. So it looks first. Once it knows, there's nothing left to be unsure about, so it just grabs the reward. The reward-chaser never sees any value in the hint, so it flips a coin forever.

Why this matters

Two reasons engineers should care:

Curiosity for free. A long-standing headache in AI is agents getting stuck doing the same thing, never trying anything new. People hand-tune "exploration bonuses" to force them to explore. This approach gives you curiosity automatically — the agent looks for info exactly when it's unsure, and stops once it isn't.
It handles surprises. An agent built to avoid surprises is built to deal with situations it wasn't trained for. When reality stops matching its expectations, closing that gap becomes its goal — so it keeps adapting instead of breaking.

A reward-chaser asks "what gets me the most points?" A surprise-avoider asks "what don't I understand yet?" — and that second question is what makes it adapt.

Try it

git clone https://github.com/Shridhar-2205/living-software
cd living-software/04-active-inference
python demo.py

Honest note: the full version of this idea has a fair bit of math behind it. I've boiled it down to the one decision that makes it obvious — being unsure has a cost — so you can watch curiosity appear in just a little code.

The rest of the series — Toward Living Software

I built an AI agent that rewrites its own code
Do AI agents need to sleep?
Can an AI agent pass the Sally-Anne test?
An AI agent that gets curious on its own (you're reading it)
How do you trust an AI agent with your money?
Agents that write their own SKILL.md files

Shridhar Shah — Senior Software Engineer on the AI team at Cisco. Part 4 of Toward Living Software.

GitHub · LinkedIn

Background: Karl Friston's "Free Energy Principle" (the brain-science origin); "Active Inference as the Test-Time Scaling Law for Physical AI Agents" (arXiv:2606.22813).

Top comments (6)

mote • Jul 1

The active inference angle is underrated in the agent community. Most people building agent systems are still in the reward-chaser paradigm â define a goal, score actions against it, pick the highest. What you're showing here is that uncertainty itself can be a signal, and acting to reduce it produces exploration behavior without hand-tuned bonuses.

The thing I keep coming back to: this approach needs persistent memory to work at all. The agent has to remember what it was uncertain about across sessions, across reboots, across days. If the uncertainty estimate resets every time the agent starts, you lose the accumulated value of all that exploration. A reward-chaser can get away with a stateless architecture â just maximize the current reward function. A surprise-avoider fundamentally needs to track "what have I already checked?" and "what am I still unsure about?"

Have you thought about what the memory model looks like for this beyond a single session? The foraging task is a great demo, but the interesting question is what happens when the agent accumulates uncertainty across weeks of operation. Does the framework handle that, or does it need an external store?

Shridhar Shah • Jul 3

Great point — the demo only remembers within one session, so a real one would need to store that uncertainty across restarts. Thanks!

Nazar Boyko • Jun 27

The reframing from "what scores the most points" to "what am I still unsure about" is a genuinely nice way to make active inference click without the free energy math. The thing I kept poking at: in the demo the hint is free and never lies, which is the friendliest possible case for curiosity. What happens when checking the hint costs a move, or it's only right 70% of the time? Does the expected free energy still trade off info gain against that cost on its own, or is that where you start hand tuning weights and the "curiosity for free" part gets a little less free?

Shridhar Shah • Jun 27 • Edited

Added a variant for this. One rule — check only if the hint's odds-boost beats its cost — so a free hint always gets checked, but make checking pricey enough and it just guesses instead. You're right it's not totally free though: you still set how reliable the hint is and how much cost matters.

VoltageGPU • Jun 29

Interesting take on active inference and curiosity-driven learning! I've seen similar behaviors in reinforcement learning setups where entropy regularization encourages exploration, but framing it as "minimizing surprise" gives it a nice Bayesian edge. If you ever test this on GPU-accelerated environments, VoltageGPU could help with the inference speed without bloating your setup.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.