Seeing is Believing? Why Your AI Has Trust Issues with Reality?

#ai #beginners #discuss #learning

In this blog we will get started with some basic Reinforcement Learning terminology. We will talk about how a state differs from an observation which is a key aspect in RL world. I’m gonna use “RL” to indicate that I’m referring to Reinforcement Learning in future blogs. Let’s get started.

State vs. Observation in Reinforcement Learning

Ever felt like your senses were playing tricks on you? You mishear a song lyric or you see a shape in the shadows that isn’t really there. We humans navigate the world through an imperfect filter of perception. It turns out, most advanced AIs and robots share the exact same problem.

When we train an AI to act in the world—whether it’s a robot arm, a self-driving car or a character in a video game, we have to come across a deep philosophical question: What is real, and what is just perceived?

Welcome to one of the most fundamental concepts in modern AI: the critical difference between State and Observation.

Meet the State: The unseen ground truth or reality.
Meet the Observation: The Agent’s window or perception to reality.
The Flow of Reality: A Three-Act Play.

Why This Distinction is Everything?

Imagine you’re playing a video game. The game engine knows everything with perfect, god-like clarity. It knows your character’s health is exactly 84.72, and an enemy is located at the precise coordinates (x: 1024.5, y: 512.8). This perfect, objective, all-knowing view of the environment is the State.

The State is the true, hidden condition of the system.

Let’s use a robot arm as an example. When the arm moves, the laws of physics dictate its final position. The environment “knows” that the arm ended up at exactly 10.2 cm forward. This isn’t a guess. It’s the ground truth.

But here’s the catch: the agent—our poor robot arm—is never allowed to see this perfect information. The true state is a secret kept by the environment. So, how does it know what to do next? It has to rely on its senses.

If the State is the objective truth, the Observation is the agent’s subjective, noisy perception of that truth. It’s what the agent gets from its sensors.

Our robot arm might have multiple sensors trying to figure out its position:

A motor encoder reports it moved 10.1 cm.

An overhead camera measures its position as 10.3 cm.

Neither is the perfect truth of 10.2 cm. Why? Welcome to the real world! Sensors suffer from friction, calibration errors, resolution limits, and electrical noise.

The agent never sees the true state. It only gets the observation the environment emits and must act based solely on this messy and imperfect signal.

This is a challenge. The agent is forced to operate from a place of uncertainty, piecing together clues to guess what the true state of the world might be.

So how do these concepts connect? It all happens in a clear, sequential order every time an agent acts.

Action (Intended) ⟶ State (True but Hidden) ⟶ Observation (Noisy Perception)

Let’s break it down:

The Action: The agent decides to do something. Its brain says, “Move the arm forward with 5 units of power.” This is the agent’s one and only moment of direct control.

The State Transition (The Real World Intrudes): The motor command is sent, but the physical world is messy. A bit of friction in a gear, a slight voltage drop, or a slip on the surface means the arm doesn’t move exactly as intended. This is Action Noise. The intended action was clean, but the resulting state is a little unpredictable. The arm lands at its new true state of 10.2 cm.

The Observation (The Sensory Report): Now that the arm is in its new, true position, the environment generates an observation for the agent. The imperfect sensors kick in, introducing Observation Noise. The camera and encoder deliver their slightly-off readings of 10.3 cm and 10.1 cm.

The environment manages both the true state and the observation it sends back. The agent only controls its action and receives the final, noisy observation.

This is the central challenge for building intelligent machines that can function in the real world. Problems where the agent cannot see the true state are called Partially Observable Markov Decision Processes (POMDPs), (we will save this for later) and they are the standard for real-world robotics.

Because an agent only has a foggy window into reality, it must learn to be smarter. It can’t just react to its latest observation. It needs to remember a history of its past observations and actions to build an internal belief or an educated guess about the true, hidden state of the world.

So, the next time you see a Boston Dynamics robot navigating a complex environment, remember what it’s really doing. It’s not just moving its legs. It’s constantly taking in a stream of noisy, partial observations and running a brilliant internal simulation to ask itself, “Given everything I’ve seen, what do I believe is the true state of the world right now?”

And that, in a nutshell, is how you teach a machine to find its way in a world it can never truly see!!!