The Detection vs Perception Problem
A motion sensor tells you: "someone is there."
A perception system tells you: "this person is standing still, looking at the piece, and leaning closer."
The first tells you to turn the lights on. The second tells you to deepen the color, slow the animation, and bring the nearest element toward them.
Most maker installations use the first type of sensor (PIR, ultrasonic, simple light sensor) and expect it to create the second type of experience.
It cannot.
PIR sensors cannot tell you direction of movement, speed, or whether someone is actually engaging with the piece or just walking past. Ultrasonic range finders give you distance but not intent. Touch sensors give you contact but not pressure or behavior.
The result: every "interactive" maker installation behaves the same way. Something approaches, something lights up. The end.

Via YouTube — ESP32 battery-powered sensor project showing proximity detection
Three Layers of Interactive Perception
Layer 1: Presence Detection (What Most Makers Do)
Binary: is someone there or not. Uses PIR, ultrasonic, photocell.
Behavior: turn on when detected, turn off after timeout.
Why it fails: no context, no behavior reading, no learning.
Layer 2: Spatial Awareness (Getting Better)
Tracks position or movement direction. Uses multiple sensors, camera + CV, or time-of-flight sensors like VL53L0X.
Behavior: light follows a person's position, speed of movement affects animation speed.
This is where most professional interactive installations live. It feels responsive but not intelligent.
Layer 3: Behavioral Perception (The Missing Layer)
Reads body language: is the person engaged (leaning in, standing still) or just passing through (walking briskly, looking at phone)?
Uses: thermal imaging (AMG8833), millimeter-wave radar (60GHz), or computer vision on low-power hardware (Raspberry Pi CM + Coral Edge TPU).
Behavior: a person leaning toward the installation gets a deeper response than one walking past. The piece "notices" attention.
This is what makes an installation feel alive instead of just active.
The Practical Maker Approach
You do not need professional-grade sensors to add Layer 3 perception. The gap between boring and compelling is smaller than you think.
Start with multiple ultrasonic sensors. One at face level, one at chest level, one lower for children or people in wheelchairs. The combination of distance readings at multiple heights tells you whether someone is leaning in or standing flat.
Add a thermal sensor. The AMG8833 IR thermal array (8×8 grid) costs $20 and tells you not just where a person is, but whether they are facing the installation. A person facing you has a different thermal signature than one facing away.
Use time thresholds correctly. Someone standing in one spot for more than 3 seconds is not passing through. Someone moving continuously for more than 10 seconds is not engaging. A simple state machine using time as a dimension changes everything.
// Simple engagement state machine
enum EngagementState { IDLE, APPROACHING, ENGAGED, LEAVING };
void updateState(float distance, float thermalFace, unsigned long timeInZone) {
if (distance > 200) currentState = IDLE;
else if (timeInZone < 3000) currentState = APPROACHING;
else if (thermalFace > 0.7 && timeInZone > 3000) currentState = ENGAGED;
else if (currentState == ENGAGED && distance > 50) currentState = LEAVING;
}
The code is not complex. The insight is: behavior is a time-series problem, not a single-readout problem.
FAQ
Q: I want to add computer vision to my installation but Raspberry Pi is too slow/expensive for my use case. What are the alternatives?
A: Edge TPU (Coral USB Accelerator, $60) runs MobileNet SSD at 30fps + full resolution, turning a Raspberry Pi 4 into a capable vision processor. Alternatively, the OpenMV H7 camera module ($65) has built-in machine vision primitives optimized for maker projects — face detection, blob tracking, and AprilTag pose estimation run at 80fps with no external processor. For pure gesture recognition without identification, the MM传感器 (Paj7620) recognizes 9 gestures (wave, rotate, approach) for $3 via I2C.
Q: Multiple sensors mean more wiring and more complex code. How do I manage this on a single ESP32?
A: Use an I2C sensor bus. The VL53L0X, AMG8833, and ADS1115 (for analog sensors) all share the same two-wire I2C bus. One ESP32 can manage 8+ sensors on a single bus with different addresses. Structure your code as a sensor manager class that reads each sensor in a round-robin loop, maintains a rolling average, and exposes a unified state object to your animation logic. Keep the animation loop clean — sensor processing happens in a separate loop at 10Hz, animation runs at 60fps.
The Next Step
Before buying new sensors, watch one person interact with your current installation for sixty seconds.
Note every moment they change behavior (speed up, slow down, lean, look away, look back). Map those moments to what your sensors were reading.
You will find that engagement has a pattern. And that pattern is recoverable with the sensors you already have — if you change how you use them.
Product recommendations for perception-enhanced interactive installations:
AMG8833 IR Thermal Array Sensor — 8×8 IR grid that detects body heat and direction. Adds Layer 3 perception without camera privacy concerns. (Amazon)
VL53L0X Time-of-Flight Sensor (3-Pack) — Precise distance sensing at multiple heights. Mount one at face level, one at chest, one lower. (Amazon)
Coral USB Edge TPU Accelerator — Adds real-time CV inference to any Linux board. MobileNet SSD at 30fps. (Amazon)
I earn from qualifying purchases.
Article #004, 2026-04-18. Content Farm pipeline, Run #004.
Top comments (0)