Luca Herlein

Posted on Feb 3

Building Intelligent Retail Signage with BrightSign's NPU: A Deep Dive into Real-Time Gaze Detection

#brightsign #ai #npu #techdemo

Every Series 5 BrightSign player (except the XC line) ships with an onboard Neural Processing Unit. Most people don't know this. Even fewer have tried to use it. We think that should change.

The NPU inside these players can run inference at up to 6 Trillion Operations Per Second (TOPS), enough to easily process real time computer vision workloads. To demonstrate what's possible, we built a showcase application: a smart retail signage system that detects when people are looking at the display, adapts content in real-time, and unlocks engagement metrics that most content managers and marketing departments would love to have.

This post walks through what we built, how it works, and what it means for developers building on BrightSign hardware.

Person Detection in Action	Demo Analytics Dashboard

The Hardware Story

BrightSign's Series 5 and later players integrate Rockchip NPUs directly into the SoC. The hardware supports multiple precision formats (INT4, INT8, INT16, FP16), giving developers flexibility to balance speed and accuracy depending on the use case.

For our gaze detection demo, we're using the RetinaFace model. RetinaFace was trained on the WIDER FACE dataset and excels at detecting faces across varying scales, lighting conditions, and angles—exactly what you need for a retail environment where people approach displays from all directions. The model runs through our open-source gaze detection extension, compiled specifically for the BrightSign NPU using Rockchip's RKNN toolchain.

The result: 30+ FPS inference from a USB camera feed, running entirely on the player.

What the Demo Does

Find and try the demo here!

The application implements a state machine that responds to audience attention:

IDLE: The display shows a welcome screen. No one's watching, so we keep it simple.
DETECTED: Faces appear in frame, but no one's looking directly at the camera. This triggers attention-grabbing content, think movement, color, visual hooks designed to earn a glance.
SPEAKING: Someone's actually looking at the display. This is the money state. The system switches to primary content and starts tracking engagement duration.

Transitions between states use persistence logic to prevent flickering. A single frame of "no gaze detected" doesn't drop you out of SPEAKING mode, we require several consecutive frames without engagement before transitioning. This smooths out the noise inherent in any real-world camera setup.

The Extension Architecture

Our gaze detection extension runs as a BrightSign extension package. It captures frames from the video source (USB camera, RTSP stream, or even video files from the SD card) and pushes them through the NPU for inference. Results come out as UDP packets on localhost:5002:

{
  "faces_attending": 1,
  "faces_in_frame_total": 3,
  "timestamp": 1756854339
}

Your application subscribes to this UDP stream and does whatever it wants with the data. The extension handles the ML heavy lifting; you handle the business logic. This separation means you can build sophisticated signage applications in TypeScript or whatever you're comfortable with, without touching any neural network code.

The extension is configurable through BrightSign registry keys. You can override the video input source, adjust UDP publish rates, and toggle auto-start behavior. RTSP support means you're not limited to directly-connected cameras - you can pull feeds from network cameras or even process pre-recorded video for testing.

Metrics That Matter

Here's where this gets interesting for retail deployments. The demo application aggregates inference results into Prometheus-compatible metrics:

Traffic metrics: Total people detected, current visitor count, detection events over time.
Engagement metrics: Gaze events, attention duration histograms, session tracking with dwell time.
Content attribution: Which content was playing when someone started paying attention? Which idle screen variant converts better to engagement? The system tracks this automatically.

We ship a Grafana dashboard with the demo that visualizes these metrics in real-time. You can see engagement funnels, compare content performance across A/B variants, and monitor system health—all scraped from the player's /metrics endpoint.

For content managers, this is actionable data. You can finally answer questions like "does the purple idle screen attract more attention than the green one?" or "how long do people actually watch our product demo before walking away?"

Why Edge Inference Matters

Running gaze detection on-device instead of in the cloud provides some meaningful benefits.

Latency: Sub-second response times are table stakes for interactive signage. Waiting for a round-trip to a cloud server kills the illusion that the display is responding to you.
Bandwidth: Streaming video to the cloud for analysis requires significant upstream bandwidth. Processing locally means you only send the metrics you care about. Tiny JSON payloads instead of continuous video streams resulting in shockingly low bandwidth consumption.
Privacy: The extension doesn't store facial images or biometric data. It maintains a vectorized representation of detected faces for a configurable time window to track unique visitors, but nothing that could directly identify individuals. All processing happens on the device; nothing leaves the local network unless you explicitly send it somewhere.
Reliability: Edge inference works without internet connectivity. Your smart signage keeps working even when the network goes down.

For deployments in privacy-sensitive environments or regions with strict data handling regulations, keeping inference on-device simplifies compliance significantly.

Getting Started

The gaze detection extension is available at github.com/brightdevelopers/brightsign-npu-gaze-extension. The demo application, including the metrics system and Grafana dashboard, is packaged with the repository.

You'll need:

A Series 5 BrightSign player (XT5, XD5, HD5, LS5, or HS5, not the XC5 line)
A USB camera (or RTSP-capable network camera)
Node.js < 18.18.2 for building the demo app
A companion computer to run Prometheus/Grafana

Clone the repo, follow the build instructions, copy the output to an SD card, and boot the player. The extension auto-starts and begins publishing gaze data immediately.

The whole point of this demo is to show what's possible right now, today, with hardware you might already have. It's not a product, it's a starting point. Take it, break it apart, build something better.

Building Your Own Extensions

Both the gaze detection extension and a general-purpose YOLOX object detection extension are open source and available on GitHub. You can use them as-is or as starting points for custom applications.

Fair Warning: Building custom NPU extensions isn't trivial. You're essentially porting or writing AI applications for an embedded edge device. Python support is limited, and many or all of your app may need to be ported to C++.

Here are resources to get started:

We encourage you to reach out to BrightSign for support if needed. See the [Contact Us] section below.

What's Next

The NPU on BrightSign players opens up categories of applications that weren't practical before on dedicated signage hardware. We're exploring use cases beyond computer vision, things like intelligent log analysis, predictive error handling, and automated content optimization.

We're still in the ideation phase for many of these. If you're building on BrightSign hardware and have ideas for NPU-accelerated features that would make your life easier, we want to hear about it.

We're curious what you'll come up with.

Contact Us

Find the proper channels for contacting us here.

If you have a pre-existing contact at BrightSign, feel free to reach out to them with questions about the NPU.

DEV Community