<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Shulamit H.</title>
    <description>The latest articles on DEV Community by Shulamit H. (@shulamit_halberstadt).</description>
    <link>https://dev.to/shulamit_halberstadt</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3662815%2Fea36b15b-824b-4a50-a50b-e687b85d9531.png</url>
      <title>DEV Community: Shulamit H.</title>
      <link>https://dev.to/shulamit_halberstadt</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/shulamit_halberstadt"/>
    <language>en</language>
    <item>
      <title>Is the User Actually Looking at the Screen? Building Real-Time On/Off Screen Detection</title>
      <dc:creator>Shulamit H.</dc:creator>
      <pubDate>Sun, 18 Jan 2026 11:08:32 +0000</pubDate>
      <link>https://dev.to/shulamit_halberstadt/is-the-user-actually-looking-at-the-screen-building-real-time-onoff-screen-detection-4ki1</link>
      <guid>https://dev.to/shulamit_halberstadt/is-the-user-actually-looking-at-the-screen-building-real-time-onoff-screen-detection-4ki1</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftgauwvdhraxg4srh7nn3.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftgauwvdhraxg4srh7nn3.gif" alt=" " width="400" height="228"&gt;&lt;/a&gt;Imagine you're building a system that tracks whether a user is looking&lt;br&gt;
at a screen. Not whether they are focused in a psychological sense, but&lt;br&gt;
something much simpler --- and surprisingly tricky:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Are they even looking at the screen right now?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At first glance, this sounds trivial. But real users don't behave&lt;br&gt;
cleanly. They glance sideways, blink, tilt their heads, or stare forward&lt;br&gt;
while their eyes drift elsewhere. Very quickly, "On-Screen" turns into a&lt;br&gt;
collection of edge cases.&lt;/p&gt;

&lt;p&gt;In this post, I describe the practical process I went through while&lt;br&gt;
building a real-time computer vision system that classifies each video&lt;br&gt;
frame as &lt;strong&gt;On-Screen&lt;/strong&gt; or &lt;strong&gt;Off-Screen&lt;/strong&gt;. No mind reading --- just&lt;br&gt;
concrete, explainable decisions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why On / Off Screen Detection Is Useful
&lt;/h2&gt;

&lt;p&gt;On/Off Screen detection is rarely the final goal. It is usually a&lt;br&gt;
&lt;strong&gt;supporting signal&lt;/strong&gt; for other systems, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Estimating concentration or engagement&lt;/li&gt;
&lt;li&gt;  Monitoring behavior during online exams&lt;/li&gt;
&lt;li&gt;  Measuring effective screen time&lt;/li&gt;
&lt;li&gt;  Filtering out irrelevant frames (user leaves the chair)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without a stable On/Off Screen layer, any downstream metric quickly&lt;br&gt;
becomes noisy and unreliable.&lt;/p&gt;

&lt;p&gt;The goal was deliberately narrow:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Classify each frame as On-Screen or Off-Screen in a stable,&lt;br&gt;
explainable, and tunable way.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Early Attempts: Starting with the Eyes
&lt;/h2&gt;

&lt;p&gt;The first approach focused &lt;strong&gt;only on the eyes&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I tracked: - Iris position&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Normalized iris offset within the eye&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This worked reasonably well in calm, frontal cases. However, it failed&lt;br&gt;
in many realistic situations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Partial face visibility&lt;/li&gt;
&lt;li&gt;  Slight head rotation while the eyes appeared centered&lt;/li&gt;
&lt;li&gt;  Increased noise when facial landmarks jittered&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Eye-only tracking turned out to be &lt;strong&gt;too sensitive&lt;/strong&gt; to small errors and&lt;br&gt;
visual noise.&lt;/p&gt;




&lt;h2&gt;
  
  
  Second Attempt: Adding Head Pose
&lt;/h2&gt;

&lt;p&gt;To stabilize the system, I introduced &lt;strong&gt;head pose estimation&lt;/strong&gt; based on&lt;br&gt;
facial landmarks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Yaw&lt;/strong&gt; (left / right rotation)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Pitch&lt;/strong&gt; (up / down rotation)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Head pose provides a coarse estimate of where the face is oriented,&lt;br&gt;
which helps correct many of the failures seen with eye-only tracking.&lt;/p&gt;

&lt;p&gt;However, it was still not enough: - The head may face the screen while&lt;br&gt;
the eyes clearly look away&lt;/p&gt;

&lt;p&gt;At this stage, it became clear that head pose is necessary but not&lt;br&gt;
sufficient.&lt;br&gt;
&lt;em&gt;(A short mathematical intuition behind Pitch and Yaw appears later in&lt;br&gt;
this post.)&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Yaw, Pitch, and Roll --- From Aviation
&lt;/h2&gt;

&lt;p&gt;The terms &lt;strong&gt;Yaw&lt;/strong&gt;, &lt;strong&gt;Pitch&lt;/strong&gt;, and &lt;strong&gt;Roll&lt;/strong&gt; originate from aviation and&lt;br&gt;
describe orientation in 3D space:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Yaw&lt;/strong&gt; --- rotation left or right around the vertical axis&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Pitch&lt;/strong&gt; --- rotation up or down around the horizontal axis&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Roll&lt;/strong&gt; --- tilting sideways around the forward axis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is exactly how aircraft orientation is described --- relative to a&lt;br&gt;
fixed coordinate system.&lt;/p&gt;

&lt;p&gt;👇 The animation below illustrates these three rotations on a rigid&lt;br&gt;
body:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2FShu6136713%2FAttention_Monitor%2Fblob%2Fmain%2Fyaw%2520pitch%2520roll.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2FShu6136713%2FAttention_Monitor%2Fblob%2Fmain%2Fyaw%2520pitch%2520roll.gif" alt="Yaw Pitch Roll animation" width="" height=""&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Yaw, Pitch, and Roll visualized on a rigid body — the same geometry applies to head pose estimation.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;For head pose estimation, &lt;strong&gt;Yaw&lt;/strong&gt; and &lt;strong&gt;Pitch&lt;/strong&gt; are the most&lt;br&gt;
informative. Roll mainly reflects head tilt and is less useful for&lt;br&gt;
determining whether the screen is within view.&lt;/p&gt;




&lt;h2&gt;
  
  
  Mathematical Intuition: Head Pose as Angles Between Vectors
&lt;/h2&gt;

&lt;p&gt;Head pose estimation is fundamentally about &lt;strong&gt;3D orientation&lt;/strong&gt;, not&lt;br&gt;
position.&lt;/p&gt;

&lt;p&gt;I define a 3D coordinate system centered on the head:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;X-axis&lt;/strong&gt;: left ↔ right&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Y-axis&lt;/strong&gt;: up ↔ down&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Z-axis&lt;/strong&gt;: forward ↔ backward (toward the screen)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using a small set of stable facial landmarks (eyes, nose tip, chin), I&lt;br&gt;
estimate a &lt;strong&gt;face-direction vector&lt;/strong&gt; that represents where the face is&lt;br&gt;
pointing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitch (Up / Down)
&lt;/h3&gt;

&lt;p&gt;To compute &lt;strong&gt;Pitch&lt;/strong&gt;: - Project the face-direction vector onto the&lt;br&gt;
&lt;strong&gt;Y--Z plane&lt;/strong&gt; - Measure the angle between this projection and the&lt;br&gt;
Z-axis&lt;/p&gt;

&lt;p&gt;Intuitively: - Looking up → positive pitch\&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Looking down → negative pitch&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Large pitch values indicate the screen is unlikely to be in the user's&lt;br&gt;
field of view.&lt;/p&gt;

&lt;h3&gt;
  
  
  Yaw (Left / Right)
&lt;/h3&gt;

&lt;p&gt;To compute &lt;strong&gt;Yaw&lt;/strong&gt;: - Project the face-direction vector onto the &lt;strong&gt;X--Z&lt;br&gt;
plane&lt;/strong&gt; - Measure the angle between this projection and the Z-axis&lt;/p&gt;

&lt;p&gt;Turning the head left or right increases yaw and gradually moves the&lt;br&gt;
screen out of view.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Angles Matter
&lt;/h3&gt;

&lt;p&gt;This formulation focuses on &lt;strong&gt;angles between vectors&lt;/strong&gt;, not absolute&lt;br&gt;
positions.&lt;/p&gt;

&lt;p&gt;As a result, it is: - Scale-invariant&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Robust to camera distance&lt;/li&gt;
&lt;li&gt;Independent of face size&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice: - &lt;strong&gt;Large angles&lt;/strong&gt; → strong evidence of Off-Screen&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Small angles&lt;/strong&gt; → ambiguous and must be combined with gaze&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Eye Closure: Prolonged Blinks
&lt;/h2&gt;

&lt;p&gt;Short blinks are natural. Prolonged eye closure, measured using the&lt;br&gt;
&lt;strong&gt;Eye Aspect Ratio (EAR)&lt;/strong&gt;, is treated as Off-Screen even if head&lt;br&gt;
orientation appears valid.&lt;/p&gt;




&lt;h2&gt;
  
  
  Handling Instability
&lt;/h2&gt;

&lt;p&gt;Frame-by-frame classification causes rapid flickering: On → Off → On&lt;br&gt;
within fractions of a second.&lt;/p&gt;

&lt;p&gt;To address this, I applied &lt;strong&gt;temporal smoothing&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  A sliding window of 5 frames&lt;/li&gt;
&lt;li&gt;  Final label determined by majority vote&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This introduces a small delay but dramatically improves stability.&lt;/p&gt;




&lt;h2&gt;
  
  
  Decision Hierarchy
&lt;/h2&gt;

&lt;p&gt;Not all signals are equally important. Through experimentation, I&lt;br&gt;
arrived at the following hierarchy:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Strong Off conditions&lt;/strong&gt; (prolonged eye closure, extreme head
pose)\&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Moderate Pitch / Yaw deviations&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Subtle gaze deviations&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If a strong Off condition is met, the frame is classified as Off-Screen&lt;br&gt;
even if other signals appear valid.&lt;/p&gt;




&lt;h2&gt;
  
  
  From Frames to Metrics
&lt;/h2&gt;

&lt;p&gt;Once each frame is classified:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  On-Screen and Off-Screen frames are counted&lt;/li&gt;
&lt;li&gt;  Rates such as On-Screen Percentage or Gaze Aversion Rate are
computed&lt;/li&gt;
&lt;li&gt;  Session-level summaries can be derived&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All higher-level metrics rely on this foundational layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  Start with the simplest signal&lt;/li&gt;
&lt;li&gt;  Expect it to fail&lt;/li&gt;
&lt;li&gt;  Add complementary signals incrementally&lt;/li&gt;
&lt;li&gt;  Stabilize before optimizing accuracy&lt;/li&gt;
&lt;li&gt;  Avoid guessing thresholds without real data&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Closing Thoughts
&lt;/h2&gt;

&lt;p&gt;Determining whether someone is looking at a screen sounds simple ---&lt;br&gt;
until you actually build it.&lt;/p&gt;

&lt;p&gt;By: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Defining a narrow, operational question&lt;/li&gt;
&lt;li&gt;Combining multiple weak signals&lt;/li&gt;
&lt;li&gt;Adding deliberate stability measures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is possible to build a reliable foundation for attention-related&lt;br&gt;
systems.&lt;/p&gt;

&lt;p&gt;Most importantly, this approach reflects the real process: &lt;strong&gt;observe,&lt;br&gt;
fail, refine&lt;/strong&gt; --- rather than assuming a perfect solution from the&lt;br&gt;
start.&lt;/p&gt;

</description>
      <category>computervision</category>
      <category>python</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
