Micro-expressions happen in 1/25th of a second — here's how we catch them

#ai #machinelearning #biometrics #deeptech

Micro-expressions are involuntary facial movements that last between 1/25th to 1/5th of a second. Traditional computer vision models, even state-of-the-art CNNs, blur them out. Why? Because they’re trained on static images or slow video streams where these flickers get averaged into noise. At EmoPulse, we treat temporal resolution as a first-class citizen. Our pipeline starts with a 200 FPS edge capture stack — not for storage, but for real-time optical flow decomposition. We use a lightweight 3D-CNN (based on Tiny-I3D) that operates on micro-video clips of 16 frames at 200 FPS, giving us ~80ms temporal windows. This isn’t about more data — it’s about meaningful data. The model doesn’t classify emotions. It detects muscle activation patterns (AU25 + AU04, etc.) at 5ms resolution, then applies a temporal attention mask to isolate transient peaks.

# Pseudo: Temporal attention over optical flow stacks
def forward(self, flow_stack):  # shape: (B, C, T=16, H, W)
    features = self.i3d_backbone(flow_stack)
    attention_weights = self.temporal_attention(features)  # learned peak sensitivity
    attended = features * attention_weights
    au_logits = self.au_head(attended.mean(dim=[3,4]))
    return au_logits

We’ve found that even with 99% dropout on spatial features, the model learns to ignore identity and focus on dynamics. The key insight? Micro-expressions aren’t rare — they’re overlooked. Standard datasets like CASME II and SAMM are gold, but they’re tiny. So we synthetic-augment using GAN-generated micro-sequence perturbations (think: simulating orbicularis oculi twitch on neutral-to-suppressed smile). This pushes F1 on AU detection from 0.68 to 0.81 in real-world conditions. We run this all on-device using TensorRT-optimized engines, no cloud roundtrip. Because if it takes 200ms to react, you’ve already missed the 40ms truth.

If you're working with high-speed behavioral signals — what’s your threshold for "real-time," and are you still using 30 FPS as input?

Learn more about our approach at emo.city.