Kriya-Egocentric-100K: Action100M-style Annotations for Real-World Labor Videos

#ai #dataset #computervision #egocentric

Just pushed a new preview dataset to Hugging Face: Kriya-Egocentric-100K.

It contains Action100M-compatible hierarchical action annotations for a small 5-video subset of Build AI’s Egocentric-100K — real first-person footage captured with a monocular head-mounted fisheye camera during manual labor tasks.

What’s inside?

One JSON file per video (f001-w001-0001.json etc.)
Full Action100M-style tree: root → sub-segments with precise start/end timestamps
LLM-generated natural language captions + structured GPT outputs (brief/detailed summaries, action labels, actors)
Everything generated 100 % automatically via the Kriya Full Automated Action Annotation API (early preview)

The videos themselves are not hosted here (you’ll need to pull them directly from Build AI under their license), but the annotations are MIT and drop-in compatible with the Kriya Visualizer — just load the video + matching JSON and explore the timeline instantly.

Why this matters

After the EPIC-KITCHENS preview, this is the next step toward scaling automatic annotation to more diverse egocentric domains. Manual labor footage brings new challenges (occlusions, tool use, unstructured environments) — and the results already look strong for downstream tasks like video world models, VLMs, VLA policies, and embodied robotics.

Visualizer demo, full pipeline details, and the previous Kriya-EPIC-KITCHENS release are all in the original Kriya tools blog post.

This is still an early preview — feedback and collaboration super welcome! Drop a comment or DM if you want to try the API on your own footage or discuss scaling plans.

Excited to keep pushing the boundary of automatic video understanding .