DEV Community

Miko
Miko

Posted on

I built an open-source tool that turns workcell incidents into evidence bundles and regression tests

I built MetriPlane v0.2.0, an open-source physical-observability tool for bounded workcells.

The short version:

A missing tool in a workcell becomes a physical event log, a Cell Truth Report, a verified evidence bundle, and a generated regression test.

3-minute demo:

Repository:

https://github.com/Miko997/metriplane

Zenodo DOI:

https://doi.org/10.5281/zenodo.20736619

GitHub release:

https://github.com/Miko997/metriplane/releases/tag/v0.2.0

The problem

Robotics and manufacturing systems often have cameras, logs, dashboards, simulators, and internal event streams.

But when something physically goes wrong, there is still a hard question:

What actually happened?
What proves it?
Can we replay it?
Can we turn the incident into a repeatable software check?
Enter fullscreen mode Exit fullscreen mode

MetriPlane focuses on that narrow layer: replayable physical evidence for bounded workcells.

It is not trying to be a robot controller, safety system, MES, or full digital-twin platform. It is an observe-only evidence layer around replayed or calibrated physical state.

What the v0.2.0 demo shows

The demo uses a camera-free assembly-cell replay where a required torque driver is missing long enough to delay a process step.

MetriPlane turns that replayed state into:

missing torque driver
→ physical event log
→ Cell Truth Report
→ incident evidence bundle
→ local bundle verification
→ generated regression test
→ regression PASS
Enter fullscreen mode Exit fullscreen mode

The important part is not just detecting an event.

The important part is that the event becomes a verifiable artifact that can be reviewed, replayed, and reused as a software test.

What MetriPlane generates

MetriPlane v0.2.0 produces:

  • physical event logs
  • Cell Truth Reports
  • incident evidence bundles
  • manifest and checksums
  • local bundle verification
  • generated regression tests
  • static review/dashboard artifacts

The demo evidence chain is:

6 physical events
1 incident
35.0 second missing-tool delay
INC-0001 evidence bundle
bundle verify: pass=true
generated regression test: PASS
Enter fullscreen mode Exit fullscreen mode

This is the core idea I am trying to validate:

replayed workcell state
→ physical event
→ evidence bundle
→ verification
→ generated regression test
Enter fullscreen mode Exit fullscreen mode

Camera-free reproduction

I am preparing a SoftwareX research-software paper while finishing my MSc thesis, and I am looking for external technical reproduction feedback.

Public reproduction issue:

https://github.com/Miko997/metriplane/issues/6

A useful reproduction comment would include:

OS:
Python version:

doctor: pass/fail
deterministic replay: pass/fail
Atlas run: pass/fail
bundle verify: pass/fail
regression test: pass/fail

Technical relevance:
2–5 sentences

Main limitation:
1–2 sentences
Enter fullscreen mode Exit fullscreen mode

Short feedback form

A full reproduction is ideal, but I realized it may be too time-consuming for many people.

If you only have 2–4 minutes, this short technical feedback form is also useful:

https://docs.google.com/forms/d/e/1FAIpQLSfnMZ4b3fSVVtwA89hZt3A09gf85eLfhW00FDD76TGRLNpirQ/viewform

The main questions are:

  1. Does the evidence-bundle / regression-test loop make sense?
  2. Is it relevant to robotics, simulation, digital twins, manufacturing, physical AI, or research software?
  3. What is the main limitation or next validation step?

Critical feedback is more useful than praise.

Main reproduction commands

git clone https://github.com/Miko997/metriplane.git
cd metriplane
git checkout v0.2.0

python3 -m venv .venv
source .venv/bin/activate
pip install -e .

python -m metriplane.cli doctor

PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 python -m pytest -q

./tools/mp.sh deterministic-replay

metriplane atlas validate-pack configs/domain_packs/assembly_cell

metriplane atlas run \
  --session-jsonl datasets/demo/atlas/assembly_cell_missing_tool.jsonl \
  --pack configs/domain_packs/assembly_cell \
  --out runs/atlas/assembly_cell_missing_tool

metriplane atlas bundle verify \
  runs/atlas/assembly_cell_missing_tool/evidence_bundles/INC-0001.zip

metriplane atlas test \
  runs/atlas/assembly_cell_missing_tool/regression_tests/INC-0001.yaml
Enter fullscreen mode Exit fullscreen mode

Expected high-level result:

doctor: pass
pytest: 580 passed
deterministic replay: pass=true, 0.0 cm position difference, 0 event mismatches
Atlas run: events=6, incidents=1
bundle verify: pass=true
regression test: PASS missing_tool_caused_delay_INC-0001
Enter fullscreen mode Exit fullscreen mode

Scope

MetriPlane v0.2.0 is intentionally bounded.

It is:

  • observe-only
  • local-first
  • camera/replay-oriented
  • planar/tagged-asset scoped
  • research-software oriented

It does not claim:

  • robot or machine control
  • safety certification
  • quality-release approval
  • people recognition
  • marker-free tracking
  • full 3D reconstruction
  • production-factory validation
  • factory-wide deployment readiness

Feedback wanted

I am especially interested in:

  1. Does the camera-free reproduction path work on another machine?
  2. Is the incident → evidence bundle → regression test loop useful?
  3. Are the observe-only boundaries clear enough?
  4. What should be validated next before this is useful beyond a deterministic demo fixture?
  5. Would this be useful around robotics, simulation, digital-twin, or manufacturing review workflows?

A full reproduction comment on GitHub Issue #6 is the strongest feedback.

The short form is also useful if you only have a few minutes:

https://docs.google.com/forms/d/e/1FAIpQLSfnMZ4b3fSVVtwA89hZt3A09gf85eLfhW00FDD76TGRLNpirQ/viewform

Critical feedback is preferred.

Top comments (2)

Collapse
 
armorer_labs profile image
Armorer Labs

The evidence-bundle framing is very strong. A lot of systems stop at "we detected an event," but the useful artifact is the package that lets someone understand, replay, and prevent the class of failure.

I am seeing the same pattern with AI-agent operations: logs are useful, but the durable thing is a receipt/evidence bundle with the action, context, decision, result, and follow-up test. Different domain, same need for inspectable ground truth.

Collapse
 
miko997 profile image
Miko

Thanks! this is exactly the distinction I’m trying to validate.

Detection alone is usually too ephemeral: “something happened” is useful, but it does not give another person enough context to inspect, reproduce, or convert the event into a prevention/checking mechanism. The evidence bundle is meant to be the durable unit: incident + timeline + context + report + checksums + replay command + generated regression material.

I like your AI-agent operations analogy. Different physical/digital boundary, but the same basic problem appears: logs are raw material, while the useful artifact is closer to an inspectable receipt that captures action, context, result, and follow-up test.

For MetriPlane v0.2.0 I’m keeping the claim intentionally narrow: observe-only, bounded workcell replay, and evidence suitable for review/regression rather than control or safety certification. But the broader pattern seems general: important events should become reviewable artifacts, not just entries in a log stream.

I’d be interested in your view on one thing: what fields would you consider essential in a minimal evidence bundle for AI-agent or robotics operations?