Miko

Posted on Jun 21 • Edited on Jun 24

I built an open-source tool that turns workcell incidents into evidence bundles and regression tests

#opensource #python #robotics #testing

I built MetriPlane v0.2.0, an open-source physical-observability tool for bounded workcells.

The short version:

A missing tool in a workcell becomes a physical event log, a Cell Truth Report, a verified evidence bundle, and a generated regression test.

3-minute demo:

Official Metriplane site:

https://www.metriplane.com/

Repository:

https://github.com/Miko997/metriplane

Zenodo DOI:

https://doi.org/10.5281/zenodo.20736619

GitHub release:

https://github.com/Miko997/metriplane/releases/tag/v0.2.0

The problem

Robotics and manufacturing systems often have cameras, logs, dashboards, simulators, and internal event streams.

But when something physically goes wrong, there is still a hard question:

What actually happened?
What proves it?
Can we replay it?
Can we turn the incident into a repeatable software check?

MetriPlane focuses on that narrow layer: replayable physical evidence for bounded workcells.

It is not trying to be a robot controller, safety system, MES, or full digital-twin platform. It is an observe-only evidence layer around replayed or calibrated physical state.

What the v0.2.0 demo shows

The demo uses a camera-free assembly-cell replay where a required torque driver is missing long enough to delay a process step.

MetriPlane turns that replayed state into:

missing torque driver
→ physical event log
→ Cell Truth Report
→ incident evidence bundle
→ local bundle verification
→ generated regression test
→ regression PASS

The important part is not just detecting an event.

The important part is that the event becomes a verifiable artifact that can be reviewed, replayed, and reused as a software test.

What MetriPlane generates

MetriPlane v0.2.0 produces:

physical event logs
Cell Truth Reports
incident evidence bundles
manifest and checksums
local bundle verification
generated regression tests
static review/dashboard artifacts

The demo evidence chain is:

6 physical events
1 incident
35.0 second missing-tool delay
INC-0001 evidence bundle
bundle verify: pass=true
generated regression test: PASS

This is the core idea I am trying to validate:

replayed workcell state
→ physical event
→ evidence bundle
→ verification
→ generated regression test

Camera-free reproduction

I am preparing a SoftwareX research-software paper while finishing my MSc thesis, and I am looking for external technical reproduction feedback.

Public reproduction issue:

https://github.com/Miko997/metriplane/issues/6

A useful reproduction comment would include:

OS:
Python version:

doctor: pass/fail
deterministic replay: pass/fail
Atlas run: pass/fail
bundle verify: pass/fail
regression test: pass/fail

Technical relevance:
2–5 sentences

Main limitation:
1–2 sentences

Short feedback form

A full reproduction is ideal, but I realized it may be too time-consuming for many people.

If you only have 2–4 minutes, this short technical feedback form is also useful:

https://docs.google.com/forms/d/e/1FAIpQLSfnMZ4b3fSVVtwA89hZt3A09gf85eLfhW00FDD76TGRLNpirQ/viewform

The main questions are:

Does the evidence-bundle / regression-test loop make sense?
Is it relevant to robotics, simulation, digital twins, manufacturing, physical AI, or research software?
What is the main limitation or next validation step?

Critical feedback is more useful than praise.

Main reproduction commands

git clone https://github.com/Miko997/metriplane.git
cd metriplane
git checkout v0.2.0

python3 -m venv .venv
source .venv/bin/activate
pip install -e .

python -m metriplane.cli doctor

PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 python -m pytest -q

./tools/mp.sh deterministic-replay

metriplane atlas validate-pack configs/domain_packs/assembly_cell

metriplane atlas run \
  --session-jsonl datasets/demo/atlas/assembly_cell_missing_tool.jsonl \
  --pack configs/domain_packs/assembly_cell \
  --out runs/atlas/assembly_cell_missing_tool

metriplane atlas bundle verify \
  runs/atlas/assembly_cell_missing_tool/evidence_bundles/INC-0001.zip

metriplane atlas test \
  runs/atlas/assembly_cell_missing_tool/regression_tests/INC-0001.yaml

Expected high-level result:

doctor: pass
pytest: 580 passed
deterministic replay: pass=true, 0.0 cm position difference, 0 event mismatches
Atlas run: events=6, incidents=1
bundle verify: pass=true
regression test: PASS missing_tool_caused_delay_INC-0001

Scope

MetriPlane v0.2.0 is intentionally bounded.

It is:

observe-only
local-first
camera/replay-oriented
planar/tagged-asset scoped
research-software oriented

It does not claim:

robot or machine control
safety certification
quality-release approval
people recognition
marker-free tracking
full 3D reconstruction
production-factory validation
factory-wide deployment readiness

Feedback wanted

I am especially interested in:

Does the camera-free reproduction path work on another machine?
Is the incident → evidence bundle → regression test loop useful?
Are the observe-only boundaries clear enough?
What should be validated next before this is useful beyond a deterministic demo fixture?
Would this be useful around robotics, simulation, digital-twin, or manufacturing review workflows?

A full reproduction comment on GitHub Issue #6 is the strongest feedback.

The short form is also useful if you only have a few minutes:

https://docs.google.com/forms/d/e/1FAIpQLSfnMZ4b3fSVVtwA89hZt3A09gf85eLfhW00FDD76TGRLNpirQ/viewform

Critical feedback is preferred.

Top comments (4)

Armorer Labs • Jun 21

The evidence-bundle framing is very strong. A lot of systems stop at "we detected an event," but the useful artifact is the package that lets someone understand, replay, and prevent the class of failure.

I am seeing the same pattern with AI-agent operations: logs are useful, but the durable thing is a receipt/evidence bundle with the action, context, decision, result, and follow-up test. Different domain, same need for inspectable ground truth.

Miko • Jun 21

Thanks! this is exactly the distinction I’m trying to validate.

Detection alone is usually too ephemeral: “something happened” is useful, but it does not give another person enough context to inspect, reproduce, or convert the event into a prevention/checking mechanism. The evidence bundle is meant to be the durable unit: incident + timeline + context + report + checksums + replay command + generated regression material.

I like your AI-agent operations analogy. Different physical/digital boundary, but the same basic problem appears: logs are raw material, while the useful artifact is closer to an inspectable receipt that captures action, context, result, and follow-up test.

For MetriPlane v0.2.0 I’m keeping the claim intentionally narrow: observe-only, bounded workcell replay, and evidence suitable for review/regression rather than control or safety certification. But the broader pattern seems general: important events should become reviewable artifacts, not just entries in a log stream.

I’d be interested in your view on one thing: what fields would you consider essential in a minimal evidence bundle for AI-agent or robotics operations?

Armorer Labs • Jun 22

That validation target feels right. The useful test for an evidence bundle is whether someone who was not in the incident can replay the chain: trigger, affected artifact, decision boundary, attempted fix, and regression case. If any one of those is missing, it becomes a note instead of an operational asset.

Miko • Jun 22

Thanks! this is really useful and exactly the kind of distinction I’m trying to capture.

I’m collecting short external technical feedback in one place for the SoftwareX paper / MSc thesis notes. If you have 2–4 minutes, would you be open to putting the core of your point into this form as well?

docs.google.com/forms/d/e/1FAIpQLS...

Your public comment already helps, so no pressure. The form just makes it easier for me to organize feedback around relevance, limitations, and what a minimal evidence bundle should contain.