I built MetriPlane v0.2.0, an open-source physical-observability tool for bounded workcells.
The short version:
A missing tool in a workcell becomes a physical event log, a Cell Truth Report, a verified evidence bundle, and a generated regression test.
3-minute demo:
Repository:
https://github.com/Miko997/metriplane
Zenodo DOI:
https://doi.org/10.5281/zenodo.20736619
GitHub release:
https://github.com/Miko997/metriplane/releases/tag/v0.2.0
The problem
Robotics and manufacturing systems often have cameras, logs, dashboards, simulators, and internal event streams.
But when something physically goes wrong, there is still a hard question:
What actually happened?
What proves it?
Can we replay it?
Can we turn the incident into a repeatable software check?
MetriPlane focuses on that narrow layer: replayable physical evidence for bounded workcells.
It is not trying to be a robot controller, safety system, MES, or full digital-twin platform. It is an observe-only evidence layer around replayed or calibrated physical state.
What the v0.2.0 demo shows
The demo uses a camera-free assembly-cell replay where a required torque driver is missing long enough to delay a process step.
MetriPlane turns that replayed state into:
missing torque driver
→ physical event log
→ Cell Truth Report
→ incident evidence bundle
→ local bundle verification
→ generated regression test
→ regression PASS
The important part is not just detecting an event.
The important part is that the event becomes a verifiable artifact that can be reviewed, replayed, and reused as a software test.
What MetriPlane generates
MetriPlane v0.2.0 produces:
- physical event logs
- Cell Truth Reports
- incident evidence bundles
- manifest and checksums
- local bundle verification
- generated regression tests
- static review/dashboard artifacts
The demo evidence chain is:
6 physical events
1 incident
35.0 second missing-tool delay
INC-0001 evidence bundle
bundle verify: pass=true
generated regression test: PASS
This is the core idea I am trying to validate:
replayed workcell state
→ physical event
→ evidence bundle
→ verification
→ generated regression test
Camera-free reproduction
I am preparing a SoftwareX research-software paper while finishing my MSc thesis, and I am looking for external technical reproduction feedback.
Public reproduction issue:
https://github.com/Miko997/metriplane/issues/6
A useful reproduction comment would include:
OS:
Python version:
doctor: pass/fail
deterministic replay: pass/fail
Atlas run: pass/fail
bundle verify: pass/fail
regression test: pass/fail
Technical relevance:
2–5 sentences
Main limitation:
1–2 sentences
Short feedback form
A full reproduction is ideal, but I realized it may be too time-consuming for many people.
If you only have 2–4 minutes, this short technical feedback form is also useful:
https://docs.google.com/forms/d/e/1FAIpQLSfnMZ4b3fSVVtwA89hZt3A09gf85eLfhW00FDD76TGRLNpirQ/viewform
The main questions are:
- Does the evidence-bundle / regression-test loop make sense?
- Is it relevant to robotics, simulation, digital twins, manufacturing, physical AI, or research software?
- What is the main limitation or next validation step?
Critical feedback is more useful than praise.
Main reproduction commands
git clone https://github.com/Miko997/metriplane.git
cd metriplane
git checkout v0.2.0
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
python -m metriplane.cli doctor
PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 python -m pytest -q
./tools/mp.sh deterministic-replay
metriplane atlas validate-pack configs/domain_packs/assembly_cell
metriplane atlas run \
--session-jsonl datasets/demo/atlas/assembly_cell_missing_tool.jsonl \
--pack configs/domain_packs/assembly_cell \
--out runs/atlas/assembly_cell_missing_tool
metriplane atlas bundle verify \
runs/atlas/assembly_cell_missing_tool/evidence_bundles/INC-0001.zip
metriplane atlas test \
runs/atlas/assembly_cell_missing_tool/regression_tests/INC-0001.yaml
Expected high-level result:
doctor: pass
pytest: 580 passed
deterministic replay: pass=true, 0.0 cm position difference, 0 event mismatches
Atlas run: events=6, incidents=1
bundle verify: pass=true
regression test: PASS missing_tool_caused_delay_INC-0001
Scope
MetriPlane v0.2.0 is intentionally bounded.
It is:
- observe-only
- local-first
- camera/replay-oriented
- planar/tagged-asset scoped
- research-software oriented
It does not claim:
- robot or machine control
- safety certification
- quality-release approval
- people recognition
- marker-free tracking
- full 3D reconstruction
- production-factory validation
- factory-wide deployment readiness
Feedback wanted
I am especially interested in:
- Does the camera-free reproduction path work on another machine?
- Is the incident → evidence bundle → regression test loop useful?
- Are the observe-only boundaries clear enough?
- What should be validated next before this is useful beyond a deterministic demo fixture?
- Would this be useful around robotics, simulation, digital-twin, or manufacturing review workflows?
A full reproduction comment on GitHub Issue #6 is the strongest feedback.
The short form is also useful if you only have a few minutes:
https://docs.google.com/forms/d/e/1FAIpQLSfnMZ4b3fSVVtwA89hZt3A09gf85eLfhW00FDD76TGRLNpirQ/viewform
Critical feedback is preferred.
Top comments (2)
The evidence-bundle framing is very strong. A lot of systems stop at "we detected an event," but the useful artifact is the package that lets someone understand, replay, and prevent the class of failure.
I am seeing the same pattern with AI-agent operations: logs are useful, but the durable thing is a receipt/evidence bundle with the action, context, decision, result, and follow-up test. Different domain, same need for inspectable ground truth.
Thanks! this is exactly the distinction I’m trying to validate.
Detection alone is usually too ephemeral: “something happened” is useful, but it does not give another person enough context to inspect, reproduce, or convert the event into a prevention/checking mechanism. The evidence bundle is meant to be the durable unit: incident + timeline + context + report + checksums + replay command + generated regression material.
I like your AI-agent operations analogy. Different physical/digital boundary, but the same basic problem appears: logs are raw material, while the useful artifact is closer to an inspectable receipt that captures action, context, result, and follow-up test.
For MetriPlane v0.2.0 I’m keeping the claim intentionally narrow: observe-only, bounded workcell replay, and evidence suitable for review/regression rather than control or safety certification. But the broader pattern seems general: important events should become reviewable artifacts, not just entries in a log stream.
I’d be interested in your view on one thing: what fields would you consider essential in a minimal evidence bundle for AI-agent or robotics operations?