DEV Community

Cover image for My AI Maintainer Kept Making Wrong Calls. So I Made It Report Its State Before Touching Anything.
Kwansub Yun
Kwansub Yun

Posted on • Originally published at flamehaven.space

My AI Maintainer Kept Making Wrong Calls. So I Made It Report Its State Before Touching Anything.

πŸ”Ž Glossary: terms used in this article

πŸ”Έ MICA (Memory Invocation & Context Archive): A governance schema for AI context management. Defines how context should be structured, trusted, scored, and handed off across sessions.

πŸ”Έ memory_injection: A MICA operational mode. The archive is updated after each maintenance session and read by the next AI session to compensate for session amnesia.

πŸ”Έ Session Report Format: The structured opening output the model must produce at session start β€” declaring archive version, self-test result, drift status, and active invariants β€” before any repository-level work begins.

πŸ”Έ Self-Test Policy: Machine-evaluable checks that validate the archive against the real project state: file existence, hash integrity, and invocation protocol presence.

πŸ”Έ Drift Response Policy: The schema-level declaration of how hash mismatches and missing files are handled. Different failure classes carry different response actions.

πŸ”Έ Design Invariant: A structured governance rule with identity, severity, and statement. Not a style guideline. A constraint the model cannot rationalize past.

πŸ”Έ Provenance Registry: The record of tracked files with SHA256 hashes. The basis for drift detection.

πŸ”Έ Deviation Log: The audit trail of formal exceptions to design invariants. Empty means no exceptions have been logged β€” not that no judgment calls were made.

coverimage


1. What Part 5 Left Open

Part 5 placed MICA inside the context engineering landscape and drew one boundary: MICA is not a collection system. It begins after collection ends. Its job is to govern what enters the session, what remains authoritative, and how the model proves it actually loaded the governed archive at all.

That answer was correct. It was also still abstract.

This post comes down from that framing. It shows what MICA looks like when it is actually running β€” not as a concept, but as a protocol inside a real project.

The project is flamehaven.space, a Next.js B2B site maintained by a solo operator using a MICA package in memory_injection mode. Everything shown here is drawn from the live archive. Values that would expose internal configuration are anonymized; structure and behavior are real.


2. The Session Opening Report

The paradigm shift

Every MICA session in memory_injection mode begins with a declared output before any work starts. The archive specifies the required format:

[SESSION READY]
Archive: flamehaven-space-maintainer v0.2.0
Self-test: PASS (3 checks -- ST-001, ST-002, ST-003)
Drift: no hash mismatch detected
Active invariants: DI-001 (critical), DI-002 (critical) + 24 others loaded
Gate: PASS -- proceeding with maintenance
Enter fullscreen mode Exit fullscreen mode

This is not a courtesy summary. It is a gate. The archive field session_report_format.gate_block_on is set to critical_self_test_failure β€” meaning the model must declare its load state before it is permitted to make any repository-level changes.

The format is specified in the archive itself:

json
"session_report_format": {
  "trigger": "session_start",
  "required_fields": ["archive_version", "self_test", "drift_status", "active_invariants", "gate"],
  "format_template": "[SESSION READY]\nArchive: {archive_version}\nSelf-test: {self_test}\nDrift: {drift_status}\nActive invariants: {active_invariants}\nGate: {gate}"
}
Enter fullscreen mode Exit fullscreen mode

The model does not decide what to declare. The archive tells it what a valid session opening looks like.


3. What the Self-Test Actually Checks

self-test mechanics

The self_test_policy runs on session_start and pre_handoff. Three checks matter here:

  • ST-001 (provenance_sha256_format, severity: error) β€” verifies that provenance hashes in the registry match the expected format. A malformed hash means the file fingerprint is untrustworthy.
  • ST-002 (provenance_file_exists, severity: warning) β€” verifies that files listed in the provenance registry actually exist on disk. A missing file is not a formatting error; it is a ghost reference.
  • ST-003 (invocation_pattern_present, severity: error) β€” verifies that the invocation protocol is declared and readable. If the model cannot confirm how it was loaded, it cannot confirm the session is in a governed state.

Failure behavior is set per-check. The overall on_failure policy for this archive is warn_continue β€” the session proceeds, but the failure is surfaced explicitly in the opening report.

This is a deliberate calibration. A site maintenance session that blocks hard on every provenance warning would be too brittle for solo operation. The severity ladder reflects the actual cost of each failure mode, not a theoretical maximum.


4. What Drift Detection Looks Like in Practice

drift response policy

The archive's drift_response_policy is minimal by design:

"drift_response_policy": {
  "on_hash_mismatch": "warn_continue",
  "on_file_missing": "warn_block",
  "reminder_after_change": true,
  "inline_sync_required": false
}
Enter fullscreen mode Exit fullscreen mode

The distinction between warn_continue and warn_block is operationally significant.

A hash mismatch means a tracked file changed β€” which happens legitimately during ordinary maintenance. The model surfaces the mismatch and continues.

A file that has gone missing entirely is a different failure class. The model blocks and requires operator acknowledgment before proceeding.

reminder_after_change: true means the archive instructs the model to remind the operator to refresh the provenance registry and artifact manifest before minting the next archive version. This is not automated enforcement. It is memory injection: the archive tells the next session what the previous session should have reminded the operator to do.

The deviation_log in v0.2.0 is empty. That is not a sign the system has never been used. It means no deviations have been formally logged yet β€” which is itself a state the archive captures.


5. What Happens When a Deployment Changes Something

deployment evolution

A concrete scenario: the operator ships a writing refresh mode change β€” switching from automatic ISR to manual operator-triggered revalidation. Three files change: next.config.ts, a helper .bat script, and the playbook.

On the next session open:

  1. Self-test runs. ST-002 may flag if the helper script path is not in the provenance registry yet.
  2. Drift check runs. Hash mismatches fire for the changed files. on_hash_mismatch: warn_continue β€” the session proceeds.
  3. The opening report surfaces the mismatch:
[SESSION READY]
Archive: flamehaven-space-maintainer v0.2.0
Self-test: PASS with warnings (ST-002: update-writing-now.bat not in provenance registry)
Drift: hash mismatch on next.config.ts, flamehaven-space-maintainer-playbook.v0.2.0.md
Active invariants: DI-001 (critical), DI-002 (critical) + 24 others
Gate: PASS WITH WARNINGS -- operator review required
Enter fullscreen mode Exit fullscreen mode

The operator now has a concrete decision surface before touching anything: what changed, what the model knows about, and what it does not.

The model then follows the change process defined in the playbook: identify the canonical subsystem touched, patch the smallest coherent surface, run build and audit, verify route-level behavior, then update README, MICA, or playbook if the change alters maintainer truth.

At the end of the session, if a new archive version is minted, the synchronization rule is explicit: file name, project.version, and the archive handoff marker must be updated in the same change. Not sequentially. The same change.

This is the operational point of drift reporting: not merely to announce that something changed, but to force the model and the operator to see the same changed surface before any new work proceeds.


6. What the Design Invariants Actually Govern

design invariants

The archive carries 26 design invariants. The first six establish the perimeter everything else operates within:

  • DI-001 (critical): Flamehaven is positioned as a governance-first, founder-led B2B AI systems practice, not a generic agency or AI wrapper shop.
  • DI-002 (critical): Primary conversion surface is the main domain flamehaven.space, not Medium, Substack, DEV.to, or LinkedIn.
  • DI-003 (high): Writing detail pages are authoritative artifacts linked to projects, selected work, contact, and SEO canonical ownership.
  • DI-004 (high): Selected Work must distinguish public, private, and internal systems without broken public links or placeholder copy.
  • DI-005 (high): Legacy WordPress-era routes must redirect away from obsolete agency messaging.
  • DI-006 (high): Operational choices favor deterministic behavior, inspectability, and maintenance continuity over decorative complexity.

These are not style guidelines. They are session-blocking constraints. An AI that proposes converting Selected Work to a live-fetch real-time surface is violating DI-006. An AI that treats cross-posting as the canonical publishing path is violating DI-002. An AI that leaves a legacy route live because a redirect β€œseems unnecessary” is violating DI-005.

The invariants exist so the model cannot rationalize its way past the operator's architectural intent β€” even across sessions, even with a new model instance that has never seen the project before.


7. What MICA Cannot Do Here

system limits and human authority

The deviation_log is empty because no formal deviation has been logged. But there have been judgment calls.

One example is the writing hero image fallback logic. The site had to support both modern Notion block structures and legacy imported posts with a different nesting format. That decision did not begin as an invariant. It began as a session-level judgment call, became a playbook rule, and only then became stable maintainer truth.

That path matters.

MICA does not automate the step from β€œwe discussed this and made a call” to β€œthis is now a governed constraint.” It provides the place to put the result. The operator decides what rises to the level of an invariant, what remains a lesson in the playbook, and what disappears when the session ends.

That boundary β€” what gets governed, what gets remembered, what gets lost β€” is not a gap in MICA. It is a design decision made with every archive update.


8. What Part 7 Will Address

preview part7

Part 6 showed what MICA looks like in operation inside a single maintenance agent. The structure holds. The protocol runs. The session report is predictable.

But that is still the easier case.

The project being governed was a site: a relatively stable artifact, maintained by one operator, where the main problem was making sure the model did not forget what already mattered.

Part 7 moves into a harder setting.

The governed project is now a tool that runs inside AI workflows itself. That changes the governance problem. The issue is no longer only session amnesia. It is iterative accumulation: what the system learns across cycles, what becomes authoritative, what remains provisional, and what must be carried forward without allowing drift to harden into false memory.

Part 7 is that case.


Named decision from this post: A session report is not a polite summary. It is a hard gate. The model must declare β€” in the exact format dictated by the archive β€” what it loaded, what tests passed, and what drift it detected, before it is allowed to touch the repository.


MICA is part of the Flamehaven governance-first AI systems practice. Schema, technical report, and production instance: flamehaven.space. Open-source tooling: AI-SLOP-Detector. All schema references follow the v0.2.0 standard unless a specific earlier version is named.

Top comments (0)