DEV Community

Cover image for Implementing Organizational Operations with Deterministic Architecture
kanaria007
kanaria007

Posted on

Implementing Organizational Operations with Deterministic Architecture

Hiring, evaluation, 1:1s, retrospectives, roadmap decisions, team design, and AI usage often look like different problems.

They are not.

In practice, they are all judgment problems:

  • what you observe,
  • what you treat as evidence,
  • what level of autonomy is allowed,
  • and what you record so decisions can be explained and improved later.

That is why I think a large part of organizational operations can be implemented with the same architectural spine I have been describing in the determinism series.

The claim here is not:

“Automate all human judgment.”

The claim is narrower and more useful:

Move the observation, verification, typed execution, and audit parts of organizational operations into reproducible boundaries.

In other words, let proposals remain flexible, but make commitment paths legible.

One clarification is important here: although I call this a deterministic architecture, it is not a replacement for your existing application stack, org chart, data platform, or model stack.

It is more precise to think of it as a protocol layer for judgment that cuts across existing systems: a way to make observation, verification, typed execution, and audit more legible without requiring you to rebuild everything from scratch.

The core idea

In the determinism series, I have been arguing for a simple but important split:

  • proposals may vary,
  • but verification and execution should be stable.

That usually means:

  • fix the input schema,
  • let humans or LLMs generate proposals,
  • run those proposals through a verifier,
  • return ACCEPT, REJECT, or DEGRADE,
  • execute only typed actions,
  • and pin the grounds in logs.

This applies surprisingly well to organizational work.

Take hiring.

The risky version is obvious: an interviewer writes a freeform impression, and that impression quietly turns into a hiring decision.

A better version is:

  • define the capability dimensions the role actually needs,
  • observe signals against those dimensions,
  • distinguish strong signals from weak signals,
  • route missing evidence into a re-entry path,
  • and only commit when the required observations are present.

That maps almost directly to:

  • an input schema,
  • a scorecard policy,
  • a verifier,
  • DEGRADE when evidence is insufficient,
  • typed next actions,
  • and a fixed decision log.

The same is true for 1:1s.

A 1:1 should not just be a vague conversation about how things feel.
It can be structured around:

  • current observation,
  • blockers,
  • gap against role expectations,
  • next practice task,
  • success condition,
  • support needed.

Once those fields exist, a verifier can check whether the session produced something actionable or whether the record is still too vague to move forward.

A simple way to think about the stack

One useful compression is this:

  • growth and mentoring practices define the observation templates,
  • organizational design practices define the judgment policies,
  • deterministic architecture defines the execution and audit runtime.

That gives you a practical implementation stack:

1. Observation layer

This is where you collect structured input:

  • daily reflection entries,
  • weekly growth snapshots,
  • 1:1 records,
  • hiring notes,
  • review notes,
  • roadmap decision memos,
  • AI usage events.

The point is not to collect more text.
The point is to collect observation in a form that can later be checked.

2. Policy layer

This is where you define what counts as a good decision:

  • role expectation policies,
  • decision lane policies,
  • hiring scorecards,
  • evaluation policies,
  • AI usage policies.

This layer decides what is acceptable, what needs review, and what must not commit.

3. Proposal layer

Humans or LLMs can generate:

  • candidate questions for the next 1:1,
  • a proposed next practice task,
  • follow-up interview questions,
  • a draft evaluation summary,
  • a proposed AI usage change,
  • a possible retro action item.

This layer is allowed to be flexible.

4. Verifier layer

This is the important part.

The verifier does not ask whether the proposal sounds nice.
It asks whether the required structure is present.

That means checking things like:

  • is the observation sufficient,
  • is the success condition concrete,
  • is the decision within the allowed lane,
  • is the role expectation gap grounded in evidence,
  • is human review mandatory for this action,
  • is the proposal forbidden under policy.

The output is not prose.
The output is a machine-readable result:

  • ACCEPT
  • DEGRADE
  • REJECT

5. Typed execution layer

Freeform text should not be executed directly.

Only typed actions should be routed forward:

  • request_more_observation
  • set_next_practice_task
  • schedule_followup_review
  • request_additional_interview
  • escalate_for_approval
  • block_high_risk_ai_usage

This is the same principle as “don’t execute the LLM.”
Do not execute freeform organizational language either.

6. Audit and learning loop

Finally, pin the decision to logs:

  • input digest,
  • policy version,
  • verifier version,
  • verdict,
  • reason codes,
  • missing fields,
  • normalized action plan digest.

Then use those results to improve the system over time:

  • golden cases,
  • gap registers,
  • policy updates,
  • template updates,
  • verifier updates.

Why DEGRADE matters so much

A lot of organizational failure is not caused by explicit bad decisions.

It is caused by vague continuation.

  • “Let’s revisit this later.”
  • “We had a good talk.”
  • “Needs more ownership.”
  • “Please think more strategically.”
  • “We should improve communication.”

These are not decisions.
They are unresolved placeholders.

That is why DEGRADE matters.

In this architecture, DEGRADE is not a soft shrug.
It is a first-class state for re-entry.

It means:

  • the observation is incomplete,
  • the role boundary is unclear,
  • the success condition is missing,
  • required reviewers are absent,
  • the evidence is too weak,
  • the proposal is too abstract to commit.

And it should always point to what is missing.

That is the difference between “not deciding yet” and “stopping in a reusable way.”

A minimal object set

You do not need a huge platform to start.

A very small set of objects is enough:

  • daily_reflection_entry
  • weekly_growth_snapshot
  • one_on_one_record
  • role_expectation_policy
  • decision_lane_policy
  • hiring_scorecard_policy
  • evaluation_policy
  • ai_usage_policy
  • proposal_packet
  • verification_result
  • typed_action_plan
  • decision_audit_log

That is enough to complete one loop:

  1. capture observation,
  2. generate proposals,
  3. verify against policy,
  4. route typed actions,
  5. pin the result to logs.

Example: a 1:1 verifier

The runtime does not need to start large.

Even a small 1:1 verifier is enough to show the pattern:

  • define the required fields,
  • check whether they are present and concrete enough,
  • return ACCEPT, DEGRADE, or REJECT,
  • and emit only typed next actions.

For example, a 1:1 record might require:

  • current observation,
  • current blockers,
  • gap against role expectations,
  • next practice task,
  • success condition,
  • support needed.

If the next practice task exists but the success condition is missing, the verifier should not produce a vague “looks promising” result.
It should return something like:

{
  "verdict": "DEGRADE",
  "reason_codes": ["missing_success_condition"],
  "missing_fields": ["success_condition"],
  "normalized_plan": [
    {
      "action_type": "request_more_observation",
      "params": {
        "questions": [
          "What would count as progress by the next session?"
        ]
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

That is the whole point.

The verifier is not trying to be creative.
It is only checking whether the structure required for commitment actually exists.

The same is true for execution.

Do not let freeform text silently become action.
Route only typed actions such as:

  • request_more_observation
  • set_next_practice_task
  • schedule_followup_review
  • request_additional_interview
  • escalate_for_approval

And once the decision is made, do not preserve only a nice-sounding explanation.
Preserve the actual grounds:

  • input digest,
  • policy version,
  • verifier version,
  • verdict,
  • reason codes,
  • missing fields,
  • normalized plan digest.

That is what makes the judgment replayable.

Readers who want the concrete implementation sketch can continue to the appendix below.
It includes a minimal architecture, example schemas, a small verifier, typed routing, pinned audit logs, golden cases, and an MVP build order.

The same runtime works for hiring, evaluation, and AI usage

Once the architecture exists, the pattern repeats.

Hiring

  • input: interview notes, scorecards, observed signals
  • policy: hiring scorecard policy
  • proposal: follow-up questions, risk notes, hire/no-hire draft
  • verify: are the must-have signals present
  • output: ACCEPT, DEGRADE, REJECT
  • action: proceed, request more observation, stop

Evaluation

  • input: weekly snapshots, 1:1 records, work signals
  • policy: role expectation policy + evaluation policy
  • proposal: draft summary
  • verify: is the rating grounded in observable evidence
  • output: ACCEPT, DEGRADE, REJECT
  • action: finalize, gather more evidence, escalate

AI usage

  • input: AI usage event
  • policy: AI usage policy
  • proposal: summary, draft, recommendation, generated change
  • verify: proposal-only, human-review-required, or forbidden
  • output: ACCEPT, DEGRADE, REJECT
  • action: allow, hold for review, block

The same runtime can support all of them.

What to build first

Do not start with a giant platform.

Start small.

Phase 1

Build:

  • observation schemas,
  • role expectation policy,
  • a verifier,
  • typed actions,
  • audit logs.

That is already enough for growth loops and 1:1s.

Phase 2

Add:

  • decision lane policy,
  • retro records,
  • gap registers,
  • roadmap decision memos.

That gives you better team-level governance.

Phase 3

Add:

  • AI usage events,
  • AI usage policy,
  • proposal packets,
  • golden cases.

That gives you proposal/verification separation for AI operations too.

What makes this an actual MVP

To make this usable in practice, I would add:

  • stronger DEGRADE / REJECT detection for vague language,
  • input forms,
  • a lightweight policy editor,
  • history and trend analysis,
  • multi-user roles and permissions,
  • integration with existing tools,
  • and then, on top of that, LLM proposal features.

That order matters.

LLM proposal generation is a good product feature.
It is not the foundation.

The foundation is:

  • structured input,
  • editable policy,
  • stable verifier behavior,
  • and auditable history.

Without that, the LLM layer is just a nice demo on top of unclear operations.

What should not be automated

This architecture is not meant to turn organizations into auto-approval machines.

Quite the opposite.

The value of the organization should remain on the verifier side:

  • what counts as commit,
  • what requires human review,
  • what is high risk,
  • what is forbidden,
  • who has authority,
  • and which boundaries must not be crossed automatically.

That means things like these should usually remain proposal-only or human-gated:

  • final hiring decisions,
  • final evaluation outcomes,
  • high-risk delegation,
  • major organizational restructuring.

The point is not to automate sovereignty.
The point is to make observation, verification, routing, and audit much stronger.

So what is this, really?

In one line:

a decision operating system that makes organizational judgment, development, and AI usage replayable.

That is the practical bridge.

Not “AI replacing management.”
Not “yet another workflow tool.”
Not “prompting your org chart.”

A runtime where:

  • observations are structured,
  • policies are explicit,
  • proposals are separated from commitments,
  • verifiers return stable outputs,
  • actions are typed,
  • logs are pinned,
  • and failures become material for the next verifier improvement.

That is what makes organizational operations more reproducible.

And in the AI era, that matters much more than simply making things faster.

Because the real question is not whether somebody can generate text quickly.

It is whether that speed can be turned into decisions that remain explainable, reviewable, and correctable after the fact.


If you only wanted the architectural argument, you can stop here.

The appendix below is for readers who want the implementation sketch: minimal objects, example schemas, a small verifier, typed routing, pinned audit logs, golden cases, and a practical MVP sequence.

Appendix — A Minimal Runtime for Organizational Operations

The main article focused on the architectural idea.

This appendix makes that idea more concrete.

The point is not to define a giant enterprise platform from day one.
The point is to show that a surprisingly small runtime is enough to start:

  • structured observation inputs,
  • explicit policies,
  • a verifier that returns ACCEPT / REJECT / DEGRADE,
  • typed actions,
  • pinned audit logs,
  • and golden cases to keep verifier behavior stable.

That is already enough to turn a large part of organizational operations into a replayable system.

A minimal architecture

flowchart TD
    A[Human Inputs / Work Signals
    daily reflection
    weekly reflection
    1:1 notes
    PR / review notes
    roadmap memos
    retrospectives
    AI usage logs] --> B[Normalization Layer
    schema validation
    ID assignment
    context binding]

    B --> C[Proposal Layer
    human proposer
    LLM proposer
    question drafts
    next-task drafts
    interview follow-up drafts
    retro issue drafts]

    B --> D[Deterministic Verifier Layer
    role expectations
    decision lanes
    responsibility boundaries
    hiring scorecards
    evaluation rules
    AI usage boundaries]

    C --> D

    D -->|ACCEPT| E[Typed Actions
    set next practice task
    request review
    update lane
    register gap
    request interview
    change AI usage restriction]

    D -->|DEGRADE| F[Re-entry Queue
    ask for more observation
    ask follow-up questions
    request more evidence
    request more approvals
    define re-entry conditions]

    D -->|REJECT| G[Stop / Escalate
    stop execution
    escalate to reviewer
    return for human judgment]

    E --> H[Execution Layer
    human execution
    meeting workflow
    HR workflow
    development workflow
    AI operations control]

    F --> H
    G --> H

    H --> I[Pinned Logs / Replay Store
    input snapshot
    policy version
    reason codes
    missing fields
    normalized plan
    decision log]

    I --> J[Learning Loop
    golden cases
    gap register
    policy updates
    template updates
    verifier updates]

    J --> D
    J --> C
Enter fullscreen mode Exit fullscreen mode

The most important thing here is that hiring, evaluation, 1:1s, retrospectives, roadmap decisions, and AI usage can all ride on the same spine.

The other important point is DEGRADE.

In many organizations, “we’ll think about it later” is an untyped fog state.
In this runtime, DEGRADE is a first-class re-entry state:

  • not enough observation,
  • no concrete success condition,
  • missing reviewer,
  • missing approval,
  • insufficient strong signals,
  • or an action proposal that is still too abstract to commit.

That makes pause states reusable instead of vague.

A minimal object set

You do not need a huge schema family to get started.

A small initial object set is enough:

  1. daily_reflection_entry
  2. weekly_growth_snapshot
  3. one_on_one_record
  4. role_expectation_policy
  5. decision_lane_policy
  6. hiring_scorecard_policy
  7. evaluation_policy
  8. ai_usage_policy
  9. proposal_packet
  10. verification_result
  11. typed_action_plan
  12. decision_audit_log

That may look like a lot, but notice the pattern:

  • observation objects,
  • policy objects,
  • runtime objects,
  • audit objects.

That is all.

Sketching the minimum schemas

You do not need fully formal JSON Schema files on day one.
A design-note-level schema is enough as long as the fields are stable.

Example: daily reflection entry

kind: daily_reflection_entry
version: v1
id: dre_2026_04_09_user_001
person_id: user_001
created_at: 2026-04-09T20:15:00+09:00

what_done:
  - "Reviewed three pull requests"
  - "Investigated the user-list API"
  - "Used an AI agent to draft test code"

why_chosen:
  - "Review priority was high"
  - "I wanted to check dependencies before starting the next task"

insights:
  - "Giving the AI an explicit direction worked better than delegating everything"
  - "A vague answer exposed a shallow part of my own understanding"

judgment_reflection:
  good:
    - "Using waiting time for review work was a good decision"
  improve:
    - "I answered before checking the design document"
  reusable_thoughts:
    - "Break requests into smaller pieces before handing them to AI"
    - "Check grounds before answering"

next_day_plan:
  - "Prioritize review work"
  - "Create a dependency-mapping sheet first"

share_or_consult:
  - "Share the AI usage insight with the team"
Enter fullscreen mode Exit fullscreen mode

Example: weekly growth snapshot

kind: weekly_growth_snapshot
version: v1
id: wgs_2026_w15_user_001
person_id: user_001
week_range:
  from: 2026-04-06
  to: 2026-04-12
created_at: 2026-04-12T18:00:00+09:00

summary:
  what_why:
    - "I balanced implementation, review work, and AI usage experiments"
  learning_and_gaps:
    - "Review quality improved, but dependency mapping is still slow"

self_eval:
  autonomy:
    task_ownership: partial
    blocker_handling: partial
  org_adaptation:
    implicit_norms: good
    communication: partial
  strategic_thinking:
    technical_depth: partial
    product_view: partial

next_week_focus:
  - "Make dependencies explicit earlier"
  - "Write down my own prioritization logic"
Enter fullscreen mode Exit fullscreen mode

Example: role expectation policy

kind: role_expectation_policy
version: v1
id: rep_senior_ic_v1
role_id: senior_ic
role_name: "Senior IC"
created_at: 2026-04-01T00:00:00+09:00

expected_outcomes:
  - "Clarify ambiguous issues and make forward progress possible"
  - "Move design and implementation decisions forward in the owned domain"

decision_scope:
  can_decide:
    - "choice of implementation approach"
    - "technical trade-off clarification"
  must_escalate:
    - "cross-unit platform changes"
    - "high-risk customer-impacting changes"

influence_patterns:
  - "make decision criteria explicit in review"
  - "separate mixed issues when discussion is confused"

reproducibility_expectation:
  - "leave behind notes or templates that preserve the judgment logic"
  - "make decisions reusable by others later"

evidence_examples:
  - "review comments"
  - "design notes"
  - "dependency mapping sheet"
Enter fullscreen mode Exit fullscreen mode

Example: decision lane policy

kind: decision_lane_policy
version: v1
policy_id: dlp_product_unit_v1
created_at: 2026-04-01T00:00:00+09:00

lanes:
  - lane_id: lane_1
    name: "local decision"
    can_decide_by_self: true
    requires_review: false
    requires_escalation: false
    required_inputs:
      - "current_scope"
      - "affected_component"
    forbidden_without_approval: []
    examples:
      - "small implementation change"
      - "improvement within an existing policy"

  - lane_id: lane_2
    name: "review required"
    can_decide_by_self: false
    requires_review: true
    requires_escalation: false
    required_inputs:
      - "design_note"
      - "reviewer"
    forbidden_without_approval:
      - "cross_team_policy_change"
    examples:
      - "design change across dependencies"
      - "a change with multiple valid interpretations"

  - lane_id: lane_3
    name: "escalation required"
    can_decide_by_self: false
    requires_review: true
    requires_escalation: true
    required_inputs:
      - "risk_summary"
      - "approver"
      - "rollback_plan"
    forbidden_without_approval:
      - "important_customer_impact"
      - "evaluation_commit"
      - "hiring_commit"
      - "high_risk_ai_usage"
    examples:
      - "important customer impact"
      - "evaluation, hiring, or authority transfer"
      - "high-risk AI usage"
Enter fullscreen mode Exit fullscreen mode

Example: verification result

{
  "kind": "verification_result",
  "version": "v1",
  "verification_id": "vr_001",
  "target_object_id": "one_on_one_record_2026_04_09_001",
  "policy_refs": [
    "rep_senior_ic_v1",
    "dlp_product_unit_v1"
  ],
  "verdict": "DEGRADE",
  "reason_codes": [
    "practice_task_too_abstract",
    "role_gap_not_grounded"
  ],
  "missing_fields": [
    "next_practice_task.success_condition",
    "current_blockers.dependency_scope"
  ],
  "normalized_plan": [
    {
      "action_type": "request_more_observation",
      "params": {
        "questions": [
          "What would count as progress by the next session?",
          "Which dependency is actually causing the blocker?"
        ]
      }
    }
  ],
  "reviewer_refs": [
    "manager_001"
  ],
  "created_at": "2026-04-09T21:00:00+09:00"
}
Enter fullscreen mode Exit fullscreen mode

Example: typed action plan

{
  "kind": "typed_action_plan",
  "version": "v1",
  "plan_id": "tap_001",
  "status": "READY",
  "actions": [
    {
      "action_type": "set_next_practice_task",
      "params": {
        "person_id": "user_001",
        "task": "Create a one-page dependency mapping sheet before the next session",
        "success_condition": "Dependencies, consultation targets, and unresolved issues are explicitly listed"
      },
      "authority_required": "manager",
      "execution_channel": "growth_plan",
      "rollback_hint": "replace_next_practice_task"
    },
    {
      "action_type": "schedule_followup_review",
      "params": {
        "person_id": "user_001",
        "date": "2026-04-16"
      },
      "authority_required": "manager",
      "execution_channel": "calendar",
      "rollback_hint": "cancel_followup_review"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Example: decision audit log

{
  "kind": "decision_audit_log",
  "version": "v1",
  "log_id": "dal_001",
  "ordering_key": "seq_00000125",
  "input_digest": "sha256:abc123...",
  "snapshot_refs": [
    "wgs_2026_w15_user_001",
    "dre_2026_04_09_user_001"
  ],
  "policy_versions": {
    "role_expectation_policy": "rep_senior_ic_v1",
    "decision_lane_policy": "dlp_product_unit_v1"
  },
  "verifier_version": "orgos-verifier-0.1.0",
  "verdict": "DEGRADE",
  "reason_codes": [
    "practice_task_too_abstract"
  ],
  "missing": [
    "next_practice_task.success_condition"
  ],
  "normalized_plan_digest": "sha256:def456...",
  "actor_refs": [
    "manager_001",
    "user_001"
  ],
  "created_at": "2026-04-09T21:00:01+09:00"
}
Enter fullscreen mode Exit fullscreen mode

The key idea is simple:

do not preserve a nice-sounding explanation as the source of truth.

Preserve:

  • input digest,
  • policy version,
  • verifier version,
  • verdict,
  • reason codes,
  • missing fields,
  • normalized plan digest.

That is what makes the judgment replayable.

A minimal repository layout

You do not need microservices first.
One repository is enough.

orgos/
  schemas/
    daily_reflection_entry.yaml
    weekly_growth_snapshot.yaml
    role_expectation_policy.yaml
    decision_lane_policy.yaml

  policies/
    role_expectation/
      senior_ic_v1.yaml
    decision_lanes/
      product_unit_v1.yaml
    ai_usage/
      default_v1.yaml

  runtime/
    models.py
    verifier.py
    action_router.py
    audit_log.py
    pipeline.py

  golden/
    cases/
      01_1on1_accept.json
      02_1on1_degrade_missing_success_condition.json
      03_ai_usage_reject_forbidden_commit.json
    run_golden.py
Enter fullscreen mode Exit fullscreen mode

That is enough to start.

The operating idea stays the same:

  • humans or LLMs produce proposals,
  • the verifier returns ACCEPT / REJECT / DEGRADE,
  • only typed actions are executable,
  • and golden cases freeze verifier behavior.

Minimal Python models

from __future__ import annotations

from dataclasses import dataclass, field
from enum import Enum
from typing import Any


class Verdict(str, Enum):
    ACCEPT = "ACCEPT"
    REJECT = "REJECT"
    DEGRADE = "DEGRADE"


@dataclass(frozen=True)
class ProposalPacket:
    proposal_id: str
    proposal_type: str
    target_object_id: str
    candidates: list[dict[str, Any]]
    producer_type: str  # "human" | "llm"
    producer_id: str


@dataclass(frozen=True)
class VerificationResult:
    verification_id: str
    target_object_id: str
    verdict: Verdict
    reason_codes: tuple[str, ...] = ()
    missing_fields: tuple[str, ...] = ()
    normalized_plan: tuple[dict[str, Any], ...] = ()


@dataclass(frozen=True)
class OneOnOneRecord:
    session_id: str
    person_id: str
    manager_id: str
    current_observation: str
    current_blockers: list[str]
    role_expectation_gap: list[str]
    next_practice_task: str | None
    success_condition: str | None
    support_needed: list[str] = field(default_factory=list)


@dataclass(frozen=True)
class RoleExpectationPolicy:
    role_id: str
    expected_outcomes: tuple[str, ...]
    must_show_evidence: tuple[str, ...]


@dataclass(frozen=True)
class DecisionLane:
    lane_id: str
    can_decide_by_self: bool
    requires_review: bool
    requires_escalation: bool


@dataclass(frozen=True)
class DecisionLanePolicy:
    policy_id: str
    lanes: tuple[DecisionLane, ...]
Enter fullscreen mode Exit fullscreen mode

Minimal verifier

Here is a small verifier for a 1:1 record.

from __future__ import annotations


def verify_one_on_one_record(
    record: OneOnOneRecord,
    role_policy: RoleExpectationPolicy,
) -> VerificationResult:
    reason_codes: list[str] = []
    missing_fields: list[str] = []
    normalized_plan: list[dict] = []

    if not record.next_practice_task:
        reason_codes.append("missing_next_practice_task")
        missing_fields.append("next_practice_task")

    if not record.success_condition:
        reason_codes.append("missing_success_condition")
        missing_fields.append("success_condition")

    if len(record.current_blockers) == 0:
        reason_codes.append("missing_blocker_context")
        missing_fields.append("current_blockers")

    if len(record.support_needed) == 0:
        reason_codes.append("missing_support_needed")
        missing_fields.append("support_needed")

    if role_policy.expected_outcomes and len(record.role_expectation_gap) == 0:
        reason_codes.append("missing_role_expectation_gap")
        missing_fields.append("role_expectation_gap")

    if missing_fields:
        questions: list[str] = []

        if "next_practice_task" in missing_fields:
            questions.append("What should be tried before the next session?")
        if "success_condition" in missing_fields:
            questions.append("What would count as progress?")
        if "current_blockers" in missing_fields:
            questions.append("What is the actual blocker right now?")
        if "support_needed" in missing_fields:
            questions.append("What support is needed to move forward?")
        if "role_expectation_gap" in missing_fields:
            questions.append("Against the current role expectation, what is actually weak?")

        normalized_plan.append(
            {
                "action_type": "request_more_observation",
                "params": {
                    "questions": questions
                }
            }
        )
        return VerificationResult(
            verification_id=f"vr_{record.session_id}",
            target_object_id=record.session_id,
            verdict=Verdict.DEGRADE,
            reason_codes=tuple(reason_codes),
            missing_fields=tuple(missing_fields),
            normalized_plan=tuple(normalized_plan),
        )

    normalized_plan.append(
        {
            "action_type": "set_next_practice_task",
            "params": {
                "person_id": record.person_id,
                "task": record.next_practice_task,
                "success_condition": record.success_condition,
            },
        }
    )

    return VerificationResult(
        verification_id=f"vr_{record.session_id}",
        target_object_id=record.session_id,
        verdict=Verdict.ACCEPT,
        reason_codes=(),
        missing_fields=(),
        normalized_plan=tuple(normalized_plan),
    )
Enter fullscreen mode Exit fullscreen mode

The point here is not that the verifier is “intelligent.”

The point is that it is checking a known structure:

  • observation,
  • blocker,
  • gap against expectation,
  • next practice task,
  • success condition,
  • support needed.

That is exactly what makes the runtime reproducible.

Typed action routing

Do not execute freeform text.

Route only typed actions.

from __future__ import annotations

from typing import Any


def route_actions(result: VerificationResult) -> list[dict[str, Any]]:
    routed: list[dict[str, Any]] = []

    for action in result.normalized_plan:
        action_type = action["action_type"]

        if action_type == "request_more_observation":
            routed.append(
                {
                    "channel": "one_on_one_followup",
                    "authority_required": "manager",
                    "payload": action["params"],
                }
            )
        elif action_type == "set_next_practice_task":
            routed.append(
                {
                    "channel": "growth_plan",
                    "authority_required": "manager",
                    "payload": action["params"],
                }
            )
        else:
            routed.append(
                {
                    "channel": "manual_review",
                    "authority_required": "manager",
                    "payload": action,
                }
            )

    return routed
Enter fullscreen mode Exit fullscreen mode

This is the organizational version of:

Don’t execute the LLM.

Pinned audit logs

from __future__ import annotations

import hashlib
import json
from datetime import datetime, timedelta, timezone


JST = timezone(timedelta(hours=9))


def canonical_json(obj: object) -> str:
    return json.dumps(obj, ensure_ascii=False, sort_keys=True, separators=(",", ":"))


def sha256_hex(text: str) -> str:
    return hashlib.sha256(text.encode("utf-8")).hexdigest()


def make_audit_log(
    input_object: dict,
    policy_refs: dict,
    result: VerificationResult,
    verifier_version: str,
    ordering_key: str,
) -> dict:
    input_digest = "sha256:" + sha256_hex(canonical_json(input_object))
    normalized_plan_digest = "sha256:" + sha256_hex(canonical_json(list(result.normalized_plan)))

    return {
        "kind": "decision_audit_log",
        "version": "v1",
        "log_id": f"dal_{ordering_key}",
        "ordering_key": ordering_key,
        "input_digest": input_digest,
        "policy_versions": policy_refs,
        "verifier_version": verifier_version,
        "verdict": result.verdict.value,
        "reason_codes": list(result.reason_codes),
        "missing": list(result.missing_fields),
        "normalized_plan_digest": normalized_plan_digest,
        "created_at": datetime.now(JST).isoformat(),
    }
Enter fullscreen mode Exit fullscreen mode

The pipeline itself

Once those parts exist, the pipeline becomes simple.

from __future__ import annotations


def run_one_on_one_pipeline(
    record: OneOnOneRecord,
    role_policy: RoleExpectationPolicy,
    ordering_key: str,
) -> tuple[VerificationResult, list[dict], dict]:
    result = verify_one_on_one_record(record, role_policy)
    actions = route_actions(result)

    audit_log = make_audit_log(
        input_object={
            "session_id": record.session_id,
            "person_id": record.person_id,
            "current_observation": record.current_observation,
            "current_blockers": record.current_blockers,
            "role_expectation_gap": record.role_expectation_gap,
            "next_practice_task": record.next_practice_task,
            "success_condition": record.success_condition,
            "support_needed": record.support_needed,
        },
        policy_refs={
            "role_expectation_policy": role_policy.role_id,
        },
        result=result,
        verifier_version="orgos-verifier-0.1.0",
        ordering_key=ordering_key,
    )

    return result, actions, audit_log
Enter fullscreen mode Exit fullscreen mode

That already completes one full cycle:

  • observation input,
  • verification,
  • typed action routing,
  • and audit logging.

Golden cases: grow the verifier, not the prompt

A key point from the determinism framing is that you should stabilize verifier outputs, not LLM phrasing.

That applies here too.

A minimal golden case for a 1:1 verifier could look like this:

{
  "name": "one_on_one_degrade_missing_success_condition",
  "input": {
    "session_id": "sess_001",
    "person_id": "user_001",
    "manager_id": "mgr_001",
    "current_observation": "Dependency mapping is slow",
    "current_blockers": ["Dependencies are not explicit"],
    "role_expectation_gap": ["Weak issue clarification under ambiguity"],
    "next_practice_task": "Create a dependency mapping sheet",
    "success_condition": null
  },
  "expect": {
    "verdict": "DEGRADE",
    "reason_codes": ["missing_success_condition"],
    "missing_fields": ["success_condition"]
  }
}
Enter fullscreen mode Exit fullscreen mode

And a minimal harness can be this small:

from __future__ import annotations

import json
from pathlib import Path


def run_golden_case(path: Path) -> None:
    with path.open("r", encoding="utf-8") as f:
        case = json.load(f)

    input_data = case["input"]
    expected = case["expect"]

    record = OneOnOneRecord(
        session_id=input_data["session_id"],
        person_id=input_data["person_id"],
        manager_id=input_data["manager_id"],
        current_observation=input_data["current_observation"],
        current_blockers=input_data["current_blockers"],
        role_expectation_gap=input_data["role_expectation_gap"],
        next_practice_task=input_data["next_practice_task"],
        success_condition=input_data["success_condition"],
    )

    policy = RoleExpectationPolicy(
        role_id="senior_ic",
        expected_outcomes=("clarify ambiguous issues",),
        must_show_evidence=("review_comment", "design_note"),
    )

    result = verify_one_on_one_record(record, policy)

    assert result.verdict.value == expected["verdict"]
    assert list(result.reason_codes) == expected["reason_codes"]
    assert list(result.missing_fields) == expected["missing_fields"]
Enter fullscreen mode Exit fullscreen mode

This pattern generalizes directly to hiring, evaluation, and AI usage policies.

What you gain from this

This is not just about efficiency.

It changes what kind of organizational knowledge you can actually preserve.

It helps with things like:

  • daily reflection and 1:1s not ending as vague conversation,
  • role expectations not remaining fuzzy language,
  • hiring and evaluation becoming easier to explain afterward,
  • AI usage operating with a real boundary between proposal and commit,
  • retrospectives feeding into structural updates,
  • and missing information turning into the next policy update or the next golden case.

In other words:

it becomes easier to convert personal tricks and implicit judgment into replayable organizational knowledge.

Where to start

Do not build everything at once.

A practical sequence is:

Phase 1

Start with:

  • daily_reflection_entry
  • weekly_growth_snapshot
  • one_on_one_record
  • role_expectation_policy
  • verification_result
  • decision_audit_log

That is enough for a growth and 1:1 loop.

Phase 2

Add:

  • decision_lane_policy
  • retro_record
  • gap_register_entry
  • roadmap_decision_memo

That gives you team improvement and decision tracking.

Phase 3

Add:

  • ai_usage_event
  • ai_usage_policy
  • proposal_packet
  • typed_action_plan
  • golden_case

That gives you proposal/verification separation for AI operations too.

What makes it a real MVP

A proof of concept is easy.
A usable MVP needs a few more things.

The order matters.

1. Stronger DEGRADE / REJECT detection

Especially for 1:1s and evaluation text, you want better detection of vague language such as:

  • “show more ownership,”
  • “move things forward properly,”
  • “do it well,”
  • “consult when necessary.”

These phrases often hide missing structure.

A useful verifier should be able to stop them and translate them into concrete failure modes such as:

  • success condition missing,
  • action granularity too coarse,
  • weak connection to the role expectation,
  • impression-based evaluation without evidence.

2. Input UI

Even good schemas fail if input is painful.

You will usually want forms for:

  • daily reflection,
  • weekly reflection,
  • 1:1 records,
  • interview notes,
  • retro notes.

3. Policy editor

If role expectations, decision lanes, hiring scorecards, and AI usage rules only live in code, operating them gets slow.

A lightweight policy editor matters sooner than most teams expect.

4. History and trend analysis

Audit logs are not enough by themselves.

Soon you will want to see:

  • where DEGRADE is frequent,
  • which reason codes are increasing,
  • which role expectation produces repeated friction,
  • which team or phase has the most missing fields.

5. Multi-user roles and permissions

To move from a personal tool to an organizational runtime, you need at least role separation across:

  • individual contributor,
  • manager or mentor,
  • interviewer,
  • evaluator,
  • administrator.

6. Existing workflow integration

Real adoption gets easier once the system connects to existing surfaces like:

  • chat,
  • calendar,
  • docs,
  • GitHub / pull requests,
  • task management,
  • HR tooling.

7. LLM proposal features

This is the eye-catching feature layer:

  • draft 1:1 questions,
  • draft next practice tasks,
  • draft deeper interview questions,
  • draft evaluation text,
  • extract structural retro issues,
  • propose policy changes.

But this should come last.

LLM proposal features are not the foundation of the MVP.
They are the visible layer on top of the foundation.

That foundation is:

  • structured input,
  • editable policy,
  • stable verifier behavior,
  • history and audit.

Without that, the LLM layer is mostly a nice demo.

What should not be automated

This architecture is not for replacing organizational judgment with AI.

It is for strengthening the observation, verification, recording, and re-entry parts of organizational judgment.

That means many high-stakes commitments should remain proposal-only or human-gated:

  • final hiring decisions,
  • final evaluation outcomes,
  • high-risk authority transfer,
  • major organizational restructuring.

The value of the organization should remain on the verifier side:

  • what counts as commit,
  • what requires review,
  • what is high risk,
  • what is forbidden,
  • and who has authority.

Final compression

If you want the shortest version of the appendix, it is this:

  • make observation structured,
  • make policy explicit,
  • separate proposal from commit,
  • freeze verifier outputs with golden cases,
  • execute only typed actions,
  • keep the grounds in pinned logs.

That is enough to start building a decision operating system for organizational work.

Top comments (0)