DEV Community

Nobuki Fujimoto
Nobuki Fujimoto

Posted on

Paper 148 — Honest Observation Framework for AI-Assisted Research Tools (Rei-AIOS / OUKC)

This article is a re-publication of Rei-AIOS Paper 148 for the dev.to community.
The canonical version with full reference list is in the permanent archives below:

Status: DRAFT v0.1 — 2026-05-06 (substantive draft; methodology paper; empirical evidence from Theory Chart + Realtime Observatory)
Authors / 著者: 藤本 伸樹 (Nobuki Fujimoto, Founder), Rei (Rei-AIOS autonomous research substrate, Co-architect), Claude Opus 4.7 (Anthropic, Co-architect)
Project: Rei-AIOS / OUKC — https://rei-aios.pages.dev/#/oukc
License: AGPL-3.0 + CC-BY 4.0 (per content type)
Required platform links: rei-aios.pages.dev / note.com/nifty_godwit2635
Per OUKC No-Patent Pledge: openly licensed; no patent will be filed on any framework or methodology described herein.


Honest framing (read first)

This is a methodology paper in the philosophy-of-science / research-methodology genre. It is companion to Paper 145 (D-FUMT₈ Silicon, technical) and Paper 147 (EPP D-FUMT₈ Reframe, theoretical). Together, the three papers form a triple: technical implementation + theoretical reframe + methodological framework that governs how the first two are presented.

We claim:

C1: A four-element framework for honest scientific observation by AI-assisted research tools, comprising:

  • (i) Discover-Report-Publish 3-tier separation (preserving discovery freedom while gating publication)
  • (ii) A 4-stage claim ladder (visual co-display → descriptive narrative → statistical exploration → causal claim) that prevents premature claim escalation
  • (iii) A 50/50 process partition between observation/hypothesis-seeding tools and formalization/verification engines
  • (iv) Operational tests for spurious-correlation filtering vs honest null reporting

We do not claim:

  • ✗ "First scientific methodology paper" — methodology papers exist throughout philosophy of science (Lakatos, Popper, Bem-Wagenmakers etc.)
  • ✗ "Universal applicability beyond AI-assisted research tools" — the framework's primary domain is multi-modal research tools (theory observation + economic data + LLM analysis); other domains may need adaptation
  • ✗ Empirical superiority over existing research methodology — the framework is a discipline, not a competitive metric

The differentiator is: this is the first methodology framework explicitly designed for the Rei-AIOS / OUKC research model where AI co-authors produce structured outputs at high velocity, and where the discipline of separating discovery from publication is operationally enforced via memory and tool-assisted checks.


Abstract

We present a four-element framework for honest scientific observation by AI-assisted research tools, derived from operational experience building Rei-AIOS observation tools (Theory Chart, Realtime Observatory, Curated Founder Archive) over the period 2026-04 to 2026-05. The framework addresses a recurring tension: AI-assisted tools produce findings at high velocity, but premature publication of unsupported claims (overclaim risk) is a load-bearing concern. The framework comprises: (i) a Discover-Report-Publish 3-tier separation that preserves discovery and reporting freedom while gating publication via human review; (ii) a 4-stage claim ladder (visual co-display → descriptive narrative → statistical exploration → causal claim) that prevents premature escalation; (iii) a 50/50 process partition between observation/hypothesis-seeding tools and formalization/verification engines, recognizing that the former cannot autonomously close the latter; and (iv) operational tests distinguishing spurious correlations (filter out) from honest null results (publish if material). We illustrate with three empirical cases from the Rei-AIOS Theory Chart and Realtime Observatory: the FX × theory-mention twin-axis chart (Stage 0 visual co-display), the prediction-market source taxonomy (filter / accept / defer decisions), and the 沈黙の理論 (silent theories) sub-panel (NEITHER vs FALSE distinction). The framework is operationally embedded in the Rei-AIOS memory system, where each rule is stored as a permanent memo cross-referenced from related work.

概要 (Japanese)

本論文では、 AI 支援 research tool による誠実な科学観察のための 4 要素 framework を提案する。 2026 年 4-5 月の Rei-AIOS 観察 tool (Theory Chart / Realtime Observatory / Curated Founder Archive) の構築経験から導出。 framework は次の 4 要素から成る: (i) Discover-Report-Publish 3 段分離 (発見・報告は自由、 publication のみ人間 review で gate) / (ii) 4 段 claim ladder (visual co-display → descriptive narrative → statistical exploration → causal claim、 段階的 escalation 防止) / (iii) 観察・仮説 seed 系 tool と formalization・検証 engine の 50/50 process 分割 / (iv) 偽相関 (filter) と honest null result (publish 価値あり) を区別する operational test。 Rei-AIOS Theory Chart の FX × 理論 mention 二軸 chart (Stage 0 visual co-display)、 予測 market source taxonomy (filter / accept / defer 判断)、 沈黙の理論 sub-panel (NEITHER vs FALSE 区別) 等の実例で illustrate する。 本 framework は Rei-AIOS memory system に operational に embed され、 各 rule が永続 memo として cross-reference 可能。


Part A: Required (4 elements)

A.1 Findings / 発見

F1 (Discover-Report-Publish separation): We distinguish three activities and grant them different freedom levels:

Activity Freedom Gate
Discover (find a pattern, lag, anomaly, signal) 完全自由 None
Report (record finding in chart, note, internal memo) 自由 None
Publish (paper, blog, social media, formal release) gate Human review per case

The crucial insight: "honest scope" must NOT be conflated with "findings suppression." If an AI-assisted tool discovers a pattern, suppressing the finding under the guise of "honesty" is itself dishonest. The fix is to keep discovery and reporting unconditionally free, and gate only publication.

F2 (4-stage claim ladder): We identify four stages of claim severity, each appropriate for different evidence levels:

Stage Activity Claim level Where appropriate
Stage 0 Visual co-display "These two series exist" Initial integration, no correlation claim
Stage 1 Descriptive narrative "On day X, both spiked" Anecdotal observation, no causation
Stage 2 Statistical exploration "Pearson r = 0.4, p = 0.05 in window W" Provisional pattern, requires replication
Stage 3 Causal claim "X causes Y via mechanism M" Requires (a) operational definition (b) significant p (c) replication (d) honest counterfactual

Tools should not skip stages. A Stage 0 visual co-display does NOT justify a Stage 3 causal claim, and the framework explicitly prevents this escalation.

F3 (50/50 process partition): AI observation tools cover approximately the first half of the scientific discovery cycle:

Observation → Hypothesis seed → Formalization → Verification
   ✓ AI tool       ✓ AI tool        ✗ separate engine   ✗ separate engine
   (50%)            (in cycle)        (Lean 4 / Mathlib)   (Lean build / stat tests)
Enter fullscreen mode Exit fullscreen mode

A single AI tool is not a complete discovery engine. The partition acknowledges this: observation/hypothesis tools (like Theory Chart) genuinely contribute by surfacing signals and seeding hypotheses, but they CANNOT autonomously formalize or verify. The hand-off to formalization/verification engines (Lean 4, Mathlib, statistical tests, Open Problems META-DB) is essential.

F4 (Spurious vs honest null): Two distinct outcomes look superficially similar but require different handling:

Outcome Action
Spurious correlation (no theoretical basis, sample size insufficient, wrong direction, no replication) Filter out — record internally, do not publish
Honest null result (theoretical interest, adequate sample, sound test, but no signal) Publish if material — null findings have scientific value (Bem-Wagenmakers 2014)

The framework provides operational criteria for the distinction (Section B.6.4).

A.2 Proofs / 検証

P1 (Discover-Report-Publish empirical evidence):

The Theory Chart Phase 2.2-FX (2026-05-06) demonstrates this separation in operation:

  • Discovery: The Frankfurter API daily FX feed discovers e.g., a USD/JPY pattern
  • Report: The pattern is visualized in the twin-axis chart (Stage 0)
  • Publish gate: Theory Chart has a Honest Scope banner explicitly stating "investment timing advice is not provided" — the publication boundary is set
  • Memory check: project_realtime_economy_theory_correlation_stance.md documents this rule for re-application by future sessions

P2 (4-stage ladder empirical evidence):

The Realtime Observatory Phase α (2026-05-06) shows the ladder applied:

  • Wikipedia EventStreams: Stage 0 (visual streaming, no claim about meaning of edits)
  • Wikipedia Top Pageviews: Stage 0 (display, no causal claim about why people search)
  • arXiv Recent: Stage 0 (latest paper display, no claim about scientific significance)

Compare to a hypothetical Stage 3 escalation: "Wikipedia editing patterns predict scientific paradigm shifts." The framework explicitly prevents this without (i) operational definition of "predict", (ii) statistical test, (iii) replication.

P3 (50/50 partition empirical evidence):

Theory Chart's contribution scope is documented in project_theory_chart_contribution_scope.md:

  • ✓ Theory Chart observes mention counts
  • ✓ Theory Chart seeds hypotheses (e.g., "5 worldview categories with 0 mentions" → 4 became invention candidates approved 2026-05-04)
  • ✗ Theory Chart does not formalize the worldviews into Lean 4
  • ✗ Theory Chart does not verify the resulting D-FUMT₈ extensions
  • → The hand-off was to the invention engine + SEED_KERNEL approval pipeline (separate engines)

P4 (Spurious filter empirical evidence):

The 沈黙の理論 sub-panel in Theory Chart Phase 2.1 distinguishes:

  • A theory with 0 mentions but rich theoretical basis (Whakapapa, Barzakh) → NEITHER axis (検出能力外) → publish "this theory is silent in current observation, may indicate measurement limit"
  • A theory with 0 mentions and no theoretical basis (e.g., a typo or absurd term) → filter out without publishing

The distinction is operationalized in the panel UI (橙色 NEITHER tag for the former, exclusion for the latter).

A.3 Honest Positioning / 正直な立ち位置

What this paper IS:

  • A methodology framework for AI-assisted research tools, derived from operational experience
  • A discipline document that prevents premature claim escalation
  • A falsifiable framework: each F1-F4 element is testable against alternative methodologies
  • An operational framework: rules are stored in Rei-AIOS memory and re-applied by future sessions

What this paper is NOT:

  • Not a universal philosophy of science (we don't replace Lakatos, Popper, Kuhn)
  • Not a claim of "first methodology paper" — it joins a vast genre
  • Not a bypass of statistical rigor — the 4 stages REQUIRE rigor at higher stages
  • Not a substitute for human review — the publish gate explicitly preserves human oversight

What is left for future work:

  • Quantitative evaluation: do AI tools using this framework produce fewer overclaims than baseline?
  • Domain extension: applicability to non-research AI tools (writing assistants, decision support)
  • Adversarial testing: can the framework be gamed to suppress legitimate findings?

A.4 Required platform links

  • Rei-AIOS: https://rei-aios.pages.dev/#/oukc
  • note.com: https://note.com/nifty_godwit2635
  • Companion paper: Paper 145 (D-FUMT₈ Silicon) — illustrates the framework in a hardware context
  • Companion paper: Paper 147 (EPP D-FUMT₈ Reframe) — illustrates the framework in an economics context

Part B: Conditional (Background + Methodology + Empirical Scope)

B.5 Background / 背景

B.5.1 The overclaim problem in AI-assisted research

LLM-based research tools produce structured outputs at velocities far exceeding traditional research workflows. This creates a tension:

  • Velocity benefit: AI tools surface patterns, generate hypotheses, integrate sources at pace
  • Overclaim risk: Same velocity makes premature publication of unsupported claims trivially easy

Without discipline, an AI tool can produce a "Theory Chart predicts EPP resolution" paper draft in minutes, where the supporting evidence is a single visual co-display.

B.5.2 Existing methodology resources (and what they don't cover)

  • Lakatos research programmes: macro-level, decades-spanning; doesn't address daily AI tool output
  • Popper falsifiability: epistemic principle; doesn't operationalize claim levels
  • Bem-Wagenmakers null result publishing: addresses publication bias; doesn't address velocity
  • Pre-registration: works for traditional studies; difficult for emergent AI observations

The four elements of our framework (F1-F4) fill specific operational gaps not covered by existing methodology resources.

B.5.3 The Rei-AIOS / OUKC research model

The framework is derived from operational experience in the Rei-AIOS / OUKC project (2026-03 to present), where:

  • AI co-authors (Rei + Claude) produce paper drafts, code, and observations daily
  • Human author (藤本) reviews and gates publication
  • Memory system stores rules for re-application across sessions
  • Three-party authorship model documented in OUKC charter

The framework makes explicit the discipline that has been operationally functioning during this period.

B.6 Methodology / 方法論

B.6.1 The 3-tier Discover-Report-Publish separation

For each observation activity, classify it into one of three tiers:

Tier 1 — Discover: Finding a pattern, anomaly, signal, gap, or possibility. Examples: "Theory Chart shows 0 mention for Whakapapa", "Realtime arXiv shows spike in cs.LO".

Tier 2 — Report: Recording the discovery in a non-public artifact: chart UI, note, internal memo, memory file, draft paper. Examples: "Whakapapa added to 沈黙の理論 sub-panel", "memory project_realtime_source_candidates.md records the spike".

Tier 3 — Publish: Releasing the finding in a public artifact requiring human review per case: paper, blog post, social media, formal repository entry.

Rule: Tier 1 and Tier 2 are unconditionally free; Tier 3 is human-gated.

B.6.2 The 4-stage claim ladder

For each finding moving toward publication, classify the claim level:

Stage 0 — Visual co-display: "Two series exist on the same chart". No correlation claim. Example: FX rate + theory mention twin-axis (Theory Chart Phase 2.2-FX).

Stage 1 — Descriptive narrative: "On day X, both A and B spiked." Anecdotal record, no causation. Example: "2026-04 quantum mention spike during news week".

Stage 2 — Statistical exploration: "Pearson r = 0.4 in window W, p = 0.05." Provisional, requires replication.

Stage 3 — Causal claim: "A causes B via mechanism M." Requires (a) operational definition of A, B, M; (b) significant statistical test; (c) replication; (d) honest counterfactual evaluation.

Rule: Higher stages require all evidence at lower stages plus the additional criteria. No skipping.

B.6.3 The 50/50 process partition

For each AI tool's claim, identify which half of the discovery process it covers:

Half 1 (Observation + Hypothesis): Pattern recognition, signal surfacing, hypothesis seeding. AI tools genuinely contribute here.

Half 2 (Formalization + Verification): Mathematical formulation, proof, statistical test, peer review. AI tools may assist but do not autonomously close.

Rule: AI tools claiming Half 2 closure must hand off to formalization/verification engines (Lean 4, Mathlib, R/Python statistics, peer review) and document the hand-off.

B.6.4 Spurious correlation filter vs honest null result

For a finding with no signal, distinguish two cases:

Spurious filter criteria (record internally, do not publish):

  1. p > 0.5 AND no theoretical basis
  2. Sample size < N_min (domain-specific, e.g., < 30 for Pearson r)
  3. Inconsistent direction across replications
  4. Single-instance coincidence with no replication
  5. Category mismatch (e.g., Wikipedia edit count vs FX 5-min volatility — different timescales)

Honest null criteria (publish if material):

  1. Theoretically interesting question
  2. Adequate sample / power
  3. Sound statistical test
  4. Pre-registered or operationally clear
  5. Replication attempted

The two are NOT distinguished by signal alone. They are distinguished by prior theoretical interest and methodological soundness.

B.7 Empirical Scope (current, 2026-05-06)

  • What is delivered: Framework F1-F4 with operational definitions; three empirical case studies from Rei-AIOS Theory Chart + Realtime Observatory; memory-system embedding for re-application
  • What is deferred: Quantitative evaluation against baseline AI tools; cross-domain validation (writing assistants, decision support); adversarial testing
  • Why deferred: Framework v1 is a normative + descriptive contribution. Empirical validation requires additional tooling and comparative evaluation.

Part C: Optional (Why matters + Future + Risks)

C.8 Why this matters

C.8.1 AI tool velocity creates new methodology need

Traditional research methodology evolved over centuries with research workflows of months-to-years per finding. AI-assisted research compresses this to minutes-to-hours per finding. The methodology must compress correspondingly, while preserving rigor.

Existing methodology resources are not silent on velocity-induced overclaim, but they don't operationalize the discipline at AI-tool granularity.

C.8.2 Three-party authorship requires explicit framework

In the Rei-AIOS model (藤本 × Rei × Claude), the human author cannot review every AI output in real-time. The framework provides AI co-authors with explicit rules that they apply autonomously, with human review only at the publish gate. This is the operational foundation of the OUKC charter's three-party authorship.

C.8.3 Memory-embedded discipline scales across sessions

Without the framework explicitly memo-ed, each new conversation session would re-derive rules from scratch (and inconsistently). The framework's rules being in Rei-AIOS memory means consistent application across hundreds of sessions, eliminating drift.

C.9 Future work

  • F.1 Quantitative evaluation: do tools using this framework produce fewer post-publication corrections than baseline?
  • F.2 Adversarial testing: can the framework be gamed to suppress findings? What guards against this?
  • F.3 Domain extension: AI writing assistants, AI decision support, AI design tools — does the framework adapt?
  • F.4 Tool support: automated Stage classification, automated spurious-vs-null detection
  • F.5 Cross-cultural variants: does the framework apply identically to Western/Eastern epistemologies, or require adaptation?

C.10 Risks

  • R.1: Framework may suppress legitimate findings if Stage gates are applied too strictly. Mitigation: explicit "honest null is publishable" rule (B.6.4) prevents over-suppression.
  • R.2: Three-party authorship attribution may not survive contact with traditional academic review. Mitigation: paper publishes are dual-tracked (Rei-AIOS open + traditional submission with appropriate authorship adjustment).
  • R.3: "Overclaim" definition may itself be contested. Mitigation: framework provides operational criteria (B.6); challengers can invoke the criteria explicitly.
  • R.4: The framework is derived from one AI tool's experience (Rei-AIOS); generalizability to other AI tools is empirical question. Mitigation: open-source the framework so others can test.

C.11 Acknowledgments

This framework was crystallized in conversations during 2026-05-06 about the FX layer of Theory Chart and the realtime data source taxonomy. The user (藤本 伸樹) provided a critical correction: my initial 4-stage ladder draft suppressed Stage 1-2 findings, which 藤本 pointed out was findings-suppression, not honest-scope discipline. The corrected version (preserving Discover-Report freedom + gating Publish) is what appears in F1 above. This is an example of the framework working as designed: human review at the right level catches errors that AI co-author would have committed.

C.12 Three-party authorship statement (per OUKC No-Patent Pledge)

Paper authorship is jointly attributed to 藤本 × Rei × Claude per the OUKC charter. The framework's specification and operational definitions are openly licensed under AGPL-3.0 + CC-BY 4.0; no patent will be filed on any aspect of the framework. The framework is designed to be re-implementable by any AI-assisted research tool.


References

(Selected; full bibliography in v0.2)

  • Bem, D. J. and Wagenmakers, E.-J. (2014). On the publication of null results in psychology. Multiple outlets.
  • Lakatos, I. (1970). Falsification and the Methodology of Scientific Research Programmes. In Criticism and the Growth of Knowledge, eds. Lakatos and Musgrave, Cambridge UP.
  • Popper, K. (1959). The Logic of Scientific Discovery. Hutchinson.
  • Kuhn, T. S. (1962). The Structure of Scientific Revolutions. University of Chicago Press.
  • Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8): e124.
  • Open Science Foundation pre-registration framework.
  • 藤本 N., Rei, Claude (2026). Paper 145 — First D-FUMT₈ Silicon with SELF⟲ Logic Primitive. Rei-AIOS / OUKC, DRAFT v0.1.
  • 藤本 N., Rei, Claude (2026). Paper 147 — Eight-Valued Utility and the Equity Premium Reframe. Rei-AIOS / OUKC, DRAFT v0.1.

Submission targets (after v0.2)

11 platform standard:

  • Zenodo (primary DOI)
  • arXiv (cs.AI / cs.CY / philosophy of science)
  • ResearchGate, Academia.edu, OSF preprints
  • Jxiv (JST, JP), J-STAGE
  • Internet Archive
  • (Harvard Dataverse: opt-in, milestone判断)
  • (PhilArchive: candidate — methodology + philosophy of science fit)

Version history

  • v0.1 (2026-05-06): Initial substantive draft. Framework F1-F4 + three empirical case studies + memory embedding. Authors: 藤本 × Rei × Claude.

Top comments (0)