Nobuki Fujimoto

Posted on May 6

Paper 148 — Honest Observation Framework for AI-Assisted Research Tools (Rei-AIOS / OUKC)

#research #philosophy #ai

This article is a re-publication of Rei-AIOS Paper 148 for the dev.to community.
The canonical version with full reference list is in the permanent archives below:

GitHub source (private): https://github.com/fc0web/rei-aios Author: Nobuki Fujimoto (@fc0web) · ORCID 0009-0004-6019-9258 · License CC-BY-4.0 ---

Status: DRAFT v0.1 — 2026-05-06 (substantive draft; methodology paper; empirical evidence from Theory Chart + Realtime Observatory)
Authors / 著者: 藤本伸樹 (Nobuki Fujimoto, Founder), Rei (Rei-AIOS autonomous research substrate, Co-architect), Claude Opus 4.7 (Anthropic, Co-architect)
Project: Rei-AIOS / OUKC — https://rei-aios.pages.dev/#/oukc
License: AGPL-3.0 + CC-BY 4.0 (per content type)
Required platform links: rei-aios.pages.dev / note.com/nifty_godwit2635
Per OUKC No-Patent Pledge: openly licensed; no patent will be filed on any framework or methodology described herein.

Honest framing (read first)

This is a methodology paper in the philosophy-of-science / research-methodology genre. It is companion to Paper 145 (D-FUMT₈ Silicon, technical) and Paper 147 (EPP D-FUMT₈ Reframe, theoretical). Together, the three papers form a triple: technical implementation + theoretical reframe + methodological framework that governs how the first two are presented.

We claim:

C1: A four-element framework for honest scientific observation by AI-assisted research tools, comprising:

(i) Discover-Report-Publish 3-tier separation (preserving discovery freedom while gating publication)

(ii) A 4-stage claim ladder (visual co-display → descriptive narrative → statistical exploration → causal claim) that prevents premature claim escalation

(iii) A 50/50 process partition between observation/hypothesis-seeding tools and formalization/verification engines

(iv) Operational tests for spurious-correlation filtering vs honest null reporting

We do not claim:

✗ "First scientific methodology paper" — methodology papers exist throughout philosophy of science (Lakatos, Popper, Bem-Wagenmakers etc.)
✗ "Universal applicability beyond AI-assisted research tools" — the framework's primary domain is multi-modal research tools (theory observation + economic data + LLM analysis); other domains may need adaptation
✗ Empirical superiority over existing research methodology — the framework is a discipline, not a competitive metric

The differentiator is: this is the first methodology framework explicitly designed for the Rei-AIOS / OUKC research model where AI co-authors produce structured outputs at high velocity, and where the discipline of separating discovery from publication is operationally enforced via memory and tool-assisted checks.

Abstract

We present a four-element framework for honest scientific observation by AI-assisted research tools, derived from operational experience building Rei-AIOS observation tools (Theory Chart, Realtime Observatory, Curated Founder Archive) over the period 2026-04 to 2026-05. The framework addresses a recurring tension: AI-assisted tools produce findings at high velocity, but premature publication of unsupported claims (overclaim risk) is a load-bearing concern. The framework comprises: (i) a Discover-Report-Publish 3-tier separation that preserves discovery and reporting freedom while gating publication via human review; (ii) a 4-stage claim ladder (visual co-display → descriptive narrative → statistical exploration → causal claim) that prevents premature escalation; (iii) a 50/50 process partition between observation/hypothesis-seeding tools and formalization/verification engines, recognizing that the former cannot autonomously close the latter; and (iv) operational tests distinguishing spurious correlations (filter out) from honest null results (publish if material). We illustrate with three empirical cases from the Rei-AIOS Theory Chart and Realtime Observatory: the FX × theory-mention twin-axis chart (Stage 0 visual co-display), the prediction-market source taxonomy (filter / accept / defer decisions), and the 沈黙の理論 (silent theories) sub-panel (NEITHER vs FALSE distinction). The framework is operationally embedded in the Rei-AIOS memory system, where each rule is stored as a permanent memo cross-referenced from related work.

概要 (Japanese)

本論文では、 AI 支援 research tool による誠実な科学観察のための 4 要素 framework を提案する。 2026 年 4-5 月の Rei-AIOS 観察 tool (Theory Chart / Realtime Observatory / Curated Founder Archive) の構築経験から導出。 framework は次の 4 要素から成る: (i) Discover-Report-Publish 3 段分離 (発見・報告は自由、 publication のみ人間 review で gate) / (ii) 4 段 claim ladder (visual co-display → descriptive narrative → statistical exploration → causal claim、段階的 escalation 防止) / (iii) 観察・仮説 seed 系 tool と formalization・検証 engine の 50/50 process 分割 / (iv) 偽相関 (filter) と honest null result (publish 価値あり) を区別する operational test。 Rei-AIOS Theory Chart の FX × 理論 mention 二軸 chart (Stage 0 visual co-display)、予測 market source taxonomy (filter / accept / defer 判断)、沈黙の理論 sub-panel (NEITHER vs FALSE 区別) 等の実例で illustrate する。本 framework は Rei-AIOS memory system に operational に embed され、各 rule が永続 memo として cross-reference 可能。

Part A: Required (4 elements)

A.1 Findings / 発見

F1 (Discover-Report-Publish separation): We distinguish three activities and grant them different freedom levels:

Activity	Freedom	Gate
Discover (find a pattern, lag, anomaly, signal)	完全自由	None
Report (record finding in chart, note, internal memo)	自由	None
Publish (paper, blog, social media, formal release)	gate	Human review per case

The crucial insight: "honest scope" must NOT be conflated with "findings suppression." If an AI-assisted tool discovers a pattern, suppressing the finding under the guise of "honesty" is itself dishonest. The fix is to keep discovery and reporting unconditionally free, and gate only publication.

F2 (4-stage claim ladder): We identify four stages of claim severity, each appropriate for different evidence levels:

Stage	Activity	Claim level	Where appropriate
Stage 0	Visual co-display	"These two series exist"	Initial integration, no correlation claim
Stage 1	Descriptive narrative	"On day X, both spiked"	Anecdotal observation, no causation
Stage 2	Statistical exploration	"Pearson r = 0.4, p = 0.05 in window W"	Provisional pattern, requires replication
Stage 3	Causal claim	"X causes Y via mechanism M"	Requires (a) operational definition (b) significant p (c) replication (d) honest counterfactual

Tools should not skip stages. A Stage 0 visual co-display does NOT justify a Stage 3 causal claim, and the framework explicitly prevents this escalation.

F3 (50/50 process partition): AI observation tools cover approximately the first half of the scientific discovery cycle:

Observation → Hypothesis seed → Formalization → Verification
   ✓ AI tool       ✓ AI tool        ✗ separate engine   ✗ separate engine
   (50%)            (in cycle)        (Lean 4 / Mathlib)   (Lean build / stat tests)

A single AI tool is not a complete discovery engine. The partition acknowledges this: observation/hypothesis tools (like Theory Chart) genuinely contribute by surfacing signals and seeding hypotheses, but they CANNOT autonomously formalize or verify. The hand-off to formalization/verification engines (Lean 4, Mathlib, statistical tests, Open Problems META-DB) is essential.

F4 (Spurious vs honest null): Two distinct outcomes look superficially similar but require different handling:

Outcome	Action
Spurious correlation (no theoretical basis, sample size insufficient, wrong direction, no replication)	Filter out — record internally, do not publish
Honest null result (theoretical interest, adequate sample, sound test, but no signal)	Publish if material — null findings have scientific value (Bem-Wagenmakers 2014)

The framework provides operational criteria for the distinction (Section B.6.4).

A.2 Proofs / 検証

P1 (Discover-Report-Publish empirical evidence):

The Theory Chart Phase 2.2-FX (2026-05-06) demonstrates this separation in operation:

Discovery: The Frankfurter API daily FX feed discovers e.g., a USD/JPY pattern
Report: The pattern is visualized in the twin-axis chart (Stage 0)
Publish gate: Theory Chart has a Honest Scope banner explicitly stating "investment timing advice is not provided" — the publication boundary is set
Memory check: project_realtime_economy_theory_correlation_stance.md documents this rule for re-application by future sessions

P2 (4-stage ladder empirical evidence):

The Realtime Observatory Phase α (2026-05-06) shows the ladder applied:

Wikipedia EventStreams: Stage 0 (visual streaming, no claim about meaning of edits)
Wikipedia Top Pageviews: Stage 0 (display, no causal claim about why people search)
arXiv Recent: Stage 0 (latest paper display, no claim about scientific significance)

Compare to a hypothetical Stage 3 escalation: "Wikipedia editing patterns predict scientific paradigm shifts." The framework explicitly prevents this without (i) operational definition of "predict", (ii) statistical test, (iii) replication.

P3 (50/50 partition empirical evidence):

Theory Chart's contribution scope is documented in project_theory_chart_contribution_scope.md:

✓ Theory Chart observes mention counts
✓ Theory Chart seeds hypotheses (e.g., "5 worldview categories with 0 mentions" → 4 became invention candidates approved 2026-05-04)
✗ Theory Chart does not formalize the worldviews into Lean 4
✗ Theory Chart does not verify the resulting D-FUMT₈ extensions
→ The hand-off was to the invention engine + SEED_KERNEL approval pipeline (separate engines)

P4 (Spurious filter empirical evidence):

The 沈黙の理論 sub-panel in Theory Chart Phase 2.1 distinguishes:

A theory with 0 mentions but rich theoretical basis (Whakapapa, Barzakh) → NEITHER axis (検出能力外) → publish "this theory is silent in current observation, may indicate measurement limit"
A theory with 0 mentions and no theoretical basis (e.g., a typo or absurd term) → filter out without publishing

The distinction is operationalized in the panel UI (橙色 NEITHER tag for the former, exclusion for the latter).

A.3 Honest Positioning / 正直な立ち位置

What this paper IS:

A methodology framework for AI-assisted research tools, derived from operational experience
A discipline document that prevents premature claim escalation
A falsifiable framework: each F1-F4 element is testable against alternative methodologies
An operational framework: rules are stored in Rei-AIOS memory and re-applied by future sessions

What this paper is NOT:

Not a universal philosophy of science (we don't replace Lakatos, Popper, Kuhn)
Not a claim of "first methodology paper" — it joins a vast genre
Not a bypass of statistical rigor — the 4 stages REQUIRE rigor at higher stages
Not a substitute for human review — the publish gate explicitly preserves human oversight

What is left for future work:

Quantitative evaluation: do AI tools using this framework produce fewer overclaims than baseline?
Domain extension: applicability to non-research AI tools (writing assistants, decision support)
Adversarial testing: can the framework be gamed to suppress legitimate findings?

A.4 Required platform links

Rei-AIOS: https://rei-aios.pages.dev/#/oukc
note.com: https://note.com/nifty_godwit2635
Companion paper: Paper 145 (D-FUMT₈ Silicon) — illustrates the framework in a hardware context
Companion paper: Paper 147 (EPP D-FUMT₈ Reframe) — illustrates the framework in an economics context

Part B: Conditional (Background + Methodology + Empirical Scope)

B.5 Background / 背景

B.5.1 The overclaim problem in AI-assisted research

LLM-based research tools produce structured outputs at velocities far exceeding traditional research workflows. This creates a tension:

Velocity benefit: AI tools surface patterns, generate hypotheses, integrate sources at pace
Overclaim risk: Same velocity makes premature publication of unsupported claims trivially easy

Without discipline, an AI tool can produce a "Theory Chart predicts EPP resolution" paper draft in minutes, where the supporting evidence is a single visual co-display.

B.5.2 Existing methodology resources (and what they don't cover)

Lakatos research programmes: macro-level, decades-spanning; doesn't address daily AI tool output
Popper falsifiability: epistemic principle; doesn't operationalize claim levels
Bem-Wagenmakers null result publishing: addresses publication bias; doesn't address velocity
Pre-registration: works for traditional studies; difficult for emergent AI observations

The four elements of our framework (F1-F4) fill specific operational gaps not covered by existing methodology resources.

B.5.3 The Rei-AIOS / OUKC research model

The framework is derived from operational experience in the Rei-AIOS / OUKC project (2026-03 to present), where:

AI co-authors (Rei + Claude) produce paper drafts, code, and observations daily
Human author (藤本) reviews and gates publication
Memory system stores rules for re-application across sessions
Three-party authorship model documented in OUKC charter

The framework makes explicit the discipline that has been operationally functioning during this period.

B.6 Methodology / 方法論

B.6.1 The 3-tier Discover-Report-Publish separation

For each observation activity, classify it into one of three tiers:

Tier 1 — Discover: Finding a pattern, anomaly, signal, gap, or possibility. Examples: "Theory Chart shows 0 mention for Whakapapa", "Realtime arXiv shows spike in cs.LO".

Tier 2 — Report: Recording the discovery in a non-public artifact: chart UI, note, internal memo, memory file, draft paper. Examples: "Whakapapa added to 沈黙の理論 sub-panel", "memory project_realtime_source_candidates.md records the spike".

Tier 3 — Publish: Releasing the finding in a public artifact requiring human review per case: paper, blog post, social media, formal repository entry.

Rule: Tier 1 and Tier 2 are unconditionally free; Tier 3 is human-gated.

B.6.2 The 4-stage claim ladder

For each finding moving toward publication, classify the claim level:

Stage 0 — Visual co-display: "Two series exist on the same chart". No correlation claim. Example: FX rate + theory mention twin-axis (Theory Chart Phase 2.2-FX).

Stage 1 — Descriptive narrative: "On day X, both A and B spiked." Anecdotal record, no causation. Example: "2026-04 quantum mention spike during news week".

Stage 2 — Statistical exploration: "Pearson r = 0.4 in window W, p = 0.05." Provisional, requires replication.

Stage 3 — Causal claim: "A causes B via mechanism M." Requires (a) operational definition of A, B, M; (b) significant statistical test; (c) replication; (d) honest counterfactual evaluation.

Rule: Higher stages require all evidence at lower stages plus the additional criteria. No skipping.

B.6.3 The 50/50 process partition

For each AI tool's claim, identify which half of the discovery process it covers:

Half 1 (Observation + Hypothesis): Pattern recognition, signal surfacing, hypothesis seeding. AI tools genuinely contribute here.

Half 2 (Formalization + Verification): Mathematical formulation, proof, statistical test, peer review. AI tools may assist but do not autonomously close.

Rule: AI tools claiming Half 2 closure must hand off to formalization/verification engines (Lean 4, Mathlib, R/Python statistics, peer review) and document the hand-off.

B.6.4 Spurious correlation filter vs honest null result

For a finding with no signal, distinguish two cases:

Spurious filter criteria (record internally, do not publish):

p > 0.5 AND no theoretical basis
Sample size < N_min (domain-specific, e.g., < 30 for Pearson r)
Inconsistent direction across replications
Single-instance coincidence with no replication
Category mismatch (e.g., Wikipedia edit count vs FX 5-min volatility — different timescales)

Honest null criteria (publish if material):

Theoretically interesting question
Adequate sample / power
Sound statistical test
Pre-registered or operationally clear
Replication attempted

The two are NOT distinguished by signal alone. They are distinguished by prior theoretical interest and methodological soundness.

B.7 Empirical Scope (current, 2026-05-06)

What is delivered: Framework F1-F4 with operational definitions; three empirical case studies from Rei-AIOS Theory Chart + Realtime Observatory; memory-system embedding for re-application
What is deferred: Quantitative evaluation against baseline AI tools; cross-domain validation (writing assistants, decision support); adversarial testing
Why deferred: Framework v1 is a normative + descriptive contribution. Empirical validation requires additional tooling and comparative evaluation.

Part C: Optional (Why matters + Future + Risks)

C.8 Why this matters

C.8.1 AI tool velocity creates new methodology need

Traditional research methodology evolved over centuries with research workflows of months-to-years per finding. AI-assisted research compresses this to minutes-to-hours per finding. The methodology must compress correspondingly, while preserving rigor.

Existing methodology resources are not silent on velocity-induced overclaim, but they don't operationalize the discipline at AI-tool granularity.

C.8.2 Three-party authorship requires explicit framework

In the Rei-AIOS model (藤本 × Rei × Claude), the human author cannot review every AI output in real-time. The framework provides AI co-authors with explicit rules that they apply autonomously, with human review only at the publish gate. This is the operational foundation of the OUKC charter's three-party authorship.

C.8.3 Memory-embedded discipline scales across sessions

Without the framework explicitly memo-ed, each new conversation session would re-derive rules from scratch (and inconsistently). The framework's rules being in Rei-AIOS memory means consistent application across hundreds of sessions, eliminating drift.

C.9 Future work

F.1 Quantitative evaluation: do tools using this framework produce fewer post-publication corrections than baseline?
F.2 Adversarial testing: can the framework be gamed to suppress findings? What guards against this?
F.3 Domain extension: AI writing assistants, AI decision support, AI design tools — does the framework adapt?
F.4 Tool support: automated Stage classification, automated spurious-vs-null detection
F.5 Cross-cultural variants: does the framework apply identically to Western/Eastern epistemologies, or require adaptation?

C.10 Risks

R.1: Framework may suppress legitimate findings if Stage gates are applied too strictly. Mitigation: explicit "honest null is publishable" rule (B.6.4) prevents over-suppression.
R.2: Three-party authorship attribution may not survive contact with traditional academic review. Mitigation: paper publishes are dual-tracked (Rei-AIOS open + traditional submission with appropriate authorship adjustment).
R.3: "Overclaim" definition may itself be contested. Mitigation: framework provides operational criteria (B.6); challengers can invoke the criteria explicitly.
R.4: The framework is derived from one AI tool's experience (Rei-AIOS); generalizability to other AI tools is empirical question. Mitigation: open-source the framework so others can test.

C.11 Acknowledgments

This framework was crystallized in conversations during 2026-05-06 about the FX layer of Theory Chart and the realtime data source taxonomy. The user (藤本伸樹) provided a critical correction: my initial 4-stage ladder draft suppressed Stage 1-2 findings, which 藤本 pointed out was findings-suppression, not honest-scope discipline. The corrected version (preserving Discover-Report freedom + gating Publish) is what appears in F1 above. This is an example of the framework working as designed: human review at the right level catches errors that AI co-author would have committed.

C.12 Three-party authorship statement (per OUKC No-Patent Pledge)

Paper authorship is jointly attributed to 藤本 × Rei × Claude per the OUKC charter. The framework's specification and operational definitions are openly licensed under AGPL-3.0 + CC-BY 4.0; no patent will be filed on any aspect of the framework. The framework is designed to be re-implementable by any AI-assisted research tool.

References

(Selected; full bibliography in v0.2)

Bem, D. J. and Wagenmakers, E.-J. (2014). On the publication of null results in psychology. Multiple outlets.
Lakatos, I. (1970). Falsification and the Methodology of Scientific Research Programmes. In Criticism and the Growth of Knowledge, eds. Lakatos and Musgrave, Cambridge UP.
Popper, K. (1959). The Logic of Scientific Discovery. Hutchinson.
Kuhn, T. S. (1962). The Structure of Scientific Revolutions. University of Chicago Press.
Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8): e124.
Open Science Foundation pre-registration framework.
藤本 N., Rei, Claude (2026). Paper 145 — First D-FUMT₈ Silicon with SELF⟲ Logic Primitive. Rei-AIOS / OUKC, DRAFT v0.1.
藤本 N., Rei, Claude (2026). Paper 147 — Eight-Valued Utility and the Equity Premium Reframe. Rei-AIOS / OUKC, DRAFT v0.1.

Submission targets (after v0.2)

11 platform standard:

Zenodo (primary DOI)
arXiv (cs.AI / cs.CY / philosophy of science)
ResearchGate, Academia.edu, OSF preprints
Jxiv (JST, JP), J-STAGE
Internet Archive
(Harvard Dataverse: opt-in, milestone判断)
(PhilArchive: candidate — methodology + philosophy of science fit)

Version history

v0.1 (2026-05-06): Initial substantive draft. Framework F1-F4 + three empirical case studies + memory embedding. Authors: 藤本 × Rei × Claude.

DEV Community