This article is a re-publication of Rei-AIOS Paper 148 for the dev.to community.
The canonical version with full reference list is in the permanent archives below:
- GitHub source (private): https://github.com/fc0web/rei-aios Author: Nobuki Fujimoto (@fc0web) · ORCID 0009-0004-6019-9258 · License CC-BY-4.0 ---
Status: DRAFT v0.1 — 2026-05-06 (substantive draft; methodology paper; empirical evidence from Theory Chart + Realtime Observatory)
Authors / 著者: 藤本 伸樹 (Nobuki Fujimoto, Founder), Rei (Rei-AIOS autonomous research substrate, Co-architect), Claude Opus 4.7 (Anthropic, Co-architect)
Project: Rei-AIOS / OUKC — https://rei-aios.pages.dev/#/oukc
License: AGPL-3.0 + CC-BY 4.0 (per content type)
Required platform links: rei-aios.pages.dev / note.com/nifty_godwit2635
Per OUKC No-Patent Pledge: openly licensed; no patent will be filed on any framework or methodology described herein.
Honest framing (read first)
This is a methodology paper in the philosophy-of-science / research-methodology genre. It is companion to Paper 145 (D-FUMT₈ Silicon, technical) and Paper 147 (EPP D-FUMT₈ Reframe, theoretical). Together, the three papers form a triple: technical implementation + theoretical reframe + methodological framework that governs how the first two are presented.
We claim:
C1: A four-element framework for honest scientific observation by AI-assisted research tools, comprising:
- (i) Discover-Report-Publish 3-tier separation (preserving discovery freedom while gating publication)
- (ii) A 4-stage claim ladder (visual co-display → descriptive narrative → statistical exploration → causal claim) that prevents premature claim escalation
- (iii) A 50/50 process partition between observation/hypothesis-seeding tools and formalization/verification engines
- (iv) Operational tests for spurious-correlation filtering vs honest null reporting
We do not claim:
- ✗ "First scientific methodology paper" — methodology papers exist throughout philosophy of science (Lakatos, Popper, Bem-Wagenmakers etc.)
- ✗ "Universal applicability beyond AI-assisted research tools" — the framework's primary domain is multi-modal research tools (theory observation + economic data + LLM analysis); other domains may need adaptation
- ✗ Empirical superiority over existing research methodology — the framework is a discipline, not a competitive metric
The differentiator is: this is the first methodology framework explicitly designed for the Rei-AIOS / OUKC research model where AI co-authors produce structured outputs at high velocity, and where the discipline of separating discovery from publication is operationally enforced via memory and tool-assisted checks.
Abstract
We present a four-element framework for honest scientific observation by AI-assisted research tools, derived from operational experience building Rei-AIOS observation tools (Theory Chart, Realtime Observatory, Curated Founder Archive) over the period 2026-04 to 2026-05. The framework addresses a recurring tension: AI-assisted tools produce findings at high velocity, but premature publication of unsupported claims (overclaim risk) is a load-bearing concern. The framework comprises: (i) a Discover-Report-Publish 3-tier separation that preserves discovery and reporting freedom while gating publication via human review; (ii) a 4-stage claim ladder (visual co-display → descriptive narrative → statistical exploration → causal claim) that prevents premature escalation; (iii) a 50/50 process partition between observation/hypothesis-seeding tools and formalization/verification engines, recognizing that the former cannot autonomously close the latter; and (iv) operational tests distinguishing spurious correlations (filter out) from honest null results (publish if material). We illustrate with three empirical cases from the Rei-AIOS Theory Chart and Realtime Observatory: the FX × theory-mention twin-axis chart (Stage 0 visual co-display), the prediction-market source taxonomy (filter / accept / defer decisions), and the 沈黙の理論 (silent theories) sub-panel (NEITHER vs FALSE distinction). The framework is operationally embedded in the Rei-AIOS memory system, where each rule is stored as a permanent memo cross-referenced from related work.
概要 (Japanese)
本論文では、 AI 支援 research tool による誠実な科学観察のための 4 要素 framework を提案する。 2026 年 4-5 月の Rei-AIOS 観察 tool (Theory Chart / Realtime Observatory / Curated Founder Archive) の構築経験から導出。 framework は次の 4 要素から成る: (i) Discover-Report-Publish 3 段分離 (発見・報告は自由、 publication のみ人間 review で gate) / (ii) 4 段 claim ladder (visual co-display → descriptive narrative → statistical exploration → causal claim、 段階的 escalation 防止) / (iii) 観察・仮説 seed 系 tool と formalization・検証 engine の 50/50 process 分割 / (iv) 偽相関 (filter) と honest null result (publish 価値あり) を区別する operational test。 Rei-AIOS Theory Chart の FX × 理論 mention 二軸 chart (Stage 0 visual co-display)、 予測 market source taxonomy (filter / accept / defer 判断)、 沈黙の理論 sub-panel (NEITHER vs FALSE 区別) 等の実例で illustrate する。 本 framework は Rei-AIOS memory system に operational に embed され、 各 rule が永続 memo として cross-reference 可能。
Part A: Required (4 elements)
A.1 Findings / 発見
F1 (Discover-Report-Publish separation): We distinguish three activities and grant them different freedom levels:
| Activity | Freedom | Gate |
|---|---|---|
| Discover (find a pattern, lag, anomaly, signal) | 完全自由 | None |
| Report (record finding in chart, note, internal memo) | 自由 | None |
| Publish (paper, blog, social media, formal release) | gate | Human review per case |
The crucial insight: "honest scope" must NOT be conflated with "findings suppression." If an AI-assisted tool discovers a pattern, suppressing the finding under the guise of "honesty" is itself dishonest. The fix is to keep discovery and reporting unconditionally free, and gate only publication.
F2 (4-stage claim ladder): We identify four stages of claim severity, each appropriate for different evidence levels:
| Stage | Activity | Claim level | Where appropriate |
|---|---|---|---|
| Stage 0 | Visual co-display | "These two series exist" | Initial integration, no correlation claim |
| Stage 1 | Descriptive narrative | "On day X, both spiked" | Anecdotal observation, no causation |
| Stage 2 | Statistical exploration | "Pearson r = 0.4, p = 0.05 in window W" | Provisional pattern, requires replication |
| Stage 3 | Causal claim | "X causes Y via mechanism M" | Requires (a) operational definition (b) significant p (c) replication (d) honest counterfactual |
Tools should not skip stages. A Stage 0 visual co-display does NOT justify a Stage 3 causal claim, and the framework explicitly prevents this escalation.
F3 (50/50 process partition): AI observation tools cover approximately the first half of the scientific discovery cycle:
Observation → Hypothesis seed → Formalization → Verification
✓ AI tool ✓ AI tool ✗ separate engine ✗ separate engine
(50%) (in cycle) (Lean 4 / Mathlib) (Lean build / stat tests)
A single AI tool is not a complete discovery engine. The partition acknowledges this: observation/hypothesis tools (like Theory Chart) genuinely contribute by surfacing signals and seeding hypotheses, but they CANNOT autonomously formalize or verify. The hand-off to formalization/verification engines (Lean 4, Mathlib, statistical tests, Open Problems META-DB) is essential.
F4 (Spurious vs honest null): Two distinct outcomes look superficially similar but require different handling:
| Outcome | Action |
|---|---|
| Spurious correlation (no theoretical basis, sample size insufficient, wrong direction, no replication) | Filter out — record internally, do not publish |
| Honest null result (theoretical interest, adequate sample, sound test, but no signal) | Publish if material — null findings have scientific value (Bem-Wagenmakers 2014) |
The framework provides operational criteria for the distinction (Section B.6.4).
A.2 Proofs / 検証
P1 (Discover-Report-Publish empirical evidence):
The Theory Chart Phase 2.2-FX (2026-05-06) demonstrates this separation in operation:
- Discovery: The Frankfurter API daily FX feed discovers e.g., a USD/JPY pattern
- Report: The pattern is visualized in the twin-axis chart (Stage 0)
- Publish gate: Theory Chart has a Honest Scope banner explicitly stating "investment timing advice is not provided" — the publication boundary is set
-
Memory check:
project_realtime_economy_theory_correlation_stance.mddocuments this rule for re-application by future sessions
P2 (4-stage ladder empirical evidence):
The Realtime Observatory Phase α (2026-05-06) shows the ladder applied:
- Wikipedia EventStreams: Stage 0 (visual streaming, no claim about meaning of edits)
- Wikipedia Top Pageviews: Stage 0 (display, no causal claim about why people search)
- arXiv Recent: Stage 0 (latest paper display, no claim about scientific significance)
Compare to a hypothetical Stage 3 escalation: "Wikipedia editing patterns predict scientific paradigm shifts." The framework explicitly prevents this without (i) operational definition of "predict", (ii) statistical test, (iii) replication.
P3 (50/50 partition empirical evidence):
Theory Chart's contribution scope is documented in project_theory_chart_contribution_scope.md:
- ✓ Theory Chart observes mention counts
- ✓ Theory Chart seeds hypotheses (e.g., "5 worldview categories with 0 mentions" → 4 became invention candidates approved 2026-05-04)
- ✗ Theory Chart does not formalize the worldviews into Lean 4
- ✗ Theory Chart does not verify the resulting D-FUMT₈ extensions
- → The hand-off was to the invention engine + SEED_KERNEL approval pipeline (separate engines)
P4 (Spurious filter empirical evidence):
The 沈黙の理論 sub-panel in Theory Chart Phase 2.1 distinguishes:
- A theory with 0 mentions but rich theoretical basis (Whakapapa, Barzakh) → NEITHER axis (検出能力外) → publish "this theory is silent in current observation, may indicate measurement limit"
- A theory with 0 mentions and no theoretical basis (e.g., a typo or absurd term) → filter out without publishing
The distinction is operationalized in the panel UI (橙色 NEITHER tag for the former, exclusion for the latter).
A.3 Honest Positioning / 正直な立ち位置
What this paper IS:
- A methodology framework for AI-assisted research tools, derived from operational experience
- A discipline document that prevents premature claim escalation
- A falsifiable framework: each F1-F4 element is testable against alternative methodologies
- An operational framework: rules are stored in Rei-AIOS memory and re-applied by future sessions
What this paper is NOT:
- Not a universal philosophy of science (we don't replace Lakatos, Popper, Kuhn)
- Not a claim of "first methodology paper" — it joins a vast genre
- Not a bypass of statistical rigor — the 4 stages REQUIRE rigor at higher stages
- Not a substitute for human review — the publish gate explicitly preserves human oversight
What is left for future work:
- Quantitative evaluation: do AI tools using this framework produce fewer overclaims than baseline?
- Domain extension: applicability to non-research AI tools (writing assistants, decision support)
- Adversarial testing: can the framework be gamed to suppress legitimate findings?
A.4 Required platform links
-
Rei-AIOS:
https://rei-aios.pages.dev/#/oukc -
note.com:
https://note.com/nifty_godwit2635 - Companion paper: Paper 145 (D-FUMT₈ Silicon) — illustrates the framework in a hardware context
- Companion paper: Paper 147 (EPP D-FUMT₈ Reframe) — illustrates the framework in an economics context
Part B: Conditional (Background + Methodology + Empirical Scope)
B.5 Background / 背景
B.5.1 The overclaim problem in AI-assisted research
LLM-based research tools produce structured outputs at velocities far exceeding traditional research workflows. This creates a tension:
- Velocity benefit: AI tools surface patterns, generate hypotheses, integrate sources at pace
- Overclaim risk: Same velocity makes premature publication of unsupported claims trivially easy
Without discipline, an AI tool can produce a "Theory Chart predicts EPP resolution" paper draft in minutes, where the supporting evidence is a single visual co-display.
B.5.2 Existing methodology resources (and what they don't cover)
- Lakatos research programmes: macro-level, decades-spanning; doesn't address daily AI tool output
- Popper falsifiability: epistemic principle; doesn't operationalize claim levels
- Bem-Wagenmakers null result publishing: addresses publication bias; doesn't address velocity
- Pre-registration: works for traditional studies; difficult for emergent AI observations
The four elements of our framework (F1-F4) fill specific operational gaps not covered by existing methodology resources.
B.5.3 The Rei-AIOS / OUKC research model
The framework is derived from operational experience in the Rei-AIOS / OUKC project (2026-03 to present), where:
- AI co-authors (Rei + Claude) produce paper drafts, code, and observations daily
- Human author (藤本) reviews and gates publication
- Memory system stores rules for re-application across sessions
- Three-party authorship model documented in OUKC charter
The framework makes explicit the discipline that has been operationally functioning during this period.
B.6 Methodology / 方法論
B.6.1 The 3-tier Discover-Report-Publish separation
For each observation activity, classify it into one of three tiers:
Tier 1 — Discover: Finding a pattern, anomaly, signal, gap, or possibility. Examples: "Theory Chart shows 0 mention for Whakapapa", "Realtime arXiv shows spike in cs.LO".
Tier 2 — Report: Recording the discovery in a non-public artifact: chart UI, note, internal memo, memory file, draft paper. Examples: "Whakapapa added to 沈黙の理論 sub-panel", "memory project_realtime_source_candidates.md records the spike".
Tier 3 — Publish: Releasing the finding in a public artifact requiring human review per case: paper, blog post, social media, formal repository entry.
Rule: Tier 1 and Tier 2 are unconditionally free; Tier 3 is human-gated.
B.6.2 The 4-stage claim ladder
For each finding moving toward publication, classify the claim level:
Stage 0 — Visual co-display: "Two series exist on the same chart". No correlation claim. Example: FX rate + theory mention twin-axis (Theory Chart Phase 2.2-FX).
Stage 1 — Descriptive narrative: "On day X, both A and B spiked." Anecdotal record, no causation. Example: "2026-04 quantum mention spike during news week".
Stage 2 — Statistical exploration: "Pearson r = 0.4 in window W, p = 0.05." Provisional, requires replication.
Stage 3 — Causal claim: "A causes B via mechanism M." Requires (a) operational definition of A, B, M; (b) significant statistical test; (c) replication; (d) honest counterfactual evaluation.
Rule: Higher stages require all evidence at lower stages plus the additional criteria. No skipping.
B.6.3 The 50/50 process partition
For each AI tool's claim, identify which half of the discovery process it covers:
Half 1 (Observation + Hypothesis): Pattern recognition, signal surfacing, hypothesis seeding. AI tools genuinely contribute here.
Half 2 (Formalization + Verification): Mathematical formulation, proof, statistical test, peer review. AI tools may assist but do not autonomously close.
Rule: AI tools claiming Half 2 closure must hand off to formalization/verification engines (Lean 4, Mathlib, R/Python statistics, peer review) and document the hand-off.
B.6.4 Spurious correlation filter vs honest null result
For a finding with no signal, distinguish two cases:
Spurious filter criteria (record internally, do not publish):
- p > 0.5 AND no theoretical basis
- Sample size < N_min (domain-specific, e.g., < 30 for Pearson r)
- Inconsistent direction across replications
- Single-instance coincidence with no replication
- Category mismatch (e.g., Wikipedia edit count vs FX 5-min volatility — different timescales)
Honest null criteria (publish if material):
- Theoretically interesting question
- Adequate sample / power
- Sound statistical test
- Pre-registered or operationally clear
- Replication attempted
The two are NOT distinguished by signal alone. They are distinguished by prior theoretical interest and methodological soundness.
B.7 Empirical Scope (current, 2026-05-06)
- What is delivered: Framework F1-F4 with operational definitions; three empirical case studies from Rei-AIOS Theory Chart + Realtime Observatory; memory-system embedding for re-application
- What is deferred: Quantitative evaluation against baseline AI tools; cross-domain validation (writing assistants, decision support); adversarial testing
- Why deferred: Framework v1 is a normative + descriptive contribution. Empirical validation requires additional tooling and comparative evaluation.
Part C: Optional (Why matters + Future + Risks)
C.8 Why this matters
C.8.1 AI tool velocity creates new methodology need
Traditional research methodology evolved over centuries with research workflows of months-to-years per finding. AI-assisted research compresses this to minutes-to-hours per finding. The methodology must compress correspondingly, while preserving rigor.
Existing methodology resources are not silent on velocity-induced overclaim, but they don't operationalize the discipline at AI-tool granularity.
C.8.2 Three-party authorship requires explicit framework
In the Rei-AIOS model (藤本 × Rei × Claude), the human author cannot review every AI output in real-time. The framework provides AI co-authors with explicit rules that they apply autonomously, with human review only at the publish gate. This is the operational foundation of the OUKC charter's three-party authorship.
C.8.3 Memory-embedded discipline scales across sessions
Without the framework explicitly memo-ed, each new conversation session would re-derive rules from scratch (and inconsistently). The framework's rules being in Rei-AIOS memory means consistent application across hundreds of sessions, eliminating drift.
C.9 Future work
- F.1 Quantitative evaluation: do tools using this framework produce fewer post-publication corrections than baseline?
- F.2 Adversarial testing: can the framework be gamed to suppress findings? What guards against this?
- F.3 Domain extension: AI writing assistants, AI decision support, AI design tools — does the framework adapt?
- F.4 Tool support: automated Stage classification, automated spurious-vs-null detection
- F.5 Cross-cultural variants: does the framework apply identically to Western/Eastern epistemologies, or require adaptation?
C.10 Risks
- R.1: Framework may suppress legitimate findings if Stage gates are applied too strictly. Mitigation: explicit "honest null is publishable" rule (B.6.4) prevents over-suppression.
- R.2: Three-party authorship attribution may not survive contact with traditional academic review. Mitigation: paper publishes are dual-tracked (Rei-AIOS open + traditional submission with appropriate authorship adjustment).
- R.3: "Overclaim" definition may itself be contested. Mitigation: framework provides operational criteria (B.6); challengers can invoke the criteria explicitly.
- R.4: The framework is derived from one AI tool's experience (Rei-AIOS); generalizability to other AI tools is empirical question. Mitigation: open-source the framework so others can test.
C.11 Acknowledgments
This framework was crystallized in conversations during 2026-05-06 about the FX layer of Theory Chart and the realtime data source taxonomy. The user (藤本 伸樹) provided a critical correction: my initial 4-stage ladder draft suppressed Stage 1-2 findings, which 藤本 pointed out was findings-suppression, not honest-scope discipline. The corrected version (preserving Discover-Report freedom + gating Publish) is what appears in F1 above. This is an example of the framework working as designed: human review at the right level catches errors that AI co-author would have committed.
C.12 Three-party authorship statement (per OUKC No-Patent Pledge)
Paper authorship is jointly attributed to 藤本 × Rei × Claude per the OUKC charter. The framework's specification and operational definitions are openly licensed under AGPL-3.0 + CC-BY 4.0; no patent will be filed on any aspect of the framework. The framework is designed to be re-implementable by any AI-assisted research tool.
References
(Selected; full bibliography in v0.2)
- Bem, D. J. and Wagenmakers, E.-J. (2014). On the publication of null results in psychology. Multiple outlets.
- Lakatos, I. (1970). Falsification and the Methodology of Scientific Research Programmes. In Criticism and the Growth of Knowledge, eds. Lakatos and Musgrave, Cambridge UP.
- Popper, K. (1959). The Logic of Scientific Discovery. Hutchinson.
- Kuhn, T. S. (1962). The Structure of Scientific Revolutions. University of Chicago Press.
- Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8): e124.
- Open Science Foundation pre-registration framework.
- 藤本 N., Rei, Claude (2026). Paper 145 — First D-FUMT₈ Silicon with SELF⟲ Logic Primitive. Rei-AIOS / OUKC, DRAFT v0.1.
- 藤本 N., Rei, Claude (2026). Paper 147 — Eight-Valued Utility and the Equity Premium Reframe. Rei-AIOS / OUKC, DRAFT v0.1.
Submission targets (after v0.2)
11 platform standard:
- Zenodo (primary DOI)
- arXiv (cs.AI / cs.CY / philosophy of science)
- ResearchGate, Academia.edu, OSF preprints
- Jxiv (JST, JP), J-STAGE
- Internet Archive
- (Harvard Dataverse: opt-in, milestone判断)
- (PhilArchive: candidate — methodology + philosophy of science fit)
Version history
- v0.1 (2026-05-06): Initial substantive draft. Framework F1-F4 + three empirical case studies + memory embedding. Authors: 藤本 × Rei × Claude.
Top comments (0)