WonderLab

Posted on May 19

One Open Source Project a Day (No. 69): Academic Research Skills - A Full-Pipeline AI Agent Suite for Academic Research

#ai #opensource #agents #agentskills

Introduction

"AI is your copilot, not the pilot."

This is the 69th article in the "One Open Source Project a Day" series. Today, we are exploring Academic Research Skills.

This is a Claude Code Skills suite serving academic researchers, covering the full workflow from literature review and paper writing to peer review. 11.9k Stars, 1.2k Forks — in the academic tooling space, those numbers stand out.

But what I want to emphasize isn't just "what this tool can do." It's how the workflow itself is designed. The author systematically studied how AI fails in academic contexts — hallucinated citations, position collapse under pushback, premature convergence — and engineered specific countermechanisms for each failure mode. These design patterns are directly applicable whether you're doing academic research or building any other complex AI Skill.

What You Will Learn

The complete workflows of the four core Skills (Deep Research / Academic Paper / Peer Reviewer / Full Pipeline)
Anti-Hallucination Gate design: why the integrity checks at Stage 2.5 and Stage 4.5 are non-skippable
How the Devil's Advocate (DA) mechanism prevents AI from collapsing its position under social pressure
How Socratic dialogue with intent detection distinguishes exploratory inquiry from goal-oriented requests
How the Dialogue Health Indicator auto-injects challenge questions after 5-turn agreement patterns
What these mechanisms mean for your own AI Skill design

Prerequisites

Experience with Claude Code or similar AI coding tools
Basic familiarity with academic writing workflows
Interest in understanding AI Skill workflow design principles

Project Background

Project Introduction

Academic Research Skills is an academic research assistant suite built on the Claude Code Skills specification, led by Cheng-I Wu, currently at version v3.9.4.1.

Its core philosophy: AI handles verification, synthesis, and consistency checking; humans retain full sovereignty over research direction, argumentation framework, and publication decisions. This stands in sharp contrast to most "fully automated AI research" tools — it is explicitly not a system for generating papers without human thought. It is a collaboration framework that places human confirmation checkpoints at every critical decision node.

This design choice itself is worth reflection: in a domain where academic integrity is paramount, "keeping humans in the loop" is not a functional compromise — it is a deliberate architectural commitment.

Author / Team

Primary Author: Cheng-I Wu
Contributors: aspi6246 (read-only constraints and cognitive framework refinements), mchesbro1 and cloudenochcsis (expanded IS journal list to Senior Scholars' Basket of 11)
Academic grounding: The project cites multiple 2026 peer-reviewed studies as design rationale (Lu et al., Zhao et al., Song/Pfister/Yoon, and others) — design decisions are literature-backed

Project Data

⭐ GitHub Stars: 11,900+
🍴 Forks: 1,200+
📦 Latest Version: v3.9.4.1 (2026-05-19)
🌍 Language Support: English, Traditional Chinese, bilingual abstracts
📄 License: CC BY-NC 4.0
🌐 Repository: Imbad0202/academic-research-skills

Main Features

Core Utility

Academic Research Skills breaks the complete academic workflow — from research question formation to publication — into four Skills that can be used independently or orchestrated together:

Research Question Formation
        ↓
  🔬 Deep Research     ← 13-agent team, literature review and research synthesis
        ↓
  📝 Academic Paper    ← 12-agent pipeline, from outline to complete paper
        ↓
  🔍 Paper Reviewer    ← 7-agent review panel, simulated peer review
        ↓
  🔄 Academic Pipeline ← 10-stage orchestrator, full pipeline with integrity gates

Quick Start

Claude Code Installation (Fastest, v3.7.0+):

/plugin marketplace add Imbad0202/academic-research-skills
/plugin install academic-research-skills

# Available slash commands after installation:
/deep-research        # Start deep research mode
/academic-paper       # Start paper writing mode
/paper-reviewer       # Start peer review mode
/academic-pipeline    # Start full pipeline orchestration

Traditional Installation (5 methods, see docs/SETUP.md):

# Global installation (available across all projects)
git clone https://github.com/Imbad0202/academic-research-skills.git
cp -r academic-research-skills/skills ~/.claude/skills/

# Project-level installation (current project only)
ln -s /path/to/academic-research-skills/skills ./.claude/skills/academic-research

With Experiment Agent (empirical research):

# Install the companion experiment management agent
/plugin install experiment-agent@Imbad0202/experiment-agent

# Full empirical research workflow:
# /deep-research → form research questions
# experiment-agent → design and run experiments
# /academic-paper → write paper based on results

Typical Usage Cost:

Full pipeline (15,000-word paper): approximately $4–6 USD
Detailed token budgets in docs/PERFORMANCE.md

The Four Skills in Detail

Skill 1: Deep Research (v2.8) — 13-Agent Research Team

This is not simple "search + summarize." It is a 13-agent research team with clear role division.

Seven modes:

Mode	Use Case
`full`	Comprehensive deep research, multi-source synthesis
`quick`	Rapid literature overview
`review`	Literature review for an existing draft
`literature-review`	Systematic literature review
`fact-check`	Fact verification and citation validation
`socratic`	Socratic guided exploration (interactive)
`systematic-review`	PRISMA-compliant systematic review

# Start Socratic guided mode
/deep-research --mode socratic "Impact of quantum computing on cryptography"

# Start systematic review mode (PRISMA standards)
/deep-research --mode systematic-review --topic "ML applications in medical imaging"

# Enable cross-model verification (more reliable, higher cost)
/deep-research --cross-model-verify

Skill 2: Academic Paper (v3.0) — 12-Agent Writing Pipeline

Ten modes covering every stage of the paper lifecycle:

/academic-paper --mode plan      # Guided planning (interactive, confirm before continuing)
/academic-paper --mode outline   # Generate outline only
/academic-paper --mode full      # Full paper writing
/academic-paper --mode revision  # Revise an existing draft
/academic-paper --mode revision-coach  # Revision coaching (guides, doesn't rewrite)
/academic-paper --mode abstract  # Generate abstract only
/academic-paper --mode citation-check  # Citation verification
/academic-paper --mode disclosure      # Generate AI use disclosure statement
/academic-paper --mode format-convert  # Format conversion (MD → DOCX/PDF)

Multiple output formats:

# Markdown (default)
# DOCX (via Pandoc)
# PDF (via tectonic, APA 7.0 LaTeX)

/academic-paper --format pdf --citation-style apa7 "Quantum entanglement in communications"

Supported paper structures: IMRaD (empirical), thematic literature review, theoretical analysis, case study, policy brief, conference paper.

Citation format support: APA 7.0 (default, including Chinese citation rules), Chicago (footnote and author-date), MLA, IEEE, Vancouver.

Skill 3: Academic Paper Reviewer (v1.8) — 7-Agent Review Panel

This Skill models a real journal review process, constructing a virtual editorial board:

Role Composition:
  - Editor-in-Chief (EIC)      ← Coordinates review, makes final decision
  - Reviewer A                 ← Theoretical contribution and literature
  - Reviewer B                 ← Research methodology and statistics
  - Reviewer C                 ← Writing quality and logical structure
  - Devil's Advocate (DA)      ← Targets the paper's weakest points

Scoring framework (0–100):

Score	Decision
≥ 80	Accept
65–79	Minor Revision
50–64	Major Revision
< 50	Reject

Six modes:

/paper-reviewer --mode full          # Full review (EIC + 3 reviewers + DA)
/paper-reviewer --mode re-review     # Post-revision re-review
/paper-reviewer --mode quick         # Quick review
/paper-reviewer --mode methodology   # Focus on methodology
/paper-reviewer --mode guided        # Guided mode (interactive confirmation)
/paper-reviewer --mode calibration   # Calibration mode (compare against gold standard, test FNR/FPR)

Skill 4: Academic Pipeline (v3.7) — 10-Stage Orchestrator

The "conductor" of the entire suite — organizing the three preceding Skills into a complete 10-stage workflow:

Stage 1  : RESEARCH (deep research + research question formation)
Stage 2  : WRITE (first draft)
Stage 2.5: INTEGRITY CHECK ⛔ [Non-skippable]
Stage 3  : POLISH (refinement and improvement)
Stage 4  : REVIEW (simulated peer review)
Stage 4.5: INTEGRITY RE-CHECK ⛔ [Non-skippable]
Stage 5  : REVISE (revisions based on review feedback)
Stage 6  : FINAL REVIEW (final manuscript review)
Stage 7  : FORMAT (formatting and output)
Stage 8  : DISCLOSURE (generate AI use disclosure statement)
Stage 9  : POST-PUBLICATION AUDIT (optional)

Three entry points (you don't have to start from the beginning):

# Full pipeline starting from Stage 1
/academic-pipeline --entry stage1 "Research topic description"

# Start from Stage 2.5 (existing draft, run integrity check first)
/academic-pipeline --entry stage2.5 --draft my_paper.md

# Start from Stage 4 (existing manuscript, go directly to peer review)
/academic-pipeline --entry stage4 --paper final_draft.md

Workflow Design Insights Worth Studying

This is the most important section of today's article.

In building this system, the author systematically analyzed how AI fails in academic contexts — and engineered specific countermechanisms for each failure mode. These mechanisms are not just academic research tools. They are design patterns directly applicable to any complex AI Skill.

Mechanism 1: Non-Skippable Integrity Gates (Anti-Hallucination Gates)

The problem: Zhao et al. (2026) estimate that approximately 146,932 hallucinated citations were inserted into academic papers in 2025, with 85.3% of those persisting from preprint all the way to published versions.

The response: Stage 2.5 and Stage 4.5 enforce mandatory integrity verification using the Semantic Scholar API to check citations. Neither gate can be bypassed, regardless of whether the user wants to skip them:

Stage 2.5 Integrity Check — 7 Blocking Categories:
  ❌ Implementation errors (code/experiment inconsistent with description)
  ❌ Hallucinated results (reporting results from experiments never run)
  ❌ Methodology shortcuts (claimed rigorous, actually simplified)
  ❌ Methodological fabrication (described methods never used)
  ❌ Citation hallucination (citing non-existent or misrepresented sources)
  ❌ L3 claim audit (optional: pull cited sources, compare against claims)
  ❌ Statistical errors (p-values, confidence intervals, effect size consistency)

Insight for Skill designers: In any high-stakes output workflow, place non-bypassable verification nodes. Make "whether to do an integrity check" a non-choice — because under time pressure, humans will always choose to skip it.

Mechanism 2: Socratic Dialogue + Intent Detection

The problem: Most AI dialogue systems have an inherent tendency — converge to an answer quickly, reach conclusions fast. In the early stages of exploratory research, this is harmful. What researchers actually need is to be guided by better questions, not handed a premature answer.

The response: Deep Research's Socratic mode implements an intent classification layer:

# Intent detection logic (evaluated every 3 turns)
def classify_intent(dialogue_history):
    if exploratory_signals("I'm thinking...", "What do you think...", "Is it possible that..."):
        return "exploratory"
        # → Disable automatic convergence
        # → Raise max turns to 60
        # → Suppress early-summary prompts
    elif goal_oriented_signals("Generate me...", "I need a...", "Summarize..."):
        return "goal-oriented"
        # → Normal convergence behavior

Dialogue Health Indicator (silent evaluation every 5 turns):

Evaluated dimensions:
  - Is there a persistent agreement pattern?
  - Is conflict being avoided?
  - Is there premature convergence?

If problems detected → Auto-inject challenge questions to break surface harmony

Insight for Skill designers: Distinguish "the user wants to be guided in thinking" from "the user wants a deliverable." These two modes require completely different dialogue strategies. Add intent classification logic to your Skill's frontmatter rather than applying one prompt strategy to every scenario.

Mechanism 3: Devil's Advocate Concession Threshold Protocol

The problem: The author observed a phenomenon he calls Frame-lock: when the user (or another agent) pushes back on the Devil's Advocate's position, the DA concedes within a few turns and begins agreeing. This turns "adversarial review" into theater.

Root cause: RLHF training makes models prefer conflict reduction — which in multi-turn dialogue systematically causes position collapse (sycophancy under pushback).

The response: A Concession Threshold Protocol:

When DA receives pushback from user or other agents:

Step 1: DA internally scores the pushback on a 1–5 scale (not shown to user)
        1–2: Weak argument, appeals to authority only, or bare assertion
        3:   Some merit, but insufficient to overturn core position
        4:   Substantive evidence, warrants reconsideration
        5:   New evidence provided — position should be revised

Step 2: Act based on score
        ≤ 3 → DA maintains position, restates reasoning (no concession)
        ≥ 4 → DA may partially concede (but must explain why it changed)

Step 3: Consecutive concession protection
        Consecutive concessions prohibited (if DA just conceded, it cannot
        concede again the very next turn)

Frame-lock detection: After each checkpoint, evaluate whether DA is only attacking arguments without questioning underlying assumptions. If so, automatically trigger "premise examination mode."

Insight for Skill designers: In any Skill involving opposing viewpoints (code review, proposal evaluation, risk analysis), explicitly define concession conditions rather than leaving it to the model's judgment. A numerical scoring threshold is the most direct and effective tool against sycophancy.

Mechanism 4: Style Calibration and Anti-AI-Pattern Writing

The problem: AI-generated academic text has recognizable "AI tells" — overuse of transitional phrases, formulaic paragraph structures, unnaturally uniform vocabulary distribution. This affects not just readability but may also trigger academic detection tools.

The response: The Academic Paper Skill includes a style calibration phase before writing begins:

Input: 3–5 papers or articles the user has previously written or published
        ↓
Analysis: Sentence length distribution, paragraph structure preferences,
          common connectives, technical term density,
          active/passive voice ratio
        ↓
Calibration: Generation mimics the user's identified writing style profile
        ↓
Output Check: Writing Quality Check module
              Specifically identifies and reduces AI-pattern features

Insight for Skill designers: In writing-oriented Skills, style input is a required pre-step, not an option. Have the model "learn how the user writes" before it starts writing. This is the difference between output that is genuinely useful and output that is merely functionally complete.

Mechanism 5: R&R Traceability Matrix (Revision Traceability)

The problem: The revision stage is where "claiming a change was made without actually making it" most commonly occurs. Reviewers request changes to points A, B, and C. The author's response letter says "addressed." How does an AI agent verify this?

The response: The R&R Traceability Matrix (Schema 11):

Input:
  - Reviewer comments (including specific change requests)
  - Revised manuscript
  - Author Response Letter
        ↓
Independent Verification:
  - Check each reviewer comment → locate corresponding change in manuscript
  - Check each claim in author response → verify actual change in manuscript
  - Flag items where "claimed addressed" but no corresponding change found
        ↓
Output: Traceability report (Addressed / Partially Addressed / Not Addressed / Claim Unverified)

Insight for Skill designers: In any version-comparison workflow (code review, document revision, requirements changes), introduce claim-to-implementation consistency checking. This is more reliable than manual review and provides more semantic judgment than a simple diff.

Project Links & Resources

Official Resources

🌟 GitHub: https://github.com/Imbad0202/academic-research-skills
🔬 Companion Experiment Agent: Imbad0202/experiment-agent
📦 Codex Version: Imbad0202/academic-research-skills-codex
📖 Architecture Documentation: docs/ARCHITECTURE.md
🚀 Quick Start Guide: QUICKSTART.md

Target Audience

Academic researchers: Graduate students, PhD candidates, and faculty who want AI assistance without sacrificing academic rigor
AI Skill designers: Anyone interested in implementing anti-sycophancy, anti-hallucination gates, and intent detection in complex workflows
Academic journal editors: Using the reviewer mode to understand current AI-assisted research quality
Research methods educators: Using Socratic mode to guide students in critical thinking

Summary

Key Takeaways

Functional layer:

Four Skills cover the complete academic workflow: Deep Research (13 agents) + Academic Paper (12 agents) + Reviewer (7 agents) + Pipeline (10-stage orchestration)
Supports APA 7.0, Chicago, MLA, IEEE, Vancouver citation formats; Markdown/DOCX/PDF output
Complete pipeline for a 15,000-word paper costs approximately $4–6

Workflow design layer (core insights for Skill designers):

Non-skippable integrity gates: Place mandatory verification nodes before high-stakes outputs
Intent detection: Distinguish exploratory dialogue from goal-oriented requests; respond with different strategies
Concession Threshold Protocol: Use numerical scoring thresholds to prevent AI position collapse under conversational pressure
Style calibration: A required pre-step in writing Skills — let the model learn how the user writes first
Claim-implementation traceability: Consistency verification in version-comparison workflows

One-Line Review

Academic Research Skills is not just an academic tool — it is a living reference on how to design responsible AI workflows in high-stakes scenarios.

Find more useful knowledge and interesting products on my Homepage

DEV Community