DEV Community

Cover image for 😎AIPOCH – 450+ Modular Agent Skills for Medical Research
The_resa
The_resa

Posted on

😎AIPOCH – 450+ Modular Agent Skills for Medical Research

Hi! I’m part of the team building AIPOCH, an open-source library of 450+ executable Agent Skills designed specifically for medical research workflows.

AIPOCH GitHub Repository
see our Website here

Why we built AIPOCH?

Most medical research AI tools today are essentially a bundle of prompt engineering + fixed toolchains + a UI. They handle "published knowledge" well (like summarizing a paper), but they fall apart the moment you say: "Now validate this hypothesis using my own cohort data." Existing tools often lack a persistent research context. There is no version-controlled hypothesis tracking, no seamless link between literature evidence and actual data execution. We wanted to move beyond point-solutions to a modular, extensible protocol.

What is AIPOCH?

AIPOCH is a curated library of 450+ Medical Research Agent Skills, built to work with​ OpenClaw and other AI agent platforms, including​​ OpenCode and Claude​. To achieve this, we have encoded specialized medical research logic directly into our Skills.

  1. Scientific Integrity Constraints
  2. Study type identification
  3. Medically Specialized Prompt Logic

AIPOCH Skills Example

A Skill is a structured capability package consisting of:

  • skill.md: A "contract" containing YAML metadata (trigger logic) and specific operational steps.
  • Python Scripts: Executable engines called directly via bash under the guidance of the skill.md.

In the context of AIPOCH, we define our developed skills as structured capability packages designed for professional medical research tasks, utilizing skill.md as the trigger contract and Python scripts as the execution engine. We have embedded medical research constraints directly into our skill.md, references, and Python scripts.

AIPOCH Medical Skill Auditor (in development)

What is Medical Skill Auditor?

Skill Auditor is AIPOCH’s evaluation framework under active development for scoring Medical Research Agent Skills with ​rigorous, multi‑dimensional quality metrics​. It’s intended to go beyond static descriptions by measuring both core capability and ​real execution performance​—giving users and developers a clearer, data‑driven understanding of skill quality.

How does it work?

🧰 Core Capability
Evaluates a skill’s design and contract against key dimensions such as Functional Suitability​, reliability, performance & context, Agent Usability, human usability, Security, Agent-Specific and maintainability​.

📊 Medical Task
Assesses actual outputs of a skill with layered criteria, weighting general competence and category‑specific behaviors to reflect real‑world execution quality.

🚫Veto​ Gates
To enforce strict quality control, Skill Auditor is designed with two layers of ​veto mechanisms​. Any failure in these checks may lead to immediate rejection of a skill.

  • Skill ​Veto
    Operational Stability
    Structural Consistency
    Result Determinism
    System Security

  • Research ​Veto
    Scientific Integrity
    Practice Boundaries
    Methodological Ground
    Code Usability

The Most Frustrating Moment

One of our biggest early mistakes was using a cheaper LLM to "vibe coding" the initial batch of scripts.
On the surface, it worked. The scripts ran, and the logic seemed okay. The nightmare only surfaced during our audit: we realized the executing agent was silently correcting the script's logic on the fly. Because the agent read the intent in skill.md, it would "patch" the sloppy edge cases and vague error branches in the Python code during execution.
The result? We were burning massive amounts of extra tokens just to fix errors that shouldn't have existed. It didn't throw an error; it just showed up on the API bill.
We eventually scrapped the lot. We learned the hard way: Quantity isn't a moat; high-quality scripts are.

All questions/feedback welcome!😎😎😎

Top comments (0)