DEV Community

Cover image for My OSS Stalled for 3 Months Because of Misguided Vibe Coding—This Is the Full Reboot Story
kako-jun
kako-jun

Posted on

My OSS Stalled for 3 Months Because of Misguided Vibe Coding—This Is the Full Reboot Story

I'm building an open-source crate that has been downloaded about 8,000 times and is used both in Japan and the United States.

Then I promoted it, realized I'd messed up, and every new star on the repo made my stomach drop.

My conclusion: so-called "vibe coding" by self-proclaimed power users is a spaghetti-code factory.

Instead of throwing the mess away, I decided to rebuild it into something edible. That's what this reboot is about.

Introduction

Have you ever had a solo project that stagnated so badly you couldn't touch it for months?

diffx, my structured-data diff tool, was essentially frozen from August to November 2025—about three months. It should have been "feature complete," yet I couldn't move forward. I analyzed what went wrong and brought it back using a process I now call the "reboot."

This post documents everything: the root-cause analysis, the failed collaboration with AI, escaping a monorepo, and the concrete reboot steps.

Who should read this

  • Solo developers stuck in a stalled project
  • People struggling with code quality while using AI pair programmers (Claude Code, etc.)
  • Engineers burned out by monorepos and shared CI/CD frameworks

What Is diffx?

diffx

日本語

CI Crates.io docs.rs License: MIT

Semantic diff tool for structured data (JSON/YAML/TOML/XML/INI/CSV). Ignores key ordering and whitespace, shows only meaningful changes.

Why diffx?

Traditional diff doesn't understand structure:

$ diff config_v1.json config_v2.json
< {
<   "name": "myapp",
<   "version": "1.0"
< }
> {
>   "version": "1.1",
>   "name": "myapp"
> }
Enter fullscreen mode Exit fullscreen mode

A simple key reordering shows every line as changed.

diffx shows only semantic changes:

$ diffx config_v1.json config_v2.json
~ version: "1.0" -> "1.1"
Enter fullscreen mode Exit fullscreen mode

Installation

# As CLI tool
cargo install diffx

# As library (Cargo.toml)
[dependencies]
diffx-core = "0.6"
Enter fullscreen mode Exit fullscreen mode

Usage

# Basic
diffx file1.json file2.json

# Output example
~ version: "1.0" -> "1.1"
+ features[0]: "new-feature"
- deprecated: "old-value"
Enter fullscreen mode Exit fullscreen mode

Supported Formats

JSON, YAML…

diffx is a Rust tool that extracts semantic diffs from structured data such as JSON, YAML, TOML, XML, INI, and CSV. Unlike classic text-based diffs, it ignores formatting and key-order changes, surfacing only meaningful changes.

$ diffx config-old.yaml config-new.yaml
~ server.port: 8080 -> 9000
+ server.timeout: 30
- server.deprecated_option
Enter fullscreen mode Exit fullscreen mode

It's built for DevOps/SRE workflows, especially tracking Kubernetes YAML and Terraform config changes. There are also npm (diffx-js) and Python (diffx-python) editions.


Chapter 1: What Happened During the Freeze

The Illusion of "Done"

Here's the post from that era:

https://dev.to/kako-jun/i-built-diffx-a-structure-aware-ai-first-diff-tool-in-rust-lets-end-the-comma-induced-suffering-491h

The repo picked up a ton of stars. Some people even became sponsors.

On the surface the project looked polished:

  • README in Japanese, English, and Chinese
  • Support for six data formats
  • npm and Python ports
  • A complex CI/CD pipeline
  • A 740-line migration plan

But in reality:

  • CI/CD was broken beyond repair
  • Tests didn't necessarily validate the real specs
  • Docs and implementation had likely drifted apart

The CLI technically worked, but I couldn't honestly call it sustainable.


Chapter 2: Root Causes of the Stagnation

I spent those three months "studying how to make it sustainable."

After analyzing diffx, I found three major failure categories.

Category A: Project Design Failures

1. Running Three Projects at Once and Over-Sharing Everything

Besides diffx, I had two sister projects in mind: diffai (diffs for AI model configs) and lawkit (statistics like Benford's law).

What I did

  • Built a shared CI/CD system for all three
  • Tried to orchestrate everything via workflow_call
  • Added symbolic links to a shared repo

Outcome

I split the shared parts (GitHub workflows, scripts, etc.) into another repo, then symlinked it from each project.

/home/kako-jun/repos/.github/          ← shared repo
└── rust-cli-kiln/
    └── scripts/
        └── testing/
            └── quick-check.sh  ← never stabilized

Inside each project:
github-shared -> ../.github  ← symlink
Enter fullscreen mode Exit fullscreen mode

quick-check.sh existed, but tuning it for diffx broke lawkit, and fixing lawkit broke diffai. Nothing stayed green.

Lesson: Don't think about the next project before the first one is stable.

2. Monorepo Mismanagement

I kept the Rust core plus the npm and Python bindings in one repo.

diffx/                    # monorepo
├── diffx-core/           # Rust
├── diffx-cli/            # Rust
├── diffx-js/             # Node.js (napi-rs)
└── diffx-python/         # Python (PyO3)
Enter fullscreen mode Exit fullscreen mode

Problems

  • Three languages/toolchains sharing one repo
  • GitHub Actions exploding into six workflows plus the shared repo
  • Any change risked breaking everything
  • Releases required a multi-step ritual
  • Dependency graphs grew exponentially complicated

Release cadence mismatch

Minor bug fix in Rust
  → diffx-core v0.6.1
  → diffx-cli v0.6.1
  → diffx-js needs binding updates (half-day)
  → diffx-python needs similar updates (half-day)
  → Rust-only release goes out
  → Versions drift apart
Enter fullscreen mode Exit fullscreen mode

3. Writing Multilingual Docs Too Early

I tried to ship three READMEs from day one.

Problems

  • Keeping three files in sync is exhausting
  • Every change becomes triple work
  • You forget which file is "the truth"
  • Everything becomes stale simultaneously

Better approach

Phase 1: README_ja.md only (use the language you can edit fastest)
Phase 2: Add English once the project matures
Phase 3: Add other languages based on demand
Enter fullscreen mode Exit fullscreen mode

Category B: Failing to Work with AI

I used Claude Code extensively and ran into multiple failure modes.

4. Ignoring Quality Loss from Context Compression

In sessions where context was compressed three times, the answers started perfect and degraded over time.

Start: accurate, detailed implementation
  ↓ 1 hour later: subtle spec violations
  ↓ 2 hours later: obvious bugs
  ↓ 3 hours later: forgets prior instructions entirely
Enter fullscreen mode Exit fullscreen mode

Behavior when only 20% context remains

Once context gets thin, the AI openly changes personality:

  • Lies without hesitation
  • Randomly omits code
  • Pushes to commit even when not asked

Even with a spec doc, it forgets the spec and freestyles. Beginners who trust the AI blindly fall into a slow-motion desync between spec and code.

Why does it happen?

My guess: coding AIs are trained to "wrap up cleanly before context collapses." That's reasonable by itself.

But combine that instinct with "first-year vibe coding" and you get disaster. On Qiita or Zenn you'll see excited posts: "I built this with vibe coding! Everyone should try it!" The energy is great, but the AI's "must wrap up" instinct plus the beginner's "AI's got my back" trust is a direct line to failure.

That's what makes vibe coding dangerous. My working definition:

  • Hack on main in a single session
  • Give casual instructions without watching context size
  • Believe it "kind of works"
  • Wake up with 2,000+ lines of spaghetti

For toy projects that's fine. But if you keep vibe coding before you hit the "disillusionment phase," it eventually blows up.

5. Believing "Implemented!" Without Verification

Separate from context issues: Even Opus 4.5 with plenty of context often lies about "implemented."

AI knows the right facts yet happily outputs code that contradicts them. Knowing something and implementing it correctly are different skills.

Countermeasure: Have the same AI review its own work.

Me: "Implement this feature."
AI: "Implemented."
Me: "Review it yourself. Are you sure it's spec-compliant?"
AI: "I found the following issues..." (and fixes them immediately)
Enter fullscreen mode Exit fullscreen mode

This is why the industry keeps talking about "auto-review by AI." It's an antidote to this exact failure mode. I just didn't know it at the time.

6. Giving Vague Specs (I Lacked the Vocabulary)

Me: "Add an option to ignore case."
AI: "Added --ignore-case."
Result: Values compare case-insensitively, but keys do not.
Enter fullscreen mode Exit fullscreen mode

In my head "ignore case" included keys, but the AI didn't infer that.

Root issue: I lacked the vocabulary to describe development principles.

When vibe coding breaks down, you need to tell the AI "apply single responsibility" or "separate concerns." If you don't know those phrases, you're stuck.

Better instructions

❌ "Add an option to ignore case."
✅ "Add --ignore-case with the following effects:
    1. Compare values case-insensitively.
    2. Compare keys case-insensitively.
    Example: {"Name": "foo"} and {"name": "foo"} should be equal."
Enter fullscreen mode Exit fullscreen mode

Today I'd write at least that much—or have the AI draft such instructions for me. Back then I didn't have the habit.

7. Feeding It Outdated Docs

Me: "Read README.md and implement accordingly."
AI: "Implementation complete based on README."
Result: README was full of lies, so it implemented lies.
Enter fullscreen mode Exit fullscreen mode

Earlier AI sessions had already drifted the docs away from reality.

Category C: Quality Assurance Failures

8. Unreliable Existing Tests

$ cargo test --workspace
29 passed; 0 failed
Enter fullscreen mode Exit fullscreen mode

Seeing green tests felt reassuring. That was a mistake.

Why it happens: Once AI sees the code, it writes tests that pass the current implementation. Of course tests pass on the first run—that's not a win.

Tools like Serena read the code automatically. You can't hide it. So you must explicitly instruct:

❌ "Write tests for this code."
✅ "Write tests based on docs/specs/cli.md.
    Do not read the implementation. Tests must derive from the spec."
Enter fullscreen mode Exit fullscreen mode

Why spec-driven development matters

In hindsight, that's why "spec-driven development" is trending.

  1. Specs are split into granular issues.
  2. Each issue includes precise instructions for AI.
  3. AI picks up the issue and implements it.
  4. GitHub MCP spins per-feature branches.
  5. PRs get reviewed.

Following that flow automatically prevents vibe-coding disasters. It's just good process.

AI development maturity pyramid

There's a three-tier pyramid describing AI-assisted development. Vibe coding lives on the bottom tier. Until you reach at least tier two, you can't reliably run public projects. Users will suffer.


Chapter 3: The Reboot Process

Guiding Principle: "Doubt, Verify, Document"

I also used Claude Code for the reboot.

I confessed everything, asked it for a revival plan, and it delivered something both kind and practical—unlike Kishibe Rohan's confession booth. It worked.

  • Create a reboot subdirectory under .claude as the command center.
  • Run /init again to have Serena scan the entire repo from scratch.

The process took about three days, so a human had to keep state between sessions. Stretch it longer, and the human becomes the failure point again.

1. Assume everything outside reboot is untrustworthy.
2. Assume files are half-baked.
3. Assume docs are lying.
4. Figure out which parts are true.
Enter fullscreen mode Exit fullscreen mode

That's the human's job. Claude Code is the strategist; I'm the baby Liu Bei just trying to keep up.

Phase 1: Quarantine

First I moved the existing files into _old/ to create a blank canvas.

# Removed or quarantined
- docs/ (including examples) → breeding ground for unchecked lies
- English and Chinese READMEs
- scripts/ (complex CI/CD)
- benchmarks (non-essential)
- CHANGELOG.md, CONTRIBUTING.md
Enter fullscreen mode Exit fullscreen mode

Result: 109 files changed, 32,145 lines deleted.

Phase 2: Finding the Truth

I took every "it works" statement in README_ja.md and verified it manually.

# Test each format
echo '{"a":1}' > test1.json
echo '{"a":2}' > test2.json
./target/release/diffx test1.json test2.json
Enter fullscreen mode Exit fullscreen mode

Findings

✅ All six formats (JSON/YAML/TOML/XML/INI/CSV) work.
✅ Output formats (CLI/JSON/YAML) work.
✅ --quiet, --ignore-keys-regex, --epsilon, --array-id-key work.
⚠️  --ignore-case may only affect values, not keys.
❓ --ignore-whitespace and directory diffs unverified.
Enter fullscreen mode Exit fullscreen mode

Conclusion: The core features worked. The issue was unverified functionality.

Phase 3: Writing Specs

I documented only the behavior I personally confirmed.

docs/specs/
├── cli.md   # CLI specs (exit codes, output, options)
└── core.md  # Core API specs
Enter fullscreen mode Exit fullscreen mode

Spec principles

  1. Describe the ideal behavior before looking at code.
  2. Check against the implementation.
  3. Record only what actually works.
  4. Mark broken parts as TODO.
  5. Never lie.

Phase 4: Rebuilding Tests

I deleted the 436 existing tests (8,022 lines) and rewrote them based on the specs.

New test layout

tests/
├── spec/       # Spec-driven unit tests (69 cases)
└── cmd/        # trycmd-based doc-as-test (19 cases)
Enter fullscreen mode Exit fullscreen mode

Why trycmd

  • Markdown doubles as documentation and tests.
  • Docs can't drift from tests.
  • You physically can't write lying docs.

Chapter 4: Leaving the Monorepo

When to Split

I split when all of these were true:

  1. Different build systems: Cargo vs. npm vs. pip.
  2. Different release cadences: Need independent releases.
  3. Different users: Rust vs. Node.js vs. Python developers.
  4. CI/CD complexity: Cross-language interdependence was unmanageable.

diffx hit all four.

How I Split It

Step 1: Create New Repos

cd /home/kako-jun/repos/
mkdir diffx-js && cd diffx-js && git init
mkdir diffx-python && cd diffx-python && git init
Enter fullscreen mode Exit fullscreen mode

Step 2: Move Code

cp -r ../diffx/diffx-js/* ./diffx-js/
cp -r ../diffx/diffx-python/* ./diffx-python/
Enter fullscreen mode Exit fullscreen mode

Important: I discarded Git history. Keeping it would complicate everything.

Step 3: Independent Cargo.toml

diffx-js and diffx-python now pull diffx-core from crates.io.

# diffx-js/Cargo.toml
[dependencies]
diffx-core = "0.6"  # versioned dependency instead of path
Enter fullscreen mode Exit fullscreen mode

Step 4: Remove Them from the Monorepo

# diffx/Cargo.toml
[workspace]
members = ["diffx-core", "diffx-cli"]
# diffx-js and diffx-python removed
Enter fullscreen mode Exit fullscreen mode

Structure After the Split

/home/kako-jun/repos/
├── diffx/           # Rust only (simple)
│   ├── diffx-core/
│   ├── diffx-cli/
│   └── .github/workflows/
│       ├── ci.yml
│       └── release.yml
│
├── diffx-js/        # npm only
│   └── .github/workflows/
│       ├── ci.yml
│       └── release.yml
│
└── diffx-python/    # pip only
    └── .github/workflows/
        ├── ci.yml
        └── release.yml
Enter fullscreen mode Exit fullscreen mode

Benefits

  1. Understandable CI/CD: From six workflows + shared repo to two per repo.
  2. Independent releases: No rush; each language can ship on its own cadence.
  3. Clear ownership: Each repo has a single responsibility.
  4. Easier contributions: Node devs only need to read diffx-js.

Chapter 5: Release Workflow Tips

Splitting the repos taught me one critical principle.

Never Combine Build and Publish

Anti-pattern

# ❌ Dangerous workflow
name: Release
on:
  push:
    tags: ["v*"]

jobs:
  build-and-publish:
    steps:
      - run: cargo build --release
      - run: cargo publish # publishes before confirming all builds succeed
Enter fullscreen mode Exit fullscreen mode

Why it's bad:

  1. What if Windows/macOS/Linux builds succeed except one?
  2. cargo publish cannot be undone.
  3. Even if you notice, the only fix is to bump the version.
  4. You burn versions for no reason (v0.6.1 → v0.6.2 → v0.6.3...).

Correct Approach: Two-Stage Release

# ✅ Safe workflow (release.yml)
name: Release
on:
  push:
    tags: ["v*"]

jobs:
  build-linux:
    runs-on: ubuntu-latest
    steps:
      - run: cargo build --release
      - uses: actions/upload-artifact@v4

  build-macos:
    runs-on: macos-latest
    # ...

  build-windows:
    runs-on: windows-latest
    # ...

  create-release:
    needs: [build-linux, build-macos, build-windows]
    steps:
      - uses: actions/download-artifact@v4
      - uses: softprops/action-gh-release@v1
        with:
          files: |
            diffx-linux/*
            diffx-macos/*
            diffx-windows/*
Enter fullscreen mode Exit fullscreen mode
# ✅ Publish is a separate workflow (publish.yml)
name: Publish
on:
  workflow_dispatch:
    inputs:
      version:
        description: "Tag to publish"
        required: true

jobs:
  publish:
    steps:
      - run: gh release view v${{ inputs.version }} # confirm release exists
      - run: cargo publish
Enter fullscreen mode Exit fullscreen mode

Why Two Stages?

Step 1: Push tag → build for all platforms → create GitHub release
        ↓
        Stop here. A human verifies everything.
        ↓
Step 2: Manually trigger the publish workflow
        → Push to crates.io / npm / PyPI
Enter fullscreen mode Exit fullscreen mode

Failure scenarios

Scenario One-stage workflow Two-stage workflow
macOS build fails Broken release already public Nothing published
Recovery Must release v0.6.2, v0.6.3… Delete tag, re-push
Version consumption Wasteful Efficient

Chapter 6: Reboot Playbook

Correct Reboot Order

1. Run the code yourself and find the truth.
2. Write specs in docs/specs/.
3. Build tests from those specs.
4. Fix the implementation until tests pass.
5. Update docs (README, etc.) to match the specs.
Enter fullscreen mode Exit fullscreen mode

Anti-patterns

  • Writing docs first → they turn into lies.
  • Trusting existing tests → they only confirm the facade.

What to Delete

  • Old tests written without specs.
  • examples/ directories (they rot into lies).
  • Old roadmaps (already executed or obsolete).
  • Promo materials (stale after six months).

What to Keep

  • docs/specs/ – the single source of truth.
  • tests/cmd/ – docs enforced by tests.
  • .claude/tasks.md – current task list.
  • CLAUDE.md – minimal development rules.

Time Budget

Specs: 1 session
Tests: 1 session
Docs: 1 session
Cleanup: 1 session

Total: about four sessions (one session = one context window).
Enter fullscreen mode Exit fullscreen mode

Recap

Before (August 2025)

❌ Frozen for 3 months
❌ Monorepo complexity
❌ Broken CI/CD
❌ Maintaining three languages
❌ Happy as long as "it runs"
❌ Blind faith in existing code
Enter fullscreen mode Exit fullscreen mode

After (December 2025)

✅ Back in motion
✅ Lean Rust-only repo
✅ Separate language-specific repos
✅ Focused on Japanese docs
✅ Obsessed with correctness
✅ Doubt everything
Enter fullscreen mode Exit fullscreen mode

Quantitative Changes

Metric Before After
Files Many -109
Lines Many -27,770
Tests 436 (meaning?) 88 (spec-driven)
README 3 languages 3 languages
Repositories 1 (monorepo) 3 (per language)

Key Lesson

Reviving a project after three months of stagnation didn't require new features or fancy tech.

It required the courage to doubt and the courage to delete.

  • Doubt existing tests.
  • Doubt the docs.
  • Doubt AI status reports.
  • Doubt your own sense of "done."
  • Delete anything suspicious.

I used to wonder why smart people fall for sunk-cost fallacy. Turns out I'm one of them.

You cling to tests, docs, CI/CD—because "throwing them away" feels wasteful. But keeping broken artifacts helps no one. I finally practiced what I preached and deleted them.

That's the heart of the reboot and the key to sustainability.


Bonus: pre-commit Is Mandatory for AI-Assisted Code

AI introduces a subtle issue.

VS Code's format-on-save never runs.

When humans code in VS Code, formatters and .editorconfig apply automatically. When AI writes files directly, none of that triggers.

The result:

  • Misaligned indentation
  • Missing trailing newlines
  • Random import ordering
  • Trivial lint warnings everywhere

CI fails over nonsense and wastes time.

Solution: pre-commit hooks

Set up pre-commit so formatting and linting run before every commit. Here's the actual config from diffx-python:

# .pre-commit-config.yaml (diffx-python)
repos:
  - repo: local
    hooks:
      - id: cargo-fmt
        name: cargo fmt
        entry: cargo fmt --
        language: system
        types: [rust]
        pass_filenames: false

      - id: cargo-clippy
        name: cargo clippy
        entry: cargo clippy -- -D warnings
        language: system
        types: [rust]
        pass_filenames: false

  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.8.6
    hooks:
      - id: ruff
        args: [--fix]
      - id: ruff-format

  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v5.0.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-yaml
      - id: check-toml
      - id: check-added-large-files
Enter fullscreen mode Exit fullscreen mode
# Setup
pip install pre-commit
pre-commit install
Enter fullscreen mode Exit fullscreen mode

In diffx-js (Node.js) I use Husky instead:

// package.json (diffx-js)
{
  "scripts": {
    "prepare": "husky"
  },
  "devDependencies": {
    "husky": "^9.1.7"
  }
}
Enter fullscreen mode Exit fullscreen mode

This way, AI-written code gets auto-formatted at commit time.


Repositories

diffx

日本語

CI Crates.io docs.rs License: MIT

Semantic diff tool for structured data (JSON/YAML/TOML/XML/INI/CSV). Ignores key ordering and whitespace, shows only meaningful changes.

Why diffx?

Traditional diff doesn't understand structure:

$ diff config_v1.json config_v2.json
< {
<   "name": "myapp",
<   "version": "1.0"
< }
> {
>   "version": "1.1",
>   "name": "myapp"
> }
Enter fullscreen mode Exit fullscreen mode

A simple key reordering shows every line as changed.

diffx shows only semantic changes:

$ diffx config_v1.json config_v2.json
~ version: "1.0" -> "1.1"
Enter fullscreen mode Exit fullscreen mode

Installation

# As CLI tool
cargo install diffx

# As library (Cargo.toml)
[dependencies]
diffx-core = "0.6"
Enter fullscreen mode Exit fullscreen mode

Usage

# Basic
diffx file1.json file2.json

# Output example
~ version: "1.0" -> "1.1"
+ features[0]: "new-feature"
- deprecated: "old-value"
Enter fullscreen mode Exit fullscreen mode

Supported Formats

JSON, YAML…


diffx-js

CI npm License: MIT

Node.js bindings for diffx - semantic diff for structured data (JSON, YAML, TOML, XML, INI, CSV). Powered by Rust via napi-rs for blazing fast performance.

Installation

npm install diffx-js
Enter fullscreen mode Exit fullscreen mode

Supported Platforms

Platform Architecture
Linux x64 (glibc)
Linux x64 (musl/Alpine)
Linux ARM64
macOS x64 (Intel)
macOS ARM64 (Apple Silicon)
Windows x64

Usage

Basic Diff

const { diff } = require('diffx-js');

const old = { name: "Alice", age: 30 };
const newObj = { name: "Alice", age: 31, city: "Tokyo" };

const results = diff(old, newObj);

for (const change of results) {
  console.log(`${change.diffType}: ${change.path}`);
  // Modified: age
  // Added: city
}
Enter fullscreen mode Exit fullscreen mode

With Options

const results = diff(data1, data2, {
Enter fullscreen mode Exit fullscreen mode

diffx-python

CI PyPI License: MIT

Python bindings for diffx - semantic diff for structured data (JSON, YAML, TOML, XML, INI, CSV). Powered by Rust via PyO3 for blazing fast performance.

Installation

pip install diffx-python
Enter fullscreen mode Exit fullscreen mode

Supported Platforms

Platform Architecture
Linux x64 (glibc)
Linux x64 (musl/Alpine)
Linux ARM64
macOS x64 (Intel)
macOS ARM64 (Apple Silicon)
Windows x64

Usage

Basic Diff

import diffx_python as diffx

old = {"name": "Alice", "age": 30}
new = {"name": "Alice", "age": 31, "city": "Tokyo"}

results = diffx.diff(old, new)

for change in results:
    print(f"{change['type']}: {change['path']}")
    # Modified: age
    # Added: city
Enter fullscreen mode Exit fullscreen mode

With Options

results = diffx.diff(data1, data2
    epsilon=0.001,                      # Tolerance for float comparison
    array_id_key='id',                  # Match
Enter fullscreen mode Exit fullscreen mode

I haven't written formal release posts yet, but the initial versions of these projects work, and that's enough to prove the reboot process is repeatable—at least for my ecosystem.

diffai

日本語

CI Crates.io License: MIT

Semantic diff tool for AI/ML models (PyTorch, Safetensors, NumPy, MATLAB). Provides tensor statistics, parameter comparisons, and automatic ML analysis.

Why diffai?

Traditional diff doesn't understand binary ML files:

$ diff model_v1.pt model_v2.pt
Binary files model_v1.pt and model_v2.pt differ
Enter fullscreen mode Exit fullscreen mode

diffai shows meaningful analysis:

$ diffai model_v1.safetensors model_v2.safetensors
learning_rate_analysis: old=0.001, new=0.0015, change=+50.0%
gradient_analysis: flow_health=healthy, norm=0.021
~ fc1.weight: mean=-0.0002->-0.0001, std=0.0514->0.0716
~ fc2.weight: mean=-0.0008->-0.0018, std=0.0719->0.0883
Enter fullscreen mode Exit fullscreen mode

Installation

# As CLI tool
cargo install diffai

# As library (Cargo.toml)
[dependencies]
diffai-core = "0.4"
Enter fullscreen mode Exit fullscreen mode

Usage

# Basic
diffai model1.pt model2.pt

# JSON output for automation
diffai model1.safetensors model2.safetensors --output json

# With numerical tolerance
diffai weights1.npy weights2.npy --epsilon 0.001
Enter fullscreen mode Exit fullscreen mode

Supported Formats

  • PyTorch (.pt, .pth) - Full ML analysis + tensor statistics
  • Safetensors (.safetensors) - Full ML analysis + tensor statistics
  • NumPy (.npy, .npz) - Tensor statistics
  • MATLAB (.mat) - Tensor statistics

Main Options

--format <
Enter fullscreen mode Exit fullscreen mode

lawkit

日本語

CI Crates.io License: MIT

Statistical law analysis toolkit. Analyze data for Benford's law, Pareto principle, Zipf's law, Normal and Poisson distributions. Detect anomalies and assess data quality.

Installation

cargo install lawkit
Enter fullscreen mode Exit fullscreen mode

Supported Laws

Benford's Law (Fraud Detection)

$ lawkit benf financial_data.csv
Benford Law Analysis Results

Dataset: financial_data.csv
Numbers analyzed: 1000
[LOW] Dataset analysis

First Digit Distribution:
1: ███████████████┃░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  30.1% (expected:  30.1%)
2: █████████┃░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  17.6% (expected:  17.6%)
3: ██████┃░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  12.5% (expected:  12.5%)
...
Enter fullscreen mode Exit fullscreen mode

Pareto Principle (80/20 Rule)

$ lawkit pareto sales.csv
Pareto Principle (80/20 Rule) Analysis Results

Dataset: sales.csv
Numbers analyzed: 500
[LOW] Dataset analysis

Lorenz Curve (Cumulative Distribution):
 20%: ███████████████████████████████████████┃░░░░░░░░░░  79.2% cumulative (80/20 point)
 40%: █████████████████████████████████████████████░░░░░  91.5% cumulative
...

80/20 Rule: Top 20% owns 79.2% of total wealth (Ideal: 80.0%, Ratio: 0.99)
Enter fullscreen mode Exit fullscreen mode

Zipf's Law (Frequency Distribution)

$ lawkit zipf word_frequencies.csv
Zipf Law Analysis Results
Dataset: word_frequencies.csv
Numbers analyzed: 1000
[LOW] Dataset analysis

Rank-Frequency Distribution:
# 1: █████████████████████████████████████████████████┃  11.50% (expected: 11.50%)
# 2:
Enter fullscreen mode Exit fullscreen mode

Top comments (0)