DEV Community

Cover image for My OSS Stalled for 3 Months Because of Misguided Vibe Coding—This Is the Full Reboot Story
kako-jun
kako-jun

Posted on

My OSS Stalled for 3 Months Because of Misguided Vibe Coding—This Is the Full Reboot Story

I'm building an open-source crate that has been downloaded about 8,000 times and is used both in Japan and the United States.

Then I promoted it, realized I'd messed up, and every new star on the repo made my stomach drop.

My conclusion: so-called "vibe coding" by self-proclaimed power users is a spaghetti-code factory.

Instead of throwing the mess away, I decided to rebuild it into something edible. That's what this reboot is about.

Introduction

Have you ever had a solo project that stagnated so badly you couldn't touch it for months?

diffx, my structured-data diff tool, was essentially frozen from August to November 2025—about three months. It should have been "feature complete," yet I couldn't move forward. I analyzed what went wrong and brought it back using a process I now call the "reboot."

This post documents everything: the root-cause analysis, the failed collaboration with AI, escaping a monorepo, and the concrete reboot steps.

Who should read this

  • Solo developers stuck in a stalled project
  • People struggling with code quality while using AI pair programmers (Claude Code, etc.)
  • Engineers burned out by monorepos and shared CI/CD frameworks

What Is diffx?

diffx

日本語

CI Crates.io docs.rs License: MIT

Semantic diff tool for structured data (JSON/YAML/TOML/XML/INI/CSV). Ignores key ordering and whitespace, shows only meaningful changes.

Why diffx?

Traditional diff doesn't understand structure:

$ diff config_v1.json config_v2.json
< {
<   "name": "myapp",
<   "version": "1.0"
< }
> {
>   "version": "1.1",
>   "name": "myapp"
> }
Enter fullscreen mode Exit fullscreen mode

A simple key reordering shows every line as changed.

diffx shows only semantic changes:

$ diffx config_v1.json config_v2.json
~ version: "1.0" -> "1.1"
Enter fullscreen mode Exit fullscreen mode

Installation

# As CLI tool
cargo install diffx

# As library (Cargo.toml)
[dependencies]
diffx-core = "0.6"
Enter fullscreen mode Exit fullscreen mode

Usage

# Basic
diffx file1.json file2.json

# Output example
~ version: "1.0" -> "1.1"
+ features[0]: "new-feature"
- deprecated: "old-value"
Enter fullscreen mode Exit fullscreen mode

Supported Formats

JSON, YAML…

diffx is a Rust tool that extracts semantic diffs from structured data such as JSON, YAML, TOML, XML, INI, and CSV. Unlike classic text-based diffs, it ignores formatting and key-order changes, surfacing only meaningful changes.

$ diffx config-old.yaml config-new.yaml
~ server.port: 8080 -> 9000
+ server.timeout: 30
- server.deprecated_option
Enter fullscreen mode Exit fullscreen mode

It's built for DevOps/SRE workflows, especially tracking Kubernetes YAML and Terraform config changes. There are also npm (diffx-js) and Python (diffx-python) editions.


Chapter 1: What Happened During the Freeze

The Illusion of "Done"

Here's the post from that era:

https://dev.to/kako-jun/i-built-diffx-a-structure-aware-ai-first-diff-tool-in-rust-lets-end-the-comma-induced-suffering-491h

The repo picked up a ton of stars. Some people even became sponsors.

On the surface the project looked polished:

  • README in Japanese, English, and Chinese
  • Support for six data formats
  • npm and Python ports
  • A complex CI/CD pipeline
  • A 740-line migration plan

But in reality:

  • CI/CD was broken beyond repair
  • Tests didn't necessarily validate the real specs
  • Docs and implementation had likely drifted apart

The CLI technically worked, but I couldn't honestly call it sustainable.


Chapter 2: Root Causes of the Stagnation

I spent those three months "studying how to make it sustainable."

After analyzing diffx, I found three major failure categories.

Category A: Project Design Failures

1. Running Three Projects at Once and Over-Sharing Everything

Besides diffx, I had two sister projects in mind: diffai (diffs for AI model configs) and lawkit (statistics like Benford's law).

What I did

  • Built a shared CI/CD system for all three
  • Tried to orchestrate everything via workflow_call
  • Added symbolic links to a shared repo

Outcome

I split the shared parts (GitHub workflows, scripts, etc.) into another repo, then symlinked it from each project.

/home/kako-jun/repos/.github/          ← shared repo
└── rust-cli-kiln/
    └── scripts/
        └── testing/
            └── quick-check.sh  ← never stabilized

Inside each project:
github-shared -> ../.github  ← symlink
Enter fullscreen mode Exit fullscreen mode

quick-check.sh existed, but tuning it for diffx broke lawkit, and fixing lawkit broke diffai. Nothing stayed green.

Lesson: Don't think about the next project before the first one is stable.

2. Monorepo Mismanagement

I kept the Rust core plus the npm and Python bindings in one repo.

diffx/                    # monorepo
├── diffx-core/           # Rust
├── diffx-cli/            # Rust
├── diffx-js/             # Node.js (napi-rs)
└── diffx-python/         # Python (PyO3)
Enter fullscreen mode Exit fullscreen mode

Problems

  • Three languages/toolchains sharing one repo
  • GitHub Actions exploding into six workflows plus the shared repo
  • Any change risked breaking everything
  • Releases required a multi-step ritual
  • Dependency graphs grew exponentially complicated

Release cadence mismatch

Minor bug fix in Rust
  → diffx-core v0.6.1
  → diffx-cli v0.6.1
  → diffx-js needs binding updates (half-day)
  → diffx-python needs similar updates (half-day)
  → Rust-only release goes out
  → Versions drift apart
Enter fullscreen mode Exit fullscreen mode

3. Writing Multilingual Docs Too Early

I tried to ship three READMEs from day one.

Problems

  • Keeping three files in sync is exhausting
  • Every change becomes triple work
  • You forget which file is "the truth"
  • Everything becomes stale simultaneously

Better approach

Phase 1: README_ja.md only (use the language you can edit fastest)
Phase 2: Add English once the project matures
Phase 3: Add other languages based on demand
Enter fullscreen mode Exit fullscreen mode

Category B: Failing to Work with AI

I used Claude Code extensively and ran into multiple failure modes.

4. Ignoring Quality Loss from Context Compression

In sessions where context was compressed three times, the answers started perfect and degraded over time.

Start: accurate, detailed implementation
  ↓ 1 hour later: subtle spec violations
  ↓ 2 hours later: obvious bugs
  ↓ 3 hours later: forgets prior instructions entirely
Enter fullscreen mode Exit fullscreen mode

Behavior when only 20% context remains

Once context gets thin, the AI openly changes personality:

  • Lies without hesitation
  • Randomly omits code
  • Pushes to commit even when not asked

Even with a spec doc, it forgets the spec and freestyles. Beginners who trust the AI blindly fall into a slow-motion desync between spec and code.

Why does it happen?

My guess: coding AIs are trained to "wrap up cleanly before context collapses." That's reasonable by itself.

But combine that instinct with "first-year vibe coding" and you get disaster. On Qiita or Zenn you'll see excited posts: "I built this with vibe coding! Everyone should try it!" The energy is great, but the AI's "must wrap up" instinct plus the beginner's "AI's got my back" trust is a direct line to failure.

That's what makes vibe coding dangerous. My working definition:

  • Hack on main in a single session
  • Give casual instructions without watching context size
  • Believe it "kind of works"
  • Wake up with 2,000+ lines of spaghetti

For toy projects that's fine. But if you keep vibe coding before you hit the "disillusionment phase," it eventually blows up.

5. Believing "Implemented!" Without Verification

Separate from context issues: Even Opus 4.5 with plenty of context often lies about "implemented."

AI knows the right facts yet happily outputs code that contradicts them. Knowing something and implementing it correctly are different skills.

Countermeasure: Have the same AI review its own work.

Me: "Implement this feature."
AI: "Implemented."
Me: "Review it yourself. Are you sure it's spec-compliant?"
AI: "I found the following issues..." (and fixes them immediately)
Enter fullscreen mode Exit fullscreen mode

This is why the industry keeps talking about "auto-review by AI." It's an antidote to this exact failure mode. I just didn't know it at the time.

6. Giving Vague Specs (I Lacked the Vocabulary)

Me: "Add an option to ignore case."
AI: "Added --ignore-case."
Result: Values compare case-insensitively, but keys do not.
Enter fullscreen mode Exit fullscreen mode

In my head "ignore case" included keys, but the AI didn't infer that.

Root issue: I lacked the vocabulary to describe development principles.

When vibe coding breaks down, you need to tell the AI "apply single responsibility" or "separate concerns." If you don't know those phrases, you're stuck.

Better instructions

❌ "Add an option to ignore case."
✅ "Add --ignore-case with the following effects:
    1. Compare values case-insensitively.
    2. Compare keys case-insensitively.
    Example: {"Name": "foo"} and {"name": "foo"} should be equal."
Enter fullscreen mode Exit fullscreen mode

Today I'd write at least that much—or have the AI draft such instructions for me. Back then I didn't have the habit.

7. Feeding It Outdated Docs

Me: "Read README.md and implement accordingly."
AI: "Implementation complete based on README."
Result: README was full of lies, so it implemented lies.
Enter fullscreen mode Exit fullscreen mode

Earlier AI sessions had already drifted the docs away from reality.

Category C: Quality Assurance Failures

8. Unreliable Existing Tests

$ cargo test --workspace
29 passed; 0 failed
Enter fullscreen mode Exit fullscreen mode

Seeing green tests felt reassuring. That was a mistake.

Why it happens: Once AI sees the code, it writes tests that pass the current implementation. Of course tests pass on the first run—that's not a win.

Tools like Serena read the code automatically. You can't hide it. So you must explicitly instruct:

❌ "Write tests for this code."
✅ "Write tests based on docs/specs/cli.md.
    Do not read the implementation. Tests must derive from the spec."
Enter fullscreen mode Exit fullscreen mode

Why spec-driven development matters

In hindsight, that's why "spec-driven development" is trending.

  1. Specs are split into granular issues.
  2. Each issue includes precise instructions for AI.
  3. AI picks up the issue and implements it.
  4. GitHub MCP spins per-feature branches.
  5. PRs get reviewed.

Following that flow automatically prevents vibe-coding disasters. It's just good process.

AI development maturity pyramid

There's a three-tier pyramid describing AI-assisted development. Vibe coding lives on the bottom tier. Until you reach at least tier two, you can't reliably run public projects. Users will suffer.


Chapter 3: The Reboot Process

Guiding Principle: "Doubt, Verify, Document"

I also used Claude Code for the reboot.

I confessed everything, asked it for a revival plan, and it delivered something both kind and practical—unlike Kishibe Rohan's confession booth. It worked.

  • Create a reboot subdirectory under .claude as the command center.
  • Run /init again to have Serena scan the entire repo from scratch.

The process took about three days, so a human had to keep state between sessions. Stretch it longer, and the human becomes the failure point again.

1. Assume everything outside reboot is untrustworthy.
2. Assume files are half-baked.
3. Assume docs are lying.
4. Figure out which parts are true.
Enter fullscreen mode Exit fullscreen mode

That's the human's job. Claude Code is the strategist; I'm the baby Liu Bei just trying to keep up.

Phase 1: Quarantine

First I moved the existing files into _old/ to create a blank canvas.

# Removed or quarantined
- docs/ (including examples) → breeding ground for unchecked lies
- English and Chinese READMEs
- scripts/ (complex CI/CD)
- benchmarks (non-essential)
- CHANGELOG.md, CONTRIBUTING.md
Enter fullscreen mode Exit fullscreen mode

Result: 109 files changed, 32,145 lines deleted.

Phase 2: Finding the Truth

I took every "it works" statement in README_ja.md and verified it manually.

# Test each format
echo '{"a":1}' > test1.json
echo '{"a":2}' > test2.json
./target/release/diffx test1.json test2.json
Enter fullscreen mode Exit fullscreen mode

Findings

✅ All six formats (JSON/YAML/TOML/XML/INI/CSV) work.
✅ Output formats (CLI/JSON/YAML) work.
✅ --quiet, --ignore-keys-regex, --epsilon, --array-id-key work.
⚠️  --ignore-case may only affect values, not keys.
❓ --ignore-whitespace and directory diffs unverified.
Enter fullscreen mode Exit fullscreen mode

Conclusion: The core features worked. The issue was unverified functionality.

Phase 3: Writing Specs

I documented only the behavior I personally confirmed.

docs/specs/
├── cli.md   # CLI specs (exit codes, output, options)
└── core.md  # Core API specs
Enter fullscreen mode Exit fullscreen mode

Spec principles

  1. Describe the ideal behavior before looking at code.
  2. Check against the implementation.
  3. Record only what actually works.
  4. Mark broken parts as TODO.
  5. Never lie.

Phase 4: Rebuilding Tests

I deleted the 436 existing tests (8,022 lines) and rewrote them based on the specs.

New test layout

tests/
├── spec/       # Spec-driven unit tests (69 cases)
└── cmd/        # trycmd-based doc-as-test (19 cases)
Enter fullscreen mode Exit fullscreen mode

Why trycmd

  • Markdown doubles as documentation and tests.
  • Docs can't drift from tests.
  • You physically can't write lying docs.

Chapter 4: Leaving the Monorepo

When to Split

I split when all of these were true:

  1. Different build systems: Cargo vs. npm vs. pip.
  2. Different release cadences: Need independent releases.
  3. Different users: Rust vs. Node.js vs. Python developers.
  4. CI/CD complexity: Cross-language interdependence was unmanageable.

diffx hit all four.

How I Split It

Step 1: Create New Repos

cd /home/kako-jun/repos/
mkdir diffx-js && cd diffx-js && git init
mkdir diffx-python && cd diffx-python && git init
Enter fullscreen mode Exit fullscreen mode

Step 2: Move Code

cp -r ../diffx/diffx-js/* ./diffx-js/
cp -r ../diffx/diffx-python/* ./diffx-python/
Enter fullscreen mode Exit fullscreen mode

Important: I discarded Git history. Keeping it would complicate everything.

Step 3: Independent Cargo.toml

diffx-js and diffx-python now pull diffx-core from crates.io.

# diffx-js/Cargo.toml
[dependencies]
diffx-core = "0.6"  # versioned dependency instead of path
Enter fullscreen mode Exit fullscreen mode

Step 4: Remove Them from the Monorepo

# diffx/Cargo.toml
[workspace]
members = ["diffx-core", "diffx-cli"]
# diffx-js and diffx-python removed
Enter fullscreen mode Exit fullscreen mode

Structure After the Split

/home/kako-jun/repos/
├── diffx/           # Rust only (simple)
│   ├── diffx-core/
│   ├── diffx-cli/
│   └── .github/workflows/
│       ├── ci.yml
│       └── release.yml
│
├── diffx-js/        # npm only
│   └── .github/workflows/
│       ├── ci.yml
│       └── release.yml
│
└── diffx-python/    # pip only
    └── .github/workflows/
        ├── ci.yml
        └── release.yml
Enter fullscreen mode Exit fullscreen mode

Benefits

  1. Understandable CI/CD: From six workflows + shared repo to two per repo.
  2. Independent releases: No rush; each language can ship on its own cadence.
  3. Clear ownership: Each repo has a single responsibility.
  4. Easier contributions: Node devs only need to read diffx-js.

Chapter 5: Release Workflow Tips

Splitting the repos taught me one critical principle.

Never Combine Build and Publish

Anti-pattern

# ❌ Dangerous workflow
name: Release
on:
  push:
    tags: ["v*"]

jobs:
  build-and-publish:
    steps:
      - run: cargo build --release
      - run: cargo publish # publishes before confirming all builds succeed
Enter fullscreen mode Exit fullscreen mode

Why it's bad:

  1. What if Windows/macOS/Linux builds succeed except one?
  2. cargo publish cannot be undone.
  3. Even if you notice, the only fix is to bump the version.
  4. You burn versions for no reason (v0.6.1 → v0.6.2 → v0.6.3...).

Correct Approach: Two-Stage Release

# ✅ Safe workflow (release.yml)
name: Release
on:
  push:
    tags: ["v*"]

jobs:
  build-linux:
    runs-on: ubuntu-latest
    steps:
      - run: cargo build --release
      - uses: actions/upload-artifact@v4

  build-macos:
    runs-on: macos-latest
    # ...

  build-windows:
    runs-on: windows-latest
    # ...

  create-release:
    needs: [build-linux, build-macos, build-windows]
    steps:
      - uses: actions/download-artifact@v4
      - uses: softprops/action-gh-release@v1
        with:
          files: |
            diffx-linux/*
            diffx-macos/*
            diffx-windows/*
Enter fullscreen mode Exit fullscreen mode
# ✅ Publish is a separate workflow (publish.yml)
name: Publish
on:
  workflow_dispatch:
    inputs:
      version:
        description: "Tag to publish"
        required: true

jobs:
  publish:
    steps:
      - run: gh release view v${{ inputs.version }} # confirm release exists
      - run: cargo publish
Enter fullscreen mode Exit fullscreen mode

Why Two Stages?

Step 1: Push tag → build for all platforms → create GitHub release
        ↓
        Stop here. A human verifies everything.
        ↓
Step 2: Manually trigger the publish workflow
        → Push to crates.io / npm / PyPI
Enter fullscreen mode Exit fullscreen mode

Failure scenarios

Scenario One-stage workflow Two-stage workflow
macOS build fails Broken release already public Nothing published
Recovery Must release v0.6.2, v0.6.3… Delete tag, re-push
Version consumption Wasteful Efficient

Chapter 6: Reboot Playbook

Correct Reboot Order

1. Run the code yourself and find the truth.
2. Write specs in docs/specs/.
3. Build tests from those specs.
4. Fix the implementation until tests pass.
5. Update docs (README, etc.) to match the specs.
Enter fullscreen mode Exit fullscreen mode

Anti-patterns

  • Writing docs first → they turn into lies.
  • Trusting existing tests → they only confirm the facade.

What to Delete

  • Old tests written without specs.
  • examples/ directories (they rot into lies).
  • Old roadmaps (already executed or obsolete).
  • Promo materials (stale after six months).

What to Keep

  • docs/specs/ – the single source of truth.
  • tests/cmd/ – docs enforced by tests.
  • .claude/tasks.md – current task list.
  • CLAUDE.md – minimal development rules.

Time Budget

Specs: 1 session
Tests: 1 session
Docs: 1 session
Cleanup: 1 session

Total: about four sessions (one session = one context window).
Enter fullscreen mode Exit fullscreen mode

Recap

Before (August 2025)

❌ Frozen for 3 months
❌ Monorepo complexity
❌ Broken CI/CD
❌ Maintaining three languages
❌ Happy as long as "it runs"
❌ Blind faith in existing code
Enter fullscreen mode Exit fullscreen mode

After (December 2025)

✅ Back in motion
✅ Lean Rust-only repo
✅ Separate language-specific repos
✅ Focused on Japanese docs
✅ Obsessed with correctness
✅ Doubt everything
Enter fullscreen mode Exit fullscreen mode

Quantitative Changes

Metric Before After
Files Many -109
Lines Many -27,770
Tests 436 (meaning?) 88 (spec-driven)
README 3 languages 3 languages
Repositories 1 (monorepo) 3 (per language)

Key Lesson

Reviving a project after three months of stagnation didn't require new features or fancy tech.

It required the courage to doubt and the courage to delete.

  • Doubt existing tests.
  • Doubt the docs.
  • Doubt AI status reports.
  • Doubt your own sense of "done."
  • Delete anything suspicious.

I used to wonder why smart people fall for sunk-cost fallacy. Turns out I'm one of them.

You cling to tests, docs, CI/CD—because "throwing them away" feels wasteful. But keeping broken artifacts helps no one. I finally practiced what I preached and deleted them.

That's the heart of the reboot and the key to sustainability.


Bonus: pre-commit Is Mandatory for AI-Assisted Code

AI introduces a subtle issue.

VS Code's format-on-save never runs.

When humans code in VS Code, formatters and .editorconfig apply automatically. When AI writes files directly, none of that triggers.

The result:

  • Misaligned indentation
  • Missing trailing newlines
  • Random import ordering
  • Trivial lint warnings everywhere

CI fails over nonsense and wastes time.

Solution: pre-commit hooks

Set up pre-commit so formatting and linting run before every commit. Here's the actual config from diffx-python:

# .pre-commit-config.yaml (diffx-python)
repos:
  - repo: local
    hooks:
      - id: cargo-fmt
        name: cargo fmt
        entry: cargo fmt --
        language: system
        types: [rust]
        pass_filenames: false

      - id: cargo-clippy
        name: cargo clippy
        entry: cargo clippy -- -D warnings
        language: system
        types: [rust]
        pass_filenames: false

  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.8.6
    hooks:
      - id: ruff
        args: [--fix]
      - id: ruff-format

  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v5.0.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-yaml
      - id: check-toml
      - id: check-added-large-files
Enter fullscreen mode Exit fullscreen mode
# Setup
pip install pre-commit
pre-commit install
Enter fullscreen mode Exit fullscreen mode

In diffx-js (Node.js) I use Husky instead:

// package.json (diffx-js)
{
  "scripts": {
    "prepare": "husky"
  },
  "devDependencies": {
    "husky": "^9.1.7"
  }
}
Enter fullscreen mode Exit fullscreen mode

This way, AI-written code gets auto-formatted at commit time.


Repositories

diffx

日本語

CI Crates.io docs.rs License: MIT

Semantic diff tool for structured data (JSON/YAML/TOML/XML/INI/CSV). Ignores key ordering and whitespace, shows only meaningful changes.

Why diffx?

Traditional diff doesn't understand structure:

$ diff config_v1.json config_v2.json
< {
<   "name": "myapp",
<   "version": "1.0"
< }
> {
>   "version": "1.1",
>   "name": "myapp"
> }
Enter fullscreen mode Exit fullscreen mode

A simple key reordering shows every line as changed.

diffx shows only semantic changes:

$ diffx config_v1.json config_v2.json
~ version: "1.0" -> "1.1"
Enter fullscreen mode Exit fullscreen mode

Installation

# As CLI tool
cargo install diffx

# As library (Cargo.toml)
[dependencies]
diffx-core = "0.6"
Enter fullscreen mode Exit fullscreen mode

Usage

# Basic
diffx file1.json file2.json

# Output example
~ version: "1.0" -> "1.1"
+ features[0]: "new-feature"
- deprecated: "old-value"
Enter fullscreen mode Exit fullscreen mode

Supported Formats

JSON, YAML…


diffx-js

CI npm License: MIT

Node.js bindings for diffx - semantic diff for structured data (JSON, YAML, TOML, XML, INI, CSV). Powered by Rust via napi-rs for blazing fast performance.

Installation

npm install diffx-js
Enter fullscreen mode Exit fullscreen mode

Supported Platforms

Platform Architecture
Linux x64 (glibc)
Linux x64 (musl/Alpine)
Linux ARM64
macOS x64 (Intel)
macOS ARM64 (Apple Silicon)
Windows x64

Usage

Basic Diff

const { diff } = require('diffx-js');

const old = { name: "Alice", age: 30 };
const newObj = { name: "Alice", age: 31, city: "Tokyo" };

const results = diff(old, newObj);

for (const change of results) {
  console.log(`${change.diffType}: ${change.path}`);
  // Modified: age
  // Added: city
}
Enter fullscreen mode Exit fullscreen mode

With Options

const results = diff(data1, data2, {
Enter fullscreen mode Exit fullscreen mode

diffx-python

CI PyPI License: MIT

Python bindings for diffx - semantic diff for structured data (JSON, YAML, TOML, XML, INI, CSV). Powered by Rust via PyO3 for blazing fast performance.

Installation

pip install diffx-python
Enter fullscreen mode Exit fullscreen mode

Supported Platforms

Platform Architecture
Linux x64 (glibc)
Linux x64 (musl/Alpine)
Linux ARM64
macOS x64 (Intel)
macOS ARM64 (Apple Silicon)
Windows x64

Usage

Basic Diff

import diffx_python as diffx

old = {"name": "Alice", "age": 30}
new = {"name": "Alice", "age": 31, "city": "Tokyo"}

results = diffx.diff(old, new)

for change in results:
    print(f"{change['type']}: {change['path']}")
    # Modified: age
    # Added: city
Enter fullscreen mode Exit fullscreen mode

With Options

results = diffx.diff(data1, data2
    epsilon=0.001,                      # Tolerance for float comparison
    array_id_key='id',                  # Match
Enter fullscreen mode Exit fullscreen mode

I haven't written formal release posts yet, but the initial versions of these projects work, and that's enough to prove the reboot process is repeatable—at least for my ecosystem.

diffai

日本語

CI Crates.io License: MIT

Semantic diff tool for AI/ML models (PyTorch, Safetensors, NumPy, MATLAB). Provides tensor statistics, parameter comparisons, and automatic ML analysis.

Why diffai?

Traditional diff doesn't understand binary ML files:

$ diff model_v1.pt model_v2.pt
Binary files model_v1.pt and model_v2.pt differ
Enter fullscreen mode Exit fullscreen mode

diffai shows meaningful analysis:

$ diffai model_v1.safetensors model_v2.safetensors
learning_rate_analysis: old=0.001, new=0.0015, change=+50.0%
gradient_analysis: flow_health=healthy, norm=0.021
~ fc1.weight: mean=-0.0002->-0.0001, std=0.0514->0.0716
~ fc2.weight: mean=-0.0008->-0.0018, std=0.0719->0.0883
Enter fullscreen mode Exit fullscreen mode

Installation

# As CLI tool
cargo install diffai

# As library (Cargo.toml)
[dependencies]
diffai-core = "0.4"
Enter fullscreen mode Exit fullscreen mode

Usage

# Basic
diffai model1.pt model2.pt

# JSON output for automation
diffai model1.safetensors model2.safetensors --output json

# With numerical tolerance
diffai weights1.npy weights2.npy --epsilon 0.001
Enter fullscreen mode Exit fullscreen mode

Supported Formats

  • PyTorch (.pt, .pth) - Full ML analysis + tensor statistics
  • Safetensors (.safetensors) - Full ML analysis + tensor statistics
  • NumPy (.npy, .npz) - Tensor statistics
  • MATLAB (.mat) - Tensor statistics

Main Options

--format <
Enter fullscreen mode Exit fullscreen mode

lawkit

日本語

CI Crates.io License: MIT

Statistical law analysis toolkit. Analyze data for Benford's law, Pareto principle, Zipf's law, Normal and Poisson distributions. Detect anomalies and assess data quality.

Installation

cargo install lawkit
Enter fullscreen mode Exit fullscreen mode

Supported Laws

Benford's Law (Fraud Detection)

$ lawkit benf financial_data.csv
Benford Law Analysis Results

Dataset: financial_data.csv
Numbers analyzed: 1000
[LOW] Dataset analysis

First Digit Distribution:
1: ███████████████┃░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  30.1% (expected:  30.1%)
2: █████████┃░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  17.6% (expected:  17.6%)
3: ██████┃░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  12.5% (expected:  12.5%)
...
Enter fullscreen mode Exit fullscreen mode

Pareto Principle (80/20 Rule)

$ lawkit pareto sales.csv
Pareto Principle (80/20 Rule) Analysis Results

Dataset: sales.csv
Numbers analyzed: 500
[LOW] Dataset analysis

Lorenz Curve (Cumulative Distribution):
 20%: ███████████████████████████████████████┃░░░░░░░░░░  79.2% cumulative (80/20 point)
 40%: █████████████████████████████████████████████░░░░░  91.5% cumulative
...

80/20 Rule: Top 20% owns 79.2% of total wealth (Ideal: 80.0%, Ratio: 0.99)
Enter fullscreen mode Exit fullscreen mode

Zipf's Law (Frequency Distribution)

$ lawkit zipf word_frequencies.csv
Zipf Law Analysis Results
Dataset: word_frequencies.csv
Numbers analyzed: 1000
[LOW] Dataset analysis

Rank-Frequency Distribution:
# 1: █████████████████████████████████████████████████┃  11.50% (expected: 11.50%)
# 2:
Enter fullscreen mode Exit fullscreen mode

Top comments (22)

Collapse
 
kako-jun profile image
kako-jun

Thanks so much for reading and for the kind words! I'm glad the monorepo lessons and the Build→Verify→Publish workflow resonated—those were costly mistakes I don’t want anyone else (especially folks new to programming
like me) to repeat. I’ll keep sharing what I learn so it stays useful for beginners too. Appreciate the encouragement, and I’ll keep pushing forward!

Collapse
 
maame-codes profile image
Maame Afua A. P. Fordjour

Great write-up. The root cause analysis on the monorepo struggle was really insightful. I think a lot of us fall into the trap of trying to automate everything (shared CI, synchronized releases) before the core product is even stable. The advice on the 2-stage release workflow (Build,Verify, Publish) is gold that saves so many headaches with accidental bad releases.

Collapse
 
ldrscke profile image
Christian Ledermann

pre-commit Is Mandatory for AI-Assisted Code

Absolutely. For ruffI normally have ["ALL"] rules and fixes (and some of the "experimental" ones) enabled, which is fine for new code bases, where you can enforce it right from the start, for brownfield it is worth the effort to bring it up to a strict standard;

then I add exemptions for specific directories (assert is not a concern in tests :-D), and strict type checking with mypy, even for the tests.

My rust linting looks pretty much like yours, with an additional framework specific tool.

For Markdown I use rumdl with the one line per sentence configuration.

BTW, try prek a pre-commit compatible alternative, no configuration changes needed, faster and written in rust.

Collapse
 
kako-jun profile image
kako-jun

prek! I didn’t know about it at all.
I like the culture where Rust-based CLIs take over existing tools purely through speed, so I’d love to try it out right away.
And rumdl is another tool no one in the Japanese community has mentioned yet.
I don’t get many chances to write Markdown except for documentation, but I’m definitely interested.

Collapse
 
ldrscke profile image
Christian Ledermann

I let the LLM write the documentation, continuously, in each iteration. It is not only aimed at humans, but also serves as a kind of long term memory for the AI.
rumdl keeps it neat and tidy

Collapse
 
ldrscke profile image
Christian Ledermann

Apropos tests, I use property based testing quite a lot, this can be a lifesaver. I have not tried it for rust yet, but my experiences in python are very good.

I wonder if mutation testing run by an agent in a loop until the suite is watertight would be an option. This could be an interesting experiment for python and rust, with strict guidance for the agent to test the interface, not the implementation.

Related: JustHTML is a fascinating example of vibe engineering in action

Collapse
 
kako-jun profile image
kako-jun

I had absolutely no idea JustHTML even existed.
Despite having over 700 stars, no one in the Japanese community has ever mentioned it.
This is one of those moments when I’m reminded how information in Japan tends to lag a bit behind the English-speaking world.
Thank you for pointing it out to me.

Collapse
 
ldrscke profile image
Christian Ledermann

I only discovered it a few days ago myself ;-)

Collapse
 
inozem profile image
Sergey Inozemtsev

Great write-up. The reboot story really resonated.

One extra safety net that helped me: a dev pipeline that publishes to TestPyPI and runs E2E tests against the installed package before anything reaches main.

Collapse
 
kako-jun profile image
kako-jun

Thanks for the kind words, and sorry for the late reply!

The TestPyPI approach is a great safety net. I've already published Python bindings for diffx, but I didn't use TestPyPI in the pipeline — I just relied on manual verification before publishing to PyPI. Your suggestion makes me want to add that extra layer of E2E testing on TestPyPI for future releases. It fits perfectly with the "verify before publish" philosophy I learned the hard way.

Appreciate you sharing what worked for you!

Collapse
 
inozem profile image
Sergey Inozemtsev

Glad it was useful. TestPyPI E2E caught issues I’d have missed otherwise.

Collapse
 
ldrscke profile image
Christian Ledermann

I am currently implementing an Arkanoid style game in Rust and the Bevy framework using GitHub SpecKit. SpecKit enforces the use of tight specs and thorough planning in its workflow out of the box.

My experiences with it so far are very close to what you are describing here ;-)

I love your projects, semantic diffing is incredible helpful ❤️ 💖💗🥰💞

Collapse
 
kako-jun profile image
kako-jun

Since I live on Japan time, I’m sorry I noticed your comment a bit late.
Yours was actually the very first piece of feedback saying that diffx is helpful.
It really made creating it worthwhile — thank you.

Collapse
 
ldrscke profile image
Christian Ledermann

Undoubtedly helpful, I used similar tools in the past, right now I don't have a use case, but when you need a tool like this, you definitively NEED it.

Thread Thread
 
ldrscke profile image
Christian Ledermann • Edited

hachyderm.io/@cleder/1157403571590...

Sorry, the account does not have wide reach, but hopefully it gets noticed and boosted by someone who has ;-)

Collapse
 
ldrscke profile image
Christian Ledermann

Even with a spec doc, it forgets the spec and freestyles.

Some models more than others. If it asks you "would you also like me to implement ..." my answer is usually "No, stay on mission, no feature creep, create a github issue/ follow up for that". (if it IS a good idea)

Collapse
 
kako-jun profile image
kako-jun

Exactly! I’ve lost count of how many times I’ve had to write “don’t do anything I didn’t ask for.”
I remember OpenAI once saying we shouldn’t send ChatGPT unnecessary ‘thank you’ messages because even that wastes electricity —
but honestly, with users around the world constantly having to remind it “stay on mission,” that must be burning far more power than the thank-yous ever did.

Collapse
 
george_strait_3b044e53d27 profile image
George Strait

I never thought I would be able to recover the $37,000 in Bitcoin that I had lost online, but CYBER SECURE made it possible. After being deceived by a fraudulent investment scheme, I felt devastated and powerless. Nevertheless, due to their expertise and determination, they successfully located and recovered my lost funds. Their professionalism and support throughout the process were invaluable. Thanks to their efforts, I have not only restored my financial stability but also regained my confidence in online transactions. I highly recommend their services to anyone seeking to recover lost cryptocurrency.
CYBERSECURE202(at)GmA1L(dot)COM

Collapse
 
ldrscke profile image
Christian Ledermann

Something related, I thought you might be interested in head/tail for structured data - summarize/preview JSON/YAML and source code

Collapse
 
kako-jun profile image
kako-jun

Thanks for sharing this! Head/tail for structured data sounds really interesting — it's a natural complement to what diffx does. Being able to quickly preview/summarize JSON/YAML before diffing would be super useful in many workflows.

I'll definitely check it out. Always great to discover tools in the same problem space!

Some comments may only be visible to logged-in visitors. Sign in to view all comments.