DEV Community

keeper
keeper

Posted on

How Codex CLI helped me ship 3 releases in 48 hours — and what it got wrong

I maintain three open-source 3D printing tools: SupportSage (AI-optimized support structures), Printsight (photo-based quality inspection), and FilamentDB (filament parameter database).

Over two days, Codex CLI (gpt-5.5) helped me fix 15 bugs, add 86 new tests, and ship a unified CLI — across all three projects. Here's what worked, what broke, and what I'd do differently.

The three projects

The three tools form a closed loop:

FilamentDB ──→ SupportSage ──→ 3D Print ──→ Printsight
     ↑                                          │
     └─────────── Feedback loop ←───────────────┘
Enter fullscreen mode Exit fullscreen mode
  • FilamentDB tells SupportSage optimal print settings for the chosen filament
  • SupportSage generates optimized support structures
  • Printsight inspects the result and feeds quality data back into the learning engine

Each was independently installable via pip. The goal was to unify them under one CLI and fix accumulated bugs.

Codex CLI: the workflow

I used Codex CLI v0.133.0 with gpt-5.5 via ChatGPT Plus. The pattern was:

  1. Write a detailed prompt describing all bugs to fix
  2. Run codex exec "$(cat prompt.txt)" from the project root
  3. Review the diff
  4. Add missing tests
  5. Commit and push

This worked well for three projects. The prompts included exact file paths, line numbers, expected behavior, and verification commands.

What Codex got right

Printsight: 5 bugs, 16 tests

Before: tests ran subprocess calls on image files that didn't exist in the repo. All tests silently hung.

Codex fixed:

  • Cross-version OpenCV crash: cv2.fitLine() returns different types across OpenCV versions. The old code indexed into a float, which crashes on newer OpenCV
  • Division by zero: Warping detection could divide by zero when the direction vector was zero
  • Unchecked return value: cv2.imencode() return value was ignored, risking garbage data
  • Edge cases: All-black and all-white images crashed detectors
  • Better tests: Rewrote the entire test suite as 16 proper unit tests using synthetic numpy arrays (no external images needed)

FilamentDB: 4 bugs, 42 tests

Before: zero tests.

Codex fixed:

  • Corrupted JSON crash: json.load() with no try/except — a corrupted database file would crash the CLI
  • Recursion risk: recommend() called itself recursively when falling back to brand defaults, risking stack overflow
  • Empty query matched everything: search("") returned every entry because "" in string is always True
  • Missing tests: 42 tests covering search, recommend, compare, alternatives, list, empty database, and corrupted JSON

SupportSage: 6 bugs, 84 tests

Before: 77 tests.

Codex fixed:

  • Unupdated parameter: borderline_angle was never updated in the learning loop, always returning the default
  • Too-slow learning: Angle updates multiplied by 0.1 damping, making 20+ records barely move the needle
  • Missing validation: quality_score accepted negative values and values over 100
  • Partial file hash: Hash fallback only read the first 64KB of STL files, causing collisions
  • Fragile JSON loading: Printer/material profile loading crashed on corrupted JSON
  • Ignored quality score: Calibration updates ignored print quality entirely

Everything above worked as expected: Codex found the bugs, wrote the fixes, and I approved them.

What Codex got wrong

The approval trap

The first codex exec attempt ran for 21 minutes with zero output. I assumed it was slow — turns out it was hung waiting for approval.

Codex's config had approval = OnRequest. When exec mode needed to write files, it blocked forever because there was no TTY to approve.

Fix: codex exec --sandbox workspace-write bypasses the approval prompt. I added a cxe alias so I never forget.

The hardcoded --json bug

The CLI unification code I wrote had a subtle bug: the inspect subcommand always passed --json to Printsight, even when the user didn't ask for it. This meant:

  • The wrapper's --json flag was meaningless — JSON output was always on
  • Default output was JSON, not human-readable

Codex caught this in its review and flagged it as CRITICAL. Two-line fix.

Python 3.10 → 3.11 compatibility

I used Python 3.11's tomllib for TOML config parsing. Codex correctly noted that pyproject.toml still claimed >=3.10 support. Fix: add a tomli fallback import for older Python versions.

The stdin piped-prompt confusion

Passing a multi-line prompt as codex exec "$(cat prompt.txt)" works unreliably. The $(cat ...) expansion can break on special characters, and the prompt silently fails if the shell parses it differently.

The robust pattern: cat prompt.txt | codex exec --sandbox workspace-write "" — pass the prompt via stdin instead of as a command argument.

Results

After the Phase 1 unified CLI work and three rounds of Codex review:

Project Tests before Tests after Bugs fixed New features
SupportSage 77 131 6 CLI unification, shared config
Printsight 0 (broken) 16 5
FilamentDB 0 42 4
Total 77 189 15 CLI, config, roadmap

New CLI commands:

supportsage inspect photo.jpg      # → delegates to Printsight
supportsage filament search PLA    # → delegates to FilamentDB
supportsage filament recommend -b "Bambu Lab" -m "PLA Basic"
supportsage filament compare "eSun PLA+" "Overture PETG"
Enter fullscreen mode Exit fullscreen mode

Shared config at ~/.supportsage/config.toml:

[printsight]
annotate_default = true  # auto-annotate inspection photos

[filamentdb]
data_dir = "~/.supportsage/data"
Enter fullscreen mode Exit fullscreen mode

What I'd do differently

  1. Test Codex exec with a simple prompt first — before throwing a 3,700-character prompt at it. A two-word test would have revealed the approval hang in seconds.

  2. Pipe prompts via stdincat prompt.txt | codex exec ... "" avoids shell escaping issues entirely.

  3. Write CLI delegation tests that verify the delegated argv — my first tests only checked that delegation happened, not what arguments were passed. Codex's review caught this.

  4. Don't guess Codex can write everything — for Phase 1, I ended up writing the code myself after Codex hung. The bug-fix tasks worked; the feature-implementation task didn't.

Releases

All three projects got patch releases with the fixes, plus SupportSage v0.7.0 with the unified CLI:

Full roadmap: docs/ROADMAP.md

Top comments (0)