keeper

Posted on May 23

How Codex CLI helped me ship 3 releases in 48 hours — and what it got wrong

#python #3dprinting #ai #opensource

I maintain three open-source 3D printing tools: SupportSage (AI-optimized support structures), Printsight (photo-based quality inspection), and FilamentDB (filament parameter database).

Over two days, Codex CLI (gpt-5.5) helped me fix 15 bugs, add 86 new tests, and ship a unified CLI — across all three projects. Here's what worked, what broke, and what I'd do differently.

The three projects

The three tools form a closed loop:

FilamentDB ──→ SupportSage ──→ 3D Print ──→ Printsight
     ↑                                          │
     └─────────── Feedback loop ←───────────────┘

FilamentDB tells SupportSage optimal print settings for the chosen filament
SupportSage generates optimized support structures
Printsight inspects the result and feeds quality data back into the learning engine

Each was independently installable via pip. The goal was to unify them under one CLI and fix accumulated bugs.

Codex CLI: the workflow

I used Codex CLI v0.133.0 with gpt-5.5 via ChatGPT Plus. The pattern was:

Write a detailed prompt describing all bugs to fix
Run codex exec "$(cat prompt.txt)" from the project root
Review the diff
Add missing tests
Commit and push

This worked well for three projects. The prompts included exact file paths, line numbers, expected behavior, and verification commands.

What Codex got right

Printsight: 5 bugs, 16 tests

Before: tests ran subprocess calls on image files that didn't exist in the repo. All tests silently hung.

Codex fixed:

Cross-version OpenCV crash: cv2.fitLine() returns different types across OpenCV versions. The old code indexed into a float, which crashes on newer OpenCV
Division by zero: Warping detection could divide by zero when the direction vector was zero
Unchecked return value: cv2.imencode() return value was ignored, risking garbage data
Edge cases: All-black and all-white images crashed detectors
Better tests: Rewrote the entire test suite as 16 proper unit tests using synthetic numpy arrays (no external images needed)

FilamentDB: 4 bugs, 42 tests

Before: zero tests.

Codex fixed:

Corrupted JSON crash: json.load() with no try/except — a corrupted database file would crash the CLI
Recursion risk: recommend() called itself recursively when falling back to brand defaults, risking stack overflow
Empty query matched everything: search("") returned every entry because "" in string is always True
Missing tests: 42 tests covering search, recommend, compare, alternatives, list, empty database, and corrupted JSON

SupportSage: 6 bugs, 84 tests

Before: 77 tests.

Codex fixed:

Unupdated parameter: borderline_angle was never updated in the learning loop, always returning the default
Too-slow learning: Angle updates multiplied by 0.1 damping, making 20+ records barely move the needle
Missing validation: quality_score accepted negative values and values over 100
Partial file hash: Hash fallback only read the first 64KB of STL files, causing collisions
Fragile JSON loading: Printer/material profile loading crashed on corrupted JSON
Ignored quality score: Calibration updates ignored print quality entirely

Everything above worked as expected: Codex found the bugs, wrote the fixes, and I approved them.

What Codex got wrong

The approval trap

The first codex exec attempt ran for 21 minutes with zero output. I assumed it was slow — turns out it was hung waiting for approval.

Codex's config had approval = OnRequest. When exec mode needed to write files, it blocked forever because there was no TTY to approve.

Fix: codex exec --sandbox workspace-write bypasses the approval prompt. I added a cxe alias so I never forget.

The hardcoded `--json` bug

The CLI unification code I wrote had a subtle bug: the inspect subcommand always passed --json to Printsight, even when the user didn't ask for it. This meant:

The wrapper's --json flag was meaningless — JSON output was always on
Default output was JSON, not human-readable

Codex caught this in its review and flagged it as CRITICAL. Two-line fix.

Python 3.10 → 3.11 compatibility

I used Python 3.11's tomllib for TOML config parsing. Codex correctly noted that pyproject.toml still claimed >=3.10 support. Fix: add a tomli fallback import for older Python versions.

The stdin piped-prompt confusion

Passing a multi-line prompt as codex exec "$(cat prompt.txt)" works unreliably. The $(cat ...) expansion can break on special characters, and the prompt silently fails if the shell parses it differently.

The robust pattern: cat prompt.txt | codex exec --sandbox workspace-write "" — pass the prompt via stdin instead of as a command argument.

Results

After the Phase 1 unified CLI work and three rounds of Codex review:

Project	Tests before	Tests after	Bugs fixed	New features
SupportSage	77	131	6	CLI unification, shared config
Printsight	0 (broken)	16	5	—
FilamentDB	0	42	4	—
Total	77	189	15	CLI, config, roadmap

New CLI commands:

supportsage inspect photo.jpg      # → delegates to Printsight
supportsage filament search PLA    # → delegates to FilamentDB
supportsage filament recommend -b "Bambu Lab" -m "PLA Basic"
supportsage filament compare "eSun PLA+" "Overture PETG"

Shared config at ~/.supportsage/config.toml:

[printsight]
annotate_default = true  # auto-annotate inspection photos

[filamentdb]
data_dir = "~/.supportsage/data"

What I'd do differently

Test Codex exec with a simple prompt first — before throwing a 3,700-character prompt at it. A two-word test would have revealed the approval hang in seconds.
Pipe prompts via stdin — cat prompt.txt | codex exec ... "" avoids shell escaping issues entirely.
Write CLI delegation tests that verify the delegated argv — my first tests only checked that delegation happened, not what arguments were passed. Codex's review caught this.
Don't guess Codex can write everything — for Phase 1, I ended up writing the code myself after Codex hung. The bug-fix tasks worked; the feature-implementation task didn't.

Releases

All three projects got patch releases with the fixes, plus SupportSage v0.7.0 with the unified CLI:

Full roadmap: docs/ROADMAP.md

DEV Community

How Codex CLI helped me ship 3 releases in 48 hours — and what it got wrong

The three projects

Codex CLI: the workflow

What Codex got right

Printsight: 5 bugs, 16 tests

FilamentDB: 4 bugs, 42 tests

SupportSage: 6 bugs, 84 tests

What Codex got wrong

The approval trap

The hardcoded `--json` bug

Python 3.10 → 3.11 compatibility

The stdin piped-prompt confusion

Results

What I'd do differently

Releases

Top comments (0)

The three projects

Codex CLI: the workflow

What Codex got right

Printsight: 5 bugs, 16 tests

FilamentDB: 4 bugs, 42 tests

SupportSage: 6 bugs, 84 tests

What Codex got wrong

The approval trap

The hardcoded --json bug

Python 3.10 → 3.11 compatibility

The stdin piped-prompt confusion

Results

What I'd do differently

Releases

The hardcoded `--json` bug