Putting Codex To Task

#programming #ai #vibecoding

I’ve been putting OpenAI’s Codex coding assistant to task. Literally.

Codex and I put together a task management system for the open source application DepanFx. The launch customer for these features was asynchronous file loading, a point of notable lag. The new capabilities avoid locking the UX loop, and provide an acceptable UX for notification and tracking of task execution. Before we were done, asynchronous file loading was also added to the session startup execution path.

With Codex’s help, I was able to add asynchronous file loading, with robust task management support, in roughly 4 days. All in, including a few more days of expanded functionality and other cleanup, Codex and I were able to bring up almost four thousand lines of new task management software.

This development felt roughly twice as quick as a fully manual implementation. Codex’s production of routine boilerplate software was a boon, with over 3,000 lines of working code in one day. It correctly handled many of the known nuances in multi-threaded code (e.g. InterruptedException).

Codex also required close supervision. Like a freshman intern, Codex is eager to make things work with a tendency to over-engineer its poor encapsulation. Wisdom provides guidance on whether these faults can be reworked or whether the goals should be revised. Wisdom is not Codex’s strength.

Working With Codex

Sitting down at the start of a major project to write down the goals and a plan for that effort is standard practice for good software engineering. Working on my own, I can shrug off the spec and let the implementation and the context drive the result. This is a poor strategy for AI.

Even more than humans, AI systems need a specification in order to be successful. As others advise, getting any AI system to generate the desired results requires clarity and detail in the prompt. Despite this good, creating that specification prompt does change that pace of discovery during software development.

After submitting the prompt and waiting out the response, I often find it hard to accept Codex’s proposals directly. I tend to integrate the Codex changes into a development branch that I control. If a set of changes is mostly sound, my process is roughly these steps.

Have Codex push the changes to git as a PR.
Fetch the PR’s branch via git
Merge the Codex branch into the development branch
Edit and refine until everything is fine.

If the proposals are more like “interesting” ideas, it is often easiest to cut and paste the good parts into the evolving branch. It tends to be the fastest way to get the good parts into production.

A Journey of a Thousand Steps

Getting Codex to produce acceptable code was quite a journey.

I took about an hour, including some ChatGPT questions, to assemble the design considerations into a 2 page document. One of the key elements in the prompt was the specification that implementation should proceed incrementally.

The initial Codex PR provided a reasonably sounded Java module for handling tasks. The implementation was well done, if overly complex. Tasks would get assigned a UUID, but there was no need. The task executor was implemented as a large file with multiple member classes. Pulling this apart made the structure much more evident. Despite the rework, it was still just a few hours for over 800 lines of solid task management capabilities.

Next up was a module for user interactions with these tasks. The incremental process led to one set of changes for the new module, another for a generic data model, then one for a popup window, and final one for a more detailed panel. Codex handled the tedious details of binding UI views and data sources, providing a rich user interface for control of asynchronous tasks.

Overall, Codex helped me knock out over 3,000 lines of code in one day. This was green-field code. Codex was a great boon for generating the hundreds of lines of routine definitions that are common in user interface components.

The challenge was the new code had zero interactions with the existing system.

Adding Asynchronous Tasks

Moving any application from a single threaded model to a multi-threaded model is a challenging endeavor.

Codex failed on this with its initial try. There was duplicated code splattered everywhere. It was confused by separate open and load paths. Some experimentation clarified that the handling of resource loading and their display were too closely entwined. This coupling of resource loading with resource display left no room to make the resource loading part of an asynchronous task.

Separating these tasks into distinct operations had been a known but tolerable code smell. With the advent of asynchronous loading, restructuring to clear that smell became essential.

However, this was not a task for AI. Finding a solution required experimentation with the existing data structures. At its core, the restructuring involved a subtle rearrangement of responsibilities, with some widespread changes to the declaration of built-in resources.

I suspect that Codex would have done fine if I’d told it which restructuring to perform, but that required experimentation and discovery of the desired restructuring. I have little confidence that it would have found a clean refactoring on its own. In the end, smart code completion from my IDE was the biggest aid to completing this restructuring work.

Away We Go

With the code restructured to enable asynchronous loading (and a more specific prompt), Codex proposed a clean software structure for loading files asynchronously.

Each module that interacts with the task monitor module introduces a task package that encapsulates asynchronous task management.
The task package contains a ModuleServer class as a component. This class handles all interactions between the module's components and the task monitor module.
The task package contains process related Task classes. These implement the task control API defined by the task monitor module.
Each unique interaction with asynchronous tasks is a call to a service method in the ModuleServer class. This method provides the shared logic for creating and managing an asynchronous task.

This architecture worked well, and has good scalability aspects. It was straightforward to extend this model to session startup.

Coasting to Completion

With asynchronous file loading a success, the original goal had been achieved. DepanFx was able to open large files without stalling the UX, and it automatically displayed a task monitor window when an asynchronous task was active.

The final changes before declaring success were quite limited. There were a few tweaks and adjustments in the layout of items for task rendering. It was a good chance to put on some final finishes.

The asynchronous loading of content files was extended to the session files. This is a boon to the application’s apparent startup time. The pattern of a ModuleService class was easy to extend into the new use case.

The popup for a new active asynchronous task was distracting, especially for very short tasks. The UX was much improved with a short delay before rendering the monitor dialog.

Most of this was cut and copy or small manual tweaks without Codex assistance.

Conclusion

TLDR: Codex was a real benefit to implementing a task manager for DepanFx.

Strengths: On green-field development, Codex does well. It often needed minor rework to better align with project naming conventions (e.g. DefaultExecutor versus SimpleExecutor). It correctly handled some parts of tricky multi-tasking code, such as including the oft-forgotten handlers for InterruptedException.

Challenges: Codex is happy to blunder forward when a step backward to refactor is more appropriate. The addition of multi-tasking into a single threaded application emphasized the need for the cleanup of existing code.

So in the end, it was what we all know.

AI code generation can be a productivity aide, especially with routine or well understood behaviors.
It has tendencies toward overly complex solutions.
It has tendencies towards poor encapsulation.
With poorly organized code, you’re likely to get unsavory suggestions.

Regardless, journey forth, and trust but verify.