Xu Bian

Posted on May 6 • Originally published at marlinbian-site.pages.dev

How Codex Can Drive Verifiable SketchUp Modeling

#ai #codex #sketchup #architecture

This is not a step-by-step tutorial.

Until the demo path is fully verified, I do not want to package it as something readers can simply follow and reproduce. A more accurate label is architecture walkthrough: this piece explains why Codex or another agent CLI should not "magically control SketchUp" directly, and why it should instead use verifiable intermediate layers to turn natural-language intent into checkable, executable, repairable project state.

Using SketchUp Agent Harness as the example, Claude and Codex are only entry points. The important structure sits in the middle:

agent CLI
-> runtime skills
-> MCP server
-> structured design model
-> SketchUp Ruby bridge
-> SketchUp scene

This chain determines whether the agent is generating a one-off result or maintaining a design project.

Why Codex Should Not Directly Operate SketchUp

The most obvious idea is to let Codex understand the user's request and directly call SketchUp.

That idea is tempting, but it creates several problems.

First, software operations can become a black box. What Codex did, why it did it, which objects were created, and where dimensions came from become hard to trace without structured records.

Second, design state gets scattered. Some of it sits in chat, some in the SketchUp scene, some in temporary scripts, and some in the user's head. When the project continues later, the agent may not know the real current state.

Third, mistakes are hard to repair. If the result looks wrong, the agent may patch the current scene instead of going back to source evidence and the structured model to fix the root cause.

Codex should not simply become SketchUp's hand. It should be an agent entry point that understands intent, calls tools, checks results, and drives repair.

The stable execution capability should live in the tool layer and the project state layer.

Layer 1: Agent CLI Understands Intent

The user begins with natural-language intent.

For example:

"Create a 2 m by 1.8 m bathroom with a toilet, vanity, door, mirror, basic lighting, and a clearance check."

Codex's value is not to turn that sentence directly into a one-off script. Its value is to understand the design actions inside the request:

create a space
set dimensions and units
place components
apply basic rules
check clearances
generate an execution plan
push the result into SketchUp

Codex is the entry point and coordinator. It is not the source of truth.

Layer 2: Runtime Skills Provide Design Context

Natural-language understanding is not enough.

A general agent may know what a bathroom is, but it may not know how this product represents spaces, how components are registered, how design rules work, when layout validation is required, or when a project-specific repair must not become a global rule.

That is the role of runtime skills.

In a system like SketchUp Agent Harness, runtime skills should serve design tasks, not product maintainers. They can tell the agent:

how to create or open a design project
how to handle space planning
how to import a floor plan
how to search and place components
how to apply design rules
how to record visual feedback
how to turn feedback into structured repair

These skills should not mix in release workflow, test workflow, or maintainer-only development rules. At runtime, designers need design task capability, not product engineering context.

Layer 3: The MCP Server Defines the Tool Boundary

If Codex is the entry point and runtime skills provide task context, the MCP server is the stable tool boundary.

It should handle things like:

reading project state
updating the structured model
running validation
generating execution plans
exposing clear tool interfaces
turning agent requests into controlled product operations

This layer matters because it turns model-session intent into auditable tool calls.

Without it, an agent may directly generate scripts, edit files, or operate software. That may feel flexible in the short term, but it is hard to maintain.

With it, Codex can change, Claude can change, and future agent CLIs can change, while the project protocol and execution model remain stable.

Layer 4: The Structured Model Holds Project Truth

The core of SketchUp Agent Harness is not a chat entry point. It is the structured design model.

In this project, design_model.json holds the working truth of the design project:

spaces
dimensions
components
rules
assumptions
source evidence
execution plans
repair clues

This layer determines whether the system is reliable.

Without a structured model, Codex may generate a SketchUp scene that looks correct, but it will be difficult to continue reliably next time.

With a structured model, the system can support:

diff
validation
replay
repair
version comparison
source-backed correction

The SketchUp scene is the execution result. design_model.json is the project state that the agent can keep understanding.

Layer 5: The Ruby Bridge Executes Into SketchUp

Design still needs to enter SketchUp.

The Ruby bridge's role is to send structured operations into SketchUp, rather than forcing the agent to carry all software details inside the conversation.

This layer should be stable, diagnosable, and allowed to fail clearly.

A good bridge does more than create objects. It should report when SketchUp is not in the right state, when the bridge is missing, when execution is blocked by software state, or when the returned structure is not what the upper layers expected.

Structured errors matter. They keep the agent from guessing and prevent it from mistaking a software-state problem for a design problem.

Layer 6: Visual Review Returns to Structured Repair

After execution in SketchUp, review is still necessary.

Screenshots, top views, and renders can reveal many problems: wall misalignment, reversed orientation, missing openings, incorrect component scale, or material and lighting mismatches.

But visual review should not stop at "this image looks wrong."

If a designer accepts a visual correction, the system should turn it into structured repair:

update the model diff
correct source evidence
add validation rules
record component or material changes
save project-local memory
execute or replay again

That is the loop.

Codex should not only generate the first version. It should help find problems, explain them, call tools, and repair them.

What the Full Chain Looks Like

At an abstract level, one natural-language modeling request can be decomposed like this:

1. The user expresses design intent.
2. Codex uses runtime skills to classify the task.
3. The MCP server reads or creates project state.
4. The structured model records spaces, rules, components, and assumptions.
5. Validators check dimensions, clearances, or structural issues.
6. The execution plan becomes SketchUp bridge operations.
7. The Ruby bridge executes inside SketchUp.
8. A screenshot or view becomes a review artifact.
9. Designer feedback is written back into the structured model.
10. The system validates, repairs, or replays.

This is more complex than "type one sentence and generate a model."

But it solves the reliability problem: the project can continue, mistakes can be located, designer feedback can be traced, and the model can be replayed.

Why This Is Not Codex-Specific

Codex is not the only core of this architecture.

Claude, Codex, and future agent CLIs can all be entry points. The parts that should remain stable are:

runtime skills
MCP tool layer
design model schema
SketchUp bridge
protocol documents
validation loop

This means the agent entry point can change without rewriting project state and execution boundaries.

It also avoids hard-coding the habits of one model product into the core system. Claude-specific and Codex-specific logic should stay in adapters as much as possible, not inside the MCP server or the structured model.

What Should Not Be Overclaimed Yet

The boundary matters.

SketchUp Agent Harness is still an early-stage open-source project. It should not be packaged as a mature commercial design platform.

What is ready to discuss is the architecture direction and the product boundary that already exists: natural-language entry, MCP server, Ruby bridge, structured model, runtime skills, protocol, and validation loop.

To turn this into a true tutorial, we still need a verified public-safe demo path: public-safe input, a full run from natural-language goal to SketchUp execution, review, and repair.

Until then, this article explains the architecture. It does not promise that readers can reproduce a full demo by following steps.

Conclusion

Codex-driven SketchUp modeling is not about letting Codex directly control software.

The key is the middle layer.

Natural language is only the entry point. The SketchUp scene is only the execution result. What makes the system sustainable is the combination of runtime skills, MCP server, structured design model, Ruby bridge, and visual review loop.

Without these layers, AI modeling easily becomes one-off output.

With these layers, an agent can move from "help me draw something" toward "help me maintain a design project."

Originally published on my personal site:
https://marlinbian-site.pages.dev/writing/codex-driven-sketchup-modeling/

More links: GitHub · YouTube · LinkedIn · Bluesky · Mastodon · Discord