DEV Community: Xu Bian

From One Failure to Project Memory: Make the Pipeline Stronger Over Time

Xu Bian — Tue, 12 May 2026 11:16:08 +0000

If AI makes the same mistake repeatedly in the same project, the problem is usually not just the model.

The task contract may be unclear. The context package may miss a source file. Stop conditions may be absent. Tests may not cover the behavior. The evidence gate may be too weak. Project rules may exist only in chat.

The final layer of a project-specific AI delivery pipeline is feedback and knowledge capture.

Do not only correct AI in chat

Many people tell the agent, "do not do that next time."

That is almost no process improvement.

The next session, tool, model, or project directory may not carry that lesson. Even if the model remembers, it may not know which project, scenario, or file scope the lesson belongs to.

Real project memory should enter project mechanisms.

Classify the failure first

After a failure, do not immediately add another rule. Classify the failure.

Common classes include:

unclear requirement;
wrong context;
wrong source of truth;
excessive permission;
unsafe tool call;
missing test;
missing evidence;
unclear release boundary;
source-specific lesson generalized incorrectly;
project rule not loaded by the AI.

Once the failure is classified, the project can decide what to fix.

Lessons belong in different layers

Not every lesson should become a skill.

Some lessons belong in the issue template because they affect task intake.

Some belong in AGENTS.md because they are project-wide rules.

Some belong in a skill because they are workflow-specific.

Some belong in tests or fixtures because machines should verify them.

Some belong in an evidence gate because they define completion.

Some belong in scripts because text reminders are not strong enough.

Some should remain project-local and should not be promoted to central knowledge.

Only stable cross-project lessons should become central knowledge.

Skill is not a trash bin

Skill files can become another messy documentation folder.

If every AI mistake adds one more sentence to a skill, the skill becomes longer, more contradictory, and harder to trigger.

Good rules should live in the narrowest verifiable layer.

If it is code behavior, use a test.

If it is execution process, use a script or state machine.

If it is task entry, use an issue template.

If it is a project-wide boundary, use AGENTS.md.

If it is a workflow manual, use a skill.

If it is a cross-project method, use the central knowledge base.

The role of the central knowledge base

The central knowledge base should not take over every project detail.

It should capture stable cross-project patterns: task contracts, evidence contracts, context packages, dynamic skill lifecycle, publication boundaries, release boundaries, and similar methods.

Project repositories should keep project facts, private materials, source-specific interpretation, and concrete implementation details.

That keeps the central knowledge base as the method layer while project pipelines retain execution authority.

Conclusion

An AI delivery pipeline should not only execute tasks. It should learn how the project can use AI better next time.

A repeated failure should not remain a complaint. It should become a clearer contract, better context, stronger evidence, narrower permissions, a better gate, or a more reliable project rule.

That is why a pipeline is stronger than chat: it turns one experience into structure for the next run.

Originally published on my personal site:
https://marlinbian-site.pages.dev/writing/feedback-to-project-memory/

More links: GitHub · YouTube · LinkedIn · Bluesky · Mastodon · Discord

Put Humans at Risk Boundaries, Not Only at the Final Approval

Xu Bian — Tue, 12 May 2026 11:15:30 +0000

Many AI workflows treat human-in-the-loop as a final approval step.

That is too late.

If AI has already modified critical code, triggered external actions, polluted project state, or opened a PR, a final approve button can only catch part of the problem. The dangerous side effects may already have happened.

Humans should stand at risk boundaries, not only at the end of the workflow.

What is a risk boundary?

Risk boundaries are concrete.

In software projects, they often include:

money, payment, trading, or payout;
permissions, security, privacy, or compliance;
database schema or persisted formats;
production data or production configuration;
release, deployment, or migration;
irreversible external actions;
product claims that cannot be verified.

In professional tools, risk boundaries may also include:

writing uncertain visual recognition into project truth;
turning a source-specific repair into a general rule;
applying irreversible changes to a user project;
presenting an unverified demo as a reproducible capability.

These boundaries should not rely on the model's good judgment alone.

Human gates should be early enough

A good human gate appears before the risky action.

Examples:

if triage finds high risk, stop at analysis;
if implementation would change core business semantics, ask for confirmation first;
if a tool call would touch production or an external system, require approval;
if the evidence gate fails, block PR readiness;
after merge, release still requires a separate boundary;
after deployment, done still requires production verification.

This is stronger than a final review.

Release boundaries must be separate

AI often compresses "code changed" into "work done."

Real projects cannot do that.

These states should remain separate:

local patch;
tests passed;
PR opened;
PR reviewed;
PR merged;
release ready;
deployed;
production verified;
done.

Each state has different ownership. The AI worker may reach PR. A release workflow may handle release readiness. Production verification may require humans or monitoring.

If these states collapse into one "done," the project loses risk control.

Humans are routers, not bottlenecks

People often worry that human gates reduce automation efficiency.

They can, if every step requires approval. But the right answer is not to remove humans. It is to place humans at a few important boundaries.

Low-risk tasks can run automatically. Medium-risk tasks can ask for help when evidence is missing. High-risk tasks can stop at analysis by default. Release and production verification can remain separate.

Then humans are not manual buttons for every step. They are risk routers.

Conclusion

Human-in-the-loop does not mean "someone looked at it." It means humans intervene at the right places.

A project-specific AI delivery pipeline should encode human gates and release boundaries into the workflow instead of leaving them to the model's moment-by-moment judgment.

AI can increase speed. Humans own risk. The pipeline keeps those responsibilities from replacing each other.

Originally published on my personal site:
https://marlinbian-site.pages.dev/writing/human-gates-and-release-boundaries/

More links: GitHub · YouTube · LinkedIn · Bluesky · Mastodon · Discord

Evidence Contract: AI Delivery Must Come With Proof

Xu Bian — Tue, 12 May 2026 11:14:53 +0000

The most dangerous sentence in AI delivery is: "It is done."

That sentence is not evidence. AI can write confidently. A summary can look complete. A PR description can be polished. None of that proves the work is actually complete.

A project-specific AI delivery pipeline should redefine "done" as an evidence question: what reviewable proof supports each acceptance criterion?

That is the evidence contract.

Tests matter, but they are not everything

Tests are one of the most important forms of evidence. They are not the only form.

A backend function fix may be covered by unit and integration tests. A frontend interaction change may also need screenshots or a recording. A data-link fix may need API output, logs, read-only SQL, or queue observation. A SketchUp modeling tool may need a design model diff, bridge trace, top-view screenshot, and live bridge smoke.

The question is not only "did tests run?" The question is "what evidence does this delivery require?"

Evidence must map to acceptance criteria

Many projects enforce evidence by changed file type. If frontend files changed, screenshots are required. If service or database code changed, data proof is required.

That is already much better than no evidence. But the stronger version maps evidence to acceptance criteria.

If the task has three acceptance criteria, the manifest should answer:

which test or screenshot proves the first one;
which API output or log proves the second one;
whether the third is uncovered, and why.

That lets reviewers decide whether the AI solved the user problem, not merely whether it ran some commands.

The evidence manifest should be a file

Evidence should not live only in chat.

An evidence manifest can include:

task ID or PR;
change summary;
acceptance criteria;
evidence for each criterion;
test commands and results;
screenshot or data proof paths;
checks that were not run and why;
residual risks;
generation time;
worker or tool version.

The manifest does not guarantee correctness. It gives reviewers something durable to inspect.

Different projects need different evidence

Evidence contracts must be project-specific.

In systems like TidalFi, changes that touch APIs, services, databases, queues, Redis, or event flows cannot rely only on unit tests. They need data proof. Frontend flow changes need screenshots. Release-related changes need a release boundary and production verification.

In SketchUp Agent Harness, "there is a visible model" is not enough. The project needs to know where the model came from, whether the structured design model is consistent, whether the bridge trace is explainable, whether the SketchUp scene came from a clean replay, and whether visual review is backed by source evidence.

In knowledge publication, "the article was generated" is not enough. The system needs source trace, bilingual siblings, series metadata, language switching, site build validation, and clear ownership between knowledge and site.

Without evidence, it is not done

This rule changes AI behavior.

Without an evidence gate, AI tends to declare completion in natural language. With an evidence contract, AI must collect test results, screenshots, logs, traces, and risk notes during execution.

It behaves more like an engineering worker and less like a chat assistant.

Conclusion

The completion standard for AI delivery should not be "AI believes it is done."

Done should mean that the acceptance criteria in the task contract have matching evidence, missing coverage is explicitly stated, and high-risk boundaries were not crossed silently.

That is the value of the evidence contract.

Originally published on my personal site:
https://marlinbian-site.pages.dev/writing/evidence-contract-for-ai-delivery/

More links: GitHub · YouTube · LinkedIn · Bluesky · Mastodon · Discord

Put AI in an Isolated Workspace: Real Projects Need Stage-Gated Workers

Xu Bian — Tue, 12 May 2026 11:14:16 +0000

If AI is going to work inside a real project, it should not freely modify your main working directory.

Your normal checkout may contain uncommitted changes, temporary debugging, unsynced documents, production configuration, or personal scripts. If an AI agent deletes, rewrites, or runs the wrong thing there, accountability becomes messy.

That is why a project-specific AI delivery pipeline needs work isolation and stage-gated workers.

Isolation is not distrust

Isolation does not mean AI will always fail. It is normal engineering practice.

Human engineers use branches, PRs, CI, review, and release gates. AI should be placed in a similarly traceable workspace.

Isolation can take many forms:

a Git branch;
a worktree;
a slot workspace;
a sandbox;
a temporary project directory;
a dedicated evidence directory;
constrained tool permissions.

The important point is that AI execution is separated from the user's normal workspace.

A branch can also be a claim lock

In multi-worker settings, isolation also solves concurrency.

If two AI workers handle the same issue, they may overwrite each other, open duplicate PRs, or corrupt task state. A clear branch claim or workspace slot tells the system that the task is already being handled.

That is more reliable than a chat message saying "I am working on it."

Project state should live in GitHub, issue comments, labels, branches, manifests, or another external system, not only in the model context.

Workers should not jump to the end

AI tends to treat a task as one continuous action: read, edit, test, summarize.

Real projects need stages.

A more stable worker flow is:

triage
-> analysis
-> implementation
-> validation
-> evidence packaging
-> PR or handoff

High-risk systems should separate even more:

release readiness
-> release approval
-> deployment
-> production verification
-> done

Each stage has different permissions. Triage can read code but should not freely edit business logic. Analysis can propose a plan but should stop on high risk. Implementation can modify files but must produce evidence. Release should usually be a separate boundary.

What stage gates do

Stage gates are not there to slow everything down. They make failures earlier and cheaper.

Finding unclear requirements during triage is cheaper than finding them in a PR. Finding high risk during analysis is cheaper than after implementation. Finding missing evidence before review is cheaper than finding it in production.

This follows the same logic as a deployment pipeline: each stage increases confidence and usually costs more. The later the stage, the more careful the process should be.

Guarded full-speed is not blind execution

I like full-speed execution, but only as guarded full-speed.

Guarded means:

the task contract is clear;
the context package is prepared;
the workspace is isolated;
high-risk boundaries are written down;
validation commands and evidence requirements are defined;
stop conditions are enforced.

Under those conditions, AI can ask fewer unnecessary questions, keep moving, debug failing tests, and package evidence.

Without those conditions, "full speed" is just faster loss of control.

Conclusion

An AI worker that enters a real project should work in an isolated space and move through stages.

This does not reduce efficiency. It lets the project absorb AI efficiency safely.

The useful goal is not "AI finishes everything in one shot." The useful goal is "AI works in the right space, with the right permissions, through the right gates, up to the right delivery boundary."

Originally published on my personal site:
https://marlinbian-site.pages.dev/writing/isolated-stage-gated-ai-worker/

More links: GitHub · YouTube · LinkedIn · Bluesky · Mastodon · Discord

Give AI the Context It Should See, Not the Whole Repository

Xu Bian — Tue, 12 May 2026 11:13:01 +0000

Many AI task failures do not happen because the model cannot modify code. They happen because the model reads the wrong context.

It may trust outdated docs, treat a roadmap as fact, read an AI-assistance note as a product rule, generalize from one failed sample, or find a similarly named implementation that is already obsolete.

That is why a project-specific AI delivery pipeline should not simply tell AI to "read the repository." It should prepare a context package for the task.

More context is not always better

It is tempting to give AI everything. In real projects, that often makes the result worse.

Too much context distracts the model. Messy documentation makes temporary decisions look permanent. Long chat history can revive decisions that were already rejected.

The project needs narrow context, not maximum context.

Narrow context means the task receives the facts it needs, and misleading material is not exposed by default.

Source of truth must be explicit

The most important field in a context package is the source of truth.

In a codebase, the current code and tests often outrank old docs. In a product project, public capability claims should be grounded in README, user docs, release artifacts, and runnable demos. In a knowledge publication system, public content must trace back to sources, not only to summaries.

Without an explicit source of truth, AI tends to treat available materials as if they are equally reliable.

That is dangerous because real project materials have hierarchy:

code and tests may outrank old documents;
current product docs may outrank early roadmaps;
project-local facts may outrank central methodology;
raw evidence may outrank AI summaries;
public claims must be more conservative than internal plans.

The context package should state these priorities.

What a context package contains

A useful package can be simple:

project rules relevant to the task;
source-of-truth files;
relevant specs, ADRs, issues, PRs, or runbooks;
relevant tests, fixtures, screenshots, or traces;
known failures;
validation commands;
files that may be changed;
files that must not be touched;
precedence rules when context conflicts.

This is more controlled than letting the agent search the whole repository on its own.

This is where project-specific value appears

General agent tools can provide strong models, shell access, file editing, MCP, subagents, and hooks. They cannot know your project's truth hierarchy by default.

For example, a TidalFi worker must know which paths involve money, trading, KYC, or production release. A SketchUp Agent Harness worker must know how the design model, source evidence, and SketchUp scene relate. A knowledge publication worker must know that the knowledge base owns bilingual candidates while the site owns rendering and deployment.

These are not generic tool facts. They are project context.

Context packages also prevent overgeneralization

AI easily turns one example into a general rule.

In a design tool, one floor plan repair should not become a universal product rule. In a trading system, one issue-specific fix should not redefine global business semantics. In a knowledge system, one project's directory habit should not be copied into every project.

A context package can label the material:

fact;
evidence;
inference;
project-local rule;
reusable method;
source-specific interpretation that must not be generalized.

That reduces the chance of local experience leaking into global rules.

Conclusion

Giving AI context does not mean dumping the whole repository into the conversation.

What works is a task-level context package: small, accurate, prioritized, source-grounded, and bounded.

Much of the value of a project-specific AI delivery pipeline comes from this. It does not make the model magically smarter. It makes the model work on the right layer of truth.

Originally published on my personal site:
https://marlinbian-site.pages.dev/writing/context-package-and-source-truth/

More links: GitHub · YouTube · LinkedIn · Bluesky · Mastodon · Discord

From Chat Request to Task Contract: Route the Work Before AI Executes

Xu Bian — Tue, 12 May 2026 11:12:55 +0000

The most common risk in AI-assisted development is not that the model cannot write code. It is that the model starts writing code too early.

A person says, "please fix this," and the agent quickly reads files, guesses the cause, edits code, runs tests, and summarizes the result. That can look productive. But the important project questions have not been answered yet: is the request mature enough, and how far is the AI allowed to go?

That is why a project-specific AI delivery pipeline starts with task intake and execution mode routing, not implementation.

Chat is not a task contract

Chat is good for discussion. It is not a reliable execution contract.

A conversation can mix complaints, guesses, goals, background, temporary ideas, and real acceptance criteria. A human can often separate those layers. AI may not do it consistently.

A task contract compresses the discussion into an executable boundary.

A minimal contract should include:

the user problem;
the expected behavior;
current evidence;
acceptance criteria;
explicit non-goals;
known risks;
the source of truth;
stop conditions.

Without these fields, AI is improvising inside an ambiguous task.

Route before execution

After the task contract, the pipeline should choose an execution mode.

I use four modes.

auto-run is for low-risk, bounded, testable, reversible work: small UI copy, obvious bugs, documentation fixes, or local test additions.

manual-triage is for unclear work or anything that may touch money, permissions, security, schema, production state, or core business semantics. AI can analyze, but it should not implement by default.

guarded-full-speed is for complex but bounded work. AI can keep moving, but only inside project-specific stop conditions, evidence requirements, and human gates.

spike is for exploration. A spike produces decision material, not production-ready delivery.

These modes are not about how smart the model is. They are about how much execution authority the project is willing to grant.

Auto-run must be opt-in

Automation should not be on by default.

Issues, product discussions, and project notes contain many things that should not be executed automatically: user feedback, product ideas, incidents, compliance questions, and temporary notes.

Only explicitly opted-in tasks should let AI claim, analyze, implement, and open PRs automatically.

The opt-in is not just a label. It means the project has confirmed that scope, risk, and acceptance are clear enough for the pipeline to handle.

Stop conditions matter more than prompt advice

Many prompts say, "ask me if you are unsure." That is too weak.

The project should encode stop conditions in the task contract or execution rules. Examples:

acceptance criteria are missing;
requirements conflict;
core business semantics would change;
production data would be modified;
money, trading, permissions, security, or compliance is involved;
the evidence gate fails;
AI cannot find a reliable source of truth.

When these conditions appear, AI should not keep trying to finish. It should move the task to a state such as needs-input, review-required, or failed-needs-human.

This is not bureaucracy

Task contracts may look like process overhead. In practice, they reduce rework.

Without a contract, AI may produce a patch quickly, but you spend more time deciding whether it solved the real problem, expanded scope, or crossed a boundary.

With a contract, the work is narrower, clearer, and easier to validate. Even failure becomes useful because you can locate whether the failure came from requirements, context, implementation, tests, or evidence.

That is the first rule of the delivery pipeline: turn the request into an executable contract, then decide whether AI is allowed to act.

Originally published on my personal site:
https://marlinbian-site.pages.dev/writing/task-contract-and-execution-mode-router/

More links: GitHub · YouTube · LinkedIn · Bluesky · Mastodon · Discord

AI Is Not the Process: What a Project-Specific AI Delivery Pipeline Means

Xu Bian — Tue, 12 May 2026 11:12:13 +0000

Many people use AI for coding by placing the whole workflow inside one chat: describe the task, ask the agent to read the repository, edit files, run tests, and summarize the result.

That works for small experiments. It becomes fragile in long-running projects, shared repositories, production systems, or professional software automation. The problem is not only whether the model is smart enough. The problem is that the model is being asked to own too much of the delivery process.

The better pattern is to place AI capability inside a project-specific delivery pipeline.

AI is the worker. The project pipeline constrains, validates, records, and escalates.

Why project-specific matters

A general AI tool cannot know a project's real risk boundaries by default.

In a trading system, payout, KYC, funded accounts, order states, and production release are hard boundaries. In a SketchUp modeling tool, the real boundaries are the structured design model, source evidence, bridge trace, SketchUp execution, and visual review. In a personal knowledge publishing system, the boundaries become source traceability, bilingual publication candidates, site rendering, and deployment ownership.

These constraints do not come from a generic model. They come from project truth.

So the goal is not to build a more general replacement for Codex or Claude Code. The goal is to build a stable AI delivery pipeline inside a real project.

What the pipeline owns

A useful project-specific AI delivery pipeline must answer questions like these:

Is this request mature enough to execute?
Should AI run automatically, analyze only, move fast under guardrails, or only run a spike?
What context should AI receive before execution?
What can AI change, and what is out of bounds?
When must the AI stop and ask for a human?
What evidence proves the work is complete?
Should the result become a PR, a release candidate, a knowledge note, or only an experiment record?

If the project does not answer these questions through its own mechanisms, the AI is still improvising inside a chat.

A minimal structure

I break the pipeline into a few parts.

Task Intake turns discussion into an executable task contract.

Execution Mode Router decides how much autonomy AI gets.

Context Package gives the AI the narrow context it should see.

Work Isolation keeps AI execution inside a branch, worktree, slot workspace, or sandbox.

Stage-Gated Worker separates triage, analysis, implementation, validation, evidence packaging, and handoff.

Evidence Contract requires tests, screenshots, API output, logs, traces, or other reviewable proof.

Human Gate puts humans at real risk boundaries.

Feedback Capture turns repeated failures into rules, tests, skills, templates, or knowledge base entries.

Together, these parts are what I mean by a harness. It is not a prompt. It is not a single tool. It is the project control layer that lets AI participate in delivery.

Where TDD fits

TDD is useful, but it is not the whole answer.

When a behavior is clear and testable, writing tests before implementation is a strong pattern. But many real tasks are not function-level exercises. Frontend changes need screenshots. Data-link changes need API or log proof. SketchUp modeling needs structured model diffs and visual review. Knowledge publication needs source trace and bilingual route validation.

So the better rule is not "everything must be TDD." The better rule is "every delivery must have an evidence contract."

Tests are one kind of evidence. They are not the only kind.

Why this is more stable than vibe coding

Vibe coding is fast. Its weakness is that boundaries and evidence are often too weak.

A project-specific AI delivery pipeline does not reject speed. It puts speed on rails.

Low-risk tasks can auto-run. Complex but bounded tasks can run in guarded full-speed mode. High-risk tasks should stop at analysis or human confirmation. Exploratory work can be a spike, but it should not be treated as production-ready delivery.

AI can still move quickly. It just does not get to mix immature requirements, high-risk actions, and unverified completion into one vague "done."

The core idea

The future value is not just making AI more impressive inside a chat window. The value is making projects better at using AI inside their delivery systems.

Models will change. CLI tools will change. MCP, hooks, skills, and subagents will change.

The durable asset is the project mechanism: how tasks are defined, how context is provided, how execution is constrained, how evidence is collected, how humans intervene, and how failures improve the next run.

That is the value of a project-specific AI delivery pipeline.

Originally published on my personal site:
https://marlinbian-site.pages.dev/writing/project-specific-ai-delivery-pipeline/

More links: GitHub · YouTube · LinkedIn · Bluesky · Mastodon · Discord

From Chat Advice to a Durable Design Project

Xu Bian — Thu, 07 May 2026 03:57:02 +0000

Many AI design experiences stop inside a chat box.

You describe a spatial problem, and the AI gives advice. You add a few images, and it analyzes them. You ask for another style, and it writes a new direction. In the short term, this is useful. But once a project runs for days or weeks, the chat box starts to show its limits.

Advice becomes scattered. Versions blur. Source material loses context. A decision that made sense yesterday may be impossible to explain tomorrow.

A real design project needs more than advice. It needs a durable project surface.

Why chat advice is not enough

Chat is a good thinking interface, but it is not a stable project container.

It has three natural weaknesses.

First, chat history is a weak source of truth. One decision may hide inside ten messages, one dimension may come from an image, and one rule may be a temporary instruction you typed in a hurry. The next time AI makes a change, it can easily miss one of those facts.

Second, chat history is weak at versioning. You can tell AI to "change it according to what we just discussed," but the project needs to know what changed, why it changed, whether it affects other spaces, and whether it can be rolled back.

Third, chat history is weak at holding material. Drawings, references, components, materials, screenshots, feedback, and rules all need organization. If they are only dropped into a chat window, it becomes difficult to review provenance and validity later.

If an AI design tool only optimizes chat, it becomes a smart but forgetful consultant.

The project workspace is the ground of the workbench

A more reliable approach is to put each design task inside its own project workspace.

That workspace is not just a folder. It is the shared operating surface between the designer and AI.

It can hold:

the current design model;
project rules;
source drawings and material;
components and material lists;
screenshots, renderings, and review records;
version differences;
temporary memory valid only for this project;
next actions.

With this structure, AI does not have to infer project state from chat history every time. It can read explicit files, modify explicit files, produce explicit artifacts, and write important changes back to the project.

For designers, this is closer to real work. A design project is not completed by one sentence. It is made of material, rules, models, drawings, communication, and versions.

A durable project needs structured truth

A project workspace is still not enough. It needs a clear structured truth layer.

In a direction such as SketchUp Agent Harness, that layer can be design_model.json. It records spaces, dimensions, components, rules, assumptions, and execution state.

The point is not that designers should hand-write JSON. The point is that AI, the designer, and the software need a shared fact layer.

If the design only lives inside a SketchUp scene, AI has a hard time knowing which parts came from source evidence and which parts are temporary operations. If the design only lives in screenshots, it is hard to repair. If the design only lives in chat, it is even harder to compare versions.

A structured truth layer lets the system answer:

what spaces and components exist now;
where the dimensions came from;
which assumptions are still unconfirmed;
which feedback has been accepted;
which rules affected this change.

That is the difference between a durable project and one-off advice.

AI should turn conversation into project state

Chat still matters. Designers should be able to say naturally: "This entrance is too narrow. Move the cabinet beside it a little."

The key is that AI should not only reply, "Sure, I will adjust it." It should turn that sentence into project state:

identify the affected space and component;
check whether project rules allow the change;
create a modification plan;
update the structured model;
execute into SketchUp or prepare execution;
leave a change record;
generate a screenshot for confirmation when needed.

Then conversation does not evaporate.

Each useful exchange should move the project state forward.

A good workbench lets projects pause and resume

Real projects are often interrupted. You work on one version today, switch to another client tomorrow, and return three days later.

If an AI design tool depends only on chat memory, resuming the project becomes painful. You have to explain the background again, upload material again, and remind the tool of the rules again.

A workbench-style tool should let the project pause and resume. When the designer opens the project again, AI can see the current model, rules, material, snapshots, accepted feedback, and unfinished tasks.

This is not about giving AI infinite memory. It is about keeping memory in the place where it belongs: the project.

Over time, the real value of an AI design tool is not making every chat answer prettier. It is turning chat, material, model state, and feedback into a design project that can continue.

Originally published on my personal site:
https://marlinbian-site.pages.dev/writing/from-chat-advice-to-durable-design-project/

More links: GitHub · YouTube · LinkedIn · Bluesky · Mastodon · Discord

Which Design Rule Wins: Defaults, Personal Preferences, Project Rules, or Session Instructions?

Xu Bian — Thu, 07 May 2026 03:49:12 +0000

The deeper an AI design tool enters real projects, the more rule conflicts it will face.

A designer has personal preferences. A project has explicit requirements. A client gives temporary feedback. The product itself has defaults. Without priority, AI can easily mix these layers together.

My recommended starting order is:

product defaults
-> designer profile
-> current project rules
-> explicit instruction in the current session

Why project rules should beat personal preferences

Personal preferences matter. A designer may have long-term preferences for clearance, material expression, storage strategy, or lighting.

But once the work enters a specific project, project rules should take priority.

The reason is simple: project rules usually come from the client brief, site conditions, budget, construction limits, and specific users. They are not abstract taste. They are the boundaries of the current project.

If a personal profile overrides project rules, AI may force an old habit into a project where it does not fit.

Why current instructions should be highest

An explicit instruction in the current session usually represents a local design decision.

For example, the project may prefer wider circulation, but the designer says: "For this temporary option, allow the passage to become narrower so I can see how much storage improves."

AI can execute that local instruction, but it must know that this is a session instruction. It is not a permanent project rule, and it is not an update to the designer's profile.

Current instruction being highest does not mean it can silently rewrite long-term rules.

Every rule should know its source

Priority is important, but provenance is just as important.

AI should be able to explain:

did this rule come from product defaults or the designer profile?
is it a current project rule or a temporary session instruction?
did it override a lower-priority rule?
should it be written back to the project?
does it need confirmation before being saved long term?

Without source explanation, "following rules" becomes a black box.

Do not turn project exceptions into global habits

Design projects often contain one-off exceptions.

One client prefers a special material. One floor plan has unusual circulation. One source drawing needs a special interpretation. These can be recorded, but they should not automatically become global rules.

A good AI design workbench should separate:

product defaults: suitable for all users;
designer profile: long-term habits for one designer;
project rules: valid only for the current project;
session instructions: valid only for the current operation.

Only with this separation can AI avoid polluting future projects with accidental decisions from one project.

Originally published on my personal site:
https://marlinbian-site.pages.dev/notes/design-rule-precedence-for-ai-designers/

More links: GitHub · YouTube · LinkedIn · Bluesky · Mastodon · Discord

How Visual Feedback Becomes Structured Design Changes

Xu Bian — Thu, 07 May 2026 03:48:34 +0000

Designers read images quickly.

A top view, screenshot, or rendering can reveal problems faster than text: the proportions are wrong, the entrance is too narrow, the furniture is crowded, the wall direction is strange, or the lighting emphasis is off.

That makes visual feedback an important part of an AI design workbench.

But visual feedback can also become a new source of disorder. If it stops at "this looks wrong," or if AI only patches the current picture, the project quickly fills with screenshots, opinions, and one-off fixes.

The key question is: how does visual feedback return to the structured design project?

The image is not the final truth

Screenshots and renderings are useful, but they are not the only source of truth for a design project.

An image can show what appears to be happening, but it usually cannot answer:

does this problem come from spatial dimensions or camera perspective?
is the model wrong, or is the rendering expression wrong?
should the material change, or should the lighting change?
was the source drawing interpreted incorrectly, or was placement wrong later?
should this feedback affect project-wide rules or only this version?

If AI directly edits the image, the short-term output may look better, but the reason is lost.

A reliable workbench should treat images as review artifacts, not final truth.

Feedback needs classification

When a designer says "this is wrong," AI should not immediately make a blind edit. It should first classify the feedback.

Common categories include:

model structure issue: a space, wall, opening, furniture item, or dimension needs a change;
source evidence issue: an original drawing, scan, photo, or annotation was interpreted incorrectly;
design rule issue: clearance, proportion, circulation, lighting, style, or preference rules need adjustment;
presentation issue: camera angle, lighting, material, or rendering settings created the confusion;
project-local memory: this client or site has a specific preference that should not become a global rule.

After classification, AI can decide where the change belongs.

For example, "the dining table is too close to the circulation path" is not just an aesthetic comment. It may mean the table component should move, the clearance rule should be checked, or the current camera angle exaggerates crowding.

Those are three different actions.

Accepted feedback should become structured action

Visual feedback should enter the project through a conversion process.

First, record the feedback: which image, which version, who said it, and what the issue is.

Second, locate the object: which space, wall, opening, component, material, light, or rule is affected.

Third, decide the action: modify the model, update source interpretation, adjust a rule, rerender, or record a project preference.

Fourth, execute and leave a diff. AI changes the structured model or related files, not just a new image.

Fifth, generate a new review artifact so the designer can confirm whether the problem was solved.

This loop may look slower, but it makes the project repairable, reviewable, and continuous.

Top views are good at exposing structural errors

In spatial design and modeling, top views have a special value: they reveal structural errors more easily than perspective views.

Perspective views can hide problems. Top views expose broken walls, wrong opening directions, unclosed room boundaries, reversed balcony orientation, or furniture intruding into circulation.

But a top view is still only a review artifact. It finds problems; it does not own the truth.

The repair should still return to the structured model, source evidence, or design rules.

If a screenshot reveals that "the door is in the wrong position," AI should ask or infer: was the doorway evidence interpreted incorrectly, was the door component placed incorrectly, or did the rule change?

That is how visual feedback enters the workbench.

Do not let screenshots become another junk drawer

Many AI workflows produce lots of images. Each image can generate opinions, and each opinion can generate more images.

Without structured write-back, the project moves from a chat junk drawer to a screenshot junk drawer.

A good AI design workbench should control this risk:

each important screenshot has provenance, version, and purpose;
each important feedback item can be traced to an object and an action;
accepted feedback is written back to project state;
rejected feedback can still explain why it was not executed;
a preference scoped to one project should not become a global product rule.

The value of visual feedback is not that AI produces more images. It is that designers can find problems faster and turn them into executable, verifiable design changes.

Originally published on my personal site:
https://marlinbian-site.pages.dev/writing/visual-feedback-to-structured-changes/

More links: GitHub · YouTube · LinkedIn · Bluesky · Mastodon · Discord

A Designer's Core Value Is Judgment, Not Faster Drafting

Xu Bian — Thu, 07 May 2026 03:46:57 +0000

When AI enters design work, the conversation often narrows to one question: who can draw faster?

If AI can produce images faster, build models faster, and revise options faster, does that reduce the value of a designer?

The question sounds sharp, but it defines design work too narrowly. Drawing and modeling matter, but they are only part of design expression and execution. In real projects, the scarce ability is not only drawing the line. It is knowing what is worth drawing, why it should be drawn that way, and when the project should stop drawing in the wrong direction.

Speed is not the only variable

Faster execution changes the industry, but speed does not automatically create good design.

A space can produce ten options quickly, but without judgment those options are just ten forms of visual noise. A model can be filled with furniture, lighting, and materials quickly, but without tradeoff it may only express the wrong direction more completely.

AI is good at expanding possibilities. It can turn one idea into several versions, batch repetitive actions, and organize feedback into tasks.

But design projects also need convergence. A designer decides which possibilities should remain, which should be deleted, which are beautiful but wrong for the project, and which look conservative but fit the goal better.

That is the core division of labor: AI increases options and executes actions; the designer judges direction and accepts results.

Judgment happens at many levels

Design judgment is not only "does this look good?"

It happens across multiple layers:

Goal judgment: what problem is this space trying to solve?
Constraint judgment: which budget, size, circulation, lighting, construction, or user habit cannot be broken?
Priority judgment: what comes first among beauty, storage, comfort, cost, buildability, and maintainability?
Feedback judgment: when a client says "I don't like it," is the problem style, proportion, material, or presentation?
Version judgment: what actually improved compared with the previous version?

These judgments are not replaced by one generated image.

AI can participate in the judgment process. It can list differences, expose conflicts, remind the designer of missing constraints, and simulate alternatives. But accepting a direction and taking responsibility for a tradeoff still belongs to the designer.

AI should reduce low-level execution, not absorb judgment

If an AI tool packages all value as "automatic generation," the designer is left standing beside the machine, judging whatever it gives back. That is shallow collaboration.

A better collaboration model lets AI handle low-level execution while the designer keeps high-level judgment.

For example:

the designer states the goal, and AI turns it into a clearer design brief;
the designer provides a plan or source material, and AI creates an editable working model;
the designer says "this circulation feels wrong," and AI maps the issue to a space, opening, furniture item, or rule;
the designer accepts feedback, and AI writes it as a structured change rather than only editing an image;
the designer asks to compare versions, and AI summarizes the differences and risks.

In that mode, AI does not replace the designer. It turns design judgment into actions that can be executed, recorded, and reviewed.

Designers should not become prompt operators

Another common mistake is imagining future designers mainly as people who know how to write prompts.

Prompts are useful, but prompt writing should not become the new core job. Otherwise designers only move from CAD operators to AI instruction operators.

A valuable AI design workbench should reduce the translation cost between the designer and the tool.

Designers should be able to speak in normal design language: this space feels compressed, this circulation is unclear, this cabinet should not dominate the visual center, this corridor needs safer width for an elderly user.

The workbench should connect that language to project facts: which space, which component, which rule, which dimension, which version, which source drawing.

When natural language can reliably land on project facts, designers do not need to keep learning machine-friendly spells.

The designer's value moves earlier

AI will make some execution skills cheaper. That is not a problem by itself.

If repetitive drafting, modeling, organizing, and small revisions are absorbed by tools, the designer's value moves earlier in the workflow: defining the problem, organizing information, creating rules, filtering options, explaining tradeoffs, and turning feedback into next actions.

This also means designers cannot rely only on "I know how to operate the software." Software operation still matters, but it increasingly becomes one expression of design judgment rather than the whole value of the designer.

In the AI era, the strongest designers will be able to say clearly:

why this direction is worth pursuing;
which constraints have been accepted;
which possibilities were rejected;
why this version is better than the previous one;
what AI can execute, and what must return to the designer for confirmation.

That is the ability an AI design workbench should amplify.

Originally published on my personal site:
https://marlinbian-site.pages.dev/writing/designer-judgment-not-drafting/

More links: GitHub · YouTube · LinkedIn · Bluesky · Mastodon · Discord

AI Workbench for Designers: How Designers Should Work With AI

Xu Bian — Thu, 07 May 2026 03:46:54 +0000

When people talk about AI design tools, the default question is often: can AI draw for me? Can it make renderings? Can it turn one sentence into a model?

Those questions matter, but they are not the whole workflow a designer needs over time.

Designers do not only need a faster drawing button. They need a workbench that can hold design judgment across a project: where the intent lives, where source material lives, who maintains the rules, how the model state changes, how visual feedback returns to the design, and why each revision happened.

That is why I prefer to think in terms of an AI workbench for designers, not a chatbot, image generator, or automatic drafting plugin.

A workbench is different from a chat tool

Chat tools are good at answering questions. You can ask for color ideas, style references, spatial suggestions, circulation advice, or material options. They can provide inspiration and help organize language.

But a real design project does not stop at one answer.

A project changes over time. A client revises the brief. Site conditions shift. The budget tightens. A designer rejects yesterday's decision. The model becomes inconsistent. A screenshot reveals a new problem.

If AI is only a chat interface, it struggles to answer questions like these:

What design rules are currently active?
Which dimensions came from source material, and which ones are inferred?
Which space did the last client comment actually change?
Should this rendering issue change the model, material, lighting, or just presentation?
What changed between the current version and the previous one?

The value of a workbench is that it does not treat conversation as the only container. It turns conversation into project state that can be saved, checked, and repaired.

The designer remains the judge

The most dangerous misunderstanding is that AI turns designers into people who merely give commands to a machine.

The better division of labor is the opposite.

AI can help with lower-level execution: organizing material, creating a first working model, trying variations, checking rules, preparing screenshots, recording versions, and translating feedback into concrete actions.

But the designer still owns the important judgment:

what the goal is;
which constraints cannot be broken;
what counts as a good proposal, not just a busy image;
which feedback should be accepted and which feedback is temporary noise;
which rules should be updated and which ones are only local exceptions.

A good AI design workbench should not turn the designer into a prompt operator. It should reduce repetitive drafting, searching, organizing, and repair work so the designer can focus on intent, tradeoffs, and acceptance.

A workbench needs at least six layers

From the outside, an AI design workbench may look like a natural-language request: "Make this living room better for reading."

Behind that request, at least six layers should be active.

The first layer is design intent. The AI needs to understand the problem you are trying to solve, not only the words you typed.

The second layer is project material. Floor plans, references, components, materials, constraints, client feedback, and site information need provenance.

The third layer is a structured model. A design cannot live only in screenshots and chat history. Spaces, dimensions, components, rules, assumptions, and versions need an editable fact layer.

The fourth layer is design rules. Personal preferences, project requirements, session instructions, and product defaults may all exist at the same time. Their priority has to be explicit.

The fifth layer is professional software execution. AI should not only give advice. It should be able to execute structured design state into tools such as SketchUp.

The sixth layer is visual review and repair. Screenshots, renderings, and top views can reveal problems, but they should not become the final source of truth. Accepted feedback should return to the structured model, rules, or source evidence.

When these layers work together, a design project is no longer a loose pile of chat suggestions and images.

SketchUp Agent Harness is one example

The core idea behind SketchUp Agent Harness is that natural-language control of SketchUp is only the entry point. The more important question is how to turn a design project into a durable workflow.

In that direction, Codex or Claude is not the entire product. They are entry points into the workbench. The important pieces are the project workspace, structured design model, runtime skills, SketchUp bridge, component information, import evidence, snapshot records, and repair loop.

This does not mean the product is already mature enough for every design scenario. Many parts still need iteration: better floor-plan import, more reliable component libraries, richer design rules, and a stronger visual feedback loop.

But the direction is clear: an AI design tool should not only chase "type one sentence and get a result." It should help designers maintain a design project that remains editable, traceable, and repairable over time.

What this series covers

This series is not a low-level protocol guide, and it is not a developer architecture document. It is about how designers can collaborate with AI.

The next pieces will discuss:

why a designer's core value is judgment and tradeoff, not faster drafting;
why chat advice should become a durable design project;
how visual feedback can move from "this looks wrong" into a structured change;
which rule should win when personal preferences, project constraints, and current instructions conflict.

My position is simple: AI should not take over the designer's judgment. It should become the designer's workbench, organizing intent, material, rules, model state, feedback, and repair so real projects can move forward more reliably.

Originally published on my personal site:
https://marlinbian-site.pages.dev/writing/ai-designers-workbench/

More links: GitHub · YouTube · LinkedIn · Bluesky · Mastodon · Discord