Paulo Victor Leite Lima Gomes

Posted on Jun 4

agent canvases are the end of chat-only coding tools

#ai #agents #githubcopilot #developertools

GitHub added something to the Copilot app this week that sounds like a UI feature and feels like a category shift.

Canvases.

The obvious read is that GitHub is making the Copilot app nicer. More visual. More desktop-app-like. Less like typing into a narrow chat box and hoping the agent understood the assignment.

I think that undersells it.

The interesting part is not that the app has a prettier place to show work. The interesting part is that agent work is outgrowing the transcript. Once an agent can plan, edit, browse, run a terminal, open a pull request, respond to review comments, and continue across sessions, chat becomes a very weak control surface.

Chat is good for intent.

It is bad for state.

chat was a good starting point

Chat was the right first interface for AI coding tools.

It is familiar. It is forgiving. You can be vague, then get more specific. You can ask "what do you think?" before asking for a diff. You can paste an error and let the model do the first pass at reading the stack trace.

For simple work, that is enough.

"Explain this function."

"Write a test for this edge case."

"Change this component to use the new prop."

The chat transcript is not elegant, but it works because the task is small. The human can hold the goal, the context, and the result in their head.

Long-running agent work is different.

When an agent spends thirty minutes across a branch, a browser, a terminal, a checklist, and a pull request, the transcript becomes a pile of archaeology. Somewhere in there is the plan. Somewhere else is the reason it changed direction. Somewhere else is the test failure. Somewhere else is the final diff. You can scroll. You can search. You can ask the agent to summarize itself, which is both useful and slightly absurd.

That is not a serious review surface.

work needs an object

The GitHub changelog describes canvases as bidirectional work surfaces where people and agents inspect, edit, approve, and redirect work. The examples are telling: plans, pull requests, browser sessions, terminals, release checklists, migration boards, incidents, dashboards, workflow state.

That list matters because it points at the real problem.

Agentic software work does not produce only text. It produces objects:

a plan with steps and dependencies
a diff with intent and risk
a browser session with evidence
a terminal run with commands and failures
a checklist with ownership
a pull request with review state
a deployment or migration with progress

If those objects only exist as messages in a chat, the human has to reconstruct the work every time they want to make a decision.

That is backwards.

The interface should expose the object directly. The conversation should steer the work, but the work should live somewhere inspectable.

This is the same lesson every engineering system eventually learns. Issue trackers beat hallway conversations because the issue becomes a durable object. Pull requests beat emailing patches because the change gets a review surface. CI dashboards beat "it passed on my machine" because the result is attached to a run.

Agents need the same move.

agent experience is not just developer experience with a new label

GitHub is using the phrase "agent experience" for this, and I like it more than I expected.

Developer experience is usually about helping humans move through tools with less friction: editors, CLIs, docs, APIs, tests, deployment workflows, observability. Good DX makes the human less likely to lose context or do the wrong thing accidentally.

Agent experience adds another participant.

Now the workflow has to be usable by a human and legible to an agent. The agent needs structured state it can read. The human needs visible state they can trust. The app needs to enforce which actions are allowed. The system needs to remember enough history that a future reviewer can understand what happened without treating the chat log as the source of truth.

That is not cosmetic.

It changes how we should design internal engineering workflows.

If a migration board is agent-operable, the board cannot be a vague spreadsheet with tribal rules in comments. If a release checklist is agent-operable, the checklist needs clear states, owners, evidence, and stop conditions. If a pull request is agent-operable, the agent needs to know which review comments are instructions, which are discussions, and which require human judgment before it continues.

The UI is only the visible part.

The deeper work is making engineering process structured enough that both humans and machines can participate without guessing.

transcripts hide too much

The transcript has one big virtue: it preserves the conversation.

That is also its problem.

Conversations are messy. They include false starts, corrections, half-formed ideas, jokes, clarifications, and decisions that only make sense because both participants remember what happened five minutes ago. Humans are decent at that in the moment. We are worse at it later. Agents are even more dangerous because they can turn a messy history into a confident summary that sounds cleaner than the actual work was.

For agentic coding, I do not want the final state to depend on vibes from a transcript.

I want the plan to show current status.

I want the diff to show what changed and why.

I want the terminal surface to show which commands ran and which failed.

I want the browser surface to show what was verified.

I want the PR surface to show what remains unresolved.

The chat can remain useful for steering. But review needs evidence attached to the work object, not buried in the conversation that produced it.

This matters more as agents get better. A bad agent is easy to distrust. A good agent produces polished, plausible work. The interface has to make the uncertainty visible even when the prose sounds confident.

what i would look for

If I were evaluating agent coding tools for a team, I would spend less time asking whether the chat feels magical and more time looking at the work surfaces.

Can I inspect the current plan without asking the agent to summarize itself?

Can I edit the plan directly?

Can I see which files are in scope and which files are not?

Can I tell what commands ran and from which environment?

Can I tell why the agent changed direction?

Can the final pull request preserve enough evidence for review?

Those questions are boring in the best possible way. They are the questions that show whether an AI coding tool is becoming a production tool or staying a demo interface.

The magic text box is not enough.

It never was.

It was just enough to get us started.

the punchline

Agent canvases are interesting because they admit something the industry has been circling for a while: chat is not the final interface for software agents.

Chat is a good place to express intent. It is a poor place to manage durable, inspectable, reviewable work.

Software engineering already has objects for serious collaboration: issues, branches, pull requests, CI runs, dashboards, incidents, release plans, and deployment records. Agents do not remove the need for those objects. They make them more important, because more work can now happen between human decisions.

So yes, the Copilot app getting canvases is a product update.

But the deeper signal is architectural.

Agent work needs state outside the transcript. It needs surfaces where humans can inspect and redirect, where agents can read and update structured intent, and where the system can enforce boundaries.

The future of coding tools is not everyone chatting harder.

It is workbenches where conversation, state, evidence, and review finally live in the same place.

references

To test my projects, I use Railway. If you want $20 USD to get started, use this link.

DEV Community