DEV Community

Steve Gonzalez
Steve Gonzalez

Posted on

I Replaced My AI Chat Interface With a Terminal Shell

Most AI tools give you a chat window. You type, the model responds, you copy what you need and paste it somewhere else. The conversation and the artifact live in different places.

I wanted the artifact to appear next to the conversation, stream in as it was generated, and stay there — editable, persistent, tab-switchable — without ever leaving the terminal.

So I built CAS: Conversational Agent Shell.

What it looks like

┌─ chat ──────────────────────┐ ┌─ [l] Todo List For Easter ──────────────┐
│                             │ │                                          │
│ you › make a todo list for  │ │  Easter Todo List                        │
│       easter                │ │                                          │
│                             │ │  ## 🗓 Planning & Budget                 │
│ cas › Created list          │ │                                          │
│       workspace "Todo List  │ │  [ ] Set date and time for Easter Sunday │
│       For Easter".          │ │  [ ] Confirm guest list and RSVPs        │
│       Edit directly or ask  │ │  [ ] Create budget for food              │
│       me to make changes.   │ │  [ ] Check family availability           │
│                             │ │  [ ] Book reservations                   │
│ > █                         │ │                                          │
└─────────────────────────────┘ └──────────────────────────────────────────┘
  ↑↓ scroll  │  enter send  │  tab workspace  │  ctrl+n new session
Enter fullscreen mode Exit fullscreen mode

Left panel: conversation. Right panel: the workspace, streaming in as the model generates it. You stay in the terminal the whole time.

The idea

There's a debate in HCI that goes back to 1997.

Ben Shneiderman argued that direct manipulation gives users control that delegation never can. Pattie Maes argued that agents reduce cognitive load that direct manipulation can't scale to. Both were right. They were arguing about the wrong dichotomy.

CAS resolves it architecturally: agents generate, users manipulate.

You describe what you want. The agent produces it. Once it exists, you own it — you edit it directly, you scroll it, you tab between workspaces, you undo changes. The agent is a producer. You are the controller.

How messages flow

Every message passes through a zero-latency routing layer before any model is called.

Intent detection is pure regex — sub-millisecond, deterministic. The routing decision fires before the LLM even knows a message arrived.

"write a project proposal"          → create workspace (document)
"make a todo list"                   → create workspace (list)
"create a python script"            → create workspace (code)
"add a conclusion section"          → edit active workspace
"run it"                            → execute code workspace
"combine the proposal and checklist" → merge workspaces
"standup"                           → run Lua plugin
"how long should this be?"          → chat reply
Enter fullscreen mode Exit fullscreen mode

Plugins are checked first. Then close, run, combine, edit, create — in that priority order. Self-edit phrases like "I'll fix it myself" are caught before the edit patterns fire. The ordering matters.

Deterministic contracts

Every workspace operation passes through a contract layer:

contract.CheckPreconditions()   // is this operation permitted?
contract.CheckInvariants()      // are all invariants satisfied?
contract.CheckPostconditions()  // did the output meet requirements?
Enter fullscreen mode Exit fullscreen mode

These run in Go, not in the model. The model cannot modify, bypass, or reason about them. Any violation fails the operation closed. Based on Bertrand Meyer's Design by Contract (1986) — a 40-year-old idea that turns out to be exactly right for agentic systems.

Code execution

Say run it with an active code workspace. CAS detects the language from content (bash, Python, Go, JavaScript, Ruby), writes to a temp file, and executes in a sandboxed subprocess.

you › create a python script to compute fibonacci
     → [c] tab opens, tokens stream in

you › run it
     → ran python (23ms, exit 0)
       1, 1, 2, 3, 5, 8, 13, 21, 34, 55
Enter fullscreen mode Exit fullscreen mode

Process group isolation, restricted environment (only PATH inherited), 30-second timeout that kills the entire tree. No LLM call — intent detection routes directly to the runner.

Cross-workspace operations

With multiple tabs open, CAS resolves which workspace you're addressing by fuzzy-matching title fragments:

"update the proposal"                → targets "Project Proposal"
"add the script code to the report"  → edits Report with Script as LLM context
"combine the proposal and checklist" → new workspace from both sources
"merge all workspaces"               → synthesizes everything into one
Enter fullscreen mode Exit fullscreen mode

Edits that reference another workspace by name include that workspace's content in the LLM prompt automatically.

Lua plugins

Drop .lua files in ~/.cas/plugins/ to add custom commands without recompiling:

-- ~/.cas/plugins/standup.lua
cas.command("standup", "Daily standup", function()
    local ws = cas.workspaces()
    local lines = {}
    for i, w in ipairs(ws) do
        lines[i] = "- " .. w.title .. " (" .. w.type .. ")"
    end
    cas.reply(table.concat(lines, "\n"))
end)
Enter fullscreen mode Exit fullscreen mode

Type standup and the plugin runs — no LLM call, sub-millisecond. The Lua VM is sandboxed: no file I/O, no os.execute, no network. API: cas.command(), cas.reply(), cas.workspaces(), cas.active().

Multi-provider

# Ollama — local, private, no API key
./cas

# Anthropic — cloud, no GPU required
export CAS_PROVIDER=anthropic
export ANTHROPIC_API_KEY=sk-ant-...
./cas
Enter fullscreen mode Exit fullscreen mode

Documents and lists route to qwen3.5:9b locally or Sonnet on Anthropic. Code routes to qwen2.5-coder:7b locally or Haiku. All overridable via env vars.

The stack

Single static Go binary. No runtime, no server, no browser. SSH to a remote machine and run it.

internal/
├── intent/      Regex intent detection — 7 intent kinds
├── contract/    Design by Contract enforcement
├── workspace/   Lifecycle: create, update, undo, close
├── shell/       Session manager + workspace resolver
├── llm/         Ollama + Anthropic streaming
├── runner/      Code execution — sandboxed subprocess
├── plugin/      Lua plugin runtime (gopher-lua)
├── store/       SQLite (WAL) + in-memory store
└── conductor/   Behavioral learning
ui/              Bubble Tea TUI: split panel, tabs, streaming
Enter fullscreen mode Exit fullscreen mode

245 tests across all packages. 8 TUI integration tests that spawn the real binary in tmux and interact with it as a user would.

Quick start

# Requires Go 1.25+
git clone https://github.com/goweft/cas.git
cd cas
go build -o cas ./cmd/cas

# Local inference
ollama pull qwen3.5:9b && ollama pull qwen2.5-coder:7b
./cas

# Or cloud
export CAS_PROVIDER=anthropic
export ANTHROPIC_API_KEY=your-key
./cas
Enter fullscreen mode Exit fullscreen mode

Why a terminal

It's already where the work happens. It composes with existing tools — export to markdown, pipe to pandoc, commit to git. And it works over SSH: run CAS on a machine with a GPU, access it from a laptop without one.


Source: goweft/cas — Apache 2.0

Top comments (0)