DEV Community: Vyacheslav Mayorskiy

Databricks Integration Deep Dive: Teaching Aye Chat to Talk to Your Endpoint

Vyacheslav Mayorskiy — Tue, 03 Mar 2026 16:49:00 +0000

Databricks has a nice party trick: it can host an LLM endpoint that looks like an OpenAI chat-completions API.

Aye Chat has a different party trick: it can treat LLMs like a backend detail, as long as it gets a structured JSON response it can render and (optionally) apply to files.

This post is the handshake between the two.

Scope note: this is a deep dive of the current implementation of Aye Chat’s Databricks plugin (DatabricksModelPlugin). It’s not a general “Databricks + LLMs” tutorial.

Where the integration lives

The Databricks integration is implemented as a model plugin:

File: plugins/databricks_model.py
Class: DatabricksModelPlugin
Main hook: on_command()
Command it intercepts: local_model_invoke

In Aye Chat’s plugin architecture, model plugins can intercept local_model_invoke and return an LLM response object that the rest of the app can render and/or apply.

Activation: the plugin is silent unless you invite it

This plugin has strong “don’t bother me unless configured” energy.

It only activates when both env vars are present:

AYE_DBX_API_URL
AYE_DBX_API_KEY

The check is centralized:

def _is_databricks_configured() -> bool:
    return bool(os.environ.get("AYE_DBX_API_URL") and os.environ.get("AYE_DBX_API_KEY"))

Why this matters:

If not configured, on_command() returns None and Aye Chat falls back to other model backends.
Users who don’t care about Databricks get true “zero config.”
Users who do care can opt in with two env vars and zero ceremony.

Configuration and model selection

The plugin expects the Databricks endpoint to behave like an OpenAI-compatible chat completions API (or a compatible proxy).

Required

AYE_DBX_API_URL
- Full URL to POST chat completion requests.
- Example (illustrative):
- https://<workspace-host>/serving-endpoints/<endpoint>/invocations
AYE_DBX_API_KEY
- Bearer token used for Authorization.

Optional

AYE_DBX_MODEL
- Defaults to: gpt-3.5-turbo

Yes, the default says gpt-3.5-turbo. No, this plugin isn’t emotionally attached to that string. It just needs a model field to put in the payload.

Lifecycle: `new_chat` vs `local_model_invoke`

The plugin handles two command names.

1) `new_chat`

If configured, new_chat resets conversation state by:

deleting .aye/chat_history.json (Databricks plugin history file)
resetting in-memory self.chat_history

Translation: when you start a fresh session in Aye Chat, the Databricks plugin doesn’t cling to the past.

2) `local_model_invoke`

This is the main inference path:

Load existing history from disk
Build the message list (system + history + new user message)
POST to the Databricks endpoint
Extract JSON from the model output (even if it tries to write a novel first)
Store lightweight history (no repeated file contents)
Return a parsed LLM response (with optional token usage)

Prompt construction: one message for the model, another for history

Aye Chat can include repo context (files / RAG snippets) in the prompt. The Databricks plugin uses two representations of the same idea.

A) The full user message (sent to the API)

user_message = build_user_message(prompt, source_files)

This includes:

the user’s prompt
the full contents of source_files (as gathered by Aye Chat)

That’s what the model needs to actually do the work.

B) The lightweight history message (saved to disk)

history_message = build_history_message(prompt, source_files)

This stores a compact representation (typically prompt + filenames), not full file contents.

Why: if you store full file contents in history on every request, .aye/chat_history.json turns into a data hoarder. Performance degrades, diffs get silly, and you start paying a storage bill for your own laziness.

This design is unglamorous, and therefore correct.

System prompt behavior

The plugin uses the shared SYSTEM_PROMPT by default (imported from aye.model.config).

But the invoker can override it per request:

effective_system_prompt = system_prompt if system_prompt else SYSTEM_PROMPT

Then the plugin builds OpenAI-style messages with the system prompt first:

messages = (
  [{"role": "system", "content": effective_system_prompt}]
  + self.chat_history[conv_id]
  + [{"role": "user", "content": user_message}]
)

The request payload (OpenAI-style, by design)

Headers:

headers = {
  "Content-Type": "application/json",
  "Authorization": f"Bearer {api_key}",
}

Payload:

payload = {
  "model": model_name,
  "messages": messages,
  "temperature": 0.7,
  "max_tokens": max_output_tokens,
}

Notable behavior:

The plugin assumes an OpenAI-like schema: model, messages, temperature, max_tokens.
Timeout is generous:

LLM_TIMEOUT = 600.0

Because sometimes the model needs a minute. And sometimes it needs a minute to think, plus another minute to dramatically clear its throat.

Response handling: extracting JSON from “helpful” output

Aye Chat expects the assistant to return a JSON object (usually with summary and optional updated_files).

In practice, models often return:

a paragraph of explanation
a code fence
three different “final” answers
then a JSON object

So the plugin uses _extract_json_object().

What `_extract_json_object()` does

It tries, in order:

json.loads(raw_response) directly
If that fails, it scans the text for balanced { ... } candidates while being string/escape-aware
It parses candidates and typically picks the last valid object

It’s not elegant. It’s resilient. (Those are often the same thing.)

A small caveat worth knowing

After extraction, the plugin does:

generated_text = json.dumps(generated_json)

If extraction fails, generated_json may be None, which turns into the literal JSON string:

null

Depending on how parse_llm_response() handles that, you may get confusing parse failures.

If you ever find yourself staring into the abyss wondering “why is the model output null?”, this is the first flashlight to grab.

Chat history: stored locally, lightweight on purpose

The plugin persists history in:

.aye/chat_history.json

It’s keyed by a conversation id derived from chat_id:

conv_id = get_conversation_id(chat_id)

On each successful request it appends:

the user’s lightweight history message
the assistant’s response as a JSON string (not raw prose)

self.chat_history[conv_id].append({"role": "user", "content": history_message})
self.chat_history[conv_id].append({"role": "assistant", "content": generated_text})
self._save_history()

This keeps future requests grounded without turning your history file into a landfill.

Parsing into Aye Chat’s internal response shape

Once the plugin has a JSON string in generated_text, it calls:

parsed_response = parse_llm_response(generated_text, self.debug)

parse_llm_response() converts the JSON into Aye Chat’s internal response schema.

Typically you’ll see fields like:

summary
updated_files: [{ file_name, file_content }, ...]

Those updated_files are what Aye Chat can apply optimistically to disk (with automatic snapshots so you can restore instantly).

Token usage passthrough

If the Databricks endpoint includes an OpenAI-like usage block, the plugin passes it through:

usage = result.get("usage")
if usage:
    parsed_response["token_usage"] = {
        "prompt_tokens": usage.get("prompt_tokens", 0),
        "completion_tokens": usage.get("completion_tokens", 0),
        "total_tokens": usage.get("total_tokens", 0),
    }

Why you care:

debugging prompt growth (especially with repo context)
monitoring costs (when applicable)
comparing RAG/context strategies across runs

Error handling (aka “tell me what broke, not poetry about failure”)

The plugin distinguishes between:

HTTP status errors

It catches httpx.HTTPStatusError and builds messages like:

DBX API error: <status_code> - <detail>

It tries to parse a JSON error body and extract error.message when possible.

Generic exceptions

Anything else becomes:

Error calling Databricks API: <exception>

Verbose / debug output

verbose: prints status code and raw response text (for non-200)
debug: prints internal message history and response blocks

This is especially useful when wiring new endpoints, where the biggest issues are usually:

schema mismatch
response shape differences
“the model didn’t output JSON like you asked” (shocking)

Minimal setup example

export AYE_DBX_API_URL="https://.../invocations"
export AYE_DBX_API_KEY="dapi..."
export AYE_DBX_MODEL="your-model-name"

Then run Aye Chat normally. If configured, this plugin will intercept local_model_invoke.

Troubleshooting checklist

If nothing happens (or worse, something happens but it’s wrong):

Env vars
- AYE_DBX_API_URL set?
- AYE_DBX_API_KEY set?
Endpoint compatibility
- Accepts messages chat format?
- Accepts max_tokens?
Response shape
- Does result["choices"][0]["message"]["content"] exist?
- Does content contain a JSON object Aye Chat can parse?
Model output discipline
- Extra prose is usually fine; _extract_json_object() can recover.
- If extraction becomes null, your endpoint likely isn’t returning JSON-like content in the expected place.
Turn on the lights
- verbose on
- debug on

Summary

The Databricks integration is a clean, opt-in model plugin that:

activates only when configured via environment variables
sends OpenAI-style chat completion payloads to your Databricks endpoint
builds rich prompts with file context, but stores lightweight history
extracts JSON robustly from messy model output
returns a structured response Aye Chat can apply to files
surfaces token usage when the endpoint provides it

If you’re extending or deploying this integration, the two most important things to validate are:

endpoint schema compatibility (messages in, choices out)
response format consistency (JSON object inside choices[0].message.content)

Everything else is just plumbing. Occasionally wet plumbing, but still.

About Aye Chat

Aye Chat is an open-source, AI-powered terminal workspace that brings AI directly into command-line workflows. Edit files, run commands, and chat with your codebase without leaving the terminal - with an optimistic workflow backed by instant local snapshots.

Support Us

Star our GitHub repository - it helps new users discover Aye Chat.
Spread the word. Share Aye Chat with your team and friends who live in the terminal.

Stop Re-Explaining Yourself to the AI: The Case for Behavior Presets

Vyacheslav Mayorskiy — Tue, 24 Feb 2026 16:42:00 +0000

Every developer who uses AI coding assistants long enough develops the same tic.

You start every prompt with a little speech.

"Be concise. Include examples. Explain intent, not just code. Don't refactor things I didn't ask you to touch. Don't add dependencies without asking. Don't..."

You're not prompting anymore. You're parenting.

And the worst part? It works - until the one time you forget a sentence. Then the assistant reads that omission as creative license and rearranges your living room while you asked it to change a lightbulb.

This is the prompt tax. You pay it every single time. And it scales terribly.

The prompt tax (and why it compounds)

Most AI interactions are stateless in practice. Sure, there's a context window, but instructions from three prompts ago are already being diluted by new information. The model doesn't remember that you prefer explicit error handling, or that you hate magic strings, or that "refactor" means "move this function, not rewrite the module."

So you repeat yourself. And the repetition creates three problems:

Fatigue. You get tired of typing the same preamble. You start abbreviating. The output quality drops. You blame the model.
Drift. Tuesday-you writes a thorough instruction set. Friday-you writes "just fix it." The results diverge. The codebase diverges.
Tribal knowledge. Your carefully crafted prompt lives in your head (or your clipboard history, which is basically the same thing). Your teammate doesn't have it. Neither does your future self after a long weekend.

The prompt tax isn't just annoying. It's a consistency problem wearing an inconvenience costume.

What if you could save your instructions?

Not as a text file you copy-paste from. Not as a system prompt you configure once and forget. Something in between.

Picture this: a small collection of Markdown files in your repo, each one defining a behavior mode for your AI assistant.

documentation.md - "When writing docs, explain intent first. Include examples. Don't just describe function signatures."
security.md - "Think like an attacker. Focus on practical mitigations, not theoretical CVE lists."
testing.md - "Write behavior-driven tests. Test boundaries and edge cases. Don't mock everything into oblivion."
refactoring.md - "Reduce coupling. Respect existing boundaries. Don't rename things for fun."

Then, when you need one, you invoke it:

Using security skill, review the authentication flow for injection risks

The tool reads the Markdown file and injects it into the system prompt - the part the model treats as instructions from the operator, not casual conversation.

Your prompt stays short. The behavior stays consistent. And the instructions live in the repo, not in your head.

I call these skills. But you could call them presets, personas, modes, lenses - whatever. The name matters less than the pattern.

Why Markdown files in a directory?

I know what you're thinking. "This is just a prompt template library." And you're not wrong. But the where and how matter more than you'd expect.

They live in the repo

This means they're version-controlled. They go through code review. When someone on your team writes a documentation.md skill that says "always include usage examples," that decision is visible, reviewable, and shared.

You're not just saving prompts. You're version-controlling your team's taste.

Code style guides exist because "write clean code" isn't specific enough. Skills exist because "be a good assistant" or "you are a super-expert senior software engineer" isn't specific enough.

They're scoped per project

Your open-source library needs different AI behavior than your internal microservice. The library wants careful public API documentation, backwards compatibility awareness, and changelog entries. The microservice wants fast iteration, infra-aware suggestions, and monitoring hooks.

Same developer, different context. Repo-local skills handle this automatically.

They're composable

Need docs and security awareness for a single task?

skills:security,documentation review and document the auth module

Two Markdown files get injected. The model gets both sets of instructions. You typed one line.

They're transparent

No hidden configuration. No settings buried in a YAML file three directories up. If you want to know what skill:testing does, you open skills/testing.md and read it. If you disagree, you edit it and commit.

The system is legible. That matters more than most people think.

What makes a good skill?

Not everything belongs in a skill file. The best ones share a few properties:

They describe a mindset, not a checklist.

Bad: "Always add type hints. Always add docstrings. Always use f-strings."

Good: "You're writing code that a junior developer will maintain alone. Prioritize clarity over cleverness. Prefer explicit patterns over implicit conventions."

The first is a linter. The second changes how the model thinks.

They're focused.

One skill per concern. Don't make a be_good_at_everything.md kitchen sink file. A security skill shouldn't also contain formatting preferences, and a documentation skill shouldn't also contain performance tips.

Keep them small. Compose them when you need more than one.

They express opinions.

The whole point is consistency. "Consider testing" is useless. "Write tests that describe behavior, not implementation. Prefer integration tests for I/O boundaries. Mock only what you must" - that's a skill with a point of view.

Your skills should sound like the best version of your team's code review comments. Specific, opinionated, and useful.

They assume the model is capable but amnesiac.

You're not teaching the model how to code. You're reminding it what you care about. The difference between a good AI response and a great one usually isn't capability - it's context about your preferences.

Skills provide that context. Repeatedly. Reliably. Without you typing it again.

The consistency argument (or: why drift will ruin you slowly)

Here's the thing about prompt drift that nobody warns you about: it's invisible.

You don't notice it happening. One day you write a thorough prompt and get beautiful, well-structured code. The next day you're tired and write something shorter. The output is slightly different. Not worse, exactly - just different. Different naming. Different patterns. Different assumptions.

Multiply that across a team of four over two months and your codebase starts to look like it was written by eight people with conflicting philosophies. Because it was.

Skills don't eliminate drift. But they create a gravity well - a default behavior that holds unless you actively override it. The model still has its moods (they all do), but instead of freestyling from a blank slate every time, it's freestyling within guardrails that your team defined.

That's a massive difference.

The tools-that-write-files problem

This pattern becomes especially important with tools that let AI write directly to your filesystem - what some call an "optimistic" or "agentic" workflow.

When the AI just suggests code in a chat window, inconsistency is annoying but harmless. You copy-paste what you like, edit the rest, move on.

But when the AI writes directly to disk? Inconsistency lands. It becomes committed code, merged PRs, production behavior. The gap between "the model was in a weird mood" and "we shipped a bug" shrinks to nothing.

In that world, skills aren't a convenience. They're a safety mechanism. They're the difference between "the AI followed our conventions" and "the AI decided this module should use a completely different error handling pattern than everything else."

The more authority you give the model, the more important it is to tell it how to behave - and to tell it the same way every time.

Design principles that matter

If you're building this pattern into a tool (or adopting it for your own workflow), a few design decisions are worth getting right:

Explicit invocation beats auto-detection.

It's tempting to have the tool guess which skill to apply based on the prompt content. Resist this. Auto-applied behavior that the user didn't ask for is spooky. It makes debugging harder and trust lower.

Let the user say skill:security. Don't try to infer it from the word "vulnerability" appearing in their prompt.

System-level injection beats user-level injection.

Skills should be injected as system prompt content, not as a user message the model might deprioritize. System-level instructions get treated as operator guidance. User-level instructions compete with everything else in the conversation.

This distinction matters more than it sounds. It's the difference between a suggestion and a directive.

Ambiguity should resolve to "do nothing."

If the tool can't figure out which skill the user meant - apply none. Don't guess. The user can rephrase. Silently applying the wrong behavior preset is worse than applying no preset at all.

Keep them readable. Keep them auditable.

Skills should be injected with clear delimiters so that logs and debug output make it obvious what instructions were active. "Why did the model do that weird thing?" should always have an answerable investigation path.

How we built it in Aye Chat

We shipped skills as a first-class feature in Aye Chat. Here's the short version:

Skills live in a skills/ directory at the repo root (auto-discovered by walking upward from the working directory, monorepo-friendly).
Only *.md files, flat directory, no recursion. Boring on purpose.
You invoke them with skill:name or skills:name1,name2 or using name skill in your prompt.
They're injected into the system prompt with clear delimiters.
Fuzzy matching exists but is intentionally conservative: high threshold, at most one match, and if two skills tie, neither is applied.
Results are cached in memory, invalidated by directory mtime. One cheap stat() call per request.

You can see our own skill files on GitHub - documentation.md, security.md, testing.md, modularization.md, performance.md, and repo-exploration.md. Feel free to steal them as starting points for your own.

The implementation is small (~200 lines of Python), but the leverage is disproportionate. Once your team has a shared skills/ directory, every prompt becomes shorter and every output becomes more predictable.

The real takeaway: codify your taste

Every team has implicit preferences. How errors should be handled. How tests should be structured. How documentation should read. What "clean code" means in this repo, not in a textbook.

Those preferences usually live in people's heads, sometimes on a wiki page nobody reads, occasionally in a linter config that covers 10% of what you actually care about.

Skills are a way to make those preferences explicit, portable, and reusable - in a format that both humans and AI assistants can read.

You're not writing prompt templates. You're codifying your team's engineering taste into something version-controlled and composable.

And the next time you're about to type "please be concise, include examples, explain intent, don't refactor things I didn't ask-" you can just write skill:documentation and get on with your day.

Your future self will thank you. Probably with a short, well-documented message.

About Aye Chat

Support Us

Star our GitHub repository - it helps new users discover Aye Chat.
Spread the word. Share Aye Chat with your team and friends who live in the terminal.

Why I Use Different AI Models for Planning, Reviewing, and Coding

Vyacheslav Mayorskiy — Tue, 17 Feb 2026 16:19:00 +0000

I've been experimenting with something that feels slightly unhinged: using different AI models at different stages of building a feature.

Not because I'm indecisive. Because each model has a different superpower.

GPT-5.2 is great at structured documentation and architectural thinking. Claude Opus 4.6 is terrifyingly good at catching edge cases and writing precise code. So why would I force one
model to do everything when I could use them like specialized tools?

This is the story of building a tiny feature called printraw - and how a five-stage, multi-model workflow caught bugs that a single-model approach would have missed entirely.

The Feature: Stop Making Me Fight the Terminal

Here's the problem: Aye Chat renders AI responses in pretty Rich panels with Markdown formatting and box-drawing characters. Looks great. Feels polished.

But try to copy that text and paste it somewhere else.

You get a mess of line breaks, box characters, and formatting artifacts that make you want to throw your laptop into the sea.

The fix seemed simple: add a printraw command that reprints the last response as plain, copy-friendly text. No panels. No Rich formatting. Just raw text wrapped in delimiters you can
select and copy.

The feature itself? Trivial. The workflow I used to build it? That's what got interesting.

The Five-Stage Pipeline

Here's what I ended up doing:

Stage	Task	Model
1	Write the plan	GPT-5.2
2	Validate the plan	Claude Opus 4.6
3	Implement	Claude Opus 4.6
4	Write tests	GPT-5.2 + Claude Opus 4.6 (alternating)
5	Fix until green	GPT-5.2 + Claude Opus 4.6 (alternating)

This isn't "use one model for everything." It's a staged pipeline where model selection is intentional - like choosing a screwdriver vs a hammer based on what you're actually fastening.

Stage 1: Planning with GPT-5.2

I started by describing the UX problem to GPT-5.2 and asking for a complete implementation plan.

GPT-5.2 produced a thorough document covering:

Command syntax and output format
Where to capture the last response text
Two architecture options (store in REPL vs. store in presenter)
Which files to modify
Testing approach
Edge cases

Why GPT-5.2 for planning? It's genuinely good at organized technical writing. It thinks through tradeoffs without needing to see every line of code. The output was clean, structured,
and gave me something concrete to react to.

Stage 2: Validation with Claude Opus 4.6

Here's where it gets interesting.

I handed the plan to Claude 4.6 with a simple prompt: "Review and validate this plan. Let me know if you'd recommend any adjustments."

Claude came back with seven specific recommendations, prioritized by impact:

#	Recommendation	Priority
1	Add `raw` as a short alias	High - usability
2	Use plain `print()`, not Rich `console.print()`	High - correctness
3	Shorten delimiter lines	Low - taste
4	Clarify: summary-only output, not file changes	Medium
5	Treat whitespace-only summary as empty	Medium - edge case
6	Note that mid-stream `printraw` is N/A	Low - docs only
7	Add Rich-markup-leak test case	Medium - correctness

Recommendation #2 was the one that made me sit up.

The Rich markup leak problem: If you use Rich's console.print() to output "raw" text, and the AI's response happens to contain tokens like or, Rich interprets them as markup
instead of printing them literally. Your "raw" output comes out formatted. The whole point of the feature is defeated.

The fix - using Python's built-in print() - is trivial. But I would have missed it without a dedicated review pass.

Why Claude 4.6 for validation? It's like hiring a polite pedant to review your work. The structured table with priority ratings made it easy to cherry-pick which adjustments to accept.
I took recommendations #1 through #5 and skipped the documentation-only items.

Stage 3: Implementation with Claude Opus 4.6

With a validated plan in hand, I asked Claude to implement it.

The first implementation changed the return types of handle_with_command() and handle_blog_command() from Optional to Tuple[Optional, Optional] - threading the response text back
to the REPL.

I flagged this immediately: "Won't that introduce regressions and break existing functionality?"

Claude acknowledged the risk and proposed something cleaner: capture the text at the source of truth - inside print_assistant_response() itself, using a module-level variable.

This approach:

Required zero signature changes
Had zero regression risk
Was guaranteed to capture the correct text (whatever was actually printed)
Worked automatically for all code paths

Much better.

The Bug That Almost Shipped

Even after the refactor, the first test showed the command printing "No assistant response available yet" after a valid response.

Root cause: the initial code tried to capture text using getattr(llm_response, 'answer_summary', None), but the response object's attribute was actually .summary, not
.answer_summary.

The fix was exactly the module-level capture approach - store the text inside print_assistant_response() where the correct string is guaranteed to exist, regardless of what the response
object's attributes are named.

Final implementation touched four files:

presenter/repl_ui.py - Module variable + getter + capture logic
presenter/raw_output.py - New file: plain print() with delimiters
controller/command_handlers.py - New handle_printraw_command() handler
controller/repl.py - Added printraw and raw to built-in commands

Stages 4 & 5: The Adversarial Testing Loop

Here's where the multi-model approach got really interesting.

With implementation done, I didn't just ask one model to write tests and fix them. I ping-ponged between models: one writes, the other critiques and fixes, repeat.

The test coverage needed to include:

Normal output with delimiters
Rich markup leak prevention (the something case)
None input → warning message
Whitespace-only input → warning message
Empty string → warning message
The capture mechanism
The handler integration

Then came the iteration loop - but with a twist.

(ツ» model gpt-5.2
(ツ» write tests for the printraw feature

GPT writes tests.

(ツ» pytest tests/test_raw_output.py -v

Oh God. OH GOD. Red everywhere.

(ツ» model claude-opus-4.6
(ツ» fix the failing tests

Claude fixes things - and often rewrites chunks of GPT's approach entirely.

(ツ» pytest tests/test_raw_output.py -v

Still red, but fewer failures.

(ツ» model gpt-5.2
(ツ» these tests are still failing, fix them

GPT takes a different angle. Catches something Claude missed.

(ツ» pytest tests/test_raw_output.py -v

Green. Finally green.

Why alternate models? Because each model has different blind spots. GPT might write a test that's technically correct but uses mocking patterns Claude handles better. Claude might fix
the mock but miss an assertion edge case that GPT catches on the next pass.

It's adversarial collaboration. Each model is essentially reviewing the other's work, and bugs that survive one model's scrutiny often get caught by the other.

No context-switching. No copying error messages between terminals. Everything in one session - just swapping which brain is on the case.

Why This Workflow Works

Different models for different cognitive tasks

Planning is a different skill than code review which is a different skill than implementation. Using one model for everything is like using a hammer for screws - it technically works but
you're fighting the tool.

The staged approach catches errors early

The validation stage caught the Rich markup leak before any code was written. Without it, that bug would have surfaced (maybe) when users reported garbled output weeks later.

Regression risk is managed explicitly

By questioning the return-type changes, I avoided an entire class of integration issues. The "capture at the source of truth" pattern emerged from that pushback.

Alternating models surfaces hidden bugs

The ping-pong pattern during testing caught issues that a single model iterating with itself would have missed. Each model brings a different failure mode - and different solutions.

The conversation is the development environment

Every stage happened in the same Aye Chat session:

Model switching via model command
File generation via prompts
Test execution via pytest
Undo via restore when something went wrong
Diff inspection via diff to verify changes

No IDE. No separate terminal. No copy-pasting between tools.

The Takeaways

Plan first, validate second, implement third. Writing a plan document forces clarity before you touch code.
Switch models for validation. The model that wrote the plan won't catch its own blind spots. A fresh perspective - even from a different AI - brings a different analytical lens.
Capture at the source of truth. When multiple code paths need the same data, find the single point where it's guaranteed to be correct. Don't thread it through function signatures.
Question regression risk explicitly. When implementation requires changing existing contracts, ask: "Is there a way to do this without breaking things?" Usually there is.
Alternate models during test/fix loops. One model writes, the other critiques. Bugs that slip past one often get caught by the other. It's like having two reviewers who never get
tired.
Keep tests in the same session. Running pytest, reading failures, and fixing them without leaving the terminal keeps iteration tight and fast.

This whole feature - planned, validated, implemented, tested, debugged, and shipped - happened in a single Aye Chat session across a few hours.

Not because the feature was hard. Because the workflow made it frictionless.

About Aye Chat

Aye Chat is an open-source, AI-powered terminal workspace that brings AI directly into command-line workflows. Edit files, run commands, and chat with your codebase without leaving the
terminal - with an optimistic workflow backed by instant local snapshots.

Support Us

Star our GitHub repository - it helps new users discover Aye Chat.
Spread the word. Share Aye Chat with your team and friends who live in the terminal.

Claude Code Asks Nicely. Aye Chat Defaults to Action.

Vyacheslav Mayorskiy — Tue, 10 Feb 2026 16:17:00 +0000

I saw a Claude Code ad and thought: ah yes, the well-mannered butler of developer tools.

"An assistant that explains its thinking before acting."

It's elegant. It's cautious. It's the kind of assistant that puts a napkin on its arm before refactoring.

Aye Chat is not that.

Aye Chat is the chaotic-good intern with a power drill, except I installed a big red UNDO button and wrote the insurance policy ourselves.

Because here's our belief:

Permission is an expensive user interface.

The tiny tax that eats your lunch

Approval-first tools tend to feel like ordering at a restaurant where the waiter reads you the entire supply chain.

You:

"Could you rename this parameter?"

Tool:

"Certainly. First, I will analyze the repository. Then I will propose a plan. Then I will show you the plan. Then I will explain the plan. Then I will ask if you approve
the plan. Then, pending approval of the plan explaining the plan... would you like me to proceed?"

Meanwhile your coffee goes cold, your focus wanders off, and your brain starts loading another task like a browser tab you didn't mean to open.

This isn't a moral failing. It's just physics.

Every "Are you sure?" prompt is a speed bump on the highway of flow.

Aye Chat's bet: optimism with a parachute (and a spare)

Aye Chat is built around the optimistic workflow:

The AI writes directly to files.
Every write is snapshotted locally.
If you don't like it, you undo it instantly.

No approval checkbox.
No pre-flight committee meeting.
No 12-slide deck titled "Proposed Rename Initiative (Q1)".

Just:

restore

That's the trick.

I am not claiming models are perfect. That would be like claiming your cat always lands on its feet and files your taxes.

I am claiming something more boring - and more useful:

The model is right often enough that defaulting to action is faster... if reversal is instant and reliable.

So I built the parachute first.

Then I started skydiving.

Why undo beats approval (most of the time)

Approval feels safe because it tries to prevent mistakes.

Undo feels safe because it makes mistakes cheap.

And cheap mistakes are how software gets written.

Approval-first pushes you into reviewer brain before you even know if the change matters.

Undo-first keeps you in builder brain:

Ask.
Get the change.
Run tests / run the app / run the command.
Keep it or revert it.

It's the same reason whiteboards work: you write first, erase later.

Git is basically civilization's agreement that:

"We will do slightly reckless things, but we will keep receipts."

Aye Chat is that - except the receipt is printed automatically, stapled to the change, and placed directly into your hand.

"But writing straight to files is scary."

Yes.

So is using a table saw.

That's why table saws have guards, and why we treat snapshots like a seatbelt:

Automatic (no remembering to commit/stash)
Local (stored in .aye/, not beamed into the void)
Immediate (restore is one command)

If the model goes off-road, you don't open a philosophical debate.

You don't negotiate with the GPS.

You just take the last exit and try again.

The real difference in plain English

Claude Code's UX says:

"I'll explain, you approve, then I'll act."

Aye Chat's UX says:

"I'll act, you react - and if it's cursed, we roll time back."

Different defaults.
Different vibes.

Claude is the assistant that asks permission to move a chair.

Aye Chat is the assistant that rearranges the room so it finally makes sense, and if you hate it, it puts everything back exactly where it was.

If you live in the terminal, that default isn't cosmetic.

It's the difference between:

babysitting an assistant who wants a signature for every screw, and
using a power tool with an emergency stop and a clean rollback.

Stop approving.
Start shipping.
Keep the parachute.

About Aye Chat

Support Us

Star our GitHub repository: https://github.com/acrotron/aye-chat
Spread the word. Share Aye Chat with your team and friends who live in the terminal.

When a Model Gets Stuck: How GPT‑5.2 Finished a 'Simple' Spinner That Opus 4.5 Couldn't

Vyacheslav Mayorskiy — Tue, 06 Jan 2026 16:54:00 +0000

I have a weakness for feature requests that start with:

"This should be simple."

Because they are simple.

Right up until you try to make them correct.

This one came from a very real pain point in Aye Chat:

"Sometimes the LLM response stalls mid‑sentence. Show a basic spinner when that happens, and remove it when more text arrives."

In a web UI, a stall is annoying.
In a terminal, a stall looks exactly like a crash.

So yes - I needed a subtle ⋯ waiting for more indicator.
Not a blinking disco. Not a permanent footer. Just an honest signal: we're alive, we're waiting.

And this is where the story gets interesting, because it wasn't just a concurrency story.

It was also a model story.

Claude Opus 4.5 got us most of the way there, then got stuck in a loop of "reasonable fixes" that didn't quite land.
GPT‑5.2 came in and finished the job - not by being more verbose or more confident, but by being more precise about what the system actually was: a little real‑time UI
state machine with multiple writers.

This is the write-up I wish I had before I lost an afternoon to a spinner.

The scene: simulated streaming meets real-world stalls

Aye Chat streams responses into a terminal UI using Rich (Live).
We also have simulated streaming: even if a provider returns larger chunks, we animate output word-by-word so it feels like "typing."

Real providers behave like this:

you receive some tokens,
then there's a gap (LLM is gathering some thoughts, server-side pause, backpressure),
then streaming resumes.

If that gap happens mid-sentence, users freeze with it.

So the request had four deceptively clean requirements:

Detect "stall" while streaming.
Show ⋯ waiting for more only during stalls.
Remove it immediately when new text arrives.
Don't break Markdown or final formatting.

The architecture (and why it can lie to you)

At a high level the streaming UI has three moving parts:

update(content: str) - called when new streamed content arrives (full accumulated content, not a delta).
_animate_words(new_text: str) - prints newly received text word-by-word with a small delay.
a background monitor thread - periodically decides whether we are "stalled."

Rendering is via a helper like:

_create_response_panel(content, use_markdown=True, show_stall_indicator=False) -> Panel

When show_stall_indicator=True, it appends:

⋯ waiting for more

So far, boring.

But then you hit the question that decides everything:

What does "stalled" mean?

And it turns out there are two kinds of "stalled":

Network stall: no new content is arriving.
User-visible stall: nothing is changing on screen.

Those are not the same in a system that intentionally delays rendering.

Where Opus 4.5 got stuck: fixing symptoms instead of the machine

Claude Opus 4.5 did the first part quickly:

add a timestamp,
monitor elapsed time,
show the indicator if we exceed a threshold.

It "worked"… until it didn't.

The bug appeared very specific:

The indicator blinks briefly even when words are still printing.

That symptom is a big clue. It means the stall detector is looking at time since last network update, while the UI is still busy animating buffered words.

Opus tried the next obvious move: add an _is_animating flag and suppress the indicator while animating.

Still not fixed.

At that point, you can feel a model fall into a common trap: it keeps proposing plausible tweaks (threshold changes, different checks, "maybe we should…"), but it doesn't
fully re-frame the problem.

Because the real problem wasn't just "the flag is wrong."

The real problem was that we had two concurrent writers to the same UI.

the animation path calls Live.update() as it prints words
the monitor thread calls Live.update() as it toggles the indicator

Without serialization, you will eventually render an inconsistent intermediate frame - which, to the user, looks like blinking.

So the "didn't fix it" moment wasn't Opus being incompetent.

It was Opus being stuck in a local optimum:

treat it as timing
treat it as one boolean
treat it as "add one more guard"

When what we needed was: state + synchronization.

That's the moment I switched models.

What GPT‑5.2 did differently: treat it like a state machine with a single renderer

GPT‑5.2 didn't win by being clever.
It won by being strict.

It made three changes that turned the spinner from "mostly works" into "boring and correct."

1) Serialize shared state and all UI updates

First: a lock.

self._lock = threading.RLock()

And then a rule:

If it touches shared state or calls Live.update(), it must hold the lock.

We centralized rendering into a single helper so we stopped sprinkling Live.update() in random code paths:

def _refresh_display(self, use_markdown: bool = False, show_stall: bool = False) -> None:
    with self._lock:
        if not self._live:
            return

        self._live.update(
            _create_response_panel(
                self._animated_content,
                use_markdown=use_markdown,
                show_stall_indicator=show_stall,
            )
        )
        self._showing_stall_indicator = show_stall

This one change kills a whole class of "blinking because two threads fought for the frame buffer."

2) Redefine "stall" as "caught up and no new input"

This was the conceptual pivot.

A stall should only be possible when:

we are not currently animating, and
the animated output has caught up to what we have received.

In code:

caught_up = (not self._is_animating) and (self._animated_content == self._current_content)

That single definition fixes the original "indicator shows while words are still printing" bug.

If the UI is still draining buffered words, you're not stalled.
You're busy.

3) Use "last receive time," not "last render time"

After the above, we hit a second, subtler bug:

When streaming is actually paused, the indicator blinks instead of staying lit.

This is a classic mistake in real-time UI code: if you update the "progress timestamp" when you redraw the indicator, the indicator becomes self-canceling.

So GPT‑5.2 separated the concepts:

_last_receive_time changes only when new stream content arrives
redraws do not touch it

self._last_receive_time: float = 0.0

Updated only in update() when content truly changes:

with self._lock:
    if content == self._current_content:
        return
    self._last_receive_time = time.time()

And the monitor checks:

time_since_receive = time.time() - self._last_receive_time
should_show_stall = time_since_receive >= self._stall_threshold

That makes the indicator "sticky" in the correct way:

it turns on after the threshold,
it stays on continuously,
and it turns off immediately when new text arrives.

The final monitor loop (the boring version that works)

Here's what the working logic boils down to:

def _monitor_stall(self) -> None:
    while not self._stop_monitoring.is_set():
        if self._stop_monitoring.wait(0.5):
            break

        with self._lock:
            if not self._started or not self._animated_content:
                continue

            caught_up = (not self._is_animating) and (self._animated_content == self._current_content)
            if not caught_up:
                continue

            time_since_receive = time.time() - self._last_receive_time
            should_show_stall = time_since_receive >= self._stall_threshold

            if should_show_stall != self._showing_stall_indicator:
                self._live.update(
                    _create_response_panel(
                        self._animated_content,
                        use_markdown=False,
                        show_stall_indicator=should_show_stall,
                    )
                )
                self._showing_stall_indicator = should_show_stall

Key properties:

no indicator while buffered words are still animating
indicator appears only after no new content arrives for stall_threshold
indicator stays on continuously once shown
indicator disappears immediately when new text arrives

The spinner stops being a feature.
It becomes infrastructure.

Which is exactly what terminal UX should be.

The real theme: why swapping models is a debugging tool

I'm not interested in "model wars."

But I am interested in the practical reality of building with them:

Some models are great at first-pass implementation.
Some models are great at refactoring.
Some models are great at pushing through annoying edge cases.

In this case:

Opus 4.5 got to a plausible implementation quickly, and even cleaned up structure when asked. But it kept circling around incremental fixes.
GPT‑5.2 zoomed out, saw "two writers + ambiguous definition of stall," and forced the solution into a small state machine with serialized rendering.

That doesn't mean GPT is "better" in the abstract.

It means something more useful:

When you feel a model looping, change the shape of the conversation - or change the model.

In Aye Chat, switching models is cheap.
And when you're stuck on a UI race condition that only reproduces one out of ten times, "cheap" matters.

Takeaways (and the part that matches Aye Chat's philosophy)

A spinner has a bigger correctness surface area than it deserves.
Animation + monitoring + concurrent rendering is a real system.
"Stall" is a state, not a timeout.
It must mean "caught up and no new input," not "some time passed."
Don't let rendering update the clock that decides whether to render.
That's how you invent blinking.
Locking isn't optional when multiple threads can render.
Even if nothing crashes, the UX will.
Model choice is part of the toolchain.
When one model gets stuck in local fixes, another might see the global shape.

In a weird way, this tiny ⋯ waiting for more indicator is the same lesson as the optimistic workflow:

let the system move fast,
but build it so you can recover instantly,
and be pragmatic about the tools (including the model) that get you unstuck.

About Aye Chat

Aye Chat is an open-source, AI-powered terminal workspace that brings AI directly into command-line workflows. Edit files, run commands, and chat with your codebase
without leaving the terminal.

Support Us

Star our GitHub repository: https://github.com/acrotron/aye-chat
Spread the word. Share Aye Chat with your team and friends who live in the terminal.

Designing Terminal UX for AI is really about DX (and a few classic UX principles)

Vyacheslav Mayorskiy — Tue, 16 Dec 2025 14:27:07 +0000

A lot of AI devtools talk about models. However, another hard problem is DX ("Developer Experience"): how do you make an AI assistant feel native inside the terminal, without breaking the muscle memory developers already have?

When we built Aye Chat, the UX playbook looked surprisingly “traditional”:

1) Hierarchy first: route user intent the way a shell does

In a terminal, ambiguity is the enemy. So Aye Chat treats input with a clear priority:

Built-in commands (help / restore / model / etc.)
Shell commands (run the real command directly)
Everything else becomes an AI prompt

This is a classic hierarchical decision tree: predictable, fast, and easy to learn. The key DX benefit is you don’t “leave the terminal mindset” to use AI.

2) Progressive disclosure: power features only appear when you need them

The terminal is already dense; AI shouldn’t add UI clutter.

Aye Chat keeps the default experience minimal (type a prompt, get an answer), then progressively reveals capabilities:

help exists, but it’s not forced into every interaction.
verbose on|off toggles how much operational detail you see.
RAG context selection is usually invisible; in verbose mode you can see which files were included.
For large projects, indexing runs in the background; you only see progress if you opt into verbosity (and it shows as a small inline hint in the prompt).

This keeps the “happy path” clean while still supporting the power-user path.

3) “Optimistic UX” with an explicit escape hatch

AI edits are only useful if they’re fast, but speed without safety kills trust.

Aye Chat leans into an optimistic workflow: apply changes directly, but make rollback instantaneous.

Every update creates snapshots automatically.
diff is the review step.
restore / undo is the safety net.

DX-wise, this shifts the mental model from “AI is risky” to “AI is reversible” and "I can experiment without fear". Developers move faster because the cost of being wrong is low.

4) Reduce cognitive load with “focus tools” instead of more UI

Two small primitives do a lot of work:

@file references: include a file inline, on demand.
with <files>: <prompt>: constrain the problem to specific files (supports wildcards).

This is progressive disclosure again: you don’t need to learn these on day one, but when prompts get vague, you have a precise way to tell the system what matters.

5) Interaction design for the terminal: make latency feel manageable

AI latency is real; good terminal UX acknowledges it instead of hiding it.

Aye Chat uses “progressive waiting” messages (spinner text that changes over time):

“Building prompt…”
“Sending to LLM…”
“Still waiting…”

It’s a small thing, but it makes the experience feel responsive and honest.

6) Autocomplete as a DX feature (not a gimmick)

Terminal tools win when they help you stay in flow. Aye Chat’s completion behavior is designed for that:

Multi-column completions for readability.
Special-cased auto-complete for @file contexts.
A user setting to choose “readline-like” vs “complete while typing”.

The principle here is consistency: the UI adapts to the terminal, not the other way around.

Takeaway

If you’re building AI for developers, UX principles like hierarchy, progressive disclosure, and reversible actions aren’t “nice to have.” They’re the difference between a demo and a daily driver.

About Aye Chat

Aye Chat is an open-source, AI-powered terminal workspace that brings the power of AI directly into your command-line workflow. Edit files, run commands, and chat with your codebase without ever leaving the terminal.

Support Us

Star our GitHub repository: https://github.com/acrotron/aye-chat#aye-chat-ai-powered-terminal-workspace
Spread the word. Share Aye Chat with your team and friends who live in the terminal.

The Day I Stopped Babysitting an AI: Building the Optimistic Workflow

Vyacheslav Mayorskiy — Thu, 04 Dec 2025 12:57:00 +0000

You know that feeling when you're trying to get work done and someone keeps tapping your shoulder for approval? "Is this okay?" Tap. "How about this?" Tap. "Should I do this next?" Tap tap tap.

That's what using most AI coding assistants felt like to me. I'd ask it to add a feature, and then - instead of just doing it - it would present me with a diff and wait. Like a puppy showing me its toy, waiting for validation. Every. Single. Time.

I was babysitting an AI like it was a toddler with scissors.

And I kept thinking: "These models are getting scary good. Claude 4.5 Sonnet gets things right maybe 70-80% of the time on first try. Why am I still approving every comma it wants to add?"

So I built something different. Something that would just do the thing and let me fix it if it screwed up. I called it the optimistic workflow, and it became the heart of Aye Chat.

This is the story of how that came together - and why it works.

The Undo Button on Steroids

Letting an AI write directly to your files sounds insane at first. What if it misunderstands and deletes your entire database layer? What if it gets creative and rewrites your authentication in a way that locks everyone out?

The only way the optimistic approach works - the only way - is if you have a perfect undo button. Not Git (too slow, too manual). Not Ctrl+Z (only works in editors). Something instant. Something automatic. Something that happens before the AI even thinks about touching your files.

That's what I built first: a snapshot engine that acts like a paranoid librarian.

You know those rare book libraries where they photograph every page before letting you read it? That's the idea. Before the AI writes anything, we take a perfect snapshot of what's there. The whole file, exactly as it was, tucked away in .aye/snapshots/ with a timestamp and the prompt that triggered the change.

Only then does the AI get to write.

# aye/model/snapshot.py

def apply_updates(updated_files: List[Dict[str, str]], prompt: Optional[str] = None) -> str:
    """
    1. Take a snapshot of the *current* files.
    2. Write the new contents supplied by the LLM.
    """
    file_paths: List[Path] = [Path(item["file_name"]) for item in updated_files]
    batch_ts = create_snapshot(file_paths, prompt)  # Safety net FIRST
    for item in updated_files:
        fp = Path(item["file_name"])
        fp.parent.mkdir(parents=True, exist_ok=True)
        fp.write_text(item["file_content"], encoding="utf-8")  # Then write
    return batch_ts

This backup-first approach is non-negotiable. It's the foundation that makes everything else possible. No snapshot, no write. Period.

And here's the beautiful part: the snapshot isn't just a file copy. It's a time capsule. It remembers why the change happened (your prompt is in the metadata), when it happened (timestamp), and exactly what was there before. It's version control purpose-built for the "try -> fail -> undo" loop of AI coding.

Two Commands That Changed Everything

A safety net is useless if it's tangled up in a closet somewhere. The whole point of moving fast is... well, moving fast. So reviewing and reverting changes had to be just as instant as making them.

Enter two stupidly simple commands: diff and restore.

Seeing What Changed: `diff`

After the AI makes a change, you naturally wonder: "Okay, what exactly did you do?" Instead of opening another tool or switching windows, you just type:

(ツ» diff calculator.py

Boom. Instant, colorized diff right in your terminal. It's just comparing the live file against the snapshot we took two seconds ago. You see the change, you understand it, you move on. No context switch. No friction.

# aye/presenter/diff_presenter.py

def show_diff(file1: Path, file2: Path) -> None:
    """Show diff between two files using system diff command."""
    try:
        result = subprocess.run(
            ["diff", "--color=always", "-u", str(file2), str(file1)],
            # ...
        )
        # ...
    except FileNotFoundError:
        # Fallback to Python's difflib if system diff is not available
        _python_diff_files(file1, file2)

It's so simple it's almost boring. But that's the point. Boring infrastructure that just works is what lets you focus on the interesting stuff.

The Magic Undo: `restore`

And if you don't like what you see? If the AI got creative in the wrong way?

(ツ» restore calculator.py

That's it. The file is back to exactly how it was. The AI's change is gone, like it never happened. No Git commands to remember, no stash juggling, no "wait which commit was that again?"

# aye/model/snapshot.py

def restore_snapshot(ordinal: Optional[str] = None, file_name: Optional[str] = None) -> None:
    # ...
    if ordinal is None and file_name is not None:
        snapshots = list_snapshots(Path(file_name))
        if not snapshots:
            raise ValueError(f"No snapshots found for file '{file_name}'")
        _, snapshot_path_str = snapshots[0]
        # ...
        shutil.copy2(snapshot_path, original_path)
        return
    # ...

This one-command rollback is what makes the optimistic workflow work. It removes the fear. The cost of a bad AI suggestion drops to near zero - just the three seconds it takes to type restore. That's when you realize: you're not babysitting anymore. You're collaborating.

How It Feels in Practice

Let me show you where this gets real. You're working on a simple calculator module with an add() function:

# calculator.py
def add(a, b):
    return a + b

You've got tests. They're green. Life is good.

Then you think: "You know what? This should handle any number of arguments, not just two." So you ask:

(ツ» modify the add function to take a list of numbers instead of two arguments

The AI doesn't ask for permission. It just does it:

-{•!•}- » I have modified the `add` function to accept a list of numbers.

Behind the scenes, the snapshot happened first (safety net deployed), then the file got rewritten. You're curious what changed:

(ツ» diff calculator.py
--- .aye/snapshots/001_.../calculator.py
+++ calculator.py
@@ -1,2 +1,2 @@
-def add(a, b):
-    return a + b
+def add(numbers):
+    return sum(numbers)

Clean. Elegant. The AI understood the assignment. You feel that little dopamine hit - this is going to work.

So you run the tests (without exiting the session mind you):

(ツ» pytest

And then:

================================ FAILURES =================================
_______________________________ test_add __________________________________

    def test_add():
>       assert add(2, 3) == 5
E       TypeError: add() takes 1 positional argument but 2 were given

test_calculator.py:4: TypeError
========================= 1 failed in 0.12s ==========================

Oh God. OH GOD.

The tests are broken. The function signature changed but the tests still call it the old way. Your brain immediately starts racing: "Okay I need to update the tests, or wait maybe I should revert this, or maybe I should just fix the function to handle both cases, or—"

And then you remember: you have an undo button.

(ツ» restore calculator.py

One second later:

(ツ» pytest
========================= 1 passed in 0.08s ===========================

Green again. Crisis averted. Your heart rate returns to normal.

Total time from "oh no" to "all good": 4 seconds.

That's the moment it clicks. You're not afraid anymore. You can try things. Wild things. Aggressive refactors. Experimental rewrites. Because the cost of failure isn't hours of Git archaeology or careful manual rollbacks - it's typing seven letters: restore.

You can iterate fearlessly. And when you're not afraid to fail, you move fast.

That's the workflow. Fast, confident, reversible. No approval dialogs. No copy/paste. No babysitting.

The Git Upgrade: From Sedan to Sports Car

Our file-copy snapshot system works great. It's simple, it's reliable, it works on any project with zero setup. It's like a dependable sedan - gets you where you need to go without fuss.

But I knew developers had a sports car sitting in the garage: Git. And I kept thinking: what if we could give them the option to use it?

So that's what we're building next: a Git-powered snapshot engine that's faster, more powerful, and integrates seamlessly with tools developers already use.

The Plan: Two Engines, One Interface

We're using the Strategy pattern - fancy name for a simple idea. Think of it as building a car with interchangeable engines. Same car, same controls, but you can swap in a different engine depending on what you need.

Two "engines":

FileCopyStrategy: The sedan. Our current file-copy logic. Default for non-Git projects.
GitStrategy: The sports car. Uses native Git commands. Automatically kicks in for Git repos.

This way, simple projects stay simple, but Git users get superpowers.

How the Git Version Will Work

Instead of copying files, we just use Git operations developers already know:

Snapshot (apply_updates): git stash push -m "aye-chat: <your_prompt>"
- Instant, space-efficient, automatically documented
View changes (diff): git diff stash@{0} -- <file_name>
- Compare against the stashed version
Undo (restore): git stash pop
- One command, back to where you were

But here's where it gets interesting.

The Real Power: Partial Acceptance

Ever wanted to accept part of an AI's suggestion but not all of it? With Git, we can build a review command that uses git checkout -p stash@{0}. It lets you approve changes hunk-by-hunk, like a code review.

You get to say: "Yes to this function, no to that refactor, yes to the docstring, no to the renaming."

That's not just faster. That's a completely different level of control.

Using Your Existing Tools

And because the AI's changes are just Git stashes, you can use any Git tool to inspect them. VS Code's source control panel. lazygit. Whatever you already use. Aye Chat becomes part of your existing workflow instead of replacing it.

This isn't about using a fancier tool for the same job. It's about unlocking a workflow that wasn't possible before - one where you can collaborate with an AI at the speed of thought, with surgical precision when you need it.

What This All Means

The optimistic workflow isn't just a feature. It's a different way of thinking about AI collaboration.

Instead of:

Prompt -> Review -> Approve -> Apply -> Test

You get:

Prompt -> Apply -> (Test/Review if needed) -> (Undo if wrong)

The approval step vanishes. The AI becomes a collaborator you can trust to act, knowing you have a perfect undo button if it goes sideways.

It took me about a week to build the first version - nights and a weekend, obsessively iterating. The snapshot engine took a day. The diff and restore commands took a few hours. The Git integration is still in progress.

But the feeling of it? That happened immediately. The first time I prompted the AI, watched it write code directly to my file, checked the diff, and moved on - all in under 10 seconds - I knew this was different.

No more babysitting. No more approval fatigue. Just flow.

That's what we're building. And honestly? It's exhilarating.

About Aye Chat

Support Us 🫶

Star 🌟 our GitHub repository. It helps new users discover Aye Chat.
Spread the word 🗣️. Share Aye Chat on social media and recommend it to your friends.

I got so annoyed with AI coding assistants that I built my own.

Vyacheslav Mayorskiy — Thu, 20 Nov 2025 14:11:00 +0000

I was browsing and thinking back and reminiscing on how it all started (not that long ago: back in September) – and decided to put it in writing. This is a walk down memory lane to the very beginning.

With AI assistants becoming more and more powerful, I as a Python coder found myself using them more and more, but I operate mostly in a terminal SSH sessions, so the practical aspect of such use was "do something in a terminal -> ok, need to solve this problem -> copy/paste into Chat GPT -> receive response -> copy/paste into the terminal. Repeat".

At some point this process of switching back and forth and back and forth became so frustrating that I decided to do something about it.

The very first thing I did was to install a CLI AI assistant. I thought "Aha! I am not the first: someone already solved this pain". Alas, the experience was terrifying: after install - it sent me back to web to log on, and then - it started working, and Oh My God was it full of itself! Every little sneeze - it had to tell me about it, and every little thing it generated - it needed approval from me. So it was acting like a little puppy who's looking for validation: "Did I do it right? Do you like me? Wait, I have some more, do you like it as well?"

I was even more annoyed to say the least - to the degree that I decided to build something from scratch and so that it would address all those pain points that accumulated from web copy/pasting and from CLI assistant asking for approval.

So Aye Chat was born (it had different name at the time of course). The main problem that I wanted to solve was this constant nagging by the famous CLI assistant. It seemed unnatural that in our days when LLMs are reasonably solid and require correction maybe 20-30% of time if that - I would need to approve them for every little thing. I decided that my tool would make updates automatically.

The main problem with that of course is the catastrophic disaster when LLM does screw up and you lose all that you worked so hard on.

Digression: in the 1990s, PalmPilot became one of the most successful hand-held device companies not because of the device itself (many were doing them) but because they introduced a safety net: syncing your content to a PC. With that – even if you dropped it in a water and it became turtle food (or nest, or mirror – whatever), - your data was safe.

With LLM making updates automatically it was very obvious that there needed to be some kind of rock-solid safety net. Implementation of it became technicality: before every update – just save the file version, and if the update was result of LLM being drunk or whatever – just restore that version.

That was the very first feature that went into this tool: get a response from LLM – if files need to change – save them first – then apply update. Well, "very first feature" after the trivial integration with LLM endpoint of course: nowadays everybody and their guppy do that it seems.

With automatic updates we resolved the first pain point: having to approve every suggestion from LLM.

I was not looking for complex implementations: again, this was just another custom tool for myself – so did not care what others would think because there was no others. There was no fancy extracting of file fragments before sending to LLM, there was no cumbersome aggregation of fragments when receiving them back: simplicity was the name of the game. I was sending full file content and receiving updates for full file content. Moreover, because my projects are fairly small in nature (bunch of AWS Lambda functions, bunch of terraform, bunch of shell scripts) - I would send entire codebase at once and it would still fit into LLM context window. On the upside: there was no missing content and no ambiguity on the LLM side: it had everything it needed to make edits.

With those 2 things: saving files automatically with rudimentary version control and with sending all files – all of a sudden I had a miniature powerhouse, which eliminated the need to go to web for copy/pasting, and that alone reduced the wasted time at least by 30%. Not bad for a 2-day implementation. And as I was sending all files - it did not even occur to me to have a flag to name files individually to put into prompt name by name: of course it's a wildcard mask to do the job, what else.

After that I became greedy. And not “good greedy” when you want something more but when everybody wins: I wanted it all for myself. The next thing I noticed was that I was switching back and forth from the tool session where I was talking to AI to another terminal where I was doing edits and command execution. (Yes, you guessed it: I am not a tmux user. Sue me). “We eliminated going to web already: what’s stopping us from eliminating switching between terminals?”

And so the next major thing went in: shell integration. With the tool (let’s start calling it “Aye Chat”: it was that already at that point), and the lightning fast speed of iterating from version to version, shell integration took less than an hour. And not just “ls -la” type of commands: I was able to open vim right from that very session. Later came “cd” as well – when I started missing it and confinement of a single directory started being painful.

All of the above is in a span of one week mind you: nights and a weekend. I spent another week I can’t even remember on what – but it became an obsession: there was no downtime, there was only my main work, and then – all free time would go into Aye Chat, and it was getting better and better.

By the end of the third week it had so many features that it became rather obvious that this is now not just a side project: it’s a product in making that can help others that are in the same position improve productivity tremendously. And not just because of AI-assisted code generation but because of how User Experience was built on top of it. I tried to eliminate most friction points that I would come over – and the experience is now exhilarating: you think of something – you ask it to do it for you – and it does. You run the tests – you see failures – you ask the tool to fix them – and it does.

Of course it’s a fresh product and many things are continuing to be improved – but the pleasure of having a tool work for you instead of you having to fight a tool to be useful – this pleasure is real already.

Who knows what's to come: we now share the development load with our small team (we are a consulting company that my friend and I started). We keep building on it and keep using it ourselves in our projects, and now have a roadmap, a sprint board, and 3-a-week scrums. Aye Chat can now handle larger projects - with built-in RAG (Retrieval-Augmented Generation) capabilities, it has privacy-oriented enterprise-grade features such as offline operation mode, and others. And it keeps growing. And what's more important: however few users we have - they seem to like it.

Or using the words of one of our users: “It looks very promising!” Let’s leave it at that.

If you liked what you read - star 🌟 our GitHub repository (https://github.com/acrotron/aye-chat). It helps new users discover Aye Chat.

DEV Community: Vyacheslav Mayorskiy

Databricks Integration Deep Dive: Teaching Aye Chat to Talk to Your Endpoint

Where the integration lives

Activation: the plugin is silent unless you invite it

Configuration and model selection

Required

Optional

Lifecycle: new_chat vs local_model_invoke

1) new_chat

2) local_model_invoke

Prompt construction: one message for the model, another for history

A) The full user message (sent to the API)

B) The lightweight history message (saved to disk)

System prompt behavior

The request payload (OpenAI-style, by design)

Response handling: extracting JSON from “helpful” output

What _extract_json_object() does

A small caveat worth knowing

Chat history: stored locally, lightweight on purpose

Parsing into Aye Chat’s internal response shape

Token usage passthrough

Error handling (aka “tell me what broke, not poetry about failure”)

HTTP status errors

Generic exceptions

Verbose / debug output

Minimal setup example

Troubleshooting checklist

Summary

About Aye Chat

Support Us

Stop Re-Explaining Yourself to the AI: The Case for Behavior Presets

The prompt tax (and why it compounds)

What if you could save your instructions?

Why Markdown files in a directory?

They live in the repo

They're scoped per project

They're composable

They're transparent

What makes a good skill?

The consistency argument (or: why drift will ruin you slowly)

The tools-that-write-files problem

Design principles that matter

How we built it in Aye Chat

The real takeaway: codify your taste

About Aye Chat

Support Us

Why I Use Different AI Models for Planning, Reviewing, and Coding

The Feature: Stop Making Me Fight the Terminal

The Five-Stage Pipeline

Stage 1: Planning with GPT-5.2

Stage 2: Validation with Claude Opus 4.6

Stage 3: Implementation with Claude Opus 4.6

The Bug That Almost Shipped

Stages 4 & 5: The Adversarial Testing Loop

Why This Workflow Works

Different models for different cognitive tasks

The staged approach catches errors early

Regression risk is managed explicitly

Alternating models surfaces hidden bugs

The conversation is the development environment

The Takeaways

About Aye Chat

Support Us

Claude Code Asks Nicely. Aye Chat Defaults to Action.

The tiny tax that eats your lunch

Aye Chat's bet: optimism with a parachute (and a spare)

Why undo beats approval (most of the time)

"But writing straight to files is scary."

The real difference in plain English

About Aye Chat

Support Us

When a Model Gets Stuck: How GPT‑5.2 Finished a 'Simple' Spinner That Opus 4.5 Couldn't

The scene: simulated streaming meets real-world stalls

The architecture (and why it can lie to you)

Where Opus 4.5 got stuck: fixing symptoms instead of the machine

What GPT‑5.2 did differently: treat it like a state machine with a single renderer

1) Serialize shared state and all UI updates

2) Redefine "stall" as "caught up and no new input"

3) Use "last receive time," not "last render time"

The final monitor loop (the boring version that works)

Lifecycle: `new_chat` vs `local_model_invoke`

1) `new_chat`

2) `local_model_invoke`

What `_extract_json_object()` does

Seeing What Changed: `diff`

The Magic Undo: `restore`