DEV Community: vinz

I built a global session browser for Codex CLI because I got tired of losing the thread

vinz — Fri, 20 Mar 2026 18:13:12 +0000

When you use Codex CLI across multiple projects, sessions start to pile up.

At first, that is fine. Then a few days later you remember that one useful conversation exists somewhere, but you do not remember which repo you were in, what you named the session, or whether you even renamed it at all.

That friction is small, but it compounds.

I built Codex Session Hub to remove that problem.

It is an open source CLI tool that gives Codex a global session browser, so instead of jumping between folders and trying to manually recover context, I can open one command, search every session on my machine, preview the context, and resume the right one immediately. The project is a PowerShell 7 tool built around fzf, and the current release already supports browsing, resuming, renaming, resetting titles, deleting sessions, previewing context, and a one-line installer.

The problem was not Codex itself

Codex already lets you resume sessions.

The annoying part was everything around that.

The real issue was session sprawl:

sessions spread across different projects
weak memory of where a conversation happened
no fast global view across all work
too much folder switching for something that should be instant

That is the gap Codex Session Hub tries to close. The tool exists specifically to browse Codex sessions across projects from one command, resume directly into the correct project directory, rename sessions with persistent aliases, bulk delete sessions, and preview context before resuming. It is intended to work on Windows, macOS, and Linux as long as PowerShell 7 is available.

Open to suggestion and contribution to expand the compatibility.

What the tool does

The core idea is simple: treat all Codex sessions on your machine like a searchable workspace instead of isolated per-project artifacts.

With csx, I wanted the flow to feel closer to a fuzzy finder for thought history:

open one browser
search by folder, project, session number, or title
inspect enough context to avoid guessing
jump back into the right place

The current command set includes:

csx
csx browse
csx browse desktop
csx rename <session-id> --name "My friendly alias"
csx reset <session-id>
csx delete <session-id>
csx doctor

The interactive browser also supports keyboard actions like resume on Enter, multi-select with Tab, delete on Ctrl-D, rename on Ctrl-E, and reset on Ctrl-R. Search supports text queries, numeric session prefixes, and title filters such as title:<term> or t:<term>.

Why I made it this way

I did not want a heavy wrapper around Codex.

I wanted a tool that stays close to the terminal, does one job clearly, and fits into an existing CLI workflow.

That is why the stack is straightforward:

PowerShell 7 for portability and shell integration
fzf for fast selection and interaction
a user-local install flow so setup is minimal

The repo documents PowerShell 7, Codex CLI in PATH, and fzf in PATH as the main requirements. It also includes a self-bootstrapping install.ps1, a self-contained uninstall.ps1, and a doctor command to verify setup.

Installation

If you already have PowerShell 7, Codex CLI, and fzf, the recommended install is:

irm https://raw.githubusercontent.com/vinzify/Codex-Session-Hub/master/install.ps1 | iex
. $PROFILE
csx doctor

The README lists default install locations as %LOCALAPPDATA%\CodexSessionHub on Windows and ~/.local/share/codex-session-hub on macOS and Linux. It also documents uninstall via a matching one-line script.

What shipped in v0.1.0

The first tagged release is v0.1.0, published on March 20, 2026 on GitHub. The changelog describes it as the initial modular fzf-based release and lists global session browsing grouped by project, browser actions for resume/rename/reset/delete, direct CLI commands, preview panes, install and uninstall scripts, and CI updates.

For a first public version, that is enough surface area to be useful without pretending the project is more mature than it is.

What I care about next

The project is still early.

What matters now is whether it actually reduces recovery time when switching between projects and whether the browser model is the right abstraction for how people use Codex day to day.

That means the next useful feedback is not vague praise. It is concrete friction:

where session discovery still feels slow
what metadata is missing in preview
whether aliases are enough or tagging is needed
where the install flow breaks on real machines
whether PowerShell is the right default layer long term

Why open source this

Because this is the kind of tool that gets better from real usage patterns.

Everyone accumulates terminal scars a little differently. One person wants cleaner recovery. Another wants better naming. Someone else wants bulk actions because they treat sessions as disposable.

Open source is the easiest way to let the tool meet actual behavior instead of my assumptions about behavior.

Try it

If you use Codex CLI heavily and your session history is turning into junk drawer state, try it and tell me where it breaks:

GitHub repo: https://github.com/vinzify/Codex-Session-Hub

If the tool saves you time, good.

If it reveals that the current model is wrong, that is useful too.

That is the point.

The state of AI agents in March 2026, and how to build a topic-specific one

vinz — Thu, 12 Mar 2026 17:24:53 +0000

The state of AI agents in March 2026, and how to build a topic-specific one

A year ago, a lot of "agent" talk was just prompt theater wearing a trench coat.

A loop called a model, maybe hit one tool, maybe dumped some text into memory, and people called it autonomous. The demos were shiny. The reliability was not.

By March 2026, the interesting change is not that models suddenly became magical. The interesting change is that the surrounding infrastructure matured enough that agents are now useful in narrow, well-instrumented slices of real work.

That distinction matters.

An agent is not just "an LLM with a task." In practice, an agent is a system that can:

decide when to use tools
operate in a loop
retrieve context from external systems
keep state across steps
hand work to specialized components
expose traces so humans can inspect what happened
stay inside safety and policy boundaries

That is a very different beast from a chatbot with a longer prompt.

In this article, I want to do two things:

Give a clean snapshot of how the agent landscape actually changed by March 2026.
Show a practical tutorial for building a topic-specific agent instead of a vague "general AI employee" fantasy machine.

The big shift: from prompt wrappers to systems

The early agent wave mostly failed in predictable ways:

too much autonomy, not enough verification
too many tools, poorly described
brittle long-context behavior
no observability
no clear domain boundaries
no evals, only vibes

That produced agents that looked clever in demos and fell apart under repetition, ambiguity, or adversarial input.

The current generation is more grounded. The best teams now treat agents as software systems with probabilistic components, not as mystical employees in the cloud.

That shift shows up in five concrete changes.

1. Tools became first-class, not bolted on

A major shift in 2025 and early 2026 was the standardization of tool use.

Instead of building every agent around custom glue code, platforms started exposing built-in and structured tool interfaces for things like:

web search
file retrieval
code execution
browser or computer interaction
external APIs
remote tool servers

This matters because raw model intelligence is rarely enough. Useful work usually depends on external state.

Without tools, the model hallucinates.
With tools, it can at least fail against reality.

That does not make it automatically correct. It just means the system now has a way to check reality instead of freehanding nonsense like a sleep-deprived intern.

2. Agent frameworks got more opinionated

By March 2026, the ecosystem is much less "just write a while loop and pray."

The winning direction is not maximum flexibility. It is constrained orchestration:

explicit handoffs between specialized agents
typed tool interfaces
tracing and replay
guardrails and policy checks
state management
evaluation hooks

This is healthy.

The field had to learn the same lesson distributed systems learned long ago: once a workflow spans multiple steps, hidden state and silent failure become the real monster under the bed.

3. Protocols matter now

One of the most important structural changes is the rise of shared protocols for tool and context access, especially MCP, the Model Context Protocol.

That sounds boring. It is not. Boring infrastructure is where ecosystems become real.

A standard protocol means agents do not need bespoke integration logic for every tool source. It also means tool ecosystems can compound instead of fragmenting into provider-specific fiefdoms.

In plain English: the future is less "one giant assistant that owns everything" and more "many tools and data sources connected through common interfaces."

4. The best agents are vertical, not universal

This is the most useful practical lesson.

General-purpose agents remain fragile. Topic-specific agents are where the real value is.

Why?

Because narrow scope lets you control:

the tool set
the retrieval corpus
the failure modes
the success criteria
the review process
the output schema

That drastically improves reliability.

A research agent for accessibility guidance, a support triage agent for a known product surface, or a CI assistant for one codebase can be genuinely useful.

A fully autonomous do-anything agent is still mostly a very expensive way to generate surprise.

5. Observability and evals are finally part of the conversation

This is the least glamorous change and probably the most important.

In 2024, people asked, "Can the agent do the task?"

In 2026, the sharper question is, "Under which conditions does it fail, how often, and can we detect the failure before it hurts something?"

That is a better question because it treats the agent as an engineering system.

Serious teams now care about:

traces
tool call logs
refusal behavior
hallucination rates
routing accuracy
retry policy
cost per successful task
human escalation thresholds

That is how the field grows up.

What changed across the major ecosystems

Here is the short version, stripped of marketing perfume.

OpenAI

OpenAI pushed the ecosystem toward a more unified agent stack around the Responses API, built-in tools, the Agents SDK, and support for remote MCP-style tool access. The main pattern is clear: one API surface for multi-step, tool-using applications, plus orchestration primitives for handoffs, tracing, and stateful workflows.

Anthropic

Anthropic stayed very influential in the practical design philosophy around agents. Their materials strongly emphasize the distinction between workflows and agents, and they have continued investing in computer use, context engineering, long-running agent harnesses, and MCP-related tooling. That has shaped how many teams think about reliability.

Google

Google pushed heavily on research-style and multimodal agent workflows, including Deep Research and agent-oriented interfaces in the Gemini ecosystem. Their direction has been especially strong in search-heavy, synthesis-heavy, multi-step work.

Microsoft

Microsoft consolidated its story by positioning Microsoft Agent Framework as the successor direction that combines ideas from AutoGen and Semantic Kernel. That is a sign of ecosystem convergence: experiments are giving way to more production-oriented frameworks.

What agents are still bad at

March 2026 is not the dawn of artificial coworkers replacing half your org chart before lunch.

Agents are still weak or unreliable at:

open-ended tasks with fuzzy success criteria
long chains of action without verification
high-risk workflows involving money, privacy, or irreversible actions
ambiguous environments with poor tool descriptions
tasks that require hidden business context not present in retrieval or tools

The deepest recurring problem is simple:

agents amplify ambiguity.

If your task definition is sloppy, your tool design is vague, your retrieval corpus is noisy, or your success criteria are mush, the agent does not rescue the system. It magnifies the mess.

So the modern design rule is not "make the model smarter."
It is "make the problem legible."

Tutorial: build a topic-specific frontend accessibility research agent

Let us build something real and bounded.

Not a fake AGI office worker.
Not a twenty-agent cathedral of confusion.

We will build a frontend accessibility research agent that can:

answer questions about a specific accessibility topic
search the web for current guidance
retrieve from your internal notes or docs
return structured output with sources, recommendations, and caveats

This is useful because accessibility guidance changes, browser support changes, framework behavior changes, and internal design system constraints matter.

A generic assistant will often blur those layers together. A topic-specific agent gives you tighter control.

What we are building

Our agent will focus on one domain:

Accessible form validation for web apps

That means it should reason within a constrained surface:

labels and descriptions
error messaging
ARIA usage
keyboard flow
focus management
screen reader announcements
browser and framework caveats

The agent should not pretend to know everything about all accessibility topics. That restraint is a feature, not a bug.

Architecture

We will use a simple architecture:

A single specialist agent with a narrow system prompt.
Web search for current public guidance.
File search for your internal standards or design system docs.
A strict output schema.
Human review before any change is shipped.

That is already enough to be useful.

Why this works better than a general agent

Because we are constraining all the important dimensions:

domain: accessibility for forms
sources: current web references plus your internal docs
format: structured answer
tooling: only the tools needed for research
action space: analysis and recommendation, not autonomous deployment

That dramatically reduces chaos.

Step 1: install dependencies

npm install openai zod dotenv

We are using JavaScript here because dev.to and frontend people tend to enjoy staying in one runtime instead of spawning seven languages for sport.

Step 2: set up environment variables

Create a .env file:

OPENAI_API_KEY=your_api_key_here

Step 3: define the agent contract

Create accessibility-agent.js:

import OpenAI from "openai";
import { z } from "zod";
import "dotenv/config";

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

const AccessibilityResponseSchema = z.object({
  topic: z.string(),
  summary: z.string(),
  recommendations: z.array(
    z.object({
      title: z.string(),
      rationale: z.string(),
      priority: z.enum(["high", "medium", "low"]),
    })
  ),
  risks: z.array(z.string()),
  open_questions: z.array(z.string()),
  sources: z.array(
    z.object({
      title: z.string(),
      url: z.string(),
      source_type: z.enum(["web", "internal"]),
    })
  ),
});

const SYSTEM_PROMPT = `
You are a topic-specific frontend accessibility research agent.

Scope:
- Only answer questions about accessible form validation in web applications.
- Prefer current standards and implementation guidance.
- Use tools when needed instead of guessing.
- Separate standards, implementation advice, and assumptions.
- If evidence is weak or conflicting, say so explicitly.

Output rules:
- Return concise, structured analysis.
- Include actionable recommendations.
- Include risks and unresolved questions.
- Cite the sources you relied on.
- Do not invent standards, browser support, or assistive technology behavior.
`;

async function run(question) {
  const response = await client.responses.create({
    model: "gpt-5.4",
    input: [
      {
        role: "system",
        content: SYSTEM_PROMPT,
      },
      {
        role: "user",
        content: `Question: ${question}`,
      },
    ],
    tools: [
      { type: "web_search" },
      {
        type: "file_search",
        vector_store_ids: ["YOUR_VECTOR_STORE_ID"],
      },
    ],
    text: {
      format: {
        type: "json_schema",
        name: "accessibility_research_result",
        schema: {
          type: "object",
          properties: {
            topic: { type: "string" },
            summary: { type: "string" },
            recommendations: {
              type: "array",
              items: {
                type: "object",
                properties: {
                  title: { type: "string" },
                  rationale: { type: "string" },
                  priority: {
                    type: "string",
                    enum: ["high", "medium", "low"],
                  },
                },
                required: ["title", "rationale", "priority"],
                additionalProperties: false,
              },
            },
            risks: {
              type: "array",
              items: { type: "string" },
            },
            open_questions: {
              type: "array",
              items: { type: "string" },
            },
            sources: {
              type: "array",
              items: {
                type: "object",
                properties: {
                  title: { type: "string" },
                  url: { type: "string" },
                  source_type: {
                    type: "string",
                    enum: ["web", "internal"],
                  },
                },
                required: ["title", "url", "source_type"],
                additionalProperties: false,
              },
            },
          },
          required: [
            "topic",
            "summary",
            "recommendations",
            "risks",
            "open_questions",
            "sources",
          ],
          additionalProperties: false,
        },
      },
    },
  });

  const parsed = JSON.parse(response.output_text);
  const validated = AccessibilityResponseSchema.parse(parsed);

  console.dir(validated, { depth: null });
}

run(
  "What is the correct pattern for showing inline form errors accessibly in a React checkout flow, including aria-invalid, aria-describedby, focus handling, and live region usage?"
).catch((error) => {
  console.error(error);
  process.exit(1);
});

Step 4: add your internal docs

If you have internal accessibility notes, design system guidelines, QA checklists, or previous audit findings, put them in a vector store and connect that store to file_search.

The goal is not just to know public best practice.
The goal is to know your constraints.

For example, your internal docs might say:

your design system always renders helper text below fields
your error summary component already exists
your mobile checkout flow cannot steal focus aggressively
a specific screen reader bug has already been documented internally

That kind of context is where topic-specific agents become actually useful.

Step 5: keep the output narrow and inspectable

Do not let the agent free-write essays forever.

Force an answer structure like this:

summary
recommendations
risks
open questions
sources

That gives you three benefits:

Easier downstream rendering.
Easier human review.
Easier evals.

Free-form text feels smart. Structured text is easier to trust.

Step 6: test with adversarial prompts

Now test questions like:

"Should I use aria-live on every field error?"
"Can placeholder text replace labels if the form is simple?"
"Should focus always jump to the first invalid field?"
"Is aria-invalid enough on its own?"

These are useful because they expose overgeneralization.

A bad agent will answer with fake certainty.
A better one will distinguish:

what is required by standard guidance
what is implementation-dependent
what depends on the UX flow
what still needs manual validation with assistive tech

Step 7: add a lightweight evaluator

Even a tiny evaluator helps.

For example, create a checklist that scores whether the answer:

cited at least two sources
included at least one risk
separated evidence from assumption
stayed inside topic scope
avoided recommending placeholder-only labeling

Pseudo-code:

function evaluateAnswer(answer) {
  const failures = [];

  if (answer.sources.length < 2) {
    failures.push("Too few sources");
  }

  if (answer.risks.length === 0) {
    failures.push("No risks listed");
  }

  const textBlob = JSON.stringify(answer).toLowerCase();
  if (textBlob.includes("placeholder can replace label")) {
    failures.push("Unsafe labeling advice");
  }

  return {
    passed: failures.length === 0,
    failures,
  };
}

This is not glamorous. It is also how you stop your agent from becoming a chaos generator in a nice jacket.

Step 8: know when not to automate

This agent should not automatically:

patch production code
approve accessibility compliance
file legal conformance claims
override manual QA
claim screen reader compatibility without testing

Research support is a good fit.
Compliance authority is not.

That line matters.

How to make this stronger

Once the basic version works, improve it in this order:

1. Shrink the domain further

Instead of "frontend accessibility," focus on:

form validation
modal dialogs
table navigation
autocomplete widgets
date pickers

Narrower scope usually means better performance.

2. Improve source quality

Weight sources by trust level:

standards and specs
major accessibility references
browser or framework docs
internal audit reports
team conventions

Do not let random SEO soup outrank authoritative references.

3. Add source annotations

Ask the agent to label each claim as one of:

standard guidance
implementation recommendation
internal convention
hypothesis needing validation

That is a huge upgrade in clarity.

4. Add retrieval filters

Only search files tagged with things like:

accessibility
forms
design-system
validation
checkout

Less retrieval noise, fewer weird answers.

5. Add a second pass verifier

Use a second model pass to check:

unsupported claims
missing caveats
contradictory recommendations
source-less assertions

Multi-step verification is often more useful than adding more autonomy.

The deeper lesson

The future of agents is probably not one giant omniscient assistant doing everything.

It is more likely a messy ecosystem of:

narrow specialists
shared tool protocols
retrieval layers
policy gates
eval harnesses
human review loops

That sounds less cinematic.
It also sounds a lot more real.

The practical path in 2026 is not:

build an agent that can do anything

It is:

build an agent that can do one thing clearly, with bounded tools, inspectable outputs, and known failure modes

That is how you get something useful before the hype goblin eats your roadmap.

Final thought

Agents did evolve.

But the evolution was not from "dumb" to "intelligent employee."
It was from clever demo objects to tool-using software systems that can be reliable inside narrow boundaries.

That is progress.
It is also a much less magical story.

Which is fine.
Real engineering is usually less magical and more effective.

And honestly, that is the better trade.

What is your experience in launching a Founder Edition “lifetime” program?

vinz — Thu, 12 Mar 2026 13:58:58 +0000

I’m looking for advice from people who have launched early-access founder programs, especially the awkward middle ground between a custom service and a real product.

I’ve built something for a client witha friend that is working well enough to make me think there’s a product here, but it’s still rough around the edges.

The idea

I developed an Instagram-focused AI workflow for an ecommerce/products brand.

The interesting part is not “generate posts with AI.” That part is cheap and mostly commoditized.

What makes this useful is that it is grounded in the company’s actual product catalog and internal documentation.

It ingests things like:

product URLs
catalogs
product sheets
brand guidelines
internal docs
messy PDFs and raw files

From there, it builds structured product and brand context, so when it generates content it is working from actual product knowledge instead of generic prompting.

The result is much better than the usual AI content sludge.

It understands the product better, stays much closer to the brand voice, and is less likely to invent weird positioning or fake claims.

What it currently does

Right now the flow looks roughly like this:

ingest raw product and brand material
structure it into something usable
generate Instagram content from that context
let the founder or marketer review and tweak it
organize it into a content calendar / rotation

So the real value is not just generation. It is more like:

brand understanding
product-grounded content generation
less hallucinated nonsense
more usable output for ecommerce teams

Where I am now

I built this for one customer, and I want to onboard maybe 5 to 10 more ecommerce brands.

Not because I think it is ready for scale, but because I want to:

see whether the problem is consistent across brands
figure out which parts are actually productizable
understand where the current workflow still breaks
stop building in a vacuum

So I’m thinking about a Founder Edition or Founder Pilot style offer.

Something like:

limited spots
direct access to me
rough but high-touch
lower entry price than a polished SaaS later
feedback loop built into the relationship

I’ve also considered some kind of “lifetime” founder deal, but I’m wary of creating a dumb long-term liability when generation costs are variable.

That part smells like an easy way to make future-me miserable.

The question

For people who have done this before:

1. What has your experience been with Founder Edition / lifetime / early adopter offers?

Did it work?

Did it attract the right people, or mostly bargain hunters?

Did “lifetime” create expectation debt later?

2. Where do you find the right early users for something like this?

I’m specifically looking for people in ecommerce who are okay betting on a product that:

is useful already
is not polished yet
still needs iteration
will involve direct feedback and collaboration

Not people who expect a fully self-serve SaaS on day one.

I’m trying to find the kind of founder or product-led marketer who says:

this is rough, but the underlying thing is valuable enough that I want in early.

Where do those people actually hang out?

Twitter / X?
LinkedIn?
niche ecommerce communities?
founder groups?
direct outreach?
warm network only?

3. How would you frame the offer?

I’m still trying to find the cleanest positioning.

The honest version is probably something like:

high-touch founder pilot
Instagram-first for now
trained on your catalog and brand material
better than generic prompting because it understands the product
still early, still being shaped

What I do not want is to oversell this as a fully mature AI SaaS when it clearly isn’t.

My current hypothesis

My current guess is that the right users are not buying “AI content generation.”

They are buying:

less time wasted explaining their product over and over
less off-brand output
less generic content
a system that starts from what the company actually sells

That feels like a stronger wedge than “make Instagram posts with AI.”

But maybe I’m still too close to it.

Would appreciate blunt feedback

A few things I’d love input on:

Would you offer “lifetime” at all in this situation?
How would you structure pricing if generation costs are ongoing?
Where would you look for 5 to 10 early ecommerce users willing to work closely with you?
Does this sound like a real product wedge or just a dressed-up service?

Happy to share more detail if useful. I’m trying to pressure-test whether this should become a real product or stay a bespoke workflow.

Stop Tab-Switching for AI: I Built a Lightweight Rust Popup to Rephrase and Reply Instantly 🦀

vinz — Wed, 04 Mar 2026 19:01:37 +0000

The Problem: The "Copy-Paste" and "AI rephrase" Fatigue

How many times a day do you do this?

Write a rough email or a Slack message.
Realize it sounds too blunt or unprofessional.
Copy the text.
Alt-Tab to a browser.
Paste it into ChatGPT/Claude with a "make this better" prompt.
Copy the result.
Alt-Tab back and paste. `` It’s a workflow killer. I wanted a way to "fix" my writing exactly where I was typing, without the overhead of a heavy browser or a 500MB Electron app sitting in my RAM.

That’s why I built PhrasePoP.

What is PhrasePoP?

PhrasePoP is a minimalist, open-source desktop utility that "pops" up over any application via a global shortcut. It’s designed specifically to help you rephrase sentences, polish grammar, or turn quick bullet points into full email responses instantly.

✨ Key Features

Global Overlay: Trigger it anywhere with a hotkey. It stays hidden until you need it.
Rephrase & Reply: Turn "no time for this meeting" into a professional "I'm currently at capacity but would love to sync later."
Privacy & Local AI: It supports Ollama and LocalAI. If you don't want your data leaving your machine, you can run everything locally.
Cloud Support: Prefer speed? It also hooks into OpenAI, Anthropic, and other major providers.
Lightweight AF: Built with Rust and Tauri, so it uses minimal resources.

Why I Chose the Rust + Tauri Stack 🦀

As developers, we care about our system resources. I didn't want another Chrome-instance-masked-as-an-app.

Memory Footprint: By using Tauri, the frontend is rendered using the OS's native webview, and the backend logic is pure Rust. This results in an idle RAM usage of about 50MB—compared to the 400MB+ typical of Electron apps.
Security: Rust's memory safety makes handling clipboard data and API keys much more reliable.
Speed: The "Pop" needs to be instant. The bridge between Rust and the webview ensures that the UI feels snappy and responsive.

Privacy First: Local LLMs

One of the biggest hurdles for AI tools in a professional setting is privacy. Many developers (and companies) aren't comfortable sending every internal email draft to a third-party API.

PhrasePoP allows you to point to a local endpoint. If you have Ollama running a model like llama3 or mistral, PhrasePoP can use it as the engine. Your drafts stay on your hardware.

Open Source & Future

PhrasePoP is 100% open source. I built it to solve my own frustration, but I’d love to see how the community uses it.

How you can help:

Give it a star: If you find the concept cool! GitHub Link
Contribute: I'm looking for help with better Linux window management and more "Writing Mode" templates.
Feedback: What’s one writing task that drains your energy every day? Let’s automate it.

Check out the repo here: 👉 https://github.com/vinzify/PhrasePoP

I'd love to hear your thoughts in the comments! Do you prefer local LLMs for your workflow, or are you all-in on cloud APIs?

How I built an AI-powered Git context menu for Windows using Tauri and Rust

vinz — Sun, 01 Mar 2026 00:36:41 +0000

As developers, we commit code constantly. The annoying part is that quick commits tend to force a slow workflow:

Open a heavy IDE (often just to stage files and write a message), or
Run git add . && git commit -m "fix" in a terminal and hope you remember what changed.

I wanted the best parts of both worlds: visual staging like a GUI, but the speed of a terminal.

So I built GitPop: a lightweight Windows File Explorer extension that adds a modern Git commit UI to your right-click menu, with an optional local AI commit generator.

GitPop on GitHub

What GitPop does

From File Explorer, you can right-click inside any repo folder and choose GitPop Here to open a small popup that lets you:

See changed files instantly
Stage and unstage with a clean UI
Review diffs (without switching to a separate app) - Coming soon.
Generate commit messages from staged diffs using local models via Ollama (or your preferred API)

The core goal is simple: make the “small commit” workflow as fast as a shell command, but less blind.

Tech stack and why Tauri

For a context menu popup, the most important metric is startup time. If right-clicking a folder and selecting GitPop Here takes a noticeable moment, it feels broken.

GitPop uses:

Frontend: React + TypeScript + vanilla CSS (glassmorphism-style dark UI)
Backend: Rust
Framework: Tauri v2

I ruled out Electron because it ships a full Chromium runtime and commonly incurs large binary sizes and heavier memory overhead compared to a native-webview approach. Tauri uses the system webview (WebView2 on Windows) with a Rust backend, which fits the “open instantly” requirement much better.

The engineering challenges

Windows integration looks simple from the outside, but there are a few spicy corners. These were the big three.

1. Registering a File Explorer context menu entry (from Rust)

To show GitPop Here in the right-click menu, GitPop needs to register a command in the Windows Registry.

Instead of asking users to run a .reg file (which feels sketchy even when it is not), GitPop can do this via a Rust command in a “Setup Mode”.

This uses the winreg crate and registers under:

HKCU\Software\Classes\Directory\Background\shell\GitPop

That keeps the install per-user (no admin required) and binds the command to the app executable path:

use winreg::enums::*;
use winreg::RegKey;

#[tauri::command]
fn install_context_menu() -> Result<(), String> {
    let hkcu = RegKey::predef(HKEY_CURRENT_USER);

    let exe_path = std::env::current_exe()
        .map_err(|e| e.to_string())?
        .to_string_lossy()
        .into_owned();

    let bg_path = r#"Software\Classes\Directory\Background\shell\GitPop"#;
    let (bg_key, _) = hkcu
        .create_subkey(bg_path)
        .map_err(|e| e.to_string())?;

    bg_key.set_value("", &"GitPop Here").map_err(|e| e.to_string())?;
    bg_key
        .set_value("Icon", &format!("\"{}\"", exe_path))
        .map_err(|e| e.to_string())?;

    let (bg_cmd, _) = bg_key
        .create_subkey("command")
        .map_err(|e| e.to_string())?;

    // %V resolves to the clicked folder path for Directory\Background handlers.
    bg_cmd
        .set_value("", &format!("\"{}\" \"%V\"", exe_path))
        .map_err(|e| e.to_string())?;

    Ok(())
}

Why this approach works well:

It is self-contained and reversible
It avoids “copy this registry text and trust me” instructions
It stays compatible with existing Git setups because GitPop does not try to reconfigure Git

2. The “flashing terminal” bug when spawning Git on Windows

GitPop does not use libgit2. Instead, the Rust backend spawns native Git CLI commands like:

git status --porcelain
git diff --cached
git commit -m ...

This is intentional: the Git CLI automatically respects the user’s existing SSH keys, credential helpers, hooks, and global configs.

On Windows, though, naïvely calling Command::new("git") can cause a CMD window to flash briefly. It is the kind of micro-annoyance that makes an app feel janky.

The fix is to set a Windows-specific process creation flag so child processes run hidden:

use std::process::Command;

#[cfg(target_os = "windows")]
use std::os::windows::process::CommandExt;

#[cfg(target_os = "windows")]
const CREATE_NO_WINDOW: u32 = 0x08000000;

fn build_hidden_cmd(program: &str) -> Command {
    let mut cmd = Command::new(program);

    #[cfg(target_os = "windows")]
    {
        cmd.creation_flags(CREATE_NO_WINDOW);
    }

    cmd
}

From there, all Git calls use build_hidden_cmd("git") instead of Command::new("git").

3. Tauri v2 capabilities, window transparency, and the invisible app trap

I wanted a transparent, glassy popup. That means:

"transparent": true
Start hidden to avoid a white flash while React loads: "visible": false
Show the window once the UI is ready (window.show())

The catch is that Tauri v2 locks down frontend APIs by default. Without the right capability permissions, the window did not crash, it just stayed invisible while the process happily ran in the background.

The fix is to explicitly allow the window operations your frontend performs in capabilities/default.json:

{
  "permissions": [
    "core:window:default",
    "core:window:allow-show",
    "core:window:allow-close",
    "process:allow-exit"
  ]
}

This is one of those “security first” defaults that is correct, but it will absolutely prank you the first time you try to do anything window-related.

The sparkle button: AI commit generation (locally, by default)

Writing commit messages is small, but it adds friction. GitPop’s ✨ Sparkle button reduces that friction:

Stage files
GitPop runs git diff --cached
The staged diff is sent to an LLM to propose a commit message

Privacy matters. Shipping proprietary diffs to a cloud API is a non-starter for a lot of dev work. So GitPop defaults to Ollama, running locally. It detects installed models (for example llama3.2 or qwen2.5-coder) and generates commit messages without API keys, paid tokens, or network calls.

GitPop also supports OpenAI, Anthropic, Gemini, and custom endpoints for people who prefer hosted models. The model selection is an implementation detail. The UX goal is consistent: stage, sparkle, commit, done.

Try it out

If you are on Windows (Soon OSx) and this fits your workflow, grab the latest installer from the repository:

GitHub: https://github.com/vinzify/gitpop

Feedback, issues, and PRs are welcome. I am also exploring what it would take to bring the same “right-click commit UI” to macOS Finder, where the integration constraints are different but the pain is identical.

Happy committing.