DEV Community

Pramesh
Pramesh

Posted on

Stop Alt-Tabbing. Meet Code Sensei: The AI Workflow Loop for Your Terminal.

GitHub Copilot CLI Challenge Submission

This is a submission for the GitHub Copilot CLI Challenge


I Built a Terminal That Catches SQL Injections, Reviews Your Code, and Maps Your Architecture — All With a Single Keypress, Powered by GitHub Copilot CLI


What I Built

GitHub Repository: github.com/prameshshah/codesensei


Here is a problem every developer has faced:

You stare at a SQL injection vulnerability for an hour and never see it. You merge a conflict you did not fully understand. You commit code that a senior developer would flag in 30 seconds. You have zero idea what the architecture of a project you just inherited actually looks like.

Security scanners are expensive. Senior reviewers are busy. Architecture diagrams are drawn once and never updated. And your linter has absolutely no idea what your code actually does.

CodeSensei solves all of this — from your terminal — with a single keypress.


The Loophole Most Developers Miss

Here is what I noticed: most developers treat GitHub Copilot CLI as a one-off chat interface — ask a question, get an answer, close it. That leaves 90% of the value on the table.

The real power of gh copilot -p is that it is a programmable AI engine you can give a persona, a context, and a structured task. So instead of building another chat window, I built a keyboard-driven workflow engine that runs 6 specialist AI personas — each one focused, opinionated, and ready on demand.

The result is CodeSensei — a 3-panel terminal application that gives every developer instant access to:

Key Mode Persona
D Devil Mode Penetration tester — finds SQL injection, hardcoded secrets, command injection, weak cryptography
L Learn Mode Patient teacher — explains any codebase in plain English
R Review Mode Senior developer — full quality review with a 0–10 score and verdict
G Git Review Senior developer — reviews only your staged diff before you commit
C Conflicts Conflict resolver — explains both sides, recommends the correct resolution
B Blueprint Senior architect — instant full-project structure map for any language

Select a file. Press a key. Get expert output in seconds. No browser. No context switching. No waiting for a colleague.


What Makes It Different

1. It is a workflow engine, not a chatbot.
Each mode is a different specialist. Press D to find vulnerabilities, press R to score the quality, press G to review what you are about to commit. One tool, multiple expert opinions, zero context switching.

2. The UI never freezes.
Every AI call runs in a background thread via Textual workers. You can navigate the file tree and read previous results while Copilot is still running. Most CLI AI tools block the terminal entirely.

3. Git Review is scoped to the file you selected.
Not all staged files — only the file you are looking at. When you are reviewing auth.py, you get a review of auth.py, not a 3,000-line diff dump.

4. Blueprint works on every language, instantly.
Python, JavaScript, TypeScript, Java, C#, Go, Rust, Ruby, PHP, Swift, Kotlin, C/C++ — plus CSS selectors, HTML structure, JSON keys, CSV columns, YAML keys. Pure local parsing, no AI call needed, renders in milliseconds.

5. It works everywhere a terminal runs.
VS Code terminal, Vim split, SSH sessions, cloud dev boxes, GitHub Codespaces. One command: python app.py /your/project


Demo

GitHub Repository: github.com/prameshshah/codesensei

Quick Start

git clone https://github.com/prameshshah/codesensei.git
cd codesensei
pip install -r requirements.txt
gh extension install github/gh-copilot
python app.py /path/to/your/project
Enter fullscreen mode Exit fullscreen mode

Live Demo Sequence

1. python app.py /your/project
2. Select any file in the file tree
3. Press D  →  security vulnerability scan
4. Press R  →  code quality score (0–10)
5. Press B  →  instant full project blueprint (no file selection needed)
6. git add yourfile.py  (in a second terminal)
7. Press G  →  AI reviews only your staged changes
8. Press C on a file with merge conflicts  →  AI resolves it
9. Press L  →  plain English explanation of any file
Enter fullscreen mode Exit fullscreen mode

Screenshot — Application Overview


Screenshot — Devil Mode in Action


Screenshot — Blueprint Mode


Screenshot — Conflict Resolution Mode


My Experience with GitHub Copilot CLI

Building CodeSensei completely changed how I think about what GitHub Copilot CLI actually is.

Most people use gh copilot suggest or gh copilot explain — single-shot commands for one-off questions. But gh copilot -p (the planner mode) is something entirely different. It is a fully programmable AI engine you can give a persona, a codebase, and a specific mission.

The breakthrough insight was this: the quality of the output is entirely determined by the specificity of the persona you give it.

A generic prompt like "review this code" gives a generic answer. But a prompt like:

"You are a penetration tester with 10 years of red team experience. Your job is to find every exploitable vulnerability in this file. Be hostile. Be specific. Include line numbers. Rate each finding CRITICAL / HIGH / MEDIUM / LOW."

...gives you something that genuinely catches bugs you would miss.

I built 6 of these specialist personas — each one tuned for a specific job. The result is a tool that feels less like a chat assistant and more like having a security team, a senior engineer, and an architect available on demand, 24/7, in your terminal.


Merge Conflict Resolution — The Problem Nobody Talks About

Let me paint you a picture that every developer has lived through.

It is Friday afternoon. You have been working on a feature branch for three days. You go to merge it into main. Git returns this:

<<<<<<< HEAD (main branch)
def calculate_discount(user, cart):
    if user.is_premium:
        return cart.total * 0.20
    return 0
=======
def calculate_discount(user, cart, promo_code=None):
    if promo_code and promo_code.is_valid():
        return cart.total * promo_code.discount_rate
    return 0
>>>>>>> feature/promo-codes
Enter fullscreen mode Exit fullscreen mode

Two developers touched the same function. One added a premium member discount. One added promo code support. Neither knew the other was working on it. Now you have a conflict — and here is the real problem:

You were not the one who wrote either version.

You are looking at code you did not write, in a function you do not fully understand, under pressure to resolve it correctly without breaking either feature. The wrong merge silently ships broken discount logic to production. Nobody notices until a customer complains.

This happens every day on every team. And the standard tool for resolving it is reading both versions and guessing.

What CodeSensei Does

Press C on any file containing conflict markers. CodeSensei sends the entire file — markers and all — to GitHub Copilot with a specialist conflict resolver prompt:

⚡ CodeSensei — Conflict Resolution
Found 1 conflict(s) in file
─────────────────────────────────

CONFLICT 1 of 1

CURRENT (main):
  Applies a 20% discount for premium users.
  Simple flat-rate logic tied to user membership status.

INCOMING (feature/promo-codes):
  Applies a variable discount using a promo code object.
  Supports dynamic discount rates but ignores premium membership entirely.

THE PROBLEM:
  Both are valid features. Neither version preserves the other.
  A naive merge silently drops either premium discounts or promo code
  support. Your tests may not catch this.

RECOMMENDATION:
  Merge both. The correct resolution:

  def calculate_discount(user, cart, promo_code=None):
      discount = 0
      if user.is_premium:
          discount = cart.total * 0.20
      if promo_code and promo_code.is_valid():
          promo_discount = cart.total * promo_code.discount_rate
          discount = max(discount, promo_discount)
      return discount

  This preserves both features and applies whichever gives the customer
  the better deal — which is almost certainly the intended behaviour.
Enter fullscreen mode Exit fullscreen mode

That is not a guess. That is an AI that read both versions, understood the business intent behind each one, identified the silent failure mode, and gave you a merged version that preserves both features.


Git Review — Your Last Line of Defence Before the Commit

Most code reviews happen after the code is merged. A pull request goes up, a colleague reviews it the next morning, and feedback arrives 18 hours after the code was written. By that point the developer has context-switched, forgotten why they made certain decisions, and has to mentally reload the entire change.

CodeSensei's Git Review Mode flips this entirely.

The Real World Scenario

You have been working on auth.py. You added JWT token validation and updated the login endpoint. You are ready to commit. Before you do:

git add auth.py
Enter fullscreen mode Exit fullscreen mode

Select auth.py in CodeSensei. Press G.

⚙ CodeSensei — Git Review
Staged diff of: auth.py
⏱ Response: 8.2s
─────────────────────────────────

REVIEWING: auth.py (staged diff only — 47 lines changed)

CRITICAL — Security Issue (Line 34):
  JWT token is being validated with algorithm="none" allowed in the
  decoder options. This is a known vulnerability — an attacker can
  forge a token by setting the algorithm to "none" and removing the
  signature. Your server will accept it as valid.

  Fix:
  jwt.decode(token, SECRET_KEY, algorithms=["HS256"])

HIGH — Missing Rate Limiting (Line 52):
  The /login endpoint has no rate limiting. An attacker can make
  unlimited login attempts. Add rate limiting before merging.

MEDIUM — Token Expiry Too Long (Line 28):
  JWT expiry set to 30 days. Standard is 15 minutes for access tokens
  with a separate refresh token. A stolen token is valid for a month.

SUGGESTION — Error Message Leaks Information (Line 61):
  "Invalid password for user {username}" reveals the username exists.
  Use a generic "Invalid credentials" message instead.

Quality Score: 5/10 — Needs Work
Verdict: Do not merge to main without addressing the CRITICAL issue.
Enter fullscreen mode Exit fullscreen mode

You caught a JWT algorithm vulnerability before it reached the repository — not in a PR review 18 hours later.

Why Scoped to the Selected File?

When you stage multiple files and run git diff --staged, you get a wall of changes. The AI processes all of it, the output is unfocused, and the review becomes generic.

CodeSensei only reviews the diff for the file you are looking at. One file. One focused review. One clear verdict.


The Engineering Challenge

The biggest technical challenge was making gh copilot -p produce structured, deterministic output instead of interactive responses. Copilot's planner mode is designed to be conversational — it reads files, asks follow-up questions, and confirms before acting. Great for interactive use, completely wrong for a TUI that needs silent structured output.

The solution was inline code embedding — sending actual code content directly in the prompt instead of a file path reference, combined with carefully crafted prompts that leave no room for follow-up questions.

Conflict Mode uses: "Do NOT ask follow-up questions — give me the resolution directly."

Blueprint Mode bypasses Copilot entirely for the structure diagram — pure local AST parsing renders instantly regardless of project size. The AI is only called when the user explicitly needs architectural analysis at the file level.


Future Work — AI Provider Fallback System

Here is the honest limitation of any tool built on a single AI provider: quotas run out.

GitHub Copilot gives you a monthly allocation of premium requests. Power users — the exact developers who would use CodeSensei most heavily — are the ones most likely to hit that limit mid-session. Right now, when that happens, you see a 402 error and the tool stops. That is unacceptable for a professional development tool.

The Vision

The next major version of CodeSensei implements an intelligent AI provider fallback chain — when one provider is unavailable, it automatically routes to the next best option with zero user intervention.

Every AI request flows through a provider router:

┌──────────────────────────────────────────────────────────┐
│                     Provider Router                       │
│                                                          │
│  1. GitHub Copilot CLI    ← always tried first           │
│           │                                              │
│           ▼  402 quota exhausted / unavailable           │
│  2. Ollama (local)        ← free, offline, fully private │
│           │                                              │
│           ▼  not installed                               │
│  3. Groq API              ← free tier, fastest inference │
│           │                                              │
│           ▼  no API key configured                       │
│  4. Google Gemini         ← generous free tier           │
│           │                                              │
│           ▼  no API key configured                       │
│  5. OpenAI API            ← paid, universal fallback     │
└──────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

The same 6 specialist prompts — Devil Mode, Learn, Review, Git Review, Conflicts, Blueprint — would be sent to whichever provider is available. The results panel would show which provider responded:

🔥 CodeSensei — Devil Mode
Provider: Ollama (llama3.2) — Copilot quota reached, using local fallback
⏱ Response: 2.1s
─────────────────────────────────
[findings here — identical format, different engine]
Enter fullscreen mode Exit fullscreen mode

Why This Architecture Matters

Privacy-first by design.
Ollama runs entirely on your local machine. No code ever leaves your network. For developers working on proprietary or sensitive codebases, this is not a nice-to-have — it is a compliance requirement. A tool that routes to a local model when Copilot is unavailable is a tool that enterprises can actually deploy at scale.

Always-on reliability.
A development tool that stops working when an API quota runs out is not a professional tool — it is a prototype. The fallback chain means CodeSensei keeps working through quota exhaustion, network issues, and API outages. The workflow never breaks.

Cost-aware routing.
The router knows which providers are free (Ollama, Groq free tier), which are paid (OpenAI), and routes to the cheapest available option first. Developers should not pay for AI inference when a free local model can do the same job.

Same output format regardless of provider.
Every provider uses identical prompt templates and output parsing. Whether the response comes from Copilot, Ollama, or Gemini — the results panel looks exactly the same. The developer never thinks about which AI is running underneath.

The Broader Vision

CodeSensei started as a tool that orchestrates GitHub Copilot CLI into a structured workflow engine. The fallback system extends that vision: the workflow should be resilient, not dependent on any single provider.

The six specialist personas — penetration tester, teacher, senior developer, git reviewer, conflict resolver, architect — are the product. The AI provider is the engine. And like any well-engineered system, the engine should be swappable without the user ever noticing.

Other Planned Features

Feature Key Description
Export Mode E Save any analysis result as a .md file for documentation or PR descriptions
History Mode H Browse previous analysis results within the session
Trace Mode T Toggle full prompt visibility for learning and transparency
File Search / Filter the file tree as you type for large projects
Team Report Share a structured CodeSensei analysis report across a team

Built with Python · Textual · GitHub Copilot CLI

Top comments (1)

Collapse
 
notfritos profile image
Ryan Lay

I really appreciate the attention to detail on the model fallback flow you put in here. I think the personas have some potential in the future!