Mohist

Posted on Mar 28

Sextant: Making Claude Code Read Your Code Before Changing It

#ai #claudeai #mcp #harness

!(https://dev-to-uploads.s3.amazonaws.com/uploads/articles/mbmjeg2peb497j1m593e.png)

An architecture-aware engineering principles framework for Claude Code, built for real work on existing codebases: bug fixes, feature work, refactoring, code review, migration, and more. Instead of treating every request the same way, Sextant routes work by task type, applies rules that match the size of the change, and makes principle conflicts explicit instead of leaving them to guesswork.

1. Why Sextant Exists

If you have used Claude Code on anything more serious than a toy project, you have probably seen the same patterns.

Sometimes it starts editing too early. It reads the error trace, opens one nearby function, and jumps into a fix before it has actually located the root cause.

Sometimes it ignores the shape of the existing system. You ask for a feature, and instead of extending the pattern that is already there, it invents a new one.

And sometimes it applies the wrong amount of process. A tiny local change gets treated like an architecture exercise, while a cross-module change gets handled as if it were just a one-file patch.

Sextant is built around a simple idea: first establish a safe baseline, then figure out what kind of task this is, and then apply the rules that make sense for that kind of change. The goal is not to add ceremony. The goal is to help Claude Code make better decisions before it starts modifying code.

2. Task-Type Routing Instead of One Generic Workflow

Sextant is not one monolithic prompt. It is a set of specialized sub-skills, and Claude Code automatically matches the user's request to the most relevant one.

That matters because a bug fix should not behave like a migration, and a code review should not behave like a feature implementation.

Here is the structure at a high level:

Task Type	Sub-Skill	Key Behavior
Bug Fix	`sextant:fix-bug`	Disambiguation gate → surgical minimal-change fix → confirmation gate
New Feature	`sextant:add-feature`	Architecture research → integration strategy → optional TDD
Modify/Refactor	`sextant:modify-feature`	Disambiguation gate → impact analysis → confirmation gate
Code Review	`sextant:review-code`	Declare review mode → multi-dimension review → classified output
Debug	`sextant:debug`	Bisection protocol → hypothesis limit gate → handoff to fix-bug
Security Audit	`sextant:security`	Input validation, auth/authZ, sensitive data, and dependency review
Migration	`sextant:migrate`	Leaf-first migration sequence → per-module validation
Sprint Planning	`sextant:plan`	Dependency-ordered task list → execution pipeline entry
Ship/PR Prep	`sextant:ship`	Pre-ship checklist → PR description generation
Write Tests	`sextant:write-tests`	Bug reproduction test entry → boundary matrix design
Requirements	`sextant:refine-requirements`	Eliminate ambiguity → feasibility assessment → task breakdown
General Coding	`sextant`	Lightweight tasks and exempt scenarios

One detail I like here is the disambiguation step. For example, the bug-fix skill first asks whether the issue is actually broken behavior or whether the requirement changed. If it is the latter, Sextant redirects the work to modify-feature instead of pretending it is a bug. That is a small design choice, but it prevents a lot of wasted motion.

3. Different Task Sizes Activate Different Rules

Another thing Sextant does explicitly is treat small and large changes differently.

The same fix-bug skill does not always run with the same rule depth. Instead, the rules that get activated depend on the task size:

Lightweight: single-function tweaks, config changes, and other narrow edits. These use the baseline rules, anti-pattern detection, and direct execution.
Medium: module-internal changes or work that introduces new functions or classes. These add checks like SRP, DRY, and interface contract review.
Large: cross-module changes or public interface changes. These bring in broader checks such as impact analysis and architecture compliance review.

This is not left to vibes. Sextant defines a five-factor Impact Radius Scorecard based on things like files changed, public interface changes, dependency direction changes, data structure changes, and downstream blast radius. That score determines whether the work is treated as lightweight, medium, large, or architectural.

What matters here is not the labels themselves. It is that the rule activation is explicit. Small tasks stay small. Bigger tasks do not get to pretend they are small just because the initial diff looks simple.

4. Principle Conflicts Are Handled as Decisions, Not Vibes

A lot of engineering advice sounds reasonable in isolation.

Keep things DRY. Avoid overengineering. Respect boundaries. Minimize change. Prefer secure choices.

The hard part is when those principles collide.

Sextant makes that collision explicit. Instead of vaguely saying "balance the tradeoffs," it defines arbitration rules for common conflicts.

A few examples:

DRY vs YAGNI: if shared logic has only one caller, YAGNI wins.
OCP vs YAGNI: if an interface has only one implementation, ship the concrete class directly.
DRY vs Layer Boundaries: if deduplication would cut across boundaries, the boundary wins.
Baseline vs Minimal Change: if you are fixing a bug and notice nearby baseline issues, minimal change usually wins.
Security vs Convenience: security wins.

That matters more than it sounds. A lot of AI-generated changes go off track not because the code is syntactically wrong, but because the model does not know which principle should take priority in context. Sextant's approach is to reduce that ambiguity.

5. Where Sextant Fits Compared with Superpowers and gstack

In the Claude Code skills ecosystem, Superpowers (by Jesse Vincent / obra) and gstack (by Garry Tan) are two well-known frameworks. The comparison is not really about which one is "best." It is about what kind of work each one is optimized for.

Design Philosophy at a Glance

Dimension	Superpowers	gstack	Sextant
Core idea	Process-driven: brainstorm → plan → implement, TDD-first	Role-driven: CEO / Designer / Eng Manager / QA role-playing	Architecture-aware: route by task type, scale rules by task size
Trigger mechanism	Slash commands (`/brainstorm`, `/execute-plan`)	Slash commands (`/office-hours`, `/ship`, `/qa`)	Auto-matching (Claude Code selects sub-skill based on prompt content)
TDD approach	Mandatory: code written before tests gets deleted	Part of `/ship` workflow	Configurable: `.sextant.yaml` supports `tdd: off / default_on / enforce`
Architecture principles	Mentions YAGNI and DRY but no systematic principle framework	Does not cover code-level architecture principles	Full SOLID + DRY/YAGNI + architecture constraints + principle conflict arbitration
Paradigm adaptation	No explicit paradigm detection	No explicit paradigm detection	8 paradigms auto-detected (backend layered, frontend component tree, CLI, FP, monorepo, event-driven, serverless, AI/ML pipeline)
Scale adaptation	Indirect via task decomposition (2–5 min micro-tasks)	Not addressed	Explicit 3-tier scaling + Impact Radius Scorecard
Sub-agents	Core feature: subagent execution + two-stage code review	Supports parallel sessions via Conductor	No sub-agent dependency; completes within a single session
External dependencies	None (pure Markdown skills)	Requires Bun runtime + compile step	Requires Python 3 (only for `strip_frontmatter.py`)
MCP dependency	No core MCP dependency; optional extensions (e.g., superpowers-chrome)	`/browse` and `/qa` require a custom browser automation toolchain; integrates Greptile for PR review	No core MCP dependency; GitNexus MCP tools conditionally injected when `.gitnexus/` detected — optional, not a prerequisite
Project config	None	`~/.gstack/config.yaml`	`.sextant.yaml` (TDD mode, principle weight overrides, predefined profiles)

A table like this can make the differences look cleaner than they feel in real use, but it does help anchor the comparison.

One practical difference is how you get into a workflow. Superpowers and gstack both lean on explicit slash commands, so the user chooses the mode up front. Sextant leans the other way: you describe the task in natural language, and Claude Code matches it to a sub-skill. Neither model is universally better. One gives you more explicit control; the other asks you to remember less while you are already in the middle of debugging or implementation.

The TDD row shows a similar difference in philosophy. Superpowers is prescriptive about test-first development. Sextant treats it as a project-level setting. That is less a question of capability than of taste: should the framework enforce a working style, or adapt to the one the project already uses?

Where Each Framework Shines

This is where the comparison becomes more useful than the feature list. Superpowers makes the most sense when the team wants a structured loop for new work: brainstorm, plan, implement, and keep TDD front and center. gstack is more helpful when the missing piece is not code structure but role coverage — product questions, design feedback, QA flows, and the kind of prompts a solo builder or small team would otherwise have to simulate manually.

Sextant is aimed at a different problem: everyday engineering work inside an existing system. When the codebase already has architecture, module boundaries, compatibility constraints, and a lot of history, what matters most is not brainstorming or role-play. You need the AI to understand the code it is about to change, route the task correctly, and apply checks that match the scope of the change. That is Sextant's center of gravity.

What Stands Out in Practice

What makes Sextant stand out is not that it tries to do everything. It is that a few decisions are made explicit instead of being left implicit: task-type routing, principle conflict arbitration, confirmation before riskier edits, visible progress during multi-step work, and optional GitNexus acceleration. The architecture-paradigm support is part of the same pattern. A layer violation does not look the same in a backend service, a frontend component tree, or a monorepo, so the framework tries to judge the same principle in context rather than as a flat rule.

6. Why GitNexus Changes the Workflow

One of the more practical parts of Sextant is how well it lines up with GitNexus.

Claude Code fundamentally operates at the text level. It reads files, searches symbols, and traces call chains one hop at a time. That is fine in smaller projects, but it becomes expensive and incomplete in larger codebases.

Three bottlenecks show up quickly:

grep finds text matches, not relationships
call-chain tracing becomes hop-by-hop manual exploration
"what breaks if I change this?" becomes difficult without a global view

GitNexus addresses that by indexing the codebase as a knowledge graph. It parses ASTs, extracts relationships such as calls, imports, and inheritance, stores them in a queryable graph, and exposes tools through MCP. That means the agent can ask relationship-oriented questions directly instead of reconstructing everything from text search.

Just as importantly, the integration is optional.

Sextant checks whether a .gitnexus/ directory exists. If it does, the GitNexus tool guidance is injected. If not, the same workflow falls back to grep and file reading. In other words, GitNexus is an accelerator, not a requirement.

7. What GitNexus Looks Like in Practice

The difference becomes clearer in actual workflows.

Bug Fixing

Without GitNexus, locating root cause means grepping for error keywords, opening files one by one, tracing callers manually, and estimating impact as you go.

With GitNexus, the same step can use graph queries to locate relevant symbols, inspect caller/callee context, trace the execution path, and enumerate affected upstream callers.

The workflow is the same. What changes is how expensive each step is.

New Feature Work

When adding a feature, Sextant wants to understand the existing architecture and find the most relevant reference module.

Without GitNexus, that means browsing directories and inferring from filenames and code.

With GitNexus, the agent can search for semantically related modules, inspect a module's dependency relationships, and look for existing extension points like strategies, factories, registries, or plugins. That makes the integration decision less speculative.

Migration

For migrations, the job is not just to find direct references to an old API, but to understand the impact scope and migration order.

Without GitNexus, that is mostly grep.

With GitNexus, the agent can query upstream impact and detect circular dependencies that affect sequencing. That lines up directly with Sextant's leaf-first migration principle.

8. A Real Example: Two Bugs in a C++ Orchestrator

The strongest part of the story is the execution example, because it shows how these ideas play out in an actual bug-fix flow.

The project is a C++ orchestrator for a voice interaction system involving VAD, ASR, TTC, and related components. The user reported multiple issues in orchestrator.cpp, which triggered Sextant's fix-bug skill.

Step 1: Read First, Then Diagnose

!(https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4yqgw86i0443t0pep5jh.png)

Sextant starts by loading the skill, showing the progress block, and reading the relevant file.

!(https://dev-to-uploads.s3.amazonaws.com/uploads/articles/y6wvojm5e23atmblig0u.png)

After analysis, it produces four findings. Two of them are explicitly judged to be not bugs in scope because they are intentional stubs marked for a later task. The other two are treated as real gaps:

on_asr_result() can transition to PROCESSING even when TTC still indicates CONTINUE_LISTENING
graceful_shutdown() persists state to the orchestrator's own database connection, but checkpoints a different SQLite connection instead

That first distinction matters. It shows the system is not simply treating unfinished-looking code as broken code.

Step 2 and Step 3: Ask Before Editing

!(https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rubgmgokfual04p6ry1z.png)

For the TTC-related issue, Sextant presents several repair options to the user through an interactive UI, including a recommended option to gate on_asr_result() on the CONTINUE_LISTENING decision.

The important point here is not the exact UI. It is that the fix plan is surfaced before code changes happen. The user sees the root cause, the proposed change shape, and the risk, then confirms the direction. That is the confirmation gate in action.

Step 4: Verify and Report Limits

!(https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dpwmvi8aota4mfhvxhav.png)
After the fixes, Sextant runs boundary validation and summarizes what was verified:

the TTC gating issue is fixed
the normal path is unaffected
the gate acts as a safety net for edge cases
the correct database is now checkpointed
the build is clean and no callers are affected

It also explicitly notes what was not covered by automated timing tests, including the <20ms barge-in path and the 5ms AED window.

That kind of verification summary is one of the clearest parts of the example. It is not just "fixed." It is "fixed, here is what we checked, and here is what still needs real testing."

9. What the Example Actually Shows

The orchestrator example ends up making the point more clearly than any feature list could.

It shows that Sextant is trying to produce a particular kind of behavior:

Read before acting

Do not jump from symptom to edit.
Keep bug fixes small when they should stay small

In the example, the final changes are narrow and local.
Ask before applying non-trivial fixes

The user sees the plan before the patch lands.
Make execution visible

Progress blocks show what step the workflow is in.
Be honest about verification

Risks and untested edges are called out explicitly.

That is really the best summary of what Sextant is for. It is less about "AI that writes code faster" and more about "AI that changes code with better judgment."

Installation

# Option 1: Official Claude marketplace
/plugin install sextant@claude-plugins-official

# Option 2: Via GitHub marketplace
/plugin marketplace add hellotern/sextant
/plugin install sextant@mohist-plugins

# Option 3: Team configuration (commit to repo, team gets it on checkout)
# Configure in .claude/settings.json

See the GitHub repository for full installation instructions and documentation.

Closing Thought

Sextant is not presented as a replacement for every other Claude Code framework. Superpowers and gstack solve different problems.

What Sextant is trying to do is narrower and, to me, more grounded: help Claude Code work more reliably inside existing codebases, where architecture, task shape, change scope, and principle conflicts all matter before the first edit is made.

Principles are tools, not chains. The goal is not more process for its own sake. The goal is to make better changes with lower long-term maintenance cost.

DEV Community