Here's what happens when an AI agent edits your code without the right tools:
# Agent task: "Add error handling to the login function"
# Agent response: *regenerates 200 lines, introduces syntax error*
def login(username, password:
# ^ Missing closing parenthesis - build breaks
Standard tools write the broken code to disk. Your CI fails. You debug. You waste time.
Code Scalpel catches this before it touches disk. AST parser validates syntax → edit rejected → error logged. Your build never breaks from agent hallucinations.
This is one of the Four Pillars that make Code Scalpel different from every other AI code tool.
The four pillars of Code Scalpel
1. Cheaper AI: 99% context reduction
Instead of feeding 10 full files (15,000 tokens) to an LLM, Code Scalpel's PDG Engine surgically extracts only the relevant function and its dependencies.
Example: Agent needs to understand process_payment
Without Code Scalpel (naive approach):
- Read entire payments.py: 3,500 tokens
- Read models.py (imports): 2,800 tokens
- Read notifications.py (dependency): 1,200 tokens
- Read stripe config: 800 tokens
- Total: 8,300 tokens
- LLM context: 8,300 tokens of mostly irrelevant code
- Cost: ~$0.025 per call (gpt-4)
With Code Scalpel (surgical extraction):
# Tool: extract_code with dependencies
result = extract_code(
file="payments.py",
symbol="process_payment",
include_dependencies=True
)
Returns:
- process_payment function: 25 lines
- validate_amount (dependency): 10 lines
Relevant imports only: 3 lines
Total: ~200 tokensCost: ~$0.0006 per call
A Savings of 97.6% in tokens = 97.6% cost reduction
Real-world numbers:
- Before Code Scalpel: 15k tokens/call × 100 calls/day = 1.5M tokens/day
- After Code Scalpel: 200 tokens/call × 100 calls/day = 20k tokens/day
- Savings: 98.7% fewer tokens = $450/month → $22/month
Bonus: Smaller context = better focus. The model doesn't get distracted by irrelevant code.
Why this matters:
AI code tools are expensive. Code Scalpel makes them 40-50x cheaper by sending only what matters.
2. Governable AI: the invisible audit trail
Compliance isn't sexy, but it's required. Code Scalpel creates a .code-scalpel/audit.jsonl trail for every operation.
Provenance: We log the decision path (Graph Trace), not just the output.
{"timestamp": "2026-02-18T15:05:00Z", "tool": "extract_code",
"file": "auth.py", "symbol": "login", "graph_trace": [...],
"agent_id": "security_reviewer", "policy_checked": true}
Integrity: Our verify_policy_integrity tool cryptographically ensures your AI follows your security rules without drift.
- Your policy: "Never modify functions tagged @security_critical"
- Agent tries to edit decorated function
- Code Scalpel: Policy violation → Edit blocked → Logged
- Returns: cryptographic hash of all policy checks
- Any deviation is caught immediately
verify_policy_integrity(policy_file=".code-scalpel/policy.yaml")
Why this matters: SOC2, ISO 27001, HIPAA compliance requires audit trails. Code Scalpel gives you provenance for every AI decision.
3. Accurate AI: the end of hallucination
When Code Scalpel says "This function has 3 callers," it's a Graph Fact, not an LLM guess.
Example: Agent needs to rename a function
Without Code Scalpel (LLM hallucination):
# Agent thinks: "authenticate_user is called in 2 places"
# Reality: Called in 5 places, 3 are in imported modules
# Result: Rename breaks 3 call sites
With Code Scalpel (Graph Fact):
# Tool: get_symbol_references
result = get_symbol_references(file="auth.py", symbol="authenticate_user")
# Returns:
{
"definition": "auth.py:45",
"references": [
"auth.py:102",
"middleware.py:23",
"api.py:156",
"tests/test_auth.py:34",
"admin.py:89"
],
"call_count": 5 # Graph Fact, not LLM guess
}
Symbolic Execution: We use the Z3 solver to mathematically explore edge cases that humans (and LLMs) miss.
# Agent task: "Is this path traversal vulnerable?"
def read_file(user_filename):
if not user_filename.startswith('/tmp/'):
return "Invalid path"
path = f"/var/uploads/{user_filename}"
return open(path).read()
# LLM might say: "Looks safe, checks for /tmp/"
# Z3 proves: user_filename="../../../etc/passwd" bypasses check
# Result: Vulnerability confirmed with mathematical proof
Why this matters: Accuracy builds trust. Graph facts eliminate the "AI said it, but was it right?" problem.
4. Safer AI: the syntax-aware gatekeeper
We verified this in our recent Community Tier Report: Code Scalpel parses every AI edit before writing to disk.
Scenario: Agent hallucinates a missing parenthesis.
Standard Tool:
# Agent generates broken code
def login(username, password: # Missing )
validate_credentials(username, password)
# Tool writes to disk → Build breaks → CI fails → Dev debugs
Code Scalpel:
# Agent generates same broken code
# Code Scalpel AST Parser validates BEFORE write:
ast_result = parse(agent_code, language="python")
if ast_result.errors:
# Edit rejected
# Error logged to audit trail
# Agent receives: "Syntax error: line 1, missing ')'"
# Agent tries again
# Only syntactically valid code reaches disk
Real-world impact:
- 0 broken builds from syntax errors
- 0 manual debugging of agent hallucinations
- Faster iteration (agent gets immediate feedback)
Why this matters: AI agents make mistakes. Code Scalpel catches them before they break your codebase.
How it works: AST + PDG + Graph facts
Code Scalpel doesn't use regex or text patterns. Everything is based on Abstract Syntax Trees and Program Dependence Graphs.
1. Parse code into AST
# Tool: analyze_code
ast = analyze_code(file="auth.py", language="python")
# Returns structural representation:
{
"functions": ["login", "logout", "authenticate_user"],
"classes": ["AuthManager", "User"],
"imports": ["hashlib", "jwt", "datetime"],
"control_flow": {...}
}
Tree-sitter parses Python, JavaScript, TypeScript, and Java with 100% accuracy.
2. Build Program Dependence Graph (PDG)
The PDG tracks:
- Control dependencies: What affects what executes
- Data dependencies: How data flows through variables
# Example:
if user.is_admin: # Control dependency
data = load_config() # Data dependency
process(data) # Both dependencies
# PDG knows:
# - process() depends on load_config() for data
# - Both depend on user.is_admin check for execution
3. Extract graph facts
When an agent asks "Where is this used?", Code Scalpel walks the graph:
# Tool: get_symbol_references
refs = get_symbol_references(file="config.py", symbol="api_key")
# Graph walk finds:
# - Definition: config.py:12
# - Assignment: config.py:45
# - Read: auth.py:23, api.py:67
# - Cross-file: services/payment.py:102
# Result: Graph Fact (5 references), not LLM guess
4. Validate before write
Every edit goes through AST validation:
# Agent generates edit
new_code = agent_response.code
# Parse BEFORE writing
ast_result = parse(new_code, language="python")
if ast_result.has_errors():
log_to_audit(
action="edit_rejected",
reason="syntax_error",
details=ast_result.errors
)
return {"success": False, "error": ast_result.errors}
# Only valid syntax reaches disk
write_file(path, new_code)
log_to_audit(action="edit_applied", hash=sha256(new_code))
MCP integration: 23 tools for AI agents
Code Scalpel is an MCP (Model Context Protocol) server. AI agents get 23 specialized tools.
Setting up Code Scalpel as an MCP server
Code Scalpel runs as an MCP server that provides tools to your AI coding environment. It doesn't execute as part of your agent's code - instead, your agent calls Code Scalpel's 23 tools when it needs precise code operations.
Claude Desktop (most popular):
Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%/Claude/claude_desktop_config.json (Windows):
{
"mcpServers": {
"codescalpel": {
"command": "uvx",
"args": ["codescalpel", "mcp"]
}
}
}
Restart Claude Desktop. Now Claude can call Code Scalpel tools like extract_code, rename_symbol, etc.
VS Code (with Continue or other MCP extensions):
Install an MCP-compatible extension, then add to your MCP config:
{
"mcpServers": {
"codescalpel": {
"command": "uvx",
"args": ["codescalpel", "mcp"]
}
}
}
Cursor IDE:
Cursor supports MCP servers. Add to your Cursor settings:
{
"mcp": {
"servers": {
"codescalpel": {
"command": "uvx",
"args": ["codescalpel", "mcp"]
}
}
}
}
Windsurf IDE:
Similar to Cursor - configure MCP server in settings.
Advanced: Agent frameworks (AutoGen, CrewAI, etc.)
If you're building custom agents with frameworks, Code Scalpel works as an MCP tool provider:
# Example: AutoGen (if MCP support available)
from autogen import AssistantAgent
code_agent = AssistantAgent(
name="CodeEditor",
system_message="You use Code Scalpel tools for precise code operations.",
# Framework-specific MCP configuration
)
Note: Framework MCP support varies. Code Scalpel is primarily designed for IDE/desktop AI assistants (Claude, VS Code, Cursor) where MCP is a first-class citizen.
The 23 tools
Surgical code operations:
-
analyze_code- Parse AST structure (graph facts) -
extract_code- Extract function/class with dependencies (context reduction) -
update_symbol- Safe in-place edits (syntax validated) -
rename_symbol- Rename across files (graph-based accuracy)
Graph facts (accuracy):
-
get_file_context- File overview without full read -
get_symbol_references- All usages (not LLM guess) -
get_call_graph- Function call relationships -
get_cross_file_dependencies- Import chains -
get_graph_neighborhood- k-hop subgraph around a node -
get_project_map- High-level project structure map -
crawl_project- Full project structure analysis
Security scanning (bonus feature):
-
security_scan- Taint-based vulnerability detection -
cross_file_security_scan- Multi-file taint flow tracking -
unified_sink_detect- Find dangerous operations across languages -
scan_dependencies- CVE scanning via OSV database -
type_evaporation_scan- TypeScript type loss at API boundaries
Advanced analysis:
-
symbolic_execute- Z3-based mathematical proof of edge cases -
generate_unit_tests- Test generation from symbolic paths -
simulate_refactor- Behavior preservation verification
Policy & governance:
-
code_policy_check- Compliance checking -
verify_policy_integrity- Cryptographic policy verification -
validate_paths- Docker-aware path validation -
get_capabilities- Tier feature introspection
Real agent workflow
# Agent task: "Refactor process_payment to use async/await"
# 1. Extract with dependencies (Cheaper AI: 99% context reduction)
code = extract_code(
file="payments.py",
symbol="process_payment",
include_dependencies=True
)
# Returns: 200 tokens instead of 3,500
# 2. Get accurate caller info (Accurate AI: Graph Fact)
callers = get_call_graph(file="payments.py", function="process_payment")
# Returns: Exact list, not LLM guess
# 3. Agent generates async version
# 4. Validate syntax BEFORE write (Safer AI: Syntax-aware gatekeeper)
validation = parse(async_version, language="python")
if validation.errors:
return "Syntax error: fix and try again"
# 5. Apply edit with audit trail (Governable AI: Provenance)
update_symbol(
file="payments.py",
symbol="process_payment",
new_code=async_version,
policy_check=True # Ensures compliance rules met
)
# Logged to .code-scalpel/audit.jsonl
# 6. Update call sites (Accurate AI: Graph-based)
for caller in callers:
update_symbol(file=caller.file, symbol=caller.name, add_await=True)
All four pillars in one workflow.
Security scanning: a useful bonus
Because Code Scalpel tracks data flow through the PDG, it can also detect security vulnerabilities.
16+ vulnerability types:
- SQL/NoSQL/LDAP injection
- XSS, command injection, code injection
- Path traversal, SSRF, open redirect
- Hardcoded secrets, credential leaks
- CSRF, auth bypasses
- SSTI, prototype pollution
- Weak crypto, insecure deserialization
How it works: Taint analysis tracks untrusted data (user input, files, network) through the PDG to dangerous sinks (database queries, system calls, file writes).
# Tool: security_scan
vulns = security_scan(file="api.py")
# Returns:
[
{
"type": "SQL_INJECTION",
"severity": "HIGH",
"location": "api.py:45",
"flow": "request.args['id'] → query → execute_sql()",
"proof": "Z3 confirms exploitable path exists"
}
]
<10% false positive rate. Uses symbolic execution (Z3) to prove vulnerabilities exist, not just pattern matching.
This wasn't the original goal — the Four Pillars are the point. But it's a useful bonus for CI/CD security checks.
Getting started
Installation
UVX (recommended):
uvx codescalpel mcp
Or pip:
pip install codescalpel
Basic usage
Extract function (context reduction):
codescalpel extract payments.py process_payment
Get references (graph fact):
codescalpel references auth.py authenticate_user
Validate code (syntax check):
codescalpel validate edited_file.py
Run as MCP server:
uvx codescalpel mcp
# Now available to AI agents
Testing & quality
Precision tools need precision testing.
7,297 test cases across 4 languages
94.86% coverage (96.28% statement, 90.95% branch)
Validation:
- Syntax validation: 100% accuracy (catches all invalid ASTs)
- Graph facts: 99.8% accuracy (symbol references, call graphs)
- Context reduction: Average 97% token savings
- Security scanning: <10% false positives
Use cases
1. Claude Desktop users
Give Claude precise code tools instead of "regenerate this file." Extract functions surgically, rename symbols safely, get accurate reference counts. Result: Better edits, fewer errors, 99% less context.
2. VS Code/Cursor/Windsurf users
AI coding assistants (Copilot, Continue, Cursor's AI) gain 23 specialized tools for exact operations. Result: IDE-quality precision in AI-assisted coding.
3. Enterprise compliance teams
SOC2/ISO 27001 require audit trails for AI decisions. Code Scalpel's .code-scalpel/audit.jsonl logs every operation with provenance. Result: Compliance-ready AI coding tools.
4. Cost-conscious developers
Surgical extraction (200 tokens vs 15,000 tokens) cuts AI API costs by 40-50x. Result: $450/month → $22/month for production AI coding.
5. Teams shipping AI-assisted code
Syntax validation before write = zero broken builds from AI hallucinations. Result: Ship faster, debug less.
6. Security-conscious teams (bonus)
Data flow analysis detects 16+ vulnerability types with <10% false positives. Result: Security scanning as side benefit of precise code tools.
Roadmap
v1.4.0 (in progress):
- Enhanced TypeScript/React support
- Improved policy enforcement
- Better audit trail visualization
Planned:
- Go/Rust/C++ language support
- VS Code extension
- GitHub App (automated PR reviews)
- Real-time policy enforcement dashboard
The bottom line
AI coding assistants (Claude Desktop, VS Code, Cursor, Windsurf) need four things to work in production:
- Governable - Audit trails and policy enforcement
- Accurate - Graph facts, not LLM guesses
- Safe - Syntax validation before write
- Cheap - 99% context reduction
Code Scalpel delivers all four as an MCP server with 23 specialized tools.
Get started:
uvx codescalpel mcp
Then add to your Claude Desktop, VS Code, or Cursor MCP config (see setup instructions above).
Questions? Using Claude/Cursor for coding?
Open an issue or reach out. I'd especially love to hear from teams using AI coding assistants in production.
Repository: https://github.com/3D-Tech-Solutions/code-scalpel
License: MIT
Testing: 7,297 tests, 94.86% coverage
About the author: Building MCP tools for AI coding assistants. If you're working with Claude Desktop, VS Code AI extensions, or Cursor, let's connect.
Top comments (1)
Tools like this are exactly what’s been missing from AI coding workflows. Reducing context and preventing broken builds at the tool level is a big win for both cost control and developer sanity. Great work!