I built an open-source CLI tool that uses AI and graph algorithms to help developers understand unfamiliar codebases 10x faster. It generates interactive dependency maps, detects critical files, and creates personalized learning paths.
🔗 GitHub: github.com/sonii-shivansh/CodeContext
⭐Give it a star if you find it useful!
🎯 The Problem I Was Solving
We've all been there: You join a new team, clone a massive repository, and spend weeks trying to figure out where anything is. You ask senior developers the same questions everyone asks: "Where's the authentication logic?", "Which file should I start with?", "What depends on what?"
The brutal reality:
- New developers take 1-3 months to become productive
- Only 12% of companies do onboarding well
- Poor onboarding costs $240,000+ per senior developer annually
Existing tools like Sourcegraph are expensive, and Backstage requires complex infrastructure. I wanted something simple, fast, and free.
💡 The Solution: CodeContext
CodeContext is a Kotlin-based CLI tool that analyzes your codebase and generates:
- 🗺️ Interactive Dependency Graphs - Visualize your entire codebase structure with D3.js
- 🔥 Knowledge Hotspots - PageRank algorithm identifies the most critical files
- 🎓 Learning Paths - Topologically sorted "start here" reading order
- 🤖 AI Insights - Optional Claude integration for code explanations
- 📊 Team Contribution Maps - Identify knowledge silos and bus factor risks
- ⏳ Temporal Analysis - Track codebase evolution over time
Quick Demo
# Install
git clone https://github.com/sonii-shivansh/CodeContext.git
cd CodeContext
./gradlew installDist
# Analyze any Java/Kotlin project
./build/install/codecontext/bin/codecontext analyze /path/to/project
# View interactive reportopen
output/index.html
Output:
🚀 Starting CodeContext analysis...
📂 Scanning repository...
Found 247 files
🧠 Parsing code...
🕸️ Building dependency graph...
🗺️ Your Codebase Map
├─ 🔥 Hot Zones (Top 5):
│ ├─ UserService.kt (0.0847)
│ ├─ DatabaseConfig.kt (0.0623)
│ └─ ApiController.kt (0.0498)
✅ Report: output/index.html
✨ Complete in 3.2s
🏗️ Technical Deep Dive
The Architecture
CodeContext is built with:
- Kotlin 2.1.0 - Modern, concise, type-safe
- JGraphT - Graph algorithms (PageRank, topological sort)
- JavaParser - AST parsing for Java
- JGit - Git history analysis
- D3.js - Interactive visualizations
- Ktor - Optional REST API server
- Claude AI - Optional AI-powered insights
How It Works
1. Parallel File Scanning
suspend fun parseFiles(files: List<File>): List<ParsedFile> = coroutineScope {
files.chunked(100).flatMap { chunk ->
chunk.map { file ->
async(Dispatchers.IO) {
cacheManager?.getCachedParse(file) ?: parser.parse(file)
}
}.awaitAll()
}
}
We use Kotlin coroutines to parse files in parallel, with intelligent caching to avoid re-parsing unchanged files.
2. Dependency Graph Construction
// Build fully-qualified class names
val classMap = parsedFiles.associate { parsed ->
val fqcn = "${parsed.packageName}.${parsed.file.nameWithoutExtension}"
fqcn to parsed.file.absolutePath
}
// Create edges from imports
parsedFiles.forEach { source ->
source.imports.forEach { import ->
classMap[import]?.let { targetPath ->
graph.addEdge(source.file.absolutePath, targetPath)
}
}
}
3. PageRank for Hotspot Detection
val pageRank = PageRank(graph, 0.85, 100) // damping, iterations
graph.vertexSet().forEach { vertex ->
pageRankScores[vertex] = pageRank.getVertexScore(vertex)
}
Files with high PageRank scores are central to the codebase - change them and many files are affected.
4. Topological Sort for Learning Paths
val iterator = TopologicalOrderIterator(graph)
val path = mutableListOf<String>()
iterator.forEachRemaining { path.add(it) }
path.reverse() // Dependencies first
Read files in dependency order: understand foundations before complex modules.
Handling Cycles
Real codebases have circular dependencies. We detect cycles and gracefully fall back:
val detector = CycleDetector(graph)
if (detector.detectCycles()) {
// Fallback: sort by fewest dependencies
return graph.vertexSet().sortedBy { graph.outDegreeOf(it) }
}
Git History Optimization
Naive approach: Run git log <file> for each file → Slow (1000 files = 1000 git calls)
Our approach: Single-pass diff analysis across commits → Fast
val commits = git.log().call().take(1000).toList()
commits.forEach { commit ->
val diffs = git.diff()
.setOldTree(prepareTreeParser(repository, parent.tree))
.setNewTree(prepareTreeParser(repository, commit.tree))
.call()
// Process all changed files in one pass
}
🧪 Testing Strategy
We wrote 19+ comprehensive tests:
1. Property-Based Tests (Kotest)
"LearningPathGenerator should handle random dependency trees" {
checkAll(1000, Arb.list(Arb.stringPattern("[a-z]{5}"), 5..20)) { names ->
val graph = buildRandomGraph(names)
val path = generator.generate(graph)
// Verify: no crashes, all files included
}
}
2. Stress Tests
test("Analyze 1000 files with complex dependencies") {
val files = generateComplexDependencies(1000)
val graph = analyze(files)
assert(graph.getTopHotspots(10).isNotEmpty())
}
3. Backend Verification
We use CodeContext to analyze itself:
val files = scan(".")
val graph = build(files)
// Verify: ImprovedAnalyzeCommand → RobustDependencyGraph edge exists
assert(graph.containsEdge(analyzeCommand, dependencyGraph))
🎨 The Report Output
The HTML report includes:
Interactive Dependency Graph
- Zoom, pan, and explore relationships
- Hover to see file details, authors, change frequency
- Click to highlight dependencies
Team Contribution Map
Identifies knowledge silos:
👥 Team Contribution Map
Developer Files Modified
Alice 156
Bob 89
Charlie 12 ⚠️ Bus factor risk!
Personalized Learning Path
🎓 Learning Path for Backend Developers
Week 1: Foundation
├─ Models.kt [Fundamental]
├─ Utils.kt [Fundamental]
└─ Task: Add a new data model
Week 2: Core Services
├─ UserService.kt [Hotspot! 0.8542]
├─ DatabaseConfig.kt [Core Logic]
└─ Task: Trace the authentication flow
🚀 Advanced Features
AI-Powered Code Insights
With Claude API integration:
codecontext ask "Where is the authentication logic?"
💡 Based on the analysis, authentication is handled in:
- AuthMiddleware.kt (intercepts requests)
- UserService.kt (validates credentials)
- TokenManager.kt (generates JWT tokens)
📁 Check these files:
- src/auth/AuthMiddleware.kt
- src/services/UserService.kt
🎯 Confidence: 92%
REST API Server Mode
codecontext serve --port 8080
# POST /analyze
curl -X POST http://localhost:8080/analyze \
-H "Content-Type: application/json" \
-d '{"repoPath": "/path/to/repo"}'
Perfect for CI/CD integration!
Temporal Analysis
Track codebase evolution:
codecontext evolution --months 6 --interval 30
📈 Evolution Report:
2024-06-15 | Files: 120 | Lines: 6,000
2024-07-15 | Files: 145 | Lines: 7,250
2024-08-15 | Files: 180 | Lines: 9,000
...
📊 Net Growth: 50%
📊 Performance
Benchmarks on Spring PetClinic (247 files):
- Scanning: 0.3s
- Parsing: 1.2s
- Graph building: 0.8s
- Git analysis: 0.9s
- Report generation: 0.2s
- Total: 3.4s ⚡
With caching, subsequent runs: < 1s
🎯 What's Next
Short-term (v0.2.0)
- [ ] TypeScript/JavaScript support
- [ ] Python support
- [ ] IntelliJ IDEA plugin
- [ ] VS Code extension
Medium-term (v1.0.0)
- [ ] Multi-language support (Go, Rust, C#)
- [ ] Code complexity metrics
- [ ] Security vulnerability detection
- [ ] Custom report templates
Long-term
- [ ] Hosted SaaS version (no installation required)
- [ ] GitHub App integration
- [ ] Real-time collaboration features
- [ ] IDE-native experience
🤝 Contribute!
CodeContext is open-source (MIT License). We welcome contributions!
Good first issues:
- Add support for TypeScript
- Improve error messages
- Create video tutorials
- Write integration tests
How to contribute:
- Fork the repo
- Create a feature branch
- Add tests
- Submit a PR
💭 Lessons Learned
1. Start with the Problem, Not the Solution
I spent 2 weeks validating the problem before writing code. Talked to 20+ developers about their onboarding pain points.
2. Ship Fast, Iterate Faster
The first version took 4 weeks. I could've spent 6 months adding features, but shipping early got real user feedback.
3. Testing is Non-Negotiable
Property-based tests caught 3 critical bugs in graph cycle handling that I would've never found manually.
4. Documentation Sells
A great README with screenshots and examples gets more stars than perfect code without docs.
5. Open Source is a Marathon
Building the tool is 20% of the work. Marketing, docs, support, and community building is 80%.
🎉 Try It Today!
git clone https://github.com/sonii-shivansh/CodeContext.git
cd CodeContext
./gradlew installDist
./build/install/codecontext/bin/codecontext analyze .
⭐ Star the repo if you find it useful!
🐛 Report issues: GitHub Issues
💬 Join the discussion: GitHub Discussions
📚 Additional Resources
- GitHub: github.com/sonii-shivansh/CodeContext
- Documentation: API Docs
- Demo Video: YouTube (coming soon)
- Landing Page: codecontext.dev (coming soon)
🙏 Acknowledgments
Built with:
- Kotlin ❤️
- JGraphT for graph algorithms
- JavaParser for AST parsing
- D3.js for visualizations
- Claude AI for code insights
What do you think? Would this solve your codebase onboarding problems?
Drop a comment below! I'd love to hear your feedback and answer any questions. 🚀
If you enjoyed this post, follow me for more deep dives into developer tools and productivity hacks!
Top comments (0)