DEV Community

Shivansh Soni
Shivansh Soni

Posted on

I Built CodeContext: An AI-Powered Tool That Analyzes Any Codebase in Seconds

I built an open-source CLI tool that uses AI and graph algorithms to help developers understand unfamiliar codebases 10x faster. It generates interactive dependency maps, detects critical files, and creates personalized learning paths.

🔗 GitHub: github.com/sonii-shivansh/CodeContext  
Give it a star if you find it useful!


🎯 The Problem I Was Solving

We've all been there: You join a new team, clone a massive repository, and spend weeks trying to figure out where anything is. You ask senior developers the same questions everyone asks: "Where's the authentication logic?", "Which file should I start with?", "What depends on what?"

The brutal reality:

  • New developers take 1-3 months to become productive
  • Only 12% of companies do onboarding well
  • Poor onboarding costs $240,000+ per senior developer annually

Existing tools like Sourcegraph are expensive, and Backstage requires complex infrastructure. I wanted something simple, fast, and free.


💡 The Solution: CodeContext

CodeContext is a Kotlin-based CLI tool that analyzes your codebase and generates:

  1. 🗺️ Interactive Dependency Graphs - Visualize your entire codebase structure with D3.js
  2. 🔥 Knowledge Hotspots - PageRank algorithm identifies the most critical files
  3. 🎓 Learning Paths - Topologically sorted "start here" reading order
  4. 🤖 AI Insights - Optional Claude integration for code explanations
  5. 📊 Team Contribution Maps - Identify knowledge silos and bus factor risks
  6. ⏳ Temporal Analysis - Track codebase evolution over time

Quick Demo

# Install 
git clone https://github.com/sonii-shivansh/CodeContext.git
cd CodeContext
./gradlew installDist

# Analyze any Java/Kotlin project
./build/install/codecontext/bin/codecontext analyze /path/to/project

# View interactive reportopen 
output/index.html
Enter fullscreen mode Exit fullscreen mode

Output:

🚀 Starting CodeContext analysis...
📂 Scanning repository...
   Found 247 files
🧠 Parsing code...
🕸️ Building dependency graph...
🗺️ Your Codebase Map
├─ 🔥 Hot Zones (Top 5):
│ ├─ UserService.kt (0.0847)
│ ├─ DatabaseConfig.kt (0.0623)
│ └─ ApiController.kt (0.0498)
✅ Report: output/index.html
✨ Complete in 3.2s
Enter fullscreen mode Exit fullscreen mode

🏗️ Technical Deep Dive

The Architecture

CodeContext is built with:

  • Kotlin 2.1.0 - Modern, concise, type-safe
  • JGraphT - Graph algorithms (PageRank, topological sort)
  • JavaParser - AST parsing for Java
  • JGit - Git history analysis
  • D3.js - Interactive visualizations
  • Ktor - Optional REST API server
  • Claude AI - Optional AI-powered insights

How It Works

1. Parallel File Scanning

suspend fun parseFiles(files: List<File>): List<ParsedFile> = coroutineScope {
    files.chunked(100).flatMap { chunk ->
        chunk.map { file ->
            async(Dispatchers.IO) {
                cacheManager?.getCachedParse(file) ?: parser.parse(file)
            }
        }.awaitAll()
    }
}
Enter fullscreen mode Exit fullscreen mode

We use Kotlin coroutines to parse files in parallel, with intelligent caching to avoid re-parsing unchanged files.

2. Dependency Graph Construction

// Build fully-qualified class names
val classMap = parsedFiles.associate { parsed ->
    val fqcn = "${parsed.packageName}.${parsed.file.nameWithoutExtension}"
    fqcn to parsed.file.absolutePath
}

// Create edges from imports
parsedFiles.forEach { source ->
    source.imports.forEach { import ->
        classMap[import]?.let { targetPath ->
            graph.addEdge(source.file.absolutePath, targetPath)
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

3. PageRank for Hotspot Detection

val pageRank = PageRank(graph, 0.85, 100) // damping, iterations
graph.vertexSet().forEach { vertex ->
    pageRankScores[vertex] = pageRank.getVertexScore(vertex)
}
Enter fullscreen mode Exit fullscreen mode

Files with high PageRank scores are central to the codebase - change them and many files are affected.

4. Topological Sort for Learning Paths

val iterator = TopologicalOrderIterator(graph)
val path = mutableListOf<String>()
iterator.forEachRemaining { path.add(it) }
path.reverse() // Dependencies first
Enter fullscreen mode Exit fullscreen mode

Read files in dependency order: understand foundations before complex modules.

Handling Cycles

Real codebases have circular dependencies. We detect cycles and gracefully fall back:

val detector = CycleDetector(graph)
if (detector.detectCycles()) {
    // Fallback: sort by fewest dependencies
    return graph.vertexSet().sortedBy { graph.outDegreeOf(it) }
}
Enter fullscreen mode Exit fullscreen mode

Git History Optimization

Naive approach: Run git log <file> for each file → Slow (1000 files = 1000 git calls)

Our approach: Single-pass diff analysis across commits → Fast

val commits = git.log().call().take(1000).toList()
commits.forEach { commit ->
    val diffs = git.diff()
        .setOldTree(prepareTreeParser(repository, parent.tree))
        .setNewTree(prepareTreeParser(repository, commit.tree))
        .call()

    // Process all changed files in one pass
}
Enter fullscreen mode Exit fullscreen mode

🧪 Testing Strategy

We wrote 19+ comprehensive tests:

1. Property-Based Tests (Kotest)

"LearningPathGenerator should handle random dependency trees" {
    checkAll(1000, Arb.list(Arb.stringPattern("[a-z]{5}"), 5..20)) { names ->
        val graph = buildRandomGraph(names)
        val path = generator.generate(graph)
        // Verify: no crashes, all files included
    }
}
Enter fullscreen mode Exit fullscreen mode

2. Stress Tests

test("Analyze 1000 files with complex dependencies") {
    val files = generateComplexDependencies(1000)
    val graph = analyze(files)
    assert(graph.getTopHotspots(10).isNotEmpty())
}
Enter fullscreen mode Exit fullscreen mode

3. Backend Verification

We use CodeContext to analyze itself:

val files = scan(".")
val graph = build(files)
// Verify: ImprovedAnalyzeCommand → RobustDependencyGraph edge exists
assert(graph.containsEdge(analyzeCommand, dependencyGraph))
Enter fullscreen mode Exit fullscreen mode

🎨 The Report Output

The HTML report includes:

Interactive Dependency Graph

  • Zoom, pan, and explore relationships
  • Hover to see file details, authors, change frequency
  • Click to highlight dependencies

Team Contribution Map

Identifies knowledge silos:

👥 Team Contribution Map
Developer Files Modified
Alice 156
Bob 89
Charlie 12 ⚠️ Bus factor risk!
Enter fullscreen mode Exit fullscreen mode

Personalized Learning Path

🎓 Learning Path for Backend Developers

Week 1: Foundation
├─ Models.kt [Fundamental]
├─ Utils.kt [Fundamental]
└─ Task: Add a new data model

Week 2: Core Services
├─ UserService.kt [Hotspot! 0.8542]
├─ DatabaseConfig.kt [Core Logic]
└─ Task: Trace the authentication flow
Enter fullscreen mode Exit fullscreen mode

🚀 Advanced Features

AI-Powered Code Insights

With Claude API integration:

codecontext ask "Where is the authentication logic?"


💡 Based on the analysis, authentication is handled in:
   - AuthMiddleware.kt (intercepts requests)
   - UserService.kt (validates credentials)
   - TokenManager.kt (generates JWT tokens)


📁 Check these files:
   - src/auth/AuthMiddleware.kt
   - src/services/UserService.kt

🎯 Confidence: 92%
Enter fullscreen mode Exit fullscreen mode

REST API Server Mode

codecontext serve --port 8080

# POST /analyze
curl -X POST http://localhost:8080/analyze \
  -H "Content-Type: application/json" \
  -d '{"repoPath": "/path/to/repo"}'
Enter fullscreen mode Exit fullscreen mode

Perfect for CI/CD integration!

Temporal Analysis

Track codebase evolution:

codecontext evolution --months 6 --interval 30


📈 Evolution Report:
2024-06-15 | Files: 120 | Lines: 6,000
2024-07-15 | Files: 145 | Lines: 7,250
2024-08-15 | Files: 180 | Lines: 9,000
...
📊 Net Growth: 50%
Enter fullscreen mode Exit fullscreen mode

📊 Performance

Benchmarks on Spring PetClinic (247 files):

  • Scanning: 0.3s
  • Parsing: 1.2s
  • Graph building: 0.8s
  • Git analysis: 0.9s
  • Report generation: 0.2s
  • Total: 3.4s

With caching, subsequent runs: < 1s


🎯 What's Next

Short-term (v0.2.0)

  • [ ] TypeScript/JavaScript support
  • [ ] Python support
  • [ ] IntelliJ IDEA plugin
  • [ ] VS Code extension

Medium-term (v1.0.0)

  • [ ] Multi-language support (Go, Rust, C#)
  • [ ] Code complexity metrics
  • [ ] Security vulnerability detection
  • [ ] Custom report templates

Long-term

  • [ ] Hosted SaaS version (no installation required)
  • [ ] GitHub App integration
  • [ ] Real-time collaboration features
  • [ ] IDE-native experience

🤝 Contribute!

CodeContext is open-source (MIT License). We welcome contributions!

Good first issues:

  • Add support for TypeScript
  • Improve error messages
  • Create video tutorials
  • Write integration tests

How to contribute:

  1. Fork the repo
  2. Create a feature branch
  3. Add tests
  4. Submit a PR

💭 Lessons Learned

1. Start with the Problem, Not the Solution

I spent 2 weeks validating the problem before writing code. Talked to 20+ developers about their onboarding pain points.

2. Ship Fast, Iterate Faster

The first version took 4 weeks. I could've spent 6 months adding features, but shipping early got real user feedback.

3. Testing is Non-Negotiable

Property-based tests caught 3 critical bugs in graph cycle handling that I would've never found manually.

4. Documentation Sells

A great README with screenshots and examples gets more stars than perfect code without docs.

5. Open Source is a Marathon

Building the tool is 20% of the work. Marketing, docs, support, and community building is 80%.


🎉 Try It Today!

git clone https://github.com/sonii-shivansh/CodeContext.git
cd CodeContext
./gradlew installDist
./build/install/codecontext/bin/codecontext analyze .
Enter fullscreen mode Exit fullscreen mode

⭐ Star the repo if you find it useful!

🐛 Report issues: GitHub Issues

💬 Join the discussion: GitHub Discussions


📚 Additional Resources


🙏 Acknowledgments

Built with:

  • Kotlin ❤️
  • JGraphT for graph algorithms
  • JavaParser for AST parsing
  • D3.js for visualizations
  • Claude AI for code insights

What do you think? Would this solve your codebase onboarding problems?

Drop a comment below! I'd love to hear your feedback and answer any questions. 🚀


If you enjoyed this post, follow me for more deep dives into developer tools and productivity hacks!

Top comments (0)