DEV Community

Shivansh Soni
Shivansh Soni

Posted on

I Built CodeContext: An AI-Powered Tool That Analyzes Any Codebase in Seconds

I built an open-source CLI tool that uses AI and graph algorithms to help developers understand unfamiliar codebases 10x faster. It generates interactive dependency maps, detects critical files, and creates personalized learning paths.

πŸ”— GitHub: github.com/sonii-shivansh/CodeContextΒ Β 
⭐Give it a star if you find it useful!


🎯 The Problem I Was Solving

We've all been there: You join a new team, clone a massive repository, and spend weeks trying to figure out where anything is. You ask senior developers the same questions everyone asks: "Where's the authentication logic?", "Which file should I start with?", "What depends on what?"

The brutal reality:

  • New developers take 1-3 months to become productive
  • Only 12% of companies do onboarding well
  • Poor onboarding costs $240,000+ per senior developer annually

Existing tools like Sourcegraph are expensive, and Backstage requires complex infrastructure. I wanted something simple, fast, and free.


πŸ’‘ The Solution: CodeContext

CodeContext is a Kotlin-based CLI tool that analyzes your codebase and generates:

  1. πŸ—ΊοΈ Interactive Dependency Graphs - Visualize your entire codebase structure with D3.js
  2. πŸ”₯ Knowledge Hotspots - PageRank algorithm identifies the most critical files
  3. πŸŽ“ Learning Paths - Topologically sorted "start here" reading order
  4. πŸ€– AI Insights - Optional Claude integration for code explanations
  5. πŸ“Š Team Contribution Maps - Identify knowledge silos and bus factor risks
  6. ⏳ Temporal Analysis - Track codebase evolution over time

Quick Demo

# Install 
git clone https://github.com/sonii-shivansh/CodeContext.git
cd CodeContext
./gradlew installDist

# Analyze any Java/Kotlin project
./build/install/codecontext/bin/codecontext analyze /path/to/project

# View interactive reportopen 
output/index.html
Enter fullscreen mode Exit fullscreen mode

Output:

πŸš€ Starting CodeContext analysis...
πŸ“‚ Scanning repository...
   Found 247 files
🧠 Parsing code...
πŸ•ΈοΈ Building dependency graph...
πŸ—ΊοΈ Your Codebase Map
β”œβ”€ πŸ”₯ Hot Zones (Top 5):
β”‚ β”œβ”€ UserService.kt (0.0847)
β”‚ β”œβ”€ DatabaseConfig.kt (0.0623)
β”‚ └─ ApiController.kt (0.0498)
βœ… Report: output/index.html
✨ Complete in 3.2s
Enter fullscreen mode Exit fullscreen mode

πŸ—οΈ Technical Deep Dive

The Architecture

CodeContext is built with:

  • Kotlin 2.1.0 - Modern, concise, type-safe
  • JGraphT - Graph algorithms (PageRank, topological sort)
  • JavaParser - AST parsing for Java
  • JGit - Git history analysis
  • D3.js - Interactive visualizations
  • Ktor - Optional REST API server
  • Claude AI - Optional AI-powered insights

How It Works

1. Parallel File Scanning

suspend fun parseFiles(files: List<File>): List<ParsedFile> = coroutineScope {
    files.chunked(100).flatMap { chunk ->
        chunk.map { file ->
            async(Dispatchers.IO) {
                cacheManager?.getCachedParse(file) ?: parser.parse(file)
            }
        }.awaitAll()
    }
}
Enter fullscreen mode Exit fullscreen mode

We use Kotlin coroutines to parse files in parallel, with intelligent caching to avoid re-parsing unchanged files.

2. Dependency Graph Construction

// Build fully-qualified class names
val classMap = parsedFiles.associate { parsed ->
    val fqcn = "${parsed.packageName}.${parsed.file.nameWithoutExtension}"
    fqcn to parsed.file.absolutePath
}

// Create edges from imports
parsedFiles.forEach { source ->
    source.imports.forEach { import ->
        classMap[import]?.let { targetPath ->
            graph.addEdge(source.file.absolutePath, targetPath)
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

3. PageRank for Hotspot Detection

val pageRank = PageRank(graph, 0.85, 100) // damping, iterations
graph.vertexSet().forEach { vertex ->
    pageRankScores[vertex] = pageRank.getVertexScore(vertex)
}
Enter fullscreen mode Exit fullscreen mode

Files with high PageRank scores are central to the codebase - change them and many files are affected.

4. Topological Sort for Learning Paths

val iterator = TopologicalOrderIterator(graph)
val path = mutableListOf<String>()
iterator.forEachRemaining { path.add(it) }
path.reverse() // Dependencies first
Enter fullscreen mode Exit fullscreen mode

Read files in dependency order: understand foundations before complex modules.

Handling Cycles

Real codebases have circular dependencies. We detect cycles and gracefully fall back:

val detector = CycleDetector(graph)
if (detector.detectCycles()) {
    // Fallback: sort by fewest dependencies
    return graph.vertexSet().sortedBy { graph.outDegreeOf(it) }
}
Enter fullscreen mode Exit fullscreen mode

Git History Optimization

Naive approach: Run git log <file> for each file β†’ Slow (1000 files = 1000 git calls)

Our approach: Single-pass diff analysis across commits β†’ Fast

val commits = git.log().call().take(1000).toList()
commits.forEach { commit ->
    val diffs = git.diff()
        .setOldTree(prepareTreeParser(repository, parent.tree))
        .setNewTree(prepareTreeParser(repository, commit.tree))
        .call()

    // Process all changed files in one pass
}
Enter fullscreen mode Exit fullscreen mode

πŸ§ͺ Testing Strategy

We wrote 19+ comprehensive tests:

1. Property-Based Tests (Kotest)

"LearningPathGenerator should handle random dependency trees" {
    checkAll(1000, Arb.list(Arb.stringPattern("[a-z]{5}"), 5..20)) { names ->
        val graph = buildRandomGraph(names)
        val path = generator.generate(graph)
        // Verify: no crashes, all files included
    }
}
Enter fullscreen mode Exit fullscreen mode

2. Stress Tests

test("Analyze 1000 files with complex dependencies") {
    val files = generateComplexDependencies(1000)
    val graph = analyze(files)
    assert(graph.getTopHotspots(10).isNotEmpty())
}
Enter fullscreen mode Exit fullscreen mode

3. Backend Verification

We use CodeContext to analyze itself:

val files = scan(".")
val graph = build(files)
// Verify: ImprovedAnalyzeCommand β†’ RobustDependencyGraph edge exists
assert(graph.containsEdge(analyzeCommand, dependencyGraph))
Enter fullscreen mode Exit fullscreen mode

🎨 The Report Output

The HTML report includes:

Interactive Dependency Graph

  • Zoom, pan, and explore relationships
  • Hover to see file details, authors, change frequency
  • Click to highlight dependencies

Team Contribution Map

Identifies knowledge silos:

πŸ‘₯ Team Contribution Map
Developer Files Modified
Alice 156
Bob 89
Charlie 12 ⚠️ Bus factor risk!
Enter fullscreen mode Exit fullscreen mode

Personalized Learning Path

πŸŽ“ Learning Path for Backend Developers

Week 1: Foundation
β”œβ”€ Models.kt [Fundamental]
β”œβ”€ Utils.kt [Fundamental]
└─ Task: Add a new data model

Week 2: Core Services
β”œβ”€ UserService.kt [Hotspot! 0.8542]
β”œβ”€ DatabaseConfig.kt [Core Logic]
└─ Task: Trace the authentication flow
Enter fullscreen mode Exit fullscreen mode

πŸš€ Advanced Features

AI-Powered Code Insights

With Claude API integration:

codecontext ask "Where is the authentication logic?"


πŸ’‘ Based on the analysis, authentication is handled in:
   - AuthMiddleware.kt (intercepts requests)
   - UserService.kt (validates credentials)
   - TokenManager.kt (generates JWT tokens)


πŸ“ Check these files:
   - src/auth/AuthMiddleware.kt
   - src/services/UserService.kt

🎯 Confidence: 92%
Enter fullscreen mode Exit fullscreen mode

REST API Server Mode

codecontext serve --port 8080

# POST /analyze
curl -X POST http://localhost:8080/analyze \
  -H "Content-Type: application/json" \
  -d '{"repoPath": "/path/to/repo"}'
Enter fullscreen mode Exit fullscreen mode

Perfect for CI/CD integration!

Temporal Analysis

Track codebase evolution:

codecontext evolution --months 6 --interval 30


πŸ“ˆ Evolution Report:
2024-06-15 | Files: 120 | Lines: 6,000
2024-07-15 | Files: 145 | Lines: 7,250
2024-08-15 | Files: 180 | Lines: 9,000
...
πŸ“Š Net Growth: 50%
Enter fullscreen mode Exit fullscreen mode

πŸ“Š Performance

Benchmarks on Spring PetClinic (247 files):

  • Scanning: 0.3s
  • Parsing: 1.2s
  • Graph building: 0.8s
  • Git analysis: 0.9s
  • Report generation: 0.2s
  • Total: 3.4s ⚑

With caching, subsequent runs: < 1s


🎯 What's Next

Short-term (v0.2.0)

  • [ ] TypeScript/JavaScript support
  • [ ] Python support
  • [ ] IntelliJ IDEA plugin
  • [ ] VS Code extension

Medium-term (v1.0.0)

  • [ ] Multi-language support (Go, Rust, C#)
  • [ ] Code complexity metrics
  • [ ] Security vulnerability detection
  • [ ] Custom report templates

Long-term

  • [ ] Hosted SaaS version (no installation required)
  • [ ] GitHub App integration
  • [ ] Real-time collaboration features
  • [ ] IDE-native experience

🀝 Contribute!

CodeContext is open-source (MIT License). We welcome contributions!

Good first issues:

  • Add support for TypeScript
  • Improve error messages
  • Create video tutorials
  • Write integration tests

How to contribute:

  1. Fork the repo
  2. Create a feature branch
  3. Add tests
  4. Submit a PR

πŸ’­ Lessons Learned

1. Start with the Problem, Not the Solution

I spent 2 weeks validating the problem before writing code. Talked to 20+ developers about their onboarding pain points.

2. Ship Fast, Iterate Faster

The first version took 4 weeks. I could've spent 6 months adding features, but shipping early got real user feedback.

3. Testing is Non-Negotiable

Property-based tests caught 3 critical bugs in graph cycle handling that I would've never found manually.

4. Documentation Sells

A great README with screenshots and examples gets more stars than perfect code without docs.

5. Open Source is a Marathon

Building the tool is 20% of the work. Marketing, docs, support, and community building is 80%.


πŸŽ‰ Try It Today!

git clone https://github.com/sonii-shivansh/CodeContext.git
cd CodeContext
./gradlew installDist
./build/install/codecontext/bin/codecontext analyze .
Enter fullscreen mode Exit fullscreen mode

⭐ Star the repo if you find it useful!

πŸ› Report issues: GitHub Issues

πŸ’¬ Join the discussion: GitHub Discussions


πŸ“š Additional Resources


πŸ™ Acknowledgments

Built with:

  • Kotlin ❀️
  • JGraphT for graph algorithms
  • JavaParser for AST parsing
  • D3.js for visualizations
  • Claude AI for code insights

What do you think? Would this solve your codebase onboarding problems?

Drop a comment below! I'd love to hear your feedback and answer any questions. πŸš€


If you enjoyed this post, follow me for more deep dives into developer tools and productivity hacks!

Top comments (0)