Shivam Singh

Posted on May 10

Markarai Agentic AI Code Intelligence Platform: The AI That Understands Your Entire Codebase

#ai #webdev #productivity #markarai

🧠 Building an AI That Understands Your Entire Codebase (Technical Deep-Dive)

When you have a 100K+ line codebase, understanding it is hard.

Not just reading the code - but really understanding it. Knowing which functions
call which. Knowing what happens when you change something. Knowing the invisible
dependencies that will break your code.

We built Markar to solve this.

The Problem We Solved

Traditional code analysis tools are stateless. They look at a file in isolation.
They find syntax errors, style violations, basic security issues. But they don't
understand your code's structure, its relationships, or how it all fits together.

Result: You push code, tests pass locally, and then something breaks in production
because you didn't realize function A calls function B which calls function C across
4 different services.

Our Approach: Knowledge Graphs + AI Agents

Instead of static analysis, we built a living knowledge graph of your codebase.

Here's how it works:

Code Parsing: We parse your entire repo using advanced AST analysis.
Every function, class, method, import becomes a node.
Dependency Mapping: We build edges between nodes. Function A calls Function B.
Class X extends Class Y. Service A uses Library B. Everything gets connected.
Real-time Updates: When code changes, the graph updates. No staleness.
Always accurate.
Agent Layer: On top of this graph, we run autonomous AI agents that reason
about your code at scale.

What Agents Can Do

Once you have a knowledge graph, you can build agents that actually understand context:

Impact Analysis Agent:

User asks: "What happens if I change the payment function?"

Agent:

Finds the payment function in the graph
Traverses all outgoing edges (what does it call?)
Traverses all incoming edges (what calls it?)
Builds a dependency tree 3-4 levels deep
Identifies all affected files, functions, services
Estimates risk level based on criticality
Recommends tests to run

Code Review Agent:

PR comes in with 20 files changed.

Agent:

Reads the changes in context of the knowledge graph
Checks: "Is this function called 47 times? Should we be careful?"
Checks: "This change touches 3 services. Are they tested?"
Checks: "This violates the pattern we see in other parts of code"
Checks: "This could cause a race condition in this scenario"
Provides specific, context-aware feedback

QA Agent:

Function gets added to codebase.

Agent:

Understands what the function does
Generates test cases based on:
Input/output types
Edge cases
Patterns from similar functions in codebase
Known failure modes in your architecture
Runs the tests
Reports coverage

Why This Matters

Most code analysis tools give you a list of issues. Markar gives you understanding.

The difference is like:

Tool: "Line 47 has a potential null pointer"
Markar: "Line 47 could crash because this function is called by 3 other critical services in production when X happens under load"

Context changes everything.

Real Numbers

We tested Markar on several open-source codebases:

repo (their own codebase): 1098 files, 196K lines → 9219 nodes,
0 circular dependencies detected, 40 high-risk files identified
Average impact analysis: 50ms query time for dependency traversal
(vs 5+ seconds for manual code review)
Test generation: 25 meaningful tests generated per new function
(vs 5-10 manually written)

The Architecture

Code Repository ↓ AST Parser (Python, JS, Go, Rust) ↓ Knowledge Graph (Neo4j) ↓ Agent Layer (LLM + Graph Reasoning) ↓ Insights (Impact, Tests, Reviews, Security)

What's Next

We're currently working on:

Multi-language support expansion
Self-learning agents that improve over time
Architecture recommendation engine
Automated refactoring suggestions
Technical debt quantification

Open Questions We're Solving

How do you make AI understand code context without hallucinating?
→ Answer: Graph-grounded reasoning. The AI only knows what's in the graph.
How do you scale this to 1M+ line codebases?
→ Answer: Incremental updates, smart caching, distributed graph queries.
How do you make this actually useful vs theoretical?
→ Answer: Focus on practical problems - impact analysis, test generation, reviews.

If you're building developer tools, or just interested in AI + code understanding,
I'd love to hear your thoughts.

What's the hardest part about working with large codebases in your team?

DEV Community

Markarai Agentic AI Code Intelligence Platform: The AI That Understands Your Entire Codebase

The Problem We Solved

Our Approach: Knowledge Graphs + AI Agents

What Agents Can Do

Why This Matters

Real Numbers

The Architecture

What's Next

Open Questions We're Solving

Top comments (0)