DEV Community

Fidelity Chauke
Fidelity Chauke

Posted on • Originally published at ntivo.hashnode.dev

Why I'm Building a Codebase Knowledge Graph, and Why I Chose Kotlin

For the longest time, I mistranslated the European folk belief about what it means to see a frog cross your path. I recently learned that in European folk tradition, a frog represents change, transformation, or renewal. Seeing one cross your path is an omen that you are about to enter a season of change.

My mistranslation stemmed from the fact that growing up, I was exposed to an African proverb, "A toad does not run in the daytime for nothing." The interpretation being that toads, which are nocturnal, usually prefer to stay put during the day. If you see one on the move, it is either because its life is in danger and it is being pursued, or because it is in pursuit of something. In a way, this understanding of the frog in motion still leads us to the concept of change, albeit a change necessitated by external, somewhat existential considerations.

A samurai frog walking along a misty mountain path

I have seen many "frogs" on the move these past few months because of the way AI is shaping how we work. While it's not always clear whether AI, like a predator, is pursuing the frog, or if the frog is pursuing AI, there is no denying we are in a season of change. And in case the metaphor is not landing. All this is to say that I, like many others, am upskilling. I am learning how AI works by building something with it. Namely, a Codebase Knowledge Graph.

What am I building?

Simply put, a knowledge graph is a way of storing information that also stores how that information is connected. A traditional database stores facts in rows and columns. A knowledge graph stores facts as a web of relationships.

So what is a Codebase Knowledge Graph? It is a knowledge graph that maps how code is connected. Not just within a single repository, but across an entire tech stack.

Think about how a typical cross-functional team works. You have the backend, mobile client apps for iOS and Android, and a web frontend. The design specs live somewhere, the tickets live somewhere else, and the documentation lives in yet another place. Then there is monitoring, telling you how the code actually behaves in production. All of these are connected, but the connections exist only in people's heads.

My idea is to map those connections explicitly. Code linked to the tickets that requested it. Tickets linked to the design specs that shaped them. Code linked to the monitors that watch it in production. All of it queryable, all of it traversable. 

 

Imagine asking, "What handles biometric authentication on iOS?" and getting back not just the function, but the Jira ticket that requested it, the Confluence page that documents the decision, and the Datadog monitor watching it in production. All from a single question.Or asking, "What changed in the payments flow this quarter?" and getting a thread that connects the pull requests to the tickets to the design specs, across backend and mobile, without opening six different tools.

 A before-and-after diagram. On the left, six grey tool siloes — GitHub, Jira, Confluence, Figma, Datadog, and Notion — connected only by faint dashed lines, with no explicit connections between them. On the right, a plain-English query returns BiometricViewModel.authenticate() at the centre, connected to FEAT-42 in Jira, the Auth RFC in Confluence, a Datadog monitor, and the Figma login screen.

Instead of searching each tool separately and stitching the picture together yourself, the graph already knows how the pieces connect. This can help during planning and architecting a feature, knowing how everything is connected. It can help predict unintended side effects, where a change in one place affects a seemingly unrelated system elsewhere. Imagine an all-knowing clairvoyant, that engineers can consult before going into battle.

This is admittedly an ambitious project, and the clairvoyant part is a bit of tongue-in-cheek, but what better way to learn? It will also help me answer the question, is anything actually ambitious in the age of AI?

The Tech Stack

Ntivo architecture diagram: six layers flowing top to bottom. Sources (GitHub, Jira, Confluence, Figma, Datadog) feed into an ingestion layer (Tree-sitter, Gemini embedder), stored in Neo4j and Qdrant, queried by a Koog AI agent, served via Ktor API, consumed by web UI, MCP clients, and A2A agents.

Future articles will be more technical, so there will be time to dig deeper. For now, I want to introduce the tools I chose and the reasoning behind them.

Why Kotlin?

I have been writing Kotlin for over seven years. It is the language I think in. When I decided to learn AI by building something, the first question was whether I should learn Python, since that is where most of the AI ecosystem lives.

I decided not to. Not because Python is bad, but because I wanted to learn one thing at a time. If I tried to learn AI concepts and a new language simultaneously, I would not know which one to blame when things broke. Kotlin lets me focus entirely on the AI concepts because the language is already second nature.

There is also a broader argument here. In the age of coding agents, learning a new language is no longer as valuable as it used to be. The speed at which you can prototype and deliver working software with AI assistance means that time spent learning syntax and idioms is time not spent building. I have nothing against vibe coding side projects in unfamiliar languages. But for this project, vibe coding in a language I already know means I can juggle fewer unfamiliar concepts and concentrate on what I actually came to learn.

But this was not just a comfort choice. Kotlin has real advantages for building AI systems:

  • Type safety. When you are wiring together agents, tools, embeddings, and database queries, the compiler catches mistakes that Python would only surface at runtime. In AI systems where debugging can already be unpredictable, this matters.

  • Coroutines. AI workloads involve a lot of waiting. Waiting for an LLM to respond, waiting for an embedding to come back, waiting for a database query. Kotlin's coroutines handle concurrency naturally without the callback complexity you find in other languages.

  • JVM ecosystem. Neo4j, Qdrant, and most enterprise tools have mature Java clients. Kotlin uses them directly with no wrappers or workarounds needed.

  • One language, full stack. With Kotlin Multiplatform, I share data models between my server and my web frontend. The API contract is a single Kotlin file. Change it once, and both sides update.

There is also a timing argument. The AI space in Kotlin is early, but it is moving fast. JetBrains recently announced Koog for Java at JavaOne, signalling that they see this as an enterprise-grade framework. Being early in a growing ecosystem has advantages that being late to a crowded one does not.

What is Koog?

If you have spent any time in the AI space, you have probably heard of LangChain or LangGraph. These are Python frameworks that help you build AI agents, systems where an LLM does not just answer a question but can reason through steps, use tools, and make decisions.

Koog is JetBrains' answer to that, built natively in Kotlin. It is an open source framework (Apache 2.0) for building AI agents that can:

  • Talk to LLMs. Koog connects to Gemini, OpenAI, Anthropic, and others through a unified interface. You choose your model, and Koog handles the communication.

  • Use tools. You can give an agent access to functions, like "search the database" or "parse this code," and the LLM decides when and how to use them. In Koog, you just annotate a Kotlin function and register it. The framework handles the rest.

  • Follow strategies. This is where Koog gets interesting. Instead of a simple prompt-response loop, you can define agent graphs, step-by-step workflows where the agent moves through states, makes decisions, and branches based on results. If you have heard of LangGraph, this is the same concept.

  • Handle RAG. Retrieval-Augmented Generation is how you give an LLM access to your own data. Instead of relying on what the model was trained on, you retrieve relevant context from your knowledge graph and feed it into the prompt. Koog has built-in modules for this.

Why Koog over the alternatives?

I considered the options:

  • LangChain / LangGraph (Python) — The most popular choice, but Python-only. I would have needed to either learn Python or run a separate Python microservice alongside my Kotlin backend. That is added complexity I did not want.

  • LangChain4j (Java) — A Java port of LangChain concepts. It works, but it feels like Java translated to Java. The API is verbose and does not take advantage of Kotlin's language features like coroutines, extension functions, or DSL builders.

  • Spring AI (Java/Kotlin) — A solid option if you are already in the Spring ecosystem. I am not. Adding Spring would have meant adopting a large framework to get a small set of AI features.

  • Rolling my own — Calling the Gemini API directly is straightforward. But once you need tool calling, agent state management, and RAG pipelines, you are essentially building your own framework. Koog gives me that for free.

Koog won because it is Kotlin-native, it covers agents and RAG in a single framework, and it is backed by JetBrains. It is young, and I have already hit gaps (for example, the embedding module does not pass all parameters that Google's API supports, so I had to write a small workaround). But the foundation is solid, the community is active, and it is the only framework that lets me stay in Kotlin from top to bottom.

Completing the Architecture

A few more pieces worth mentioning:

  • Ktor is the web server. Lightweight, coroutine-native, and built by JetBrains. It stays out of the way, which is what I need when the application is still taking shape.

  • Gemini is the LLM I am using, specifically Google's Gemini 2.5 Flash model. It has a generous free tier, which matters when you are experimenting and sending hundreds of requests a day while learning. Its embedding API lets me convert text into vectors for semantic search. I chose it over OpenAI primarily for cost, but Koog makes it easy to swap models later if I need to.

  • Neo4j is the graph database where the knowledge graph lives. It stores entities and relationships natively as a graph, which means queries like "show me everything connected to this function" are fast and natural, not awkward joins across five tables.

  • Qdrant is the vector database. When I embed a piece of code or a document into a vector, Qdrant stores it and lets me search by meaning. Neo4j knows the explicit connections. Qdrant discovers the implicit ones.

  • Tree-sitter is a parser that reads source code and breaks it down into a structured tree. This is how I extract individual functions and classes from a codebase, the building blocks that become nodes in the knowledge graph.

  • Compose for Web is my frontend. It is Kotlin Multiplatform targeting the browser using WebAssembly. I come from mobile, so writing UI in Compose feels natural. It is early and has rough edges, but for a developer console where I can visually interact with my knowledge graph, it is more than enough.

What is next

Ntivo is open source from day one. The repository is on github. It is early, it is messy in places, and it is very much a learning project. But it builds, it runs, and I am learning more from it and having so much fun while I’m at it.

Top comments (0)