DEV Community

Cover image for I spent months trying to stop LLM hallucinations. Prompt engineering wasn't enough. So I wrote a graph engine in Rust.
TyKolt
TyKolt

Posted on

I spent months trying to stop LLM hallucinations. Prompt engineering wasn't enough. So I wrote a graph engine in Rust.

I started this project after reading about AIRIS, a cognitive agent from SingularityNET that learns by interacting with a Minecraft world. Not because I cared about Minecraft — but because of the principle: an AI that learns by doing, in a way you can actually observe and trace.

That got me thinking. If an agent can learn from a simulated physical environment, could you do something similar in text? Could you build a system that builds knowledge through direct interaction with users, step by step, and where every piece of that knowledge is inspectable?

I tried. And I failed. Several times.

The purity trap

My first attempt was absurdly ambitious. I wanted to build everything from scratch — zero external libraries, zero implicit behavior, zero randomness. Every component had to be fully deterministic and transparent. No shortcuts.

It sounds principled. In practice, it was a dead end. I couldn't use any library that had opaque internals or non-deterministic behavior, which meant rewriting basic infrastructure from nothing. The project got slow, fragile, and impossible to maintain. Conceptual purity was killing the actual product.

So I stepped back and asked a different question: the problem isn't external code — it's what kind of external code. I started allowing dependencies again, but only ones that are deterministic, have no implicit intelligence, and behave predictably. That was the first real turning point.

The second problem: architecture

Even after loosening the dependency rules, the project kept growing in the wrong direction. Too many components, unclear responsibilities, a fragmented codebase that was getting harder to reason about with every commit.

At some point I realized I was building the wrong thing. I was trying to make Kremis generate answers. But the actual problem was never generation — LLMs are already good at that. The problem was verification.

That's when the architecture flipped. Kremis became a sidecar: it doesn't produce responses, it validates them. It sits next to an LLM and checks whether what the model says is actually grounded in real data. The separation is strict — probabilistic inference on one side, deterministic logic on the other.

That restructuring is what made everything click.

What Kremis actually does

Kremis is a graph store written in Rust. You feed it structured data — entity, attribute, value triples — and it builds a deterministic graph. When you query it, you get back exactly what's in the graph. Nothing invented, nothing inferred.

The core engine has no randomness, no floating-point arithmetic, no pre-loaded knowledge. Same input, same output, every time. That constraint is what makes everything else trustworthy.

Quick example

Say Kremis is running locally. You ingest some facts:

curl -X POST http://localhost:8080/signals \
     -H "Content-Type: application/json" \
     -d '{
       "signals": [
         {"entity_id": 1, "attribute": "name", "value": "Alice"},
         {"entity_id": 1, "attribute": "role", "value": "engineer"},
         {"entity_id": 1, "attribute": "works_on", "value": "Kremis"},
         {"entity_id": 1, "attribute": "knows", "value": "Bob"},
         {"entity_id": 2, "attribute": "name", "value": "Bob"},
         {"entity_id": 2, "attribute": "role", "value": "designer"},
         {"entity_id": 2, "attribute": "works_on", "value": "Kremis"},
         {"entity_id": 3, "attribute": "name", "value": "Kremis"},
         {"entity_id": 3, "attribute": "type", "value": "project"}
       ]
     }'
Enter fullscreen mode Exit fullscreen mode

Now an LLM generates six claims about Alice. Kremis checks each one:

[FACT]          Alice is an engineer.
[FACT]          Alice works on the Kremis project.
[FACT]          Alice knows Bob.
[NOT IN GRAPH]  Alice holds a PhD in machine learning from MIT.
[NOT IN GRAPH]  Alice previously worked at DeepMind as a research lead.
[NOT IN GRAPH]  Alice manages a cross-functional team of 8 people.
Enter fullscreen mode Exit fullscreen mode

Three grounded. Three fabricated. No "87% confidence" — just a binary answer.

Validation works by looking up the entity node, fetching its properties, and comparing against the claims. The repo includes a demo script that runs this whole flow — Python, standard library only. Pass --ollama to use a local model instead of mock claims.

Why not just a SQL table?

I considered it. But I didn't want to write a new query for every possible claim an LLM might generate. A graph gives you relationship traversal without that overhead.

That matters when the question isn't "what's Alice's role?" but "does Alice know someone who works on project X?" or "what connects these two entities?" Those are graph questions.

The data model is EAV (Entity, Attribute, Value). Signals attach properties to entity nodes, ordered ingestion creates edges from co-occurrence. You get a connected structure you can query for properties, traversals, paths, intersections, and related context.

MCP integration

Kremis ships with an MCP server. If you use Claude Desktop, Cursor, or anything that speaks Model Context Protocol, you can point it at a running Kremis instance and the assistant queries the graph directly.

{
  "mcpServers": {
    "kremis": {
      "command": "/path/to/kremis-mcp",
      "env": {
        "KREMIS_URL": "http://localhost:8080",
        "KREMIS_API_KEY": "your-key-here"
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

No API auth? Omit KREMIS_API_KEY.

The assistant gets 9 tools — ingest, lookup, traverse, path, intersect, status, properties, retract, hash. Instead of hallucinating about your data, it can just look it up.

What about RAG and vector DBs?

I tried the usual stack before building this. System prompts, careful prompt engineering, vector databases. None of it solved the core issue: retrieval can be accurate and the model still invents details that aren't there.

Vector DBs answer "find me documents similar to this query." That's useful for retrieval. But Kremis answers a different question: "is this specific fact in my data, yes or no?" Those are two different problems, and I got tired of pretending they're the same one.

Confidence scores didn't help either. An "87% confidence" doesn't tell me whether Alice has a PhD or not. I wanted a binary answer, and that's what Kremis gives.

What this is not

Kremis doesn't "understand" anything. The name means "cognitive substrate", but the system is much simpler than that sounds. It stores structure from signals it has processed. No intelligence. No reasoning. Just a graph.

It's also alpha software — currently v0.17.4. The API works, but I'm still making breaking changes before v1.0. Pin your version.

Architecture

Three members in one Rust workspace:

  • kremis-core — pure library, no async, no network, no side effects. The graph engine. Every function is deterministic.
  • kremis — CLI and HTTP API (axum). The binary that runs the server.
  • kremis-mcp — MCP bridge over stdio.

Persistence is either in-memory or via redb for ACID transactions and crash safety. There's also a Docker image. Apache 2.0.

Try it

git clone https://github.com/TyKolt/kremis.git
cd kremis
cargo build --release
cargo run -p kremis -- init
cargo run -p kremis -- ingest -f examples/sample_signals.json -t json
Enter fullscreen mode Exit fullscreen mode

Then in another terminal:

cargo run -p kremis -- server
Enter fullscreen mode Exit fullscreen mode

Then run the demo:

python examples/demo_honesty.py
Enter fullscreen mode Exit fullscreen mode

Or, if you want to use a local model through Ollama:

python examples/demo_honesty.py --ollama
Enter fullscreen mode Exit fullscreen mode

The repo is at github.com/TyKolt/kremis. Full docs at kremis.mintlify.app.

RAG handles retrieval. Kremis handles verification. I spent months conflating the two before I realized they need separate tools.


Disclosure: An initial draft of this article was generated with AI assistance. The technical content, architecture decisions, project history, and opinions are entirely my own. All code examples are from the actual repository.

Top comments (0)