sh1zen

Posted on Jun 24

REQL: a relational entities query language context engine for coding agents

#agents #ai #llm #tooling

A few weeks ago, I pushed REQL to GitHub after working on it for quite some time.

I started building it around a recurring problem I kept encountering with coding agents: before changing code, an agent needs to understand the repository, but most repositories are much larger than the context available to the model.

The usual alternatives are not ideal:

search for a few keywords and hope the relevant files are nearby;
recursively scan large parts of the repository;
place entire files into the prompt;
maintain a separate index that quickly becomes stale;
ask an LLM to repeatedly rediscover the same project structure.

REQL is my attempt to provide a more structured layer between a repository and the tools operating on it.

Before going into the implementation, I want to clarify an important point:

REQL is not another graph database, graph framework, or graph visualization product.

It uses a property graph internally because source code is naturally relational, but the graph itself is not the final product.

The actual purpose of REQL is to provide an embeddable repository context pipeline: one coherent path from project scanning and compilation to incremental updates, structured queries, graph traversal, and compact context generation for coding agents.

I hope it can be useful to other people working on coding assistants, repository analysis, automated refactoring, or developer tooling.

The project is available here:

GitHub: https://github.com/sh1zen/reql

The problem is not simply finding text

Traditional search is very good at answering questions such as:

Where does the string PaymentService appear?

But that is not always the question a coding agent actually needs to answer.

Before making a change, the agent may need to know:

where PaymentService is defined;
which module owns it;
which methods call it;
which interfaces it implements;
which tests exercise it;
which configuration values affect it;
whether it is mentioned in documentation;
whether another component serializes its output;
which files are likely to be affected by a modification.

Those relationships are often spread across imports, calls, inheritance, declarations, documentation, tests, configuration, and package structure.

A lexical search can provide entry points, but it does not provide the connected working set needed for a safe change.

At the other extreme, giving an agent a complete repository dump is expensive and noisy. The relevant information becomes harder to identify, context is wasted, and unrelated implementation details compete for attention.

REQL tries to occupy the space between those two approaches:

Start with deterministic matches, follow only relevant repository relationships, and return a bounded set of source-grounded records.

What REQL is

REQL is a local, storage-agnostic repository context engine for coding agents and developer tools.

It scans a project, classifies its artifacts, parses supported code and documentation, and compiles them into a property graph.

That graph can contain records representing:

projects, directories, files, and source artifacts;
modules, packages, classes, interfaces, functions, and methods;
imports and dependencies;
calls and references;
tests, endpoints, schemas, and configuration;
comments, docstrings, and document fragments;
static-analysis findings;
compilation runs and incremental deltas.

Relationships such as CONTAINS, DEFINES, IMPORTS, CALLS, REFERENCES, INHERITS, READS, WRITES, and RETURNS connect those records while preserving source provenance.

The graph is then used by the retrieval layer to identify a small, connected subset of the repository for a specific task.

What REQL is not

REQL is not intended to compete with Neo4j or other general-purpose graph databases.

It does not require a separate graph server, and it is not primarily a tool for browsing or visualizing a graph.

The bundled backend persists the graph locally. Internally, storage is accessed through a GraphStore interface, so the local block store is the default implementation rather than an architectural constraint.

The distinction matters because I do not expect developers to redesign their systems around REQL.

The intended integration is closer to this:

repository
    |
    v
REQL compilation and retrieval layer
    |
    +----> coding agent
    |
    +----> developer tool
    |
    +----> automated analysis
    |
    +----> repository-aware workflow

The graph is an internal representation that supports the pipeline. The externally useful result is the focused context, structured query response, dependency slice, source location, or incremental project state returned to the caller.

The primary public API is exposed through MemoryGraph, while the storage backend can be replaced by implementing the GraphStore port.

One pipeline from repository compilation to context generation

The basic architecture can be summarized as follows:

Project directory
    |
    v
Recursive scan and file classification
    |
    v
Fingerprint and dirty-artifact detection
    |
    +----------------------+
    |                      |
    v                      v
Tree-sitter code       Document parsing
analysis               and segmentation
    |                      |
    +----------+-----------+
               |
               v
    Local property graph
    with source provenance
               |
               v
       Query tokenization
               |
               v
    Lexical seed discovery
               |
               v
     Bounded graph expansion
               |
               v
      Graph-aware ranking
               |
               v
Compact context, dependency slice,
source records, or query result

There are two sides to this pipeline.

The first is repository compilation. REQL converts code and supported documents into stable graph records.

The second is retrieval. REQL starts from deterministic lexical seed nodes, expands through a bounded portion of the graph, ranks the candidates, and returns only a limited working set.

The retrieval stage is deliberately bounded. It is not supposed to serialize the complete graph into an agent prompt.

Code analysis with Tree-sitter

REQL uses Tree-sitter-based parsing for its supported programming languages.

It recognizes more than 30 language families and source-file types, including Python, JavaScript, TypeScript, Go, Rust, Java, C, C++, C#, Kotlin, Ruby, PHP, Swift, Lua, Scala, SQL, Terraform, Bash, PowerShell, Elixir, Julia, Zig, and others.

Python, JavaScript, and TypeScript currently receive the deepest extraction.

Depending on the language and adapter, the compiler can identify:

modules and packages; classes and interfaces; functions and methods; useful variable declarations; imports and dependencies; call targets; inheritance; decorators and type information; comments and docstrings; tests, endpoints, schemas, and configuration records.

The compiler generates stable source fragments and technical relationships with file and line provenance. Recompiling an unchanged artifact should not create duplicate symbol records.

This is static analysis rather than runtime tracing, so unresolved or dynamically generated relationships remain an expected limitation. The intention is not to claim perfect program comprehension, but to construct a useful and reproducible repository model from locally available evidence.

Why use a graph internally?

Consider a request such as:

Change the serialization format returned by the payment service.

A keyword search may find the service implementation, but the relevant change surface could also include:

an interface implemented by the service;
a controller that calls it;
a response schema;
a serializer;
tests asserting the old format;
documentation showing an example response;
an import or re-export exposing the type publicly.

Those records may not all contain the same words as the original request.

A graph representation makes it possible to begin with a lexical match and then follow explicit repository relationships:

PaymentController
    |
    | CALLS
    v
PaymentService.create_payment()
    |
    | RETURNS
    v
PaymentResult
    |
    | REFERENCES
    v
PaymentResponseSchema
    |
    | USED_BY
    v
PaymentSerializer

The graph is useful because it preserves connectivity.

However, expanding every reachable relationship would recreate the context-size problem. REQL therefore limits traversal depth, result count, context items, and other output dimensions.

The goal is not “retrieve the graph.”

The goal is:

Retrieve the smallest useful connected subgraph for the current task.

Deterministic retrieval before model reasoning

REQL does not require an LLM call in its core indexing or retrieval path.

The default retrieval flow uses:

query tokenization;
deterministic lexical seed discovery;
bounded graph expansion;
graph-aware ranking;
filtering of generic or weakly related records;
compact context composition.

This has several practical properties.

The same project state and query can produce reproducible results. Indexing can run locally without sending source code to an external model. Retrieval can also be inspected and debugged through the nodes, edges, sources, and ranking metadata returned by the engine.

LLMs can still be connected at the edges of the system. REQL is intended to provide structured evidence to a model, not to replace the model itself.

An agent can use REQL to determine what to inspect and then use normal file-reading, editing, testing, and debugging tools for the exact implementation work.

Installation

REQL currently requires Python 3.10 or newer.

Clone the repository and install it in editable mode:

git clone https://github.com/sh1zen/reql.git
cd Reql

python -m pip install -e .

The installed CLI command is reql.

From the repository you want to analyze:

cd /path/to/your/project

reql project compile .

By default, the project graph is stored locally under:

.reql/memory.reql

The filesystem monitor uses the optional watchdog dependency.

Asking for compact repository context

Once the project has been compiled, you can request a compact context block:

reql query_context \
--query "how is payment authorization handled?"

query_context is intended to produce agent-ready output rather than a raw graph dump.

Its structured output can include:

likely owner symbols;
a focused working set;
relevant source snippets;
file and line references;
targeted file reads;
impact information;
related tests;
cleanup candidates when using cleanup mode.

JSON output is also available:

reql query_context \ --query "how is payment authorization handled?" \ --json

You can restrict the result to a specific scope:

reql query_context \ --query "payment authorization" \ --code reql query_context \ --query "payment authorization" \ --docs reql query_context \ --query "payment authorization" \ --test

For dead-code or unused-symbol investigations:

reql query_context \ --query "legacy payment adapter" \ --cleanup

The cleanup output is based on static-analysis candidates, so it should be treated as evidence for review rather than as an instruction to delete code automatically.

Exploring dependencies around a concept

query_explore returns more specific dependency-oriented views:

reql query_explore \ --query "payment service serialization" \ --view owners \ --view callers \ --view serialization_paths \ --json

Available views include:

owners; callers; public_surface; serialization_paths; docs_mentions; code.

This can be useful before a refactor because the caller may request only the relationship categories relevant to the task.

For example, an agent preparing to change a public type might begin with:

reql query_explore \ --query "PaymentResult" \ --view owners \ --view callers \ --view public_surface \ --view code \ --json

It can then perform targeted reads of the returned files rather than beginning with a repository-wide search.

Inspecting the retrieved graph

When more structural detail is needed, query_graph exposes the relevant nodes, edges, sources, and expansion metadata:

reql query_graph \ --query "payment retry behavior" \ --max-depth 2

For a smaller ranked list of relevant source text:

reql query_memories \ --query "payment retry behavior" \ --limit 8 \ --json

Despite the name, query_memories does not mean that REQL is extracting conversational memories from an LLM.

In project compilation mode, these records come from repository sources and supported documents. The command provides a unified compact retrieval path for clients that do not require detailed graph diagnostics.

Querying the repository with REQL

The project also includes its own query language.

It can be used to inspect graph records directly from the CLI:
reql query \ 'SYMBOLS TYPE Function WHERE name CONTAINS "compile" LIMIT 20'

Find static-analysis candidates:

reql query \ 'FINDINGS WHERE finding_type = "unused_variable" LIMIT 20'

Inspect call and reference relationships:

reql query \ 'FIND edges WHERE type IN ["CALLS","REFERENCES"] RETURN from_id,to_id,type LIMIT 50'

Inspect recent incremental updates:

reql query 'DELTAS LIMIT 10'

Check compilation cache state:

reql query 'CACHE STATUS'

The query language supports filtering, ordering, limits, boolean composition, comparisons, membership, regular expressions, text operators, ranges, and null checks.

The intention is to make the compiled repository inspectable without requiring users to write code against the storage implementation.

Incremental compilation

A repository index is only useful while it reflects the current repository.

For this reason, REQL does not treat project compilation as a one-time import.

During compilation it calculates artifact fingerprints and stores cache metadata under:

.reql/artifact-cache.json

On subsequent runs, it compares the current project state with that cache and determines which artifacts are:

unchanged;
modified;
new;
deleted.

Only changed and deleted artifacts need corresponding graph deltas.

A manual refresh can be performed with:

reql project update .

The project state can be inspected with:

reql project status .

The compilation process records CompilationRun and GraphDelta entities, making the update history available to queries and reports.

If the disk cache is missing but compiled SourceArtifact records remain available, REQL can recover cache information from the graph rather than necessarily rebuilding every artifact from zero.

Watch mode

During active agent work, REQL can monitor the repository and incrementally process changed files:

reql project compile . --watch

Watch mode uses filesystem events and delegates updates to the same incremental compiler used by manual compilation.

That is important because watch mode is not a separate indexing implementation with different behavior. It is another trigger for the same compilation path.

A possible workflow becomes:

Compile the project
Retrieve context for the task
Read the targeted sources
Modify the code
Incrementally refresh changed artifacts
Query the updated repository state
Run tests and continue

This allows the repository context layer to follow the source tree while an agent or developer is working.

Integrating REQL with coding agents

REQL can install repository-context instructions for several coding-agent environments.

For example:

reql install --project

You can also select specific integrations:

reql install codex --project
reql install claude --project

The installer currently supports profiles for tools such as Claude Code, Codex, OpenCode, Kilo Code, Cursor, Gemini CLI, GitHub Copilot CLI, VS Code Copilot Chat, and generic AGENTS.md-compatible clients.

The generated instructions encourage agents to ask REQL for repository context before performing broad recursive discovery. Targeted file reads are still expected for exact edits, tests, and debugging.

MCP support

REQL also provides an optional MCP server.

After installation, it can be started with:

reql-mcp

For a read-only tool surface:

reql-mcp --read-only

A minimal Codex configuration looks like this:

[mcp_servers.reql] command = "reql-mcp" args = ["--read-only"]

A minimal Claude Desktop configuration looks like this:

{ "mcpServers": { "reql": { "command": "reql-mcp", "args": ["--read-only"] } } }

The MCP server exposes bounded operations for:

compact context retrieval;
graph retrieval;
dependency exploration;
ranked source-memory retrieval;
direct REQL queries;
project status;
project compilation and watch updates;
graph hub analysis.

Read-only tools are intentionally limited and do not return the complete graph.

The MCP layer is optional. The core compiler, storage, query language, retrieval system, and Python API do not depend on it.

Using the Python API

The main Python entry point is MemoryGraph.

A minimal example:

from reql import MemoryGraph


graph = MemoryGraph.open(".reql/memory.reql")

try:
    graph.compile_project(".")

    context = graph.query_context(
        "How does incremental compilation handle deleted files?"
    )

    print(context)
finally:
    graph.close()

You can also access lower-level or more structured operations:

from reql import MemoryGraph


graph = MemoryGraph.open(".reql/memory.reql")

try:
    graph.compile_project(".")

    graph_result = graph.query_graph(
        "incremental compilation",
        max_depth=2,
    )

    exploration = graph.query_explore(
        "incremental compilation",
        views=["owners", "callers", "code"],
    )

    memories = graph.query_memories(
        "incremental compilation",
        limit=8,
    )

    rows = graph.query(
        'SYMBOLS TYPE Function '
        'WHERE name CONTAINS "compile" '
        'LIMIT 20'
    )
finally:
    graph.close()

The public facade also exposes project updates, reports, cache inspection, graph deltas, community detection, and hub analysis.

Storage is replaceable

The bundled block-backed graph store is meant to make the default setup local and self-contained.

It is not intended to force applications to use one persistence implementation.

A different storage backend can be supplied by implementing the GraphStore interface:

`from reql import MemoryGraph

store = MyGraphStore(...)
graph = MemoryGraph(store)`

This separation keeps repository compilation and retrieval independent from a particular graph database.

That is another reason I avoid presenting REQL as a graph-database alternative. A database is one implementation detail behind the context engine, not the central abstraction exposed to users.

Where I think REQL can be useful

Some of the workflows I had in mind while building it include:

Preparing a refactor

Before renaming or moving a symbol, retrieve its owners, callers, imports, public surface, tests, and documentation references.

Investigating an unfamiliar repository

Start with a bounded context block and a set of targeted source reads instead of recursively exploring the complete tree.

Supporting coding agents

Give the agent a repeatable repository context operation that is separate from its own prompt history.

Reviewing change impact

Inspect graph relationships around a class, function, endpoint, schema, or configuration value before applying a modification.

Connecting documentation and code

Compile supported documents alongside code and link explicit documentation mentions to symbols where possible.

Finding cleanup candidates

Use deterministic static analysis to identify possible unused symbols, then confirm them through source inspection and tests.

Building repository-aware tools

Use the Python API, CLI, query language, or MCP surface as an integration layer instead of implementing a new indexer for each tool.

Current limitations

REQL is still an alpha project.

There are several areas where it can improve.

Language support is not equally deep across all parsers. Python, JavaScript, and TypeScript currently have richer extraction than many of the other supported languages.

Static analysis cannot perfectly resolve dynamic calls, generated code, runtime dependency injection, reflection, or framework behavior that is not visible in the syntax tree.

Document-to-code linking is intentionally conservative and currently focuses on explicit mentions rather than speculative semantic associations.

Deterministic lexical retrieval also has trade-offs. It is inspectable and reproducible, but it will not infer every conceptual relationship that an embedding model or an LLM might recognize.

Finally, bounded context is a ranking problem. A result can be compact and still omit something important. Real repositories are needed to understand where those misses happen.

I would rather be explicit about those limits than present REQL as a complete solution to repository understanding.

Why I am publishing it now

I worked on REQL for a while before putting it on GitHub.

There is still a considerable amount I would like to improve, but keeping it private until every design question was resolved did not seem useful.

Repository context engines need to be tested against codebases with different:

languages;
architectures;
monorepo structures;
generated files;
framework conventions;
dependency patterns;
documentation styles;
project sizes.

That feedback cannot come only from the repository used to build the tool itself.

So I decided to publish the current implementation and ask people to try it.

Feedback and contributions are welcome

I hope REQL can be useful to developers working on coding agents, code intelligence, repository analysis, automated refactoring, or related tooling.

I would especially appreciate feedback about:

queries that return irrelevant context;
cases where an important owner or caller is missing;
languages that need deeper extraction;
repository structures that compile incorrectly;
integrations that are difficult to configure;
results that are too large or too limited;
performance on real projects;
parts of the architecture that should be simplified.

Concrete counterexamples are particularly valuable. A repository pattern that breaks retrieval is often more useful than a general suggestion that retrieval should be improved.

Contributions are also very welcome.

That includes bug reports, documentation improvements, language adapters, tests, retrieval experiments, integrations, pull requests, and commits proposed through PRs.

GitHub: https://github.com/sh1zen/reql

I hope some of you find it useful, and I would be interested to hear how it behaves outside the projects on which it was developed.