Li-Hsuan Lung

Posted on May 4 • Originally published at blog.projectbrain.tools

Grounding the Agent: How Symbolic Rules Help LLMs Stay on Track

#machinelearning #ai #llm #agents

A College Project That Planted a Seed

Years ago I was on a university team trying to build a Go AI. We explored monte carlo simulation for lookahead search, basic neural networks for pattern recognition, and expert systems for encoding domain knowledge. None of them worked well enough on their own. Go's branching factor is enormous, so brute-force search fails quickly. Neural networks without the right training data go nowhere. And even carefully written rules eventually hit a wall against a skilled human opponent.

Then AlphaGo happened, and it was hard not to feel a little awe.

AlphaGo was not purely any one of those things. Its neural networks learned to evaluate board positions and suggest candidate moves, but a structured tree search still imposed discipline: constraining where the network could look, and how. Neither component could have done it alone. AlphaGo is probably not a textbook example of neuro-symbolic AI, but the general idea still struck me. Learned intuition, bounded by structure. I kept turning that over in my head.

This article is an attempt to explain how that early impression shaped the way we think about agents in Project Brain and why we believe that combining symbolic rules with LLM reasoning builds more reliable systems than relying on either alone.

The Problem: Small Models Make Mistakes

When you run a small language model locally — say, a 7B-parameter Qwen or Llama via Ollama — and ask it to drive an agent loop, things go wrong in predictable ways.

The model might call a tool with a missing required field. It might pass the right tool name but the wrong argument type. It might call the same tool twice with the same arguments, because repetition looked probable in its training distribution. In a worst case, it enters a quiet loop — not crashing, not reporting an error, just spinning — until you interrupt it or it runs out of context.

These are not exotic failures. They happen regularly. And they are frustrating because the model is not unintelligent — it usually understands the task. It just does not have a reliable internal sense of schema correctness, or of when it is stuck.

The natural response is to use a bigger, more capable model. That works, to a point. But bigger models cost more, run slower, and still hallucinate. More importantly, it feels like the wrong level to fix the problem. The model is failing at things that are fundamentally not about language understanding. Things like "does this argument have the right type" or "have I already done this exact step." Those are rule-shaped problems.

What if the environment around the model enforced those rules, instead of asking the model to remember them?

A Concrete Example: Lantern

Before getting into architecture, it helps to see the idea in action.

Lantern is a system health monitor that ships as an example with our agent engine, neuron. It runs on a 15-second tick and raises an alert — voiced aloud by a desktop avatar — when your machine's resources cross defined thresholds.

In neuron, a workflow is made up of roles — named steps that each do one specific job and pass their output to the next. Here is the rough shape of what happens on each tick:

Six collector roles run in parallel, each reading one signal from the machine: memory usage, CPU usage, disk usage, load average, uptime, and a basic health check.
An aggregator role waits until all six signals arrive, then bundles them into a single snapshot.
A decision role checks the snapshot against hard thresholds — memory ≥ 70%, CPU ≥ 80%, disk ≥ 85%, and so on. If everything is fine, the workflow ends quietly. If something is wrong, it passes the alert details forward.
A notifier role receives the alert details and writes a short natural-language message, as if the machine itself is speaking.
A speaker role delivers that message to a desktop avatar, which speaks it aloud.

What is interesting here is how little of this involves a language model. Steps 1, 2, 3, and 5 are entirely deterministic — they follow fixed rules, call specific tools, and route based on explicit conditions. Only step 4 uses an LLM, and its job is narrow: take a structured payload and write one or two sentences. The language model handles the language. Everything else is handled by rules.

The avatar speaks. And only one step in the entire pipeline involved probabilistic inference.

What Neuro-Symbolic AI Means (at Least How We Use the Term)

Neuro-symbolic AI is a research direction that tries to combine two complementary styles of reasoning. I am not an academic, so I will describe it in practical terms:

Neural (LLM) reasoning is probabilistic and flexible. A language model is good at understanding intent, generating natural language, and reasoning over context. It is not good at guaranteeing correctness. It will hallucinate.
Symbolic reasoning uses deterministic rules and formal constraints. A rule that says "memory usage ≥ 70% triggers an alert" will fire exactly when stated, every time, without guessing.

The key insight, at least the one we keep coming back to, is that these two styles are not competing. They are complementary. The language model handles the parts of the task that are language-shaped. The symbolic layer handles the parts that are rule-shaped. And the symbolic layer can also act as a guardrail — catching the cases where the language model steps outside its lane.

Neuron's Architecture

Neuron is an agentic execution engine written in Rust. Its contract is simple: receive a RunRequest, emit a stream of RunEvents, return a CompletionEnvelope. Inside, the architecture is built around three cooperating layers.

The Agent Loop

The heart of neuron is the Engine. It runs a loop:

loop {
    step = planner.next_step(state)       // LLM or symbolic: "what should happen next?"
    result = evaluator.evaluate(step)     // symbolic rules: "is this allowed?"
    if rejected → push rejection, replan
    if approved → execute tool or complete
}

The planner proposes. The evaluator disposes. These two are deliberately separate — neither knows about the other's implementation.

The Dual Planner: Neural and Symbolic

A NeuroPlanner talks to Claude, OpenAI, Gemini, Ollama, or any OpenAI-compatible server. It reconstructs the full conversation from the current state, sends a request to the model, and maps the response to either a tool call or a completion. The model drives the agenda.

A SymbolicPlanner runs a list of rules in order. The first rule that produces a step wins. Rules can hard-route to a specific tool, evaluate numeric thresholds, match on input payload fields, or call an MCP server. No model inference required.

Both implement the same Planner trait. The engine does not care which it gets.

In Lantern, five of the seven pipeline roles use planner_mode = "symbolic". Only the notifier uses planner_mode = "llm". The architecture tries to allocate model intelligence where it adds value, and nowhere else.

The Evaluator's Rules Engine

Every proposed step, regardless of whether a neural or symbolic planner produced it, passes through the Evaluator before execution. The evaluator runs a stack of rules:

ToolSchemaValidationRule — validates that the proposed tool call provides all required arguments with correct types. If a small model calls search_text without the required pattern field, this rule rejects the step with a tool_schema_error. The engine passes the rejection back to the planner as context for the next attempt. Most of the time, the model corrects itself on the first retry.
RedundantSuccessfulToolCallRule — detects when a model proposes calling the same tool with the same arguments when a successful result already exists in the session. Loop behaviour caught at the rule level.
AllowedToolsRule — enforces an explicit allowlist of tool names declared in the agent profile. A model cannot reach for a tool outside what the profile permits.
PlannerTokenBudgetRule — stops a run when the conversation history approaches the context window limit, preventing silent context truncation from producing nonsense.

This is the symbolic layer acting as a guardrail. It does not make the LLM smarter. It tries to make the consequences of LLM mistakes recoverable.

Synapse: Coordinating Multiple Roles

The engine handles a single agent loop. When you need multiple specialized roles to work in parallel and hand off results — like Lantern's six collectors feeding into a single aggregator — Synapse provides an event-driven runtime on top of neuron.

Synapse runs a worker pool. Each worker dequeues a topic event, looks up the subscriber role, and invokes it. When a role completes, it emits a new event to the next topic. The workflow graph defined in TOML becomes a live event routing table:

signal_memory_collector -> signal_aggregator
signal_cpu_collector -> signal_aggregator
signal_disk_collector -> signal_aggregator
signal_health_collector -> signal_aggregator
signal_load_collector -> signal_aggregator
signal_uptime_collector -> signal_aggregator
signal_aggregator -> signal_policy_gate
signal_policy_gate -> signal_notifier [when=alert]
signal_policy_gate -> complete [when=healthy]
signal_notifier -> signal_speaker
signal_speaker -> complete

For Lantern, the aggregator uses a time-windowed aggregation: it waits until it has received signals from all six collectors for the same window, then emits a single composite event downstream. The policy gate always sees a complete, consistent snapshot before it makes a decision — not a partial view.

Synapse is how we compose symbolic and neural planners into a larger workflow without any of them needing to know about the others. A symbolic collection stage feeds into a symbolic policy stage, which feeds into a neural language-generation stage, which feeds into a symbolic delivery stage. Each role does the job it is best suited for.

A Working Hypothesis

We think the pattern above points toward a broader principle for agent design, though we hold it loosely.

LLMs seem to be good at understanding intent, summarizing information, generating language, and reasoning across context. They seem less reliable at guaranteeing correctness, respecting schemas, enforcing numeric thresholds, and avoiding repetitive behaviour. Most of those failures have a deterministic counterpart that is straightforward to express as a rule.

The tentative idea is this: agent tools that work well might be those that structure the environment so the LLM only handles the parts of the task it is suited for. Symbolic rules are not a constraint on capability — they may actually amplify it, because they let you trust a small, inexpensive, locally-running model in contexts where you otherwise could not.

In practice this might mean:

Distinguishing clearly between steps that need language intelligence and steps that need deterministic correctness, and assigning each to the right planner mode.
Treating evaluation rules as first-class citizens of the agent runtime, not as prompt engineering afterthoughts.
Expressing workflows as explicit graphs with typed routing conditions, not as emergent behaviour from a single monolithic prompt.

AlphaGo didn't win by giving the neural network the whole board. Whatever it was doing, it seemed to work because each component was doing what it was good at.

We are still figuring out what that looks like for agents. This is our current best attempt.

Project Brain is our ongoing experiment in building AI-native tooling on these principles. Neuron is the execution engine. Synapse is the coordination layer. Medium is the avatar that makes it a little more personal.

DEV Community