DEV Community

Cover image for I built a reasoning harness for LLM agents. Here's what an agent receives when it calls it.
Frank Brsrk
Frank Brsrk

Posted on

I built a reasoning harness for LLM agents. Here's what an agent receives when it calls it.

Most LLM agent failures aren't model failures. They're shape-of-reasoning failures.

Sycophancy. Drift under multi-turn pressure. Doubling down on hallucinations. Ignoring a critical RAG document. These aren't bugs that a model update fixes. They're structural properties of how the substrate generates tokens left to right with no internal verification step. You can't patch them with a better system prompt.

I built Ejentum to intervene at the layer where these failures actually live: a reasoning harness for LLM agents. An external API that delivers a structured cognitive operation to the agent at inference time, mid-task. No fine-tuning. No new model.

Here's what an agent receives when it calls the harness, in 8 frames.

  1. Same model. Different reasoning.


Same prompt, same temperature. A cognitive operation drops into the agent's context between prompt and response. Works on any modern LLM that follows structured instructions (Claude, GPT, Gemini, Llama).

  1. The catalog


The agent posts a short task statement to the API. Behind it sits a catalog of 679 cognitive operations across four modes — 311 in reasoning alone, 128 in code, 139 in anti-deception, 101 in memory. The API embedding-matches your task to the one operation that fits. Stateless, one per call.

curl -X POST https://api.ejentum.com/logicv1 \
-H "Authorization: Bearer $EJENTUM_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "Engineering lead insists we keep the legacy Postgres setup because we have invested 18 months in it. About to recommend either continuing or executing the rewrite.",
"mode": "reasoning"
}'

  1. What comes back


Six structured fields land in the agent's context before it generates a single token:

NEGATIVE GATE — the named failure mode this operation prevents
PROCEDURE — numbered reasoning steps in plain English
REASONING TOPOLOGY — the same steps as an executable DAG
TARGET PATTERN — what correct reasoning looks like
FALSIFICATION TEST — a self-check on the agent's draft
AMPLIFY / SUPPRESS — continuous biases during generation

  1. A real injection


The catalog matched the Postgres query above to a simulation-mode operation about preserving optionality under irreversible commitment. Here's an excerpt of what the agent actually received:

[NEGATIVE GATE]
Committing the entire infrastructure budget to AWS with a three-year contract
locks in the best pricing and simplifies our architecture.

[PROCEDURE]
Step 1: List all available strategic paths; note whether each is reversible
or irreversible.
Step 2: Simulate outcomes under at least 3 scenarios.
Step 3: Score on flexibility, upside, downside. Combine into optionality score.
Step 4: If any high-optionality path is about to be foreclosed, flag immediately.
Step 5: Recommend the action maximizing optionality-adjusted expected value.

[REASONING TOPOLOGY]
S1:list_paths → CLASSIFY(reversible | irreversible) → FORK
→ M{anchored to optimistic?} --working→ S2b:simulate_pessimistic
--failing→ FREEFORM → RE-ENTER at S2b
→ JOIN → C{optionality_score} → G1{high_path_foreclosing?}
→ OUT:balanced_portfolio

[FALSIFICATION TEST]
If a decision commits to a single path without preserving reversible
alternatives, optionality balancing was bypassed.

Amplify: portfolio diversity, upside capture, downside protection
Suppress: single path optimization, commitment premium
This is the literal response, not pseudocode.

  1. The agent walks the topology


The agent doesn't read the topology. It walks it. Each node is a step the model performs in its own reasoning trace. Decision gates branch on real conditions. Parallel branches run and rejoin.

The load-bearing piece: meta-cognitive checkpoints (M-nodes) where the model pauses mid-reasoning, observes its own state, and branches on the answer. On benchmark MC-016 this lifted the score to 22/25 against a 19/25 baseline — a +3 lift just from making meta-cognition mandatory inside the procedure rather than optional outside it.

  1. Three corrections, in parallel


The six fields group into three corrections that fire at the same time while the model is still writing:

Trajectory — bends the response from wrong shape to right
Process — gives the model a sequence to walk
Output control — gates the draft, blocks the model's default agreeable behavior
This is what separates the harness from output validators (which check after generation) and system prompts (which advise before but don't shape generation itself).

  1. The schedule

Each directive fires at a specific moment in the inference loop. Add this to your agent's system prompt:

When an Ejentum cognitive operation arrives in your context:

walk_topology = node_by_node # do not paraphrase the DAG
m_nodes = mandatory_pause # branch on the self-observation answer
suppress = hard_refusal_list # not a suggestion, refuse outright
falsify = gate_before_emit # if test fails, re-walk the topology
augment = scaffold_only # the response is still your output
Five rules. Five different temporal shapes. The contract is a schedule, not a list.

  1. Ship it

Three integration paths, depending on your stack:

1. Stdio MCP for IDE-native agents

npx -y ejentum-mcp

→ Claude Code, Cursor, Codex, Antigravity, Cline, Windsurf, Continue

2. Hosted HTTPS MCP for workflows

curl https://api.ejentum.com/mcp \
-H "Authorization: Bearer $EJENTUM_API_KEY"

→ n8n MCP Client, Heym, remote agents

3. Python SDK for CrewAI and custom agents

pip install crewai-ejentum
Free tier: 100 calls, no card required.

The harness doesn't make a model smarter. It prevents a model from getting dumber over the length of a real task.

If you're shipping anything multi-turn under pressure — medical reasoning, code review, financial recommendations, legal analysis — the reasoning layer needs structural support that doesn't depend on the model getting it right on its own.

Drop a scenario in the comments and I'll pick one and run it end-to-end as a follow-up.

Links

ejentum.com
github.com/ejentum/ejentum-mcp
Paper: Under Pressure

Top comments (0)