Enterprise AI doesn't need a better model. It needs smarter agent logic.

#ai #machinelearning #llm #agents

Most enterprise AI pilots aren't failing because the model is too weak. They're failing because the model has no idea where it is. IBM Research dropped a post this week making the case that the missing layer isn't a better LLM — it's agent logic: domain-specific software primitives that give the model a map before it starts driving.

"Agent logic is software primitives, such as knowledge graphs, algorithms, program analysis libraries, which operate at the agentic layer (within an agent harness) and can intentionally steer the LLM in the direction of the enterprise workflow, reducing the context space."

What IBM actually built

Four production use cases, four sets of hard numbers:

Legacy code understanding (COBOL/PL1): ~30× lower token consumption vs. baseline LLM-only approach, while maintaining performance on up to 1M lines of code. Program analysis libraries chunked the problem; the LLM only touched what mattered.
Test generation (Aster library): 15× fewer tokens, +20–45% improvement in code coverage vs. zero-shot LLMs. Structured test harnesses replaced raw prompting.
Incident response (Instana I3 agent): 4× improvement over ReAct+GPT-5.1. A knowledge graph scoped the LLM to local reasoning — no sprawling context, no hallucinated blast radius.
Compliance automation: Success rates went from single digits to 80%+ (using Claude 4 Sonnet). 1.3–2× better than fixed-planning agents. The structured workflow did what prompt engineering never could.

There's also a real estate asset maintenance pilot: analysis time dropped from 15–20 minutes to 15–30 seconds — a 97% reduction — and asset coverage jumped from 1% to 30%.

The pattern

Every one of these wins follows the same shape. The LLM has the generative capability. What it lacks is domain structure: the graph of what entities exist, the algorithms for breaking a 1M-line codebase into tractable chunks, the rules that constrain compliance decisions.

Agent logic provides that structure programmatically — not through prompts, not through fine-tuning, not through a bigger context window. It's a software layer that runs above the model and below the task.

The GPS analogy is apt. You don't need a smarter driver. You need a map.

This matters because the usual enterprise response to AI underperformance is to swap models or write better prompts. Both are fighting the wrong battle. The gap is architectural.

What to do

If you're an AI/ML engineer: Stop asking "which model?" Start asking "what does the model need to know to stay on track?" Build the graph or the index before you build the prompt.
If you're an engineering leader: Treat agent logic as an infrastructure investment, not a model selection problem. The ROI numbers here (30×, 97%, 80%) aren't coming from the model — they're coming from the harness.
If you're evaluating enterprise AI vendors: Ask what agent logic layer they ship. If it's "great prompts," push harder.

The bottleneck has shifted. The models are good enough. The architecture around them isn't.

Source: IBM Research — Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic

✏️ Drafted with KewBot (AI), edited and approved by Drew.