Prashanth Velidandi

Posted on Jun 30

From RAG to Skill Function: A New Architecture for Enterprise AI Knowledge

#ai #rag #llm #agents

Enterprise AI knowledge systems have a scaling problem.

RAG was the answer for years. Retrieve relevant chunks, feed them to the model, generate an answer. It works — until it doesn't. Retrieval misses, chunking breaks context, multi-document reasoning fails, pipelines grow complex. And as knowledge bases grow, the problems compound.

Long context models offered a partial fix. Skip retrieval entirely, load the whole document. Better understanding, simpler architecture. But you're still paying full prefill cost on every query, and a single context window can't hold an entire enterprise knowledge base.

We built something different. We're calling it Skill Function.

The Problem with RAG

RAG's core limitation isn't the retrieval algorithm — it's the fundamental architecture. The model can only reason over what gets retrieved. If the retrieval step misses something, the model never sees it. Wrong answer, not because the model can't reason, but because it never had the chance.

The specific failure modes:

Retrieval quality limits answer quality. Chunking breaks document structure, tables, cross-references, and long-range dependencies.
Multi-document reasoning is hard. Missing even one relevant chunk leads to incomplete answers.
Production pipelines are complex. Multiple retrieval stages, reranking, metadata filtering, hybrid search — each adds failure surface.
No deep document understanding. RAG retrieves passages, not comprehension. It works for lookup, fails for analysis.

Long Context Solves Some of This

Recent models with 128k–1M token context windows change the equation. Load the entire document, skip retrieval, let the model reason over everything.

The improvements are real:

No retrieval errors
Document structure preserved
Better cross-section reasoning
Simpler architecture — no vector database, no chunking pipeline

But long context introduces new problems at scale:

Cost. Every query reprocesses the entire document, even when only a small section is relevant.
Latency. Prefill time on 100k+ tokens adds up.
Scalability. Enterprise knowledge bases contain thousands of documents. No single context window holds all of it.

Long context is a meaningful improvement for individual documents. It doesn't solve enterprise-scale knowledge.

Skill Function

A Skill Function is a protected AI capability hosted as a callable endpoint in the cloud.

The architecture has two components:

Document Skill — A Document Skill specializes in a single document or related document collection. Instead of retrieving chunks, it loads the entire document into its context. Deep understanding, no retrieval, full structure preserved. Each Document Skill runs in its own isolated context — its own model, its own knowledge, its own reasoning.

Orchestrator Skill — An Orchestrator Skill organizes multiple Document Skills into a hierarchy. It maintains a summary of each sub-skill's expertise. When a query arrives, the Orchestrator determines which Document Skills are relevant, invokes them on demand, and synthesizes their results.

An Orchestrator can invoke Document Skills or other Orchestrator Skills — forming a hierarchical knowledge tree that scales to arbitrarily large knowledge bases.

How It Works

User sends a query to the root Orchestrator Skill
Orchestrator selects relevant sub-skills based on their summaries
Selected sub-skills process the query using full document context
Orchestrator aggregates results and produces the final answer

Only the skills relevant to the query execute. Everything else stays idle. Context stays clean.

Instead of forwarding the entire conversation history, each Skill Function receives only the current query and a concise summary of relevant conversation history. This eliminates context pollution and keeps each skill focused.

Skill Function vs RAG

Aspect	RAG	Skill Function
Knowledge Unit	Document chunks	Specialized Document Skills
Knowledge Access	Vector retrieval	AI skill routing
Document Understanding	Partial chunks	Complete documents
Multi-document Reasoning	Retrieve multiple chunks	Coordinate multiple skills
Scalability	Larger vector databases	Hierarchical skill tree
Context Usage	Retrieved chunks	Only relevant skills execute
Engineering	Chunking, embeddings, retrieval tuning	Skill organization and orchestration

RAG treats enterprise knowledge as a searchable database. Skill Function treats it as a network of specialized AI experts coordinated through hierarchical orchestration.

Skill Function vs Claude-style Skills

Claude-style skills (SKILL.md files) load multiple skills into a shared context window. More skills means more context competition, slower responses, and a hard ceiling on how many skills can run together.

Skill Function assigns a dedicated context per skill. Each Document Skill operates independently with its own long-context environment. Knowledge doesn't compete for a shared window.

Because contexts are isolated, skills compose recursively without exhausting a global context:

A Document Skill can be called by an Orchestrator Skill
An Orchestrator Skill can call other Orchestrator Skills
The hierarchy scales to arbitrary depth

Claude-style Skills: One global context → simplicity, but limited scaling and context competition.

Skill Function: Many isolated contexts → hierarchical composition, scalable knowledge depth, controllable execution.

What This Looks Like in Practice

Upload a PDF. We automatically convert it into skill experts — each section becomes its own Document Skill with its own model, context, and reasoning.

On our platform, you can combine those experts into an Orchestrator Skill. Skills call other skills. Your query automatically reaches the right expert.

The whole thing is exposed as an MCP server.

For example: take your company knowledge across legal, finance, HR, and product — turn each into a Document Skill, combine them into one Orchestrator, and query across your entire company knowledge base. The right expert answers every time.

No vector database. No embeddings. No retrieval step. No document size limit. 70-90% cheaper than loading everything into one context window.

Try It

We're testing this now. Try it free at inferx.net.
or reach out to me: prashanth@inferx.net

Happy to answer questions in the comments.

Top comments (2)

Alex Shev • Jun 30

The useful distinction here is that enterprise knowledge should not only be retrieved, it should become callable with boundaries. RAG answers a question; a skill-style interface can encode what the system is allowed to do with the answer, which is where governance gets easier.

Prashanth Velidandi • Jun 30

Thats exactly right and that’s a dimension we haven’t written about yet. When knowledge is encapsulated in isolated skill contexts, you can enforce boundaries at the skill level. Like what it can access, what it can return, who can invoke it. Governance becomes structural rather than a layer bolted on top. That’s a meaningful shift for regulated industries.