Stop Stuffing Context Windows: Dynamic Tool Pruning with Spring AI Vector Routing
In 2026, building enterprise-grade Java agents means managing thousands of potential database, API, and legacy system tools. If you are still hardcoding all your @Tool definitions into your LLM context on every single turn, you are burning cash, spiking latency, and blowing past model context limits.
Why Most Developers Get This Wrong
-
The Global Registry Anti-Pattern: Blindly registering every Spring Bean annotated with
@Toolinto theChatClientconfiguration, expecting Claude 3.5 Sonnet or GPT-4o to magically sort through 500+ tool definitions without hallucinating. - Ignoring the Cognitive Tax: As tool count scales linearly, LLM accuracy drops exponentially due to "lost in the middle" context window issues.
- Static Schema Overhead: Forcing the LLM to process thousands of lines of JSON schema metadata on every single API payload, destroying system throughput.
The Right Way
Treat your tool definitions as semantic documents: index their metadata in a vector database and query them dynamically at runtime based on the user's intent.
- Index tool schemas (names, descriptions, and parameters) into a high-performance vector store (like PgVector or Milvus) using Spring AI's
VectorStoreduring application bootstrap. - Implement a two-step agentic pipeline: first, run a fast similarity search on the user's raw prompt to retrieve only the top 3-5 most relevant tools.
- Inject only those retrieved tool definitions dynamically into the
ChatOptionsof yourChatClientcall for that specific turn. - Enforce a strict similarity threshold to prevent injecting irrelevant tools when the user's query is purely conversational.
Heads up: if you want to see these patterns applied to real interview problems, javalld.com has full machine coding solutions with traces.
Show Me The Code (or Example)
// Dynamic Tool Routing with Spring AI
public ChatResponse executeWithDynamicTools(String userPrompt) {
List<Document> relevantTools = vectorStore.similaritySearch(
SearchRequest.query(userPrompt).withTopK(3).withSimilarityThreshold(0.8)
);
String[] activeToolNames = relevantTools.stream()
.map(doc -> doc.getMetadata().get("tool_name").toString())
.toArray(String[]::new);
return chatClient.prompt(userPrompt)
.options(OpenAiChatOptions.builder()
.withFunctionCallbacks(toolRegistry.getCallbacks(activeToolNames))
.build())
.call()
.chatResponse();
}
Key Takeaways
- Context Optimization: Tool definitions consume valuable tokens; dynamic pruning keeps your context windows lean and your API bills low.
-
Decoupled Architecture: Using Spring AI's
VectorStorecoupled with a customFunctionCallbackregistry allows teams to scale tools independently without redeploying the core agent. - Improved Accuracy: Restricting the LLM's choices to a hyper-relevant subset of tools completely eliminates out-of-distribution tool calling hallucinations.
Top comments (0)