DEV Community

Machine coding Master
Machine coding Master

Posted on

Stop Stuffing Context Windows: Dynamic Tool Pruning with Spring AI Vector Routing

Stop Stuffing Context Windows: Dynamic Tool Pruning with Spring AI Vector Routing

In 2026, building enterprise-grade Java agents means managing thousands of potential database, API, and legacy system tools. If you are still hardcoding all your @Tool definitions into your LLM context on every single turn, you are burning cash, spiking latency, and blowing past model context limits.

Why Most Developers Get This Wrong

  • The Global Registry Anti-Pattern: Blindly registering every Spring Bean annotated with @Tool into the ChatClient configuration, expecting Claude 3.5 Sonnet or GPT-4o to magically sort through 500+ tool definitions without hallucinating.
  • Ignoring the Cognitive Tax: As tool count scales linearly, LLM accuracy drops exponentially due to "lost in the middle" context window issues.
  • Static Schema Overhead: Forcing the LLM to process thousands of lines of JSON schema metadata on every single API payload, destroying system throughput.

The Right Way

Treat your tool definitions as semantic documents: index their metadata in a vector database and query them dynamically at runtime based on the user's intent.

  • Index tool schemas (names, descriptions, and parameters) into a high-performance vector store (like PgVector or Milvus) using Spring AI's VectorStore during application bootstrap.
  • Implement a two-step agentic pipeline: first, run a fast similarity search on the user's raw prompt to retrieve only the top 3-5 most relevant tools.
  • Inject only those retrieved tool definitions dynamically into the ChatOptions of your ChatClient call for that specific turn.
  • Enforce a strict similarity threshold to prevent injecting irrelevant tools when the user's query is purely conversational.

Heads up: if you want to see these patterns applied to real interview problems, javalld.com has full machine coding solutions with traces.

Show Me The Code (or Example)

// Dynamic Tool Routing with Spring AI
public ChatResponse executeWithDynamicTools(String userPrompt) {
    List<Document> relevantTools = vectorStore.similaritySearch(
        SearchRequest.query(userPrompt).withTopK(3).withSimilarityThreshold(0.8)
    );

    String[] activeToolNames = relevantTools.stream()
        .map(doc -> doc.getMetadata().get("tool_name").toString())
        .toArray(String[]::new);

    return chatClient.prompt(userPrompt)
        .options(OpenAiChatOptions.builder()
            .withFunctionCallbacks(toolRegistry.getCallbacks(activeToolNames))
            .build())
        .call()
        .chatResponse();
}
Enter fullscreen mode Exit fullscreen mode

Key Takeaways

  • Context Optimization: Tool definitions consume valuable tokens; dynamic pruning keeps your context windows lean and your API bills low.
  • Decoupled Architecture: Using Spring AI's VectorStore coupled with a custom FunctionCallback registry allows teams to scale tools independently without redeploying the core agent.
  • Improved Accuracy: Restricting the LLM's choices to a hyper-relevant subset of tools completely eliminates out-of-distribution tool calling hallucinations.

Top comments (0)