NodeLLM 1.16: Advanced Tool Orchestration and Multimodal Manipulation

#nodellm #node #llm #aiinfrastructure

The release of NodeLLM 1.16 marks a significant milestone in our journey to provide production-grade infrastructure for AI applications. While earlier releases focused on basic integration and safety, version 1.16 focuses on surgical control and multimodal parity.

As agentic workflows become more complex, the ability to guide model behavior with precision—and handle failures gracefully—becomes the difference between a toy and a tool.

🎨 Advanced Image Manipulation

NodeLLM 1.16 introduces high-fidelity image editing and manipulation support. This moves beyond simple text-to-image generation into the realm of In-painting, Masking, and Variations.

Surgical Image Edits

You can now pass source images and masks to the paint() method. For OpenAI providers, this automatically routes requests to the /v1/images/edits endpoint using the specialized gpt-image-1 (DALL-E 2) model, which remains the state-of-the-art for manipulation tasks.

const llm = createLLM({ provider: "openai" });

// Modify an existing logo using a mask
const response = await llm.paint("Add a futuristic robot head to the logo", {
  model: "gpt-image-1",
  images: ["logo.png"],
  mask: "logo-mask.png",
  size: "1024x1024"
});

await response.save("edited-logo.png");

Image Variations & Asset Support

Generate visual variations of a source image without a prompt, or pass base64/URL assets seamlessly. The underlying BinaryUtils handles the conversion to provider-standard multipart formats, so you don't have to worry about binary boundaries or mime-types.

🛠️ Precision Tool Orchestration

One of the most usefull features for agentic workflows is the ability to force (or prevent) tool usage at specific turns. NodeLLM 1.16 introduces the choice and calls directives.

Tool Choice

You can now mandate tool usage or force a specific tool, similar to OpenAI's tool_choice but normalized across all major providers (Anthropic, Gemini, Bedrock, and Mistral).

required: The model must call at least one tool.
"get_weather": The model must call the specific tool named get_weather.
none: Tools are disabled for this turn, even if defined.

Sequential Execution (`calls: 'one'`)

Modern models often attempt to perform multiple tool calls in parallel. While efficient, this can lead to "parallel hallucinations" where later calls depend on the output of earlier ones. Use calls: 'one' to force the model to proceed sequentially, turn-by-turn.

const chat = llm.chat("gpt-4o");

// Force a specific tool and disable parallel calls for reliability
const response = await chat.ask("What is the temperature in London?", {
  choice: "get_weather",
  calls: "one"
});

🛡️ AI Self-Correction for Tool Failures

Building on the Self-Correction middleware introduced in v1.15, version 1.16 hardens the tool execution pipeline.

If a model attempts to call a non-existent tool, NodeLLM now catches the error and returns a descriptive "unavailable tool" response along with the list of valid tools. This allows the model to instantly self-correct its proposal without throwing an application-level exception. Similarly, arguments failing Zod validation are fed back to the model as "Invalid Arguments" results, enabling agents to fix their own mistakes.

🎙️ Advanced Transcription & Diarization

Our audio support has also received a major upgrade. The Transcription interface now supports Word-level Timestamps and enhanced Diarization (speaker tracking).

Fine-grained Timestamps: Use timestamp_granularities in OpenAI/Mistral to get precise sub-second timing for every word.
ORM Parity: The Transcription class now includes .meta and .raw getters, ensuring the persistence layer captures the full provider response.