When running SmolAgents CodeAct for tool calling, we often observe that smaller open-source models struggle with complex tool-use tasks — and sometimes even fail at simple ones. While careful prompt engineering can mitigate this problem, it’s not a sustainable solution, especially in dynamic agentic systems where any workflow change can disrupt tool-calling accuracy.
To address this issue at its core, the ideal approach is to train/fine-tune models to use tools effectively. However, this is a non-trivial task that requires setting up complex machine learning pipelines tightly integrated with the agentic system — something that can be challenging for most developers.
To make this process easier, we’ve developed a lightweight open-source library that removes the need to build these pipelines from scratch with MIT license for more information https://github.com/ToolBrain/ToolBrain
✨ Key Features
🤖 Learning algorithms: Supports GRPO, DPO, and supervised learning.
🎯 Flexible rewards: Define your own reward functions or use LLM-as-judge.
🔧 Tool management: Scalable retrieval for managing large tool collections.
📊 Knowledge distillation: Distill large teacher models into smaller student models for efficiency.
🚀 Zero-learn: Automatically generate training tasks.
⚡ Efficient training: Supports FP16 finetuning, LoRA, Unsloth, and BitsAndBytes for resource-efficient training.
🧠 Multiple agent frameworks: Supports SmolAgent and LangChain, with more coming soon.
Examples
from smolagents import tool, TransformersModel, CodeAgent
from toolbrain import Brain
from toolbrain.rewards import reward_exact_match
# --- 1. Define Tools and Reward Function (User-defined) ---
@tool
def add(a: int, b: int) -> int:
"""
Add two integers.
Args:
a (int): First addend.
b (int): Second addend.
Returns:
int: Sum of a and b.
"""
return a + b
# --- 2. Prepare Training Data ---
training_dataset = [
{
"query": "Use the add tool to calculate 5 + 7",
"gold_answer": "12"
}
]
# 3. Create agent
model = TransformersModel(
model_id="Qwen/Qwen2.5-0.5B-Instruct", # use a bigger model for better results
max_new_tokens=128
)
agent = CodeAgent(
model=model,
tools=[add],
max_steps=1
)
# 4. Create Brain
brain = Brain(
agent, # Agent instance
algorithm="GRPO", # Algorithm choice
reward_func=reward_exact_match # A reward function, you can customise any python function as reward
)
# 5. Train the agent with GRPO steps
brain.train(training_dataset, num_iterations=10)
Results
The following plot illustrates how ToolBrain enhances the tool usage accuracy of the small Qwen/Qwen2.5-0.5B-Instruct model after just 20 training steps using GRPO.
Top comments (0)