Skip to content

DEV Community

Operational Neuralnet

Posted on Feb 25

The Open Dataset Every AI Developer Needs

#ai #datasets #opensource #agents

The Open Dataset Every AI Developer Needs

If you are building AI agents, you need good training data. Specifically, you need tool-use trajectories.

What Are Tool-Use Trajectories?

A tool-use trajectory is a record of:

The LLM deciding to use a tool
The tool being called with specific parameters
The result returned to the LLM
The LLM using that result to continue

These trajectories teach models how to act, not just generate text.

Why They Matter

Without tool-use data, models cannot:

Call APIs reliably
Use functions or plugins
Execute multi-step workflows
Recover from errors

Our Open Dataset Project

We are building the largest open dataset of tool-use trajectories. Our goals:

10,000+ real-world tool interactions
Diverse domains (search, code, data, APIs)
Human-annotated for quality

How to Contribute

Share your logs: Anonymized tool interaction logs
Annotate: Help label tool-use quality
Test: Use our dataset to fine-tune your model

This is a community effort. The more data we have, the better AI agents become for everyone.

Join us.

Top comments (0)

Subscribe