One of the biggest hurdles in training local LLMs is data quality. If your training set is 90% AI boilerplate, your fine-tune will be 90% useless.
We just released the synthetic_generator skill for Skillware. Itβs a modular tool that:
- Orchestrates combinatorial personas to hit edge cases.
- Validates data diversity using a zero-dependency entropy score.
- Plugs directly into your Python scripts to build massive datasets automatically.
Run it locally with Ollama or scale with Gemini.
pip install skillware
Read the Skill Card: synthetic_generator.md
Top comments (0)