DEV Community

Cover image for DeepFabric is a Game Changer: πŸš€ Build ⛓️-of-πŸ’­ Reasoning Datasets in Minutes Using Natural Prompts πŸ’¬
Sahil Kapoor
Sahil Kapoor

Posted on

DeepFabric is a Game Changer: πŸš€ Build ⛓️-of-πŸ’­ Reasoning Datasets in Minutes Using Natural Prompts πŸ’¬

Stop Spending Weeks on Dataset Creation. Start Training Better Models Today.

As developers, we've all been there. You have a brilliant idea for a Chain-of-Thought (CoT) model, but then reality hits: you need training data. Quality training data. A lot of quality training data.

The traditional path? Weeks of manual data curation, complex prompt engineering, or expensive data labeling. Most of us end up abandoning the project or settling for subpar datasets that produce mediocre models.

What if I told you there's a tool that can generate professional-grade CoT datasets in minutes using natural language prompts?

Enter DeepFabric - and it's about to change how you think about dataset creation forever.

The Problem: Dataset Creation is Broken

Before DeepFabric, creating CoT datasets meant:

  • πŸ“ Manual curation: Spending days writing examples by hand
  • πŸ”§ Complex prompt engineering: Wrestling with intricate templates
  • πŸ’Έ Expensive services: Paying premium rates for quality data
  • 🎯 Limited diversity: Struggling to create varied, non-repetitive examples
  • βš–οΈ Quality vs. quantity: Choosing between good data or enough data

Most developers either gave up or shipped models trained on insufficient data.

The Solution: DeepFabric's Triple Threat

DeepFabric doesn't just solve the dataset problem - it obliterates it with three different CoT formats that cover every use case:

1. πŸ”₯ Free-text CoT (GSM8K Style)

Perfect for mathematical reasoning and step-by-step problem solving.

deepfabric generate \
  --mode tree \
  --provider openai \
  --model gpt-4o-mini \
  --depth 2 \
  --degree 2 \
  --num-steps 4 \
  --topic-prompt "Mathematical word problems and logical reasoning" \
  --generation-system-prompt "You are a math tutor creating educational problems" \
  --conversation-type cot_freetext \
  --dataset-save-as math_reasoning.jsonl
Enter fullscreen mode Exit fullscreen mode

Output format:

{
  "question": "Sarah has 24 apples. She gives away 1/3 to her neighbors and keeps 1/4 for herself. How many apples are left?",
  "chain_of_thought": "First, I need to find 1/3 of 24 apples. 24 Γ· 3 = 8 apples given to neighbors. Next, I need to find 1/4 of 24 apples. 24 Γ· 4 = 6 apples kept for herself. Total apples used: 8 + 6 = 14 apples. Apples left: 24 - 14 = 10 apples.",
  "final_answer": "10 apples"
}
Enter fullscreen mode Exit fullscreen mode

2. πŸ—οΈ Structured CoT (Conversation Based)

Ideal for educational dialogues and systematic problem-solving.

deepfabric generate \
  --mode graph \
  --provider ollama \
  --model qwen3:32b \
  --topic-prompt "Computer science algorithms and data structures" \
  --conversation-type cot_structured \
  --reasoning-style logical \
  --dataset-save-as cs_reasoning.jsonl
Enter fullscreen mode Exit fullscreen mode

Output format:

{
  "messages": [
    {"role": "user", "content": "How would you implement a binary search algorithm?"},
    {"role": "assistant", "content": "I'll walk you through implementing binary search step by step..."}
  ],
  "reasoning_trace": [
    {"step": 1, "reasoning": "Define the search space with left and right pointers"},
    {"step": 2, "reasoning": "Calculate middle index to divide the array"},
    {"step": 3, "reasoning": "Compare target with middle element"}
  ],
  "final_answer": "Here's the complete binary search implementation..."
}
Enter fullscreen mode Exit fullscreen mode

3. πŸš€ Hybrid CoT (Best of Both Worlds)

Combines natural reasoning with structured steps - perfect for complex domains.

deepfabric generate \
  --provider gemini \
  --model gemini-2.5-flash \
  --topic-prompt "Scientific reasoning and physics problems" \
  --conversation-type cot_hybrid \
  --num-steps 8 \
  --dataset-save-as science_hybrid.jsonl
Enter fullscreen mode Exit fullscreen mode

Output format:

{
  "question": "A ball is thrown upward with initial velocity 20 m/s. When will it hit the ground?",
  "chain_of_thought": "This is a projectile motion problem. I need to use kinematic equations...",
  "reasoning_trace": [
    {"concept": "Initial conditions", "value": "vβ‚€ = 20 m/s, yβ‚€ = 0"},
    {"concept": "Kinematic equation", "value": "y = vβ‚€t - Β½gtΒ²"},
    {"concept": "Ground impact", "value": "y = 0, solve for t"}
  ],
  "final_answer": "The ball hits the ground after 4.08 seconds"
}
Enter fullscreen mode Exit fullscreen mode

Why Developers Are Going Crazy for DeepFabric

⚑ Speed That Will Blow Your Mind

# Generate 100 CoT examples in under 5 minutes
deepfabric generate config.yaml --num-steps 100 --batch-size 10
Enter fullscreen mode Exit fullscreen mode

🧠 Smart Topic Generation

DeepFabric doesn't just generate random examples. It creates a hierarchical topic tree first, ensuring your dataset covers diverse subtopics without redundancy:

Mathematical Reasoning
β”œβ”€β”€ Algebra Problems
β”‚   β”œβ”€β”€ Linear Equations
β”‚   └── Quadratic Functions
└── Geometry Problems
    β”œβ”€β”€ Area Calculations
    └── Volume Problems
Enter fullscreen mode Exit fullscreen mode

πŸ”§ YAML Configuration = Zero Complexity

No more complex prompt engineering. Just describe what you want:

# cot_config.yaml
dataset_system_prompt: "You are a helpful AI that solves problems step-by-step"

topic_tree:
  topic_prompt: "Programming challenges and algorithms"
  provider: "ollama"
  model: "qwen3:32b"
  depth: 3
  degree: 3

data_engine:
  conversation_type: "cot_hybrid"
  reasoning_style: "logical"
  instructions: "Create coding problems that require systematic thinking"

dataset:
  creation:
    num_steps: 50
    batch_size: 5
Enter fullscreen mode Exit fullscreen mode

Then run: deepfabric generate cot_config.yaml

🌐 Multi-Provider Freedom

Switch between providers based on your needs:

  • OpenAI GPT-4 for complex reasoning
  • Ollama for local, private generation
  • Gemini for fast bulk creation
  • Anthropic Claude for nuanced problems

πŸ“€ Instant HuggingFace Integration

deepfabric generate config.yaml --hf-repo username/my-cot-dataset
Enter fullscreen mode Exit fullscreen mode

Your dataset is automatically uploaded with a generated dataset card. No manual uploads, no fuss.

Real-World Impact: What Developers Are Building

πŸŽ“ Educational AI: Teachers creating personalized math tutoring datasets
πŸ€– Agent Training: Developers building reasoning agents for complex tasks
πŸ“Š Research: ML researchers generating evaluation benchmarks
πŸ’Ό Enterprise: Companies creating domain-specific reasoning models

The Numbers Don't Lie

  • ⏱️ 95% faster than manual dataset creation
  • πŸ“ˆ 10x more diverse examples per domain
  • πŸ’° 80% cost reduction compared to data labeling services
  • 🎯 Zero prompt engineering required

Ready to Transform Your ML Pipeline?

Getting started takes literally 30 seconds:

# Install
pip install deepfabric

# Generate your first CoT dataset
deepfabric generate \
  --topic-prompt "Your domain here" \
  --conversation-type cot_freetext \
  --num-steps 10 \
  --provider openai \
  --model gpt-4o-mini

# Watch the magic happen ✨
Enter fullscreen mode Exit fullscreen mode

What's Next?

The ML community is moving fast, and quality training data is the bottleneck. DeepFabric removes that bottleneck entirely.

Whether you're building the next breakthrough in reasoning AI or just need better training data for your side project, DeepFabric gives you superpowers.

Stop spending weeks on dataset creation. Start building better models today.


Try DeepFabric Now:


What kind of CoT dataset will you build first? Drop a comment and let's discuss! πŸš€


Tags: #MachineLearning #AI #Datasets #ChainOfThought #Python #OpenSource #MLOps #DataScience #DeepLearning #ArtificialIntelligence

Top comments (2)

Collapse
 
lukehinds profile image
Luke Hinds

Thanks Sahil, appreciate you covering DeepFabric! Glad you're enjoying it!

Some comments may only be visible to logged-in visitors. Sign in to view all comments.