<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Bernabé Puente Moure</title>
    <description>The latest articles on DEV Community by Bernabé Puente Moure (@robopilingui).</description>
    <link>https://dev.to/robopilingui</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F179368%2F95b4b11b-586e-43cc-9d42-3f6a36b5038d.jpg</url>
      <title>DEV Community: Bernabé Puente Moure</title>
      <link>https://dev.to/robopilingui</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/robopilingui"/>
    <language>en</language>
    <item>
      <title>How I Built an Autonomous Dataset Generator with CrewAI + Ollama (72-hour run, 1,065 entries)</title>
      <dc:creator>Bernabé Puente Moure</dc:creator>
      <pubDate>Tue, 14 Apr 2026 16:39:36 +0000</pubDate>
      <link>https://dev.to/robopilingui/how-i-built-an-autonomous-dataset-generator-with-crewai-ollama-72-hour-run-1065-entries-2280</link>
      <guid>https://dev.to/robopilingui/how-i-built-an-autonomous-dataset-generator-with-crewai-ollama-72-hour-run-1065-entries-2280</guid>
      <description>&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;I needed high-quality instruction datasets for fine-tuning local LLMs, but commercial options were prohibitively expensive ($500-2,000 for decent quality datasets). &lt;/p&gt;

&lt;p&gt;So I asked myself: &lt;em&gt;What if I could build a system that generates these datasets autonomously while I sleep?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; 1,065 professional instruction/response pairs in 72 hours, 100% local, zero API costs.&lt;/p&gt;

&lt;p&gt;Here's exactly how I did it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem with Existing Datasets
&lt;/h2&gt;

&lt;p&gt;When you want to fine-tune a local LLM for specific tasks, you face three options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use generic datasets&lt;/strong&gt; → Don't match your domain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manual creation&lt;/strong&gt; → Exhausting, slow, doesn't scale&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Buy commercial datasets&lt;/strong&gt; → Expensive ($500-2,000), limited customization&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I wanted code instruction pairs with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python/AI/ML/DevOps/Database topics&lt;/li&gt;
&lt;li&gt;Chain-of-Thought reasoning (not just answers)&lt;/li&gt;
&lt;li&gt;Real-world complexity (no template responses)&lt;/li&gt;
&lt;li&gt;Continuous generation (24/7 if possible)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of the existing options delivered all four.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Solution: Multi-Agent Autonomous System
&lt;/h2&gt;

&lt;p&gt;I built a three-agent system inspired by academic research workflows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────────────┐
│   Curator        │ ← Selects topics from knowledge base
│   (Qwen 7B)      │
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│   Producer       │ ← Generates instruction + response + CoT
│   (Qwen 7B)      │
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│   Critic         │ ← Validates logic, checks hallucinations
│   (DeepSeek 1.5B)│
└────────┬─────────┘
         │
         ▼
    [Accept/Reject]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Agent Roles
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Curator Agent:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reads ChromaDB vector store of potential topics&lt;/li&gt;
&lt;li&gt;Selects next topic based on diversity + priority&lt;/li&gt;
&lt;li&gt;Prevents duplicates via similarity search&lt;/li&gt;
&lt;li&gt;Think of it as a research librarian&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Producer Agent:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generates the actual instruction/response pair&lt;/li&gt;
&lt;li&gt;Includes Chain-of-Thought reasoning&lt;/li&gt;
&lt;li&gt;Uses domain knowledge to create realistic scenarios&lt;/li&gt;
&lt;li&gt;Think of it as the expert writer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Critic Agent:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reviews Producer's output for:

&lt;ul&gt;
&lt;li&gt;Hallucinations&lt;/li&gt;
&lt;li&gt;Logical errors&lt;/li&gt;
&lt;li&gt;Incomplete reasoning&lt;/li&gt;
&lt;li&gt;Generic template responses&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Binary decision: Accept or Reject&lt;/li&gt;

&lt;li&gt;Think of it as the peer reviewer&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Core Components
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# 1. Ollama (Local LLM Engine)
# Custom Modelfiles with 8k context:
&lt;/span&gt;&lt;span class="n"&gt;qwen2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;coder&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;    &lt;span class="c1"&gt;# 4.7GB, main generator
&lt;/span&gt;&lt;span class="n"&gt;deepseek&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;r1&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;1.5&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;     &lt;span class="c1"&gt;# 1.1GB, validator
&lt;/span&gt;
&lt;span class="c1"&gt;# 2. CrewAI (Agent Orchestration)
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;crewai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Crew&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;

&lt;span class="c1"&gt;# 3. ChromaDB (Deduplication + Memory)
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;chromadb&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Flask (Real-time Dashboard)
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;flask&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;render_template&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Hardware
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AMD Ryzen AI 9 HX 370&lt;/strong&gt; (12 cores, 32GB RAM shared as VRAM)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mini PC&lt;/strong&gt; (Geekom A9 Max, ~$899)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SSD NVMe&lt;/strong&gt; (for fast model loading)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Total investment: &lt;strong&gt;$899 + ~$3.60 electricity for 72 hours&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Implementation Details
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Challenge 1: Memory Leaks in Long Runs
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Agent instances accumulate state. After ~100 cycles, performance degrades → crashes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Recreate all agents every cycle.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;crewai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Crew&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;cycle&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Recreate agents from scratch (prevents memory leaks)
&lt;/span&gt;    &lt;span class="n"&gt;curator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Topic Curator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Select next topic to generate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ollama_qwen&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;producer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content Producer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Generate high-quality instruction pair&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ollama_qwen&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;critic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Quality Critic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Validate logic and catch hallucinations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ollama_deepseek&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Define sequential workflow
&lt;/span&gt;    &lt;span class="n"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Select topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;curator&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Generate content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Validate quality&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;critic&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;crew&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Crew&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;curator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;critic&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;kickoff&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Explicit cleanup
&lt;/span&gt;    &lt;span class="k"&gt;del&lt;/span&gt; &lt;span class="n"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;curator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;producer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;critic&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Zero crashes in 72 hours. Stable RAM at 24.2 GB.&lt;/p&gt;




&lt;h3&gt;
  
  
  Challenge 2: CrewAI + Ollama Compatibility
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; CrewAI v1.12+ requires string-only responses, but Ollama returns objects by default.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Embed configuration in Modelfiles (permanent fix).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Modelfile for qwen2.5-coder-8k:7b&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; qwen2.5-coder:7b&lt;/span&gt;

PARAMETER num_ctx 8192
PARAMETER temperature 0.7
PARAMETER top_p 0.9

SYSTEM """You are an expert programmer..."""
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama create qwen2.5-coder-8k:7b &lt;span class="nt"&gt;-f&lt;/span&gt; Modelfile_qwen
ollama create deepseek-r1-8k:1.5b &lt;span class="nt"&gt;-f&lt;/span&gt; Modelfile_deepseek
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes &lt;code&gt;num_ctx: 8192&lt;/code&gt; permanent at the model level. No runtime config needed.&lt;/p&gt;




&lt;h3&gt;
  
  
  Challenge 3: Duplicate Questions
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Random topic generation creates duplicates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; ChromaDB similarity search + rejection.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;chromadb&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chromadb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;collection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;generated_questions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;is_duplicate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.85&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Check if question already exists via embedding similarity&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;query_texts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;new_question&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;n_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ids&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

    &lt;span class="n"&gt;similarity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;distances&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;similarity&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;

&lt;span class="c1"&gt;# Before saving
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;is_duplicate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;save_to_dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;ids&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;unique_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; 452 unique topics across 1,065 entries (zero duplicates).&lt;/p&gt;




&lt;h2&gt;
  
  
  Results After 72 Hours
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Metrics
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total Entries&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1,065&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Success Rate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;50.9% (605 accepted / 460 rejected)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Unique Topics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;452&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Velocity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;14.4 entries/hour&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RAM Usage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;24.2 GB (stable)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CPU Usage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;31.1% average&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Crashes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Distribution by Category
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Python:&lt;/strong&gt; 346 entries (33%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI/ML:&lt;/strong&gt; 287 entries (27%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend/API:&lt;/strong&gt; 126 entries (12%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database/SQL:&lt;/strong&gt; 99 entries (9%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DevOps/Cloud:&lt;/strong&gt; 99 entries (9%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Other:&lt;/strong&gt; 108 entries (10%)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Quality Samples
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Example 1: Python Development
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"instruction"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"How to create a Dockerfile for a Python application?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"output"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"# Step-by-step guide to create a production-ready Dockerfile:&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;1. **Choose Base Image**&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;   - Use official Python image: `python:3.11-slim`&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;   - For smaller size: `python:3.11-alpine`&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;2. **Set Working Directory**&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;   ```

dockerfile&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;   WORKDIR /app&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;

   ```&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;3. **Copy Requirements First** (leverage Docker cache)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;   ```

dockerfile&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;   COPY requirements.txt .&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;   RUN pip install --no-cache-dir -r requirements.txt&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;

   ```&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;4. **Copy Application Code**&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;   ```

dockerfile&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;   COPY . .&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;

   ```&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;5. **Set Environment Variables**&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;   ```

dockerfile&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;   ENV PYTHONUNBUFFERED=1&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;   ENV PORT=8000&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;

   ```&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;6. **Expose Port**&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;   ```

dockerfile&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;   EXPOSE $PORT&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;

   ```&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;7. **Define Entrypoint**&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;   ```

dockerfile&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;   CMD [&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;python&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;app.py&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;]&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;

   ```&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;**Complete Example:**&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;```

dockerfile&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;FROM python:3.11-slim&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;WORKDIR /app&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;COPY requirements.txt .&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;RUN pip install --no-cache-dir -r requirements.txt&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;COPY . .&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;ENV PYTHONUNBUFFERED=1&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;EXPOSE 8000&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;CMD [&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;uvicorn&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;main:app&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;--host&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;0.0.0.0&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;--port&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;8000&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;]&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;

```&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;**Best Practices:**&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Use multi-stage builds for smaller images&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Don't run as root (create user)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Use .dockerignore to exclude unnecessary files&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Pin dependency versions in requirements.txt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="nl"&gt;"thought"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"User needs containerization setup. Key considerations: 1) Layer caching optimization (copy requirements first), 2) Security (non-root user), 3) Image size (slim vs alpine tradeoff), 4) Production readiness (health checks, signals). Template should be practical and immediately usable."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Character counts:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Output: 928 chars&lt;/li&gt;
&lt;li&gt;Thought: 266 chars&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Example 2: AI/ML Engineering
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"instruction"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Explain the difference between EC2 instances and Lambda functions for ML model deployment"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"output"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"**EC2 Instances vs Lambda for ML Deployment:**&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;**EC2 Instances (Virtual Machines)**&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;*Pros:*&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Full control over environment (custom libraries, CUDA, etc.)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- No execution time limits&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Better for large models (multi-GB)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- GPU support (P3, G4 instances)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Persistent connections (WebSockets, streaming)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Predictable latency&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;*Cons:*&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Always running = always paying ($50-500+/month)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Manual scaling configuration&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Need to manage OS, security patches&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Cold start when scaling from 0&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;*Best for:*&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Real-time inference with steady traffic&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Models &amp;gt;1GB&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- GPU-intensive workloads&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Long-running predictions (&amp;gt;15 min)&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;**Lambda Functions (Serverless)**&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;*Pros:*&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Pay per invocation ($0.20 per 1M requests)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Auto-scaling (0 to thousands)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- No server management&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Built-in HA/fault tolerance&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;*Cons:*&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- 15-minute timeout&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Limited to 10GB RAM&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- No GPU support&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Cold starts (1-5 seconds)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Package size limit (250MB unzipped)&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;*Best for:*&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Batch predictions&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Small models (&amp;lt;100MB)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Sporadic/unpredictable traffic&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Budget constraints&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;**Hybrid Approach:**&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;Use SageMaker Serverless Inference:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Lambda-like pricing&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Support for larger models&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- Automatic scaling&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- GPU available&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;**Decision Tree:**&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;```

&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;Model size &amp;gt;1GB? → EC2&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;GPU required? → EC2&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;Constant traffic? → EC2&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;Budget &amp;lt;$50/mo? → Lambda&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;Unpredictable spikes? → Lambda&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;

```"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="nl"&gt;"thought"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"User comparing infrastructure options for ML deployment. Critical factors: cost structure (fixed vs pay-per-use), model size constraints, GPU requirements, latency tolerance, traffic patterns. Should provide concrete decision criteria and mention SageMaker Serverless as middle ground. Real-world cost examples help contextualize."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Character counts:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Output: 1,876 chars&lt;/li&gt;
&lt;li&gt;Thought: 261 chars&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Cost Comparison
&lt;/h2&gt;

&lt;h3&gt;
  
  
  My Approach (Local)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hardware:&lt;/strong&gt; $899 one-time (reusable forever)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Electricity:&lt;/strong&gt; $3.60 for 72 hours&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per entry:&lt;/strong&gt; $0.85 amortized (or $0.003 electricity only)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Commercial APIs (for 1,065 entries)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPT-4 API:&lt;/strong&gt; ~$850 (800 tokens avg × $0.001/token × 1,065)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude API:&lt;/strong&gt; ~$1,200 (longer responses × higher price)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenRouter DeepSeek:&lt;/strong&gt; ~$150 (cheapest API option)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Savings: 99%+ (after hardware pays for itself in 2-3 runs)&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Quality Over Quantity
&lt;/h3&gt;

&lt;p&gt;A 50.9% rejection rate seems harsh, but it's a &lt;strong&gt;feature, not a bug&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The Critic agent catches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;❌ Hallucinated facts ("Python 4.0 was released...")&lt;/li&gt;
&lt;li&gt;❌ Incomplete reasoning ('Just use library X" without explanation)&lt;/li&gt;
&lt;li&gt;❌ Generic templates ("Here are 5 steps...")&lt;/li&gt;
&lt;li&gt;❌ Logic errors (contradictory statements)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Only entries that pass rigorous validation make it through.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Every entry is production-quality.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Hardware Performs Better Than Expected
&lt;/h3&gt;

&lt;p&gt;I was skeptical about running two LLMs simultaneously on 32GB RAM. Turns out AMD's shared VRAM architecture works great:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;During generation:&lt;/strong&gt; 24.2 GB RAM (87% utilization)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CPU:&lt;/strong&gt; 31.1% average (not maxed, room for optimization)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cycle time:&lt;/strong&gt; ~6 minutes per entry (2 models sequentially)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stability:&lt;/strong&gt; Zero crashes in 72 hours&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For comparison, this would cost $40-100/month in cloud GPU time.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Automation Compounds
&lt;/h3&gt;

&lt;p&gt;The system ran while I:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Slept (8 hours × 3 nights = ~400 entries)&lt;/li&gt;
&lt;li&gt;Worked (8 hours × 3 days = ~400 entries)&lt;/li&gt;
&lt;li&gt;Coded other projects (8 hours × 3 days = ~265 entries)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Total: 1,065 entries with ~10 minutes of manual intervention (checking dashboard once/day).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That's the power of autonomous systems.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Free Datasets Released
&lt;/h2&gt;

&lt;p&gt;I just uploaded &lt;strong&gt;5 LITE versions (50% of data, 100% free)&lt;/strong&gt; to HuggingFace:&lt;/p&gt;

&lt;h3&gt;
  
  
  🐍 &lt;a href="https://huggingface.co/datasets/bernabepuente/python-instruction-dataset" rel="noopener noreferrer"&gt;Python Developer Dataset&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;173 entries covering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Docker/containerization&lt;/li&gt;
&lt;li&gt;Testing patterns&lt;/li&gt;
&lt;li&gt;API development&lt;/li&gt;
&lt;li&gt;Package management&lt;/li&gt;
&lt;li&gt;Best practices&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🤖 &lt;a href="https://huggingface.co/datasets/bernabepuente/ai-ml-instruction-dataset" rel="noopener noreferrer"&gt;AI/ML Engineering Dataset&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;143 entries covering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model deployment (EC2, Lambda, SageMaker)&lt;/li&gt;
&lt;li&gt;MLOps patterns&lt;/li&gt;
&lt;li&gt;Training optimization&lt;/li&gt;
&lt;li&gt;Cloud ML services&lt;/li&gt;
&lt;li&gt;LLM fine-tuning&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ☁️ &lt;a href="https://huggingface.co/datasets/bernabepuente/devops-cloud-instruction-dataset" rel="noopener noreferrer"&gt;DevOps &amp;amp; Cloud Dataset&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;49 entries covering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CI/CD pipelines&lt;/li&gt;
&lt;li&gt;Infrastructure as Code&lt;/li&gt;
&lt;li&gt;Kubernetes/Docker&lt;/li&gt;
&lt;li&gt;Monitoring &amp;amp; logging&lt;/li&gt;
&lt;li&gt;AWS/GCP/Azure&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🔌 &lt;a href="https://huggingface.co/datasets/bernabepuente/backend-api-instruction-dataset" rel="noopener noreferrer"&gt;Backend &amp;amp; API Dataset&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;63 entries covering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;REST/GraphQL design&lt;/li&gt;
&lt;li&gt;Authentication patterns&lt;/li&gt;
&lt;li&gt;Database optimization&lt;/li&gt;
&lt;li&gt;Caching strategies&lt;/li&gt;
&lt;li&gt;Microservices&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  💾 &lt;a href="https://huggingface.co/datasets/bernabepuente/database-sql-instruction-dataset" rel="noopener noreferrer"&gt;Database &amp;amp; SQL Dataset&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;49 entries covering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Query optimization&lt;/li&gt;
&lt;li&gt;Schema design&lt;/li&gt;
&lt;li&gt;Transactions &amp;amp; ACID&lt;/li&gt;
&lt;li&gt;NoSQL patterns&lt;/li&gt;
&lt;li&gt;Performance tuning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Format:&lt;/strong&gt; JSONL with &lt;code&gt;instruction&lt;/code&gt;, &lt;code&gt;input&lt;/code&gt;, &lt;code&gt;output&lt;/code&gt;, &lt;code&gt;thought&lt;/code&gt; fields&lt;br&gt;&lt;br&gt;
&lt;strong&gt;License:&lt;/strong&gt; MIT (use however you want)&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Ready for:&lt;/strong&gt; HuggingFace Transformers, Axolotl, LLaMA Factory&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  V2 Features (in progress)
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Multilingual datasets&lt;/strong&gt; (Spanish, French, German)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Difficulty levels&lt;/strong&gt; (Beginner/Intermediate/Advanced)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom domains&lt;/strong&gt; (on-demand generation for specific niches)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Video tutorials&lt;/strong&gt; (system walkthrough + setup guide)&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Open-Source Plans
&lt;/h3&gt;

&lt;p&gt;If there's enough interest, I'll release:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complete agent definitions&lt;/li&gt;
&lt;li&gt;Modelfile configurations&lt;/li&gt;
&lt;li&gt;ChromaDB setup scripts&lt;/li&gt;
&lt;li&gt;Dashboard code (Flask + real-time metrics)&lt;/li&gt;
&lt;li&gt;Consolidation scripts (JSONL → categorized products)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; Based on comments, I'll publish setup guide by end of week.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Quick Start
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Download a dataset:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   wget https://huggingface.co/datasets/bernabepuente/python-instruction-dataset/resolve/main/dataset.jsonl
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Load in Python:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;   &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

   &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dataset.jsonl&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
       &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

   &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Loaded &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; entries&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sample: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Fine-tune with Transformers:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;   &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Trainer&lt;/span&gt;

   &lt;span class="c1"&gt;# Load your base model
&lt;/span&gt;   &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Llama-2-7b-hf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Llama-2-7b-hf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

   &lt;span class="c1"&gt;# Prepare dataset
&lt;/span&gt;   &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;format_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Instruction: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;instruction&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Response: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

   &lt;span class="c1"&gt;# Train!
&lt;/span&gt;   &lt;span class="c1"&gt;# (full training code in dataset README)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Questions I'm Happy to Answer
&lt;/h2&gt;

&lt;p&gt;Drop a comment if you want to know about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Optimizing Ollama for long runs&lt;/li&gt;
&lt;li&gt;CrewAI agent patterns for production&lt;/li&gt;
&lt;li&gt;ChromaDB deduplication strategies&lt;/li&gt;
&lt;li&gt;Dataset quality validation&lt;/li&gt;
&lt;li&gt;Hardware recommendations for local LLM workflows&lt;/li&gt;
&lt;li&gt;Monetization experiments (Scale AI, Gumroad, HF Premium)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Building autonomous systems that work while you sleep is incredibly satisfying.&lt;/p&gt;

&lt;p&gt;This project taught me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Local LLMs can match commercial quality (with good prompts)&lt;/li&gt;
&lt;li&gt;Multi-agent validation catches errors humans would miss&lt;/li&gt;
&lt;li&gt;Hardware investment pays for itself quickly&lt;/li&gt;
&lt;li&gt;Automation compounds exponentially&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Total time invested:&lt;/strong&gt; ~20 hours (setup + monitoring)&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Total output:&lt;/strong&gt; 1,065 professional datasets&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Time per entry:&lt;/strong&gt; 1.1 minutes of human time&lt;/p&gt;

&lt;p&gt;Try the datasets and let me know what you think!&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🎁 Free Datasets: &lt;a href="https://huggingface.co/bernabepuente" rel="noopener noreferrer"&gt;https://huggingface.co/bernabepuente&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📧 Contact: (tu email o Twitter)&lt;/li&gt;
&lt;li&gt;💬 Discuss: Comments below!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tech Stack:&lt;/strong&gt;&lt;br&gt;
Ollama • CrewAI • ChromaDB • Flask • Python 3.11 • AMD Ryzen AI 9&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;License:&lt;/strong&gt; MIT (all datasets)&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>llm</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
