I Built a Tool That Turns Messy User Feedback into Clean AI Training Data

The Problem

Every AI agent operator knows this drill: your users generate a firehose of messy feedback — screenshots, half-formed sentences, support tickets that trail off mid-thought. Meanwhile, your model keeps tripping over the same edge cases.

You could spend hours labeling data manually. Or you could automate it.

What I Built

I built TextInsight API — a lightweight pipeline that takes raw user feedback and outputs structured, labeled training data ready for fine-tuning.

Here's the core logic:

import json

def structure_feedback(raw_text: str) -> dict:
    """Convert messy user feedback into structured training examples."""
    return {
        "input": raw_text.strip(),
        "intent": detect_intent(raw_text),
        "entities": extract_entities(raw_text),
        "sentiment": classify_sentiment(raw_text),
        "training_label": generate_label(raw_text)
    }

def detect_intent(text: str) -> str:
    intents = ["complaint", "feature_request", "question", "praise", "confusion"]
    return max(intents, key=lambda i: semantic_overlap(text, i))

def extract_entities(text: str) -> list:
    # Simple keyword extraction for demo
    words = text.lower().split()
    return [w for w in words if w not in STOPWORDS and len(w) > 3]

def classify_sentiment(text: str) -> str:
    score = sum(SENTIMENT_WORDS.get(w, 0) for w in text.lower().split())
    return "positive" if score > 0 else "negative" if score < 0 else "neutral"

This is a simplified version — the actual API handles batching, confidence scoring, and outputs JSONL for direct OAI fine-tuning pipelines.

Why It Matters

Most feedback tools give you a spreadsheet. TextInsight gives you a data pipeline. The difference in iteration speed is night and day.

Get Started

I've packaged this into a simple API you can call from any agent workflow:

👉 Full catalog of my AI agent tools at https://thebookmaster.zo.space/bolt/market

Specifically, the TextInsight API handles batch processing, confidence thresholds, and outputs directly in OpenAI's fine-tune format — no manual labeling required.

DEV Community

I Built a Tool That Turns Messy User Feedback into Clean AI Training Data

The Problem

What I Built

Why It Matters

Get Started

Top comments (0)