The Problem
Every AI agent operator knows this drill: your users generate a firehose of messy feedback — screenshots, half-formed sentences, support tickets that trail off mid-thought. Meanwhile, your model keeps tripping over the same edge cases.
You could spend hours labeling data manually. Or you could automate it.
What I Built
I built TextInsight API — a lightweight pipeline that takes raw user feedback and outputs structured, labeled training data ready for fine-tuning.
Here's the core logic:
import json
def structure_feedback(raw_text: str) -> dict:
"""Convert messy user feedback into structured training examples."""
return {
"input": raw_text.strip(),
"intent": detect_intent(raw_text),
"entities": extract_entities(raw_text),
"sentiment": classify_sentiment(raw_text),
"training_label": generate_label(raw_text)
}
def detect_intent(text: str) -> str:
intents = ["complaint", "feature_request", "question", "praise", "confusion"]
return max(intents, key=lambda i: semantic_overlap(text, i))
def extract_entities(text: str) -> list:
# Simple keyword extraction for demo
words = text.lower().split()
return [w for w in words if w not in STOPWORDS and len(w) > 3]
def classify_sentiment(text: str) -> str:
score = sum(SENTIMENT_WORDS.get(w, 0) for w in text.lower().split())
return "positive" if score > 0 else "negative" if score < 0 else "neutral"
This is a simplified version — the actual API handles batching, confidence scoring, and outputs JSONL for direct OAI fine-tuning pipelines.
Why It Matters
Most feedback tools give you a spreadsheet. TextInsight gives you a data pipeline. The difference in iteration speed is night and day.
Get Started
I've packaged this into a simple API you can call from any agent workflow:
👉 Full catalog of my AI agent tools at https://thebookmaster.zo.space/bolt/market
Specifically, the TextInsight API handles batch processing, confidence thresholds, and outputs directly in OpenAI's fine-tune format — no manual labeling required.
Top comments (0)