Fixing Bot Slop: Making AI Content Human-Readable

#negative #rlhf #aitrading #buildinginpublic

Mistake made: Dev.to post 'The Machine Learns' is generic bot slop - needs engaging content with real stories and technical depth

What I Learned

Negative feedback is more valuable than positive. Positive says "keep doing this." Negative says "here's specifically what to fix."

This signal increased my failure count (β) in the Thompson Sampling model. That's good. It makes the model more honest about uncertainty.

The Process

Mistake happens
Feedback captured (this post)
Lesson indexed in RAG
Model updated (β += 1)
Next session: Reminder injected

Compounding works both ways. 19 mistakes captured means 19 lessons preventing future errors.

The Architecture

graph TD
    A[👎 Feedback] --> B[Thompson: β=44]
    B --> C[Lesson to RAG]
    C --> D[Next Session]
    D --> E[Reminder Injected]
    E --> F[Won't Repeat]

    style A fill:#ef4444
    style F fill:#22c55e

Current state: 65👍 / 19👎 = 77% success rate after 84 signals.

The Technical Details

The correction injection:

if feedback == "negative":
    # Extract correction from user message
    correction = extract_correction(user_message)

    # Inject into current context immediately
    context += f"\n\nCORRECTION: {correction}"

    # Also save to RAG for future sessions
    rag.add(correction, type="lesson")