Sang

Posted on Apr 12

Building an AI Chatbot That Learns From Human Edits (Not Just Feedback)

#showdev #webdev #beginners #ai

Shifts from human judge to co-author

AI is getting smarter every day.

But somehow, it still feels… a little empty.

It can answer questions.
It can summarize books.
It can even write code better than most beginners.

And yet, when you’re having a rough day and ask something personal, the response often feels slightly off. Technically correct, emotionally distant.

So I started wondering. What if the problem isn’t the model — but how we train it?

The Problem is Intelligence isn't euqal to Empathy.

Modern AI is trained on massive datasets and refined through techniques like reinforcement learning from human feedback.

But there’s a subtle issue.

Most feedback systems are optimized for:

correctness
safety
general usefulness

Not necessarily for:

emotional nuance
relatability
this feels right

In other words we’re training AI to be right — not to be understood.

And those are very different things.

A Different Thought

Instead of asking "Is this answer correct?”

What if we asked “Does this answer feel right to people?”

And more importantly "What if people could directly rewrite AI responses — not just rate them?"

The Experiment: Letting Humans Edit AI

So I built a small experiment called Crowdians.

Here’s the idea:

You chat with your own AI character If you don’t like the response, you send it to an “Academy”.
Other users rewrite the answer. The community votes on the best version.
The system collects these refined human-approved responses

Instead of just collecting feedback we collect better answers.

Not “this is bad” — but “this is better.”

Why This Might Matter

Most AI systems learn from:

labeled datasets
ranking signals
passive feedback

But Crowdians explores something slightly different. Active co-creation of responses

It treats users not just as evaluators but as contributors.

Almost like Wikipedia, but for AI responses or GitHub, but for human empathy

The assumption is simple. Good answers aren’t just generated. They’re refined.

Why I’m Sharing This

I don’t know if this is the right approach.

It might be inefficient.
It might not scale.
It might completely fail.

But it also feels like something worth exploring.

Because right now, AI is incredibly capable — but still lacks something deeply human.

And maybe that missing piece can’t just be trained.

Maybe it has to be collaborated on.

I’d Love Your Thoughts

I’m curious what you think about this idea.

Would you trust human-edited AI responses more than model-generated ones?
Can empathy even be crowdsourced?
Is this direction promising, or fundamentally flawed?

If you want to try it out and break it (please do), here it is:
Crowdians

Any feedback, criticism, or wild ideas are more than welcome.

Top comments (5)

mote • Apr 13

The distinction you're drawing — collecting better answers instead of just rating existing ones — is actually a significant architectural shift. Most RLHF pipelines treat the human as a judge; you're treating them as a co-author. That changes what kind of data you're accumulating and what you can learn from it.

The "Wikipedia for AI responses" framing resonates. The challenge Wikipedia solved (and the one you'll hit) is: how do you prevent the community edits from regressing toward the most popular answer rather than the most correct/empathetic one? That's a different moderation problem.

One angle worth exploring from a storage perspective: if you're accumulating edited responses over time, the temporal dimension matters a lot. The same question asked in different emotional contexts deserves different answers. If your memory layer treats each response as a flat record, you lose the contextual signal. I've been thinking about this in the context of AI memory systems for embodied agents — the when and why of an interaction is often as important as the response itself.

What does your data model look like for storing the edit history? Are you keeping the chain of revisions or just the final voted version?

Sang • Apr 13

Thank you for such a profound and insightful comment! You've perfectly captured the core philosophy of what I'm trying to build—shifting the human role from a 'judge' to a 'co-author' to tackle the high-quality data scarcity in RLHF.

Regarding the "Wikipedia challenge" (popularity vs. correctness), this is exactly why Crowdians doesn't use a flat democratic voting system. Instead, we implemented a Trust-weighted voting system. A user's influence (voting weight) dynamically scales with their in-game Trust stat. To prevent the data from regressing toward merely popular or generic answers, we inject Honeypot cards (attention checks) during the Academy voting phase. Failing these significantly drops a user's Trust score, thereby limiting their future influence on the dataset.

You are absolutely spot-on about the temporal and contextual dimension. A flat prompt-response pair loses the why. To address this, our data model strictly preserves the context. When a post is created, we don't just store the final query. we store the full chat_context (the conversation history leading up to the prompt), domain_category, and relevant tags to keep the situational nuance intact.

As for your question about the data model and edit history, we preserve the spectrum of revisions in a competitive format. Here is how our pipeline works:

Users submit alternative answers.

These alternative answers are pitted against each other in our Academy module via A/B testing.

We track the match counts and win rates for each answer, multiplied by the voters' Trust weights.

Once an item hits a certain threshold (e.g., 100 matches), it is automatically migrated to our Golden Dataset.

Crucially, the Golden Dataset doesn't just save the final #1 version. It stores the original ai answer, the full chat context, and a ranked answers array that contains all participating human revisions along with their respective win rates. This means researchers using the dataset can see the full spectrum of community preference and the evolution of the answer, not just the single winner.

I find your point about AI memory systems for embodied agents incredibly fascinating. The when and why is definitely the next frontier. I'd love to hear more about your thoughts on this!

P.S. Please excuse any awkward phrasing—I am not entirely fluent in English, so I used AI translation to ensure I could properly express my thoughts and appreciation for your great feedback!

Tanagarn Ploychinda • Apr 12

I found a little bug in the chatbot, but im very interested in building a chatbot myself. would you mind if we collaborate on something that appeals to both of us? 🙂

Sang • Apr 13

Thank you so much for trying out the chatbot and catching that! Could you let me know what little bug you found? I'm actually in the middle of tracking down and fixing quite a few bugs right now, so I'd really appreciate the heads-up! I'm very curious if it's related to what I'm currently working on or something entirely new!

Regarding the collaboration, I am honestly really flattered by your offer! However, I feel like my skills are still lacking, and I wouldn't want to slow you down or be a burden to your project. I really need to focus on building up my own fundamentals right now.

I truly appreciate you reaching out and offering, though! I'd love to stay connected and cheer on your own chatbot-building journey.