Building a Language Learning Game Taught Me Something Unexpected About AI

#ai #gamedev #nlp #ux

When we started experimenting with AI translations, we assumed the biggest challenge would be accuracy.

We were wrong.

The harder problem was preference.

Give two AI models the same sentence, and both translations can be technically correct. Yet people almost always have a favorite.

One sounds more natural.

One feels more human.

One is the version they'd actually use.

That observation eventually led us to build Parley, a simple game where players compare two translations and choose the better one.

What happened next surprised us.

People became highly engaged with a task that looked almost trivial. They started debating word choices, discussing tone, and noticing subtle differences between translations. Some users spent far longer interacting with translation examples than they ever would reading documentation or language-learning materials.

It highlighted something interesting about AI products: evaluation can be more engaging than generation.

Most AI interfaces focus on creating content. But humans are often much better at judging quality than producing it from scratch. Asking someone to choose between two outputs requires less effort while still training their intuition.

The experiment also changed how I think about language learning.

Traditional language apps often rely on memorization and repetition. But comparing alternatives forces you to think about meaning, context, and natural expression. You're not just learning vocabulary, you're developing taste.

And in a world where AI can generate endless content, taste might become one of the most valuable skills we can build.

Have you seen similar patterns in AI products where evaluation turns out to be more engaging than creation?