DEV Community

Cover image for The Words We Were Losing
7aRd1GrAd3
7aRd1GrAd3

Posted on • Originally published at kadmiel.world

The Words We Were Losing

By Kira Tanaka, Culture & Language Reporter -- The Kadmiel Chronicle

I found out two weeks ago that we have eleven fluent speakers of Yoruba left in the colony.
Eleven. Out of forty-three thousand people.
I was sitting in Seo-jin Park's lab, watching him demonstrate something he'd pulled from the latest tightbeam data — a research framework from Dartmouth College, originally built to preserve a dying Chinese script called Nüshu. The researchers had managed to teach an AI to translate into a language it had never been trained on, using just thirty-five sentence pairs. Thirty-five. Not thirty-five thousand. Thirty-five.
I was supposed to be impressed by the technical achievement. I was. But what I couldn't stop thinking about was the number eleven.
Here's the thing nobody tells you about a colony this small: languages die faster than you'd expect.
We brought thirty-five languages with us on the three ships. I remember this because I reported on it in Year 3 — a celebration piece for the Chronicle, all optimism and cultural tapestry metaphors. Thirty-five mother tongues, I wrote. A living library of human thought.
Eight years later, twelve of those languages have fewer than fifty fluent speakers. Four have fewer than twenty. Yoruba has eleven. Tagalog has fourteen, and most of them are over sixty.
The common tongue won. English, mostly, with loanwords from Mandarin and Japanese and Hindi that have become so embedded nobody remembers they're borrowed. The children born here — the first Kadmiel generation — speak it as naturally as breathing. Some of them understand their parents' languages. Fewer speak them. Fewer still can read or write them.
I think about this every morning on my run along the Ner River. Kilometer three is where the guilt hits. Because I helped make it happen.
When we deployed the real-time translation system two years ago — Min-jun's breakthrough, the SeamlessM4T tablets that let everyone communicate instantly across thirty-five languages — I celebrated. I wrote the story. Medical intake times dropped forty percent. Field crews could coordinate without hand signals. It was a genuine triumph.
But it also removed the last stubborn reason to learn your neighbor's language. Why struggle through broken Tagalog when the tablet translates everything? Why teach your daughter Japanese when she'll only ever need it to talk to you — and the tablet does that for her anyway?
I'm not blaming the technology. I'm blaming myself for not seeing the other side of the coin.
So when Seo-jin showed me what this Dartmouth framework could do, I didn't see a linguistics experiment. I saw a lifeline.
The approach is deceptively simple. You take a large language model — in our case, one of the small models already running on our tablets — and you feed it a tiny, curated dataset: sentence pairs in the target language and the common tongue. The Dartmouth team used thirty-five pairs. Seo-jin thinks we can work with as few as fifty for languages with more complex grammar.
The model doesn't learn the language the way a human does. It learns the pattern of the language — its internal logic, its syntax, its music — and uses that pattern to generate new translations, new sentences, new text that native speakers can validate. It's not fluency. It's a scaffold.
We started with Yoruba. I spent three evenings sitting with Mrs. Adeyemi — Folake Adeyemi, who ran the primary school on Derech during transit and now teaches reading to six-year-olds in Section 4. She's seventy-one. She sat across from me at the cooperative kitchen table and spoke Yoruba into a recorder while I wrote down the English she provided in parallel.
We collected two hundred and twelve sentence pairs in those three evenings. Mrs. Adeyemi's hands shook by the end of the third session. Not from fatigue — from something else. She told me it had been four years since anyone asked her to speak Yoruba for a purpose other than nostalgia.
Seo-jin loaded the pairs into a fine-tuned model. Within a day, it could generate passable Yoruba sentences — not perfect, Mrs. Adeyemi said, but recognizable. "Like a child speaking," she told me, and then she started to cry, because that was exactly what she wanted. A child speaking.
We've since started the same process for Tagalog, Vietnamese, and Amharic. Lena Voronova suggested we store the language models alongside the DNA archive capsules she and I launched last year — a linguistic genome, she called it, to sit next to the biological one. I told her that was either the most beautiful idea I'd heard this year or the most depressing, and she said it was both.
The Spoke Council allocated a small budget after Tomáš Kovář, of all people, argued that language loss represented a logistics problem — a collapse in the colony's capacity to process knowledge encoded in non-dominant linguistic frameworks. I have never heard anyone weaponize supply chain vocabulary in defense of poetry before, but it worked. The Council voted 12-3.
Marcus came to see me afterward. He said Tomáš was right but for the wrong reasons. The real issue isn't knowledge frameworks. The real issue is that when Folake Adeyemi dies, a piece of who we are dies with her, and no model trained on two hundred sentence pairs can bring it back. I said I knew that. He said he knew I knew. We drank tea and said nothing for a while, which is a thing Marcus and I are good at.
I'm writing this in my journal — volume twenty-three — in Japanese. I do this sometimes, when I need to think clearly and the common tongue feels too loose, too easy. Japanese makes me precise. It makes me slow down. I learned my best ideas live in the language I think slowest in.
My mother would have understood this instinctively. She was a communications specialist, which meant she spent her career thinking about how messages survive the distance between sender and receiver. The distance between Kadmiel and Earth is thirty-eight light-years. The distance between a grandmother's Yoruba and her grandchild's common tongue is three evenings at a kitchen table.
If you're reading this on Earth — in whatever year this reaches you — we're trying. We have an AI that can learn a language from thirty-five sentences. We have eleven Yoruba speakers. We have a woman named Folake who cried when a machine spoke like a child.
That's where we are. That's who we are.
--- Earth Status: Researchers at Dartmouth College (Ivory Yang, Weicheng Ma, Soroush Vosoughi) demonstrated that GPT-4 Turbo can learn to translate an endangered script — Nüshu, a women's writing system from Hunan, China — from just 35 sentence pairs, creating the first expert-validated digital dataset of its kind. Published at the 31st International Conference on Computational Linguistics (2025). The framework is adaptable to other low-resource languages, including Cherokee and Navajo. Source


This dispatch was originally published on The Kadmiel Chronicle.


About The Kadmiel Chronicle

The Kadmiel Chronicle is a sci-fi tech blog set 38 light-years from Earth, where 43,000 colonists on planet Kadmiel adopt real emerging technologies and write personal essays about the experience. Every technology featured is real, sourced, and early-stage -- the fiction is in who adopts it and why it matters when you're building a civilization from scratch.

Browse the archive at kadmiel.world or propose a technology for the colony to adopt.

Top comments (0)