The Road to Localization in an Open Source Project

Edit: Dear Spanish-speaking readers, I am very pleased to bring to your attention a Spanish translation of this blog entry: https://www.ibidem-translations.com/edu/traduccion-localizacion-codigo-abierto/ Please check the comments for more details about it.

By deciding to go the localization road in my open source (OS) project, I had to learn a lot and overcome some problems. In this blog post I would like to describe for you exactly that.

The hurdles of localization

There are a lot of issues that make localizations difficult:

You either do it all the way or not at all. Nobody wants to work with a partially localized product.
It takes up capacity on an ongoing basis. With every new sprint, every new feature, there are new texts that want to be localized.
Expertise in the language to be localized is necessary and languages are very difficult to learn, as probably almost everyone should know from their own experience.
It delays and complicates the process. Are localizations done in parallel with development? What if texts change afterwards? Also, the language experts might not see the texts in the context of the app. Or do you do the localizations sequentially and after the feature is developed? But then you have to expect that the release will be delayed because of that.
Whether you tackle localizations at all or not is a question in software development that makes a serious difference. For localizations, an abstraction layer is necessary, because you want to keep texts localized and interchangeable at certain points. The ease of not offering localizations at all is tempting, because it can save a lot of effort and complexity in software development.

That's a challenge even for companies. And now I'm standing in front of my project as a recreational open source developer and thinking about its localization. For open source projects (especially small ones in manpower like mine) the hurdle is all the bigger due to lack of capacity. It's unfortunate, because many open source applications could be relevant to those who at the same time can't afford commercial alternatives and don't know English or the other commonly localized languages. That is, where the benefit seems great, the hurdle is even greater. To tell about it I would like to lead through a small history journey of my experiences from the localization of my OS project.

A bit of history of my personal experiences with localizing an OS project

First of all a little disclaimer. I was born in Russia and came to Germany when I was six. That means I speak German, English and very very bad Russian (sorted by skill).

So I'm starting my OS project, which is intended for private accounting (it's still at "not ready enough" for release, unfortunately). It is supposed to help people to get their finances under control. People who don't have money problems don't need to rely on it. So it has to be free! It should also be able to reach as many people as possible. So also those who can't speak English. "Localizations, yes or no?" is thus answered with a clear "F*** yeah!".

At first I decided to localize only in English and German, because these languages were good enough for my needs. Russian I didn't dare to do, because I never went to school in Russia and therefore I was far away from mastering the terminology. This alone shows that when you work on a project all by yourself, you can only handle localization as far as your skills go. And mine are unfortunately very limited. Too limited! However, this forced me to put the code in a position where it is localized and for further languages "only" all existing texts have to be translated.

For further languages I needed help and I looked for help. I found it with my aunt. She is also born in Russia and works in the field of finance. Perfect! In addition, I had a Hungarian girlfriend at that time, who thankfully agreed to help (Szia, magyarország. Hogy vagy?). I had their agreement, but I was still hesitant to take advantage of it. The program should be in a state that is worth something. That is another realization: I did not want to let these helpers, who are dear to me, work unnecessarily and regularly for me completely free of charge. I just had a bad feeling about that.

But I used the help once and then my hobby project was translated into four languages (out of 7,111; source wikipedia). Uh yeah. And then consciously or subconsciously I procrastinated with features that needed new texts. I preferred to deal with other things which didn't require new texts, which is not bad per se, but it inhibits the development of the real purpose of the project. When I started again with new features, I've put the English values into the Russian and Hungarian localizations as placeholders for the new texts. This is what happened to the project until a few weeks ago when I developed a new solution for myself. I wasn't eager to ask my aunt. And the relationship with my Hungarian girlfriend at that time passed in the meantime.

The solution I came up with

I solved (most of) the problems I had with localizations with the help of two projects. These are very specialized and designed for the C#/WPF/DeepL tech stack. Those who also work with C#/WPF/DeepL are welcome to try these projects. I am looking forward to feedback. I go into the technical details in the wikis of the respective projects (MrMeeseeks.ResXTranslationCombinator/MrMeeseeks.ResXToViewModelGenerator). Feel free to have a look there if you are interested. However, the concepts will certainly be transferable to other tech stacks as well. This will be the topic here.

Unfortunately I won't be able to spare you a short dry definition of the terminology, so let's get it over with quickly!

My projects distinguish between four categories of localization files:

The default file - this specifies the localization keys and texts that will be localized.
The automatic files - same keys as the standard file, but with translated texts. One for each supported language.
The overriding files - same keys as the default file, but with translated texts. One per supported language at most. These offer the possibility to manually and selectively override automatic translations.
The combined files - same keys as the standard file, but with translated texts. One for each supported language. The texts are combined from an automatic and, if necessary, an override file.

Now a rough description of the two projects and what they do:

MrMeeseeks.ResXTranslationCombinator - as the name and the definition of the terminology suggest: this project gets the default file and the overriding files at the beginning. From this it generates the automatic and combined files. I will simply call it the TranslationCombinator from now on.
MrMeeseeks.ResXToViewModelGenerator - this project initially gets the default file and the combined files. From this it generates localization ViewModels that can be conveniently used according to the Model-View-ViewModel Pattern (MVVM). I'll just call it the ViewModelGenerator from now on.

I'll be happy to torture you with more conceptual details and my personal experiences with these projects.

The TranslationCombinator

The TranslationCombinator is implemented as a github action step. If you follow the workflow - documented and recommended in the repository - the step reacts as soon as changes are made to localization files. Then it uses the translation service (I chose DeepL; be aware that you need an account to access the DeepL API) to create or supplement the automatic files. After that, taking into account any overriding files, the combined files are created or supplemented. Last but not least, a pull request is created if the process resulted in changes to the files.

Ideally, the developers only need to make changes to the default file as soon as they need new localizations. Ideally, there is no need for the language experts to get active. However, they can provide overrides to the localizations when the need arises. Everything else is done by the TranslationCombinator. This also allows for completely asynchronous collaboration between developers and language experts.

Some conceptual choices:

The translation service automatically detects the source language, so you can choose any of the supported languages in the default file without explicitly specifying which one it is.
automatic files act as caches. Translations are created only for missing values. If the maintainers want to have a value translated again automatically, they can simply delete the value.
TranslationCombinator creates a template file for overriding files. If you want to override texts of a certain language it is enough to rename the template file instead of copy&paste it. In the next pull request the template file will be created in this case.
You can manually override a non-supported language if you completely fill in an overriding file. However, in this case you should make sure that you supply the manual translations for new keys, otherwise these localizations will remain empty.
A lesson I had to learn: The TranslationCombinator should sort the keys of the generated files! Otherwise the pull request diff will explode.

The ViewModelGenerator

The localization files have an XML format which is not designed for direct use in MVVM projects. This is where the ViewModelGenerator helps out. It takes the default file and the combined files and generates a set of ViewModel interfaces and classes from them. These can then be read directly from elements in the View and ViewModel layers. They also provide a convenient and performant way to switch languages at runtime.

The sample project

I have created a third repository. This one is just a sample project using the other two projects in combination. If you want to see a complete example in action, feel free to check it out. Please note that I focused on the localization process there. This means that the rest of the code is not up to my usual standards. Here is a small animation of the result (I go through all languages once "slowly" and then a few times at full speed):

You can also have a look at the new localization workflow at the previously mentioned accounting hobby project (Project BFF).

Final thoughts

Are all localization problems solved now? No, certainly not. But this is two steps closer to the goal of minimizing human effort. Now I don't have to burden my aunt and my ex-girlfriend with more work. Uh yeah. The great thing is that you automatically get a base localization generated and language experts are always welcome to contribute if they want. There is no pressure on anyone to take care of the localizations, but anyone who wants to has the opportunity to become active. In my opinion, this is worth its weight in gold for an open source project.

Top comments (1)

Dima • Jul 6 '22

This blog post came to the attention of Spanish-based translator Chema Bescós, who asked if he could translate it for his blog "Ideas Worth Translating". Since it is a great honor for me I of course accepted. His translation can be found here:
ibidem-translations.com/edu/traduc...
The blog contains translations of other articles of technical nature. I think Chema would be very happy if you take a look:
ibidem-translations.com/edu/

All that remains for me to say is: Thank you very much, Chema :)