How to Manage Translations With (and Without) Artificial Intelligence

#ai #tutorial #learning #hacktoberfest

Hacktoberfest: Contribution Chronicles

I’ve been writing less than usual, as I’m currently focusing on my Hacktoberfest contributions, but I have something I want to share with you: I had the opportunity to translate some documentation files from English to Italian, and I learned a lot about using AI to do so. Let’s see if I think it’s worth using for this task.

Human Error Is Always the First Problem

When dealing with textual sources, human error is always the first problem you encounter. I mean, something might not be right before we even start: many documents report inconsistencies, especially if they are part of a serial publication. Recursive sentences may not be as identical as they should be, for example.

It’s completely normal, but when you don’t start from your native language it becomes problematic. In this specific case, I think the author (or authors) were in a rush to proceed. I’m referring to a publication that recently acquired two new episodes: it seems they’ve pasted in the wrong paragraph from the previous ones.

Keep in mind that I translate technical documentation, so formatting is almost never the first thing a writer thinks about. And among the many things I’ve learned (or at least reviewed), the most important ones haven’t been about writing. Specifically, I’m referring to a guide on generative AI with JavaScript.

Not Everything Is Always Translatable

Let’s face it, it’s not always possible to translate everything. This is due to at least two different problems: the first could be a limitation of the platform in use, the second the use of technical terminology. Programming languages often include English commands that cannot be translated into other languages.

Limitation of the Platform

This time I came across Markdown and, specifically, the modified version from GitHub. It generally has little to do with translation, except for an extension (the source code for which I couldn't find) that automatically generates a text label. I'm referring to Alerts, which was introduced back in 2023.

Unfortunately, I haven’t found a way to translate them. It’s a minor issue, since the graphics are quite intuitive, but the translation result may be incomplete. Compared to the second point, I consider this to be more serious: it should be possible today. I mean, why not? A single attribute would be enough.

I can say more or less the same thing about icon labels that indicate things like build status, but in this case translating them isn’t essential. Furthermore, GitHub itself often uses third-party icons and therefore isn’t responsible for their output. We can also dismiss this as a false issue and move on.

Limitation of the Programming Languages

It’s one thing to translate a programming guide, but it’s another to translate the programming language itself. You can’t replace reserved keywords with alternatives in your own language, for example: at most, you can translate the comments. In this case, I chose not to do so for a reason.

By translating a guide on generative AI, code-based examples can be exposed via MCP to one or more LLMs. I found the English comments more useful for this purpose than translated ones. It’s the same reason I didn’t translate the prompts. I prefer the models to receive them in English.

Tokenization is designed around English words, phrases, and sentences: it now works with any other language, but why complicate things? Besides, in Italy, it’s common practice to include English terms in colloquial speech. This brings us to the next point in the description of my experience.

Translate With Artificial Intelligence

Let's get one thing straight: I didn’t use the LLMs. Could I have? Yes. Will I do so in the future? Yes, especially after reading (and translating) the guide I’m talking about, but this time, that wasn’t the case. All this was simply because I didn’t have a suitable AI model on my computer and wouldn’t have used one in the cloud.

The level reached by artificial intelligence in translation (at least between Romance and Anglo-Saxon languages) is very high. Even using a graphical interface, I can rely on the generated results 99% of the time. Some flaws arise due to the different complexity of the languages, such as the second-person singular and plural.

Italian distinguishes “tu” from “voi”, while in English it is always “you”. This means that verbs, adjectives, and personal pronouns also change: only one of the two tools I used deduced the correct ones from the context. The other used the two variants at random, also modifying the tone of the sentences.

Using DeepL Translate

Spoiler: it’s still my favorite. That said, I actually started translating entire sections of the guide with DeepL, and I did it from the GUI rather than the API. I did this on purpose because I didn’t want an automated process: I wanted to be able to check the result word by word, especially given the unique nature of the Italian language.

The tool performed well, always providing a fairly accurate translation, but a problem common to its rival concerns the fact that we often prefer not to translate certain English words in Italian, especially if they are technical terms. Specifically for MCPs, we don’t translate “tool” as “strumento”.

Mostly because the term refers to a specific item. In Italian, both “tool” and “instrument” translate as “strumento”: likewise, “model” and “template” are always “modello”. Imagine how many contradictions this could generate in a guide that uses these words frequently. However, it was the best result.

Using Google Translate

Believe me, it has improved a lot. I’ve been very impressed with the accuracy it’s gained lately: I avoided using it for years because it wasn’t up to par, but now I even recommend it. That doesn't mean it's a perfect tool; in fact, it hasn't always distinguished between “tu” and “voi”.

This is a rather big problem, because in Italian using one or the other completely changes the tone of the conversation. A fellow Italian wouldn’t understand why, in the same conversation, I had to switch from using “tu” to “voi” (which isn’t even used anymore except in the South).

For the same reason, this instrument has often changed linguistic register, moving from a translation suitable for everyone to a more courtly one. Here, it would have made sense to use an LLM with a prompt to guide the tone. But overall I’m not dissatisfied with this result either, on the contrary.

Translate Without Artificial Intelligence

It’s still the preferable solution, provided you’re a professional translator — and I’m not. I mean, if you want to do it completely independently, because the ideal is to act as a supervisor, correcting the AI’s errors. I can give you other examples that concern my language and other similar ones.

For example, the use of quotation marks in Europe varies quite significantly across languages. AI tends to only use single and double quotation marks, which are also available on the keyboard, but when it comes to texts that include dialogue and citations, these are completely inappropriate.

In Italian, we use angle quotation marks to enclose spoken text; the French use high quotation marks at the beginning and low quotation marks at the end. This is one of the many reasons why algorithms are still immature, in my opinion. Another is the use of punctuation, which poses various problems.

While I’m convinced that a hybrid approach remains the best, I’m starting to think that LLMs could make their way, almost completely replacing human intervention. I’ve experimented with this approach, and the results have been astonishing. I think I’ll return to this topic in the future to show how I did it.