DEV Community

Cover image for Dear AI, can you translate the Rails Guide for me?
Kevin Luo
Kevin Luo

Posted on • Edited on

Dear AI, can you translate the Rails Guide for me?

TL;DR;

I used ChatGPT API to translate the Rails Guide into different languages:

Update on 2023/08/12

I added 3 more langauges

What's the Rails Guide?

I guess people who read this article already know Rails, however, just in case, I'll briefly introduce Ruby on Rails and the Rails Guide. Feel free to skip this section if you already knew them.

Ruby on Rails is a full-stack web application framework. With Rails, you can build a website that can access your database's data, return as API payload or render them on the user's browser easily and safely. The Rails Guide is the user manual for developers to learn how to use Rails. The Rails Guide is also a crowd-creation and is in the same repository on GitHub. It has very high quality because it is reviewed and modified again and again by numerous seasoned Rails developers. For anyone who wants to learn Ruby on Rails, I will definitely recommend they read the guide first.

Why translate the Rails Guide?

Translating the Rails Guide is not for diversity. The Ruby on Rails guide is written exclusively in English and it is totally fine. However, there are many talented developers all around the world who just cannot read English well. It is really a pity that they don't have a chance to get in touch with this wonderful and powerful web framework, Ruby on Rails, just because it lacks the information in their languages. I believe by translating the Rails Guide, we'll have a better chance for people all over the world to learn Rails.

Why use generative AI to translate Rails Guide?

First of all, generative AI can produce more human text. Moreover, with more context, it can generate more accurate and suitable translations. You must have read some articles which you could tell immediately that were translated by Google Translate because they felt very unnatural.

Second, although there are already many repositories of rails guide in different languages, https://guides.rubyonrails.org/contributing_to_ruby_on_rails.html#translating-rails-guides. However, the problem is that most of them are out of date. Those repositories also depend on volunteers' efforts. The Rails community used to have some enthusiastic fans who were willing to help translate the guide. Unfortunately, since the popularity of Rails plummeted, it hasn't had enough volunteers to continue the work. Using Generative AI to translate documents saves time and human effort. One person can refine the translation result by his/herself easily. It also means that we can update them more frequently. It could be a more sustainable method.

Proposed Workflow

My original plan was simple.

  1. Write a script to read the Rails guide files and send their content to ChatGPT to translate to a specified language.
  2. Then use the existing Rails Guide script to generate HTML files just like the current translation workflow I may wrap the code into a class, AiTranslator, so it should be like this

Original Idea

However, it was not as simple as I imagined 😅

Challenges

There are many challenges in this simple task. I picked some more significant ones here.

Tokens

ChatGPT or other generative AI models can only accept a limited number of tokens. Tokens are composed of both input and output strings. It's not the number of characters or words but only correlated. Tokens are also used for OpenAI to charge your bill.

The current most popular model, gpt-3.5-turbo only allows 4097 tokens for one request. Remember, it's used for both input and output. That means I cannot just upload a whole file to ChatGPT but I need to process a file piece by piece.

Maybe you think: it's easy, you can just send 1 to 2 phrases for a ChatGPT API call, then you'll never exceed the limit.

You're right. However, each ChatGPT request is independent, they don't share any context. I can show you an exmaple of the web page's ChatGPT. If I ask ChatGPT "Do you know NBA?" then ask it "Who's the champion of 2019?
". It will answer it's Toronto Raptors.

context ex1

However, if I only ask "Who's the champion of 2019?" directly in a new session, ChatGPT will not be able to answer me because of lacking context.

context ex2

Unlike Google Translate which is like a strengthened dictionary. We'd better treat the Generative AI model like a very smart student. The more input you give it, the better the result it returns to you. As a result, I want to feed ChatGPT text as much as possible so it can have appropriate context to translate the Rails Guide properly.

My approach is like the code block below.



buffer = []
result = ''
File.readlines(file).each do |line|
  if line == "\n" && buffer.join.split.length > @buffer_size
    translated_text = ai_translate(buffer.join)[:text]
    result += translated_text + "\n"
    buffer = []
  else
    buffer << line
  end
end


Enter fullscreen mode Exit fullscreen mode
  1. I declare a buffer = [] at the beginning.
  2. Iterate a file line by line. For each iteration, I'll put one line into buffer
  3. When the number of words exceeds a threshold, I'll send the request to ChatGPT API with the content in the buffer. The threshold, @buffer_size, is defaulted as 700. It's just an empirical magic number
  4. Plus, we know paragraphs in markdown are separated by blank lines, therefore, I also want to translate a whole paragraph in one ChatGPT request.

Prompt phrase

The prompt phrase for the Generative AI model affects the result drastically. I tried a lot of different combinations. And eventually, I made it this way:



LANGUAGES = {
  'zh-TW' => "Traditional Chinese used in Taiwan(台灣繁體中文).",
  'lt' => 'Lithuanian',
  'fr' => 'French',
  'pt-BR' => 'Brazilian Portuguese',
  'th' => 'Thai',
  'zh-CN' => 'Simplified Chinese',
}
system_prompt ||= "Translate the technical document to #{LANGUAGES[@target_language]} without adding any new content."


Enter fullscreen mode Exit fullscreen mode
  • Translate the technical document: pointing out that we are translating a technical document excerpt so it will know it does not need to translate some elements like code blocks.
  • LANGUAGES[@target_language]: I don't know whether it is a unique problem for Traditional Chinese. Although they're both Chinese words, the terminologies, writing style and intonation of Traditional Chinese in Taiwan are very different from what Simplified Chinese has. I need to specify it more clearly so I can get the desired result.
  • without adding any new content.: It is also important to tell ChatGPT not to add extra information because we're translating an article. Otherwise, it will just be like some annoying students in your classroom, who keep talking and add much needless knowledge.

Markdown parsing

The Rails Guide is full of code blocks for showing code examples. It's reasonable not to send a code block separately. I made the line reader a simple state machine. It will change the state to :codeblock when it starts parsing a codeblock and it won't call ChatGPT API until it finishes that block.



state = :readline
buffer = []
result = ''
File.readlines(file).each do |line|
  if line.include?("` ` `") # I need to add spaces between the backtick(`), or Dev.to will have problem
    buffer << line
    state = state == :codeblock ? :readline : :codeblock
  elsif line == "\n" && state == :readline && buffer.join.split.length > buffer_size
    translated_text = ai_translate(buffer.join)[:text]
    result += translated_text + "\n"
    buffer = []
  else
    buffer << line
  end
end


Enter fullscreen mode Exit fullscreen mode

Anchors

When you open any rails guide's page, you can see there's a Chapters block on the right serving as a table of content.

Chapters

That table is generated automatically by a script. The titles, such as <h1>, <h2>, <h3>, etc. will be assigned id with the title's text. For example, if the title is "Guide Assumption" in the markdown,



### Guide Assumption


Enter fullscreen mode Exit fullscreen mode

it will be rendered as in the final HTML



<h3 id="guide-assumptions">...</h3>


Enter fullscreen mode Exit fullscreen mode

The link in the table of content can then be referred to the elements with that id value.

It works fine in the original Rails Guide. When you click a link in the Chapters, the browser will jump to the corresponding section. However, a problem happens once all titles are translated. After some investigation, I found that it's related to Turbo. I guess it's a Turbo's bug. My current solution is disabling Turbo for the links in the Chapters block.



<ol class="chapters" data-turbo="false">
...
</ol>


Enter fullscreen mode Exit fullscreen mode

Code

Repository: https://github.com/kevinluo201/rails-guide-ai
This repo is forked from the Rails repo so that it can pull the updates of the guide's files. It only has 2 new files:

It only has 2 new files.

  • guides/rails_guides/ai_translator.rb: it's the main program.
  • guides/ai_translate.rb: it's the starting point

You can do the following steps if you want to play around with it.

  1. Set a new environment variable call OPENAI_ACCESS_TOKEN and set its value to your personal access token on OpenAI.
  2. add a new language in RailsGuide::AiTranslator, for example, 'jp' => 'Japanese'
  3. Open the terminal, go to guides/ and start translating by executing ```bash

ruby ./ai_translate.rb jp

4. You can also translate a single file, just add a filename after the command
```bash


ruby ./ai_translate.rb jp getting_started.md


Enter fullscreen mode Exit fullscreen mode
  1. After all files are translated, you can just execute the rails existing script to generate HTML, CSS and JS. Unfortunately, it is likely to fail when you do that. Usually, it is because there are duplicated titles which lead to duplicated id in the HTML. You can fix it by finding out which title has the problem and can change that title a bit to avoid the problem. It can also have different problems when translating into different languages. Just try solving them so the process can finish.


bundle exec rake guides:generate:html GUIDES_LANGUAGE=jp

Enter fullscreen mode Exit fullscreen mode




Help Wanted

It is just an experimental project now. There are several issues that can be improved. If you think it is an interesting topic, feel free to discuss it with me.

Current Issues

Anchor links

The table of content is solved by disabling Turbo. However, there are anchor links spread among the articles. They cannot be converted to the correct URL smoothly, especially when it refers to an anchor on another page.

Versioning

The Rails Guide has versions. A version is kind of a snapshot of the guide at a particular time. I haven't thought of a good way to manage them.

Different models

I'm now using gpt-3.5-turbo. I live in Canada so I cannot use Google's Bard. Feel free to change the code to be able to switch different models, like gpt4 or llamas 2

EPUB

Epub files can be generated by the Rails guide script. However, it has errors when I want to import them into the Epub reader software, such as "Books" on OSX. I think it may related to the broken anchor links.

Other stuff

If you have any ideas that can make this project more sustainable, please discuss it with me. For example, it's a guide for Rails, why not build it as a Rails app?

Conclusion

The quality of AI translation is not perfect but acceptable. I'm not concerned about the quality. As far I can see, the limitation of tokens and the trained model are the most significant factors. I believe this problem will be solved by swapping the current model (gpt-3.5-turbo) with a more advanced model in the future. The result shows that this workflow really works and that's the most important lesson for me.

About the cost, I have done many experiments for this idea and I translated the Rails Guides into 6 different languages. It costs me about $27 so each version of the translation costs less than $5 on average. The actual price should be less than that because many experiments just failed.

usage chart

*Due to its good quality and low cost, Generative AI might be a good solution for technical documents of open-source projects. *

Buy me a coffee

At last, if you like what i'm doing, you can buy me a coffee 😉☕️
Buy Me A Coffee](https://www.buymeacoffee.com/kevinluo)

Top comments (12)

Collapse
 
yasulab profile image
Yohei Yasukawa

Thanks for your informative article on translation! I am one of core maintainers of the Japanese version of Rails Guides.

there are already many repositories of rails guide in different languages, guides.rubyonrails.org/contributin.... However, the problem is that they're all out of date.

This would be true on some translation projects but not true, at least on the Japanese one. Our repository is still very active since we released in 2014, as well as rorlakr/rails-guides and morsbox/rusrails repos. On Japanese one, you can check out how we actively maintain it here: github.com/yasslab/railsguides.jp

But anyway, I know some of other translation projects are not actively maintained and I am personally interested in your approach. So I am glad if this information above helps your article more precise. :)

Collapse
 
kevinluo201 profile image
Kevin Luo

Thanks for your comment!

However, the problem is that they're all out of date.

Sorry, I shouldn't have used "All". I checked the repos and I agreed with you the Japanese translation is pretty active. I'm so envy of that 🥹 If I remember correctly, not only the open source communities documents often have the latest Japanese translation but also a Japanese tranlsation book will be reelased very fast after a new computer-related book is published

Anyway, I think it's still a good idea to utilize LLM like ChatGPT to generate an initial version of translation. It can save a lot of time for the volunteers. Since translating a open source project document doesn't get paid, if we can make volunteers' lives easier, I guess it could make people more willing to participate in and stay longer in the project.

Collapse
 
yasulab profile image
Yohei Yasukawa • Edited

Yeah, using "Some" or maybe "Most of" instead of "All" makes this article more precise. ;)


P.S.

In the Japanese translation project we already use AI-powered tools like DeepL Pro for a draft translation since 2018. And yes, it helps a lot!

Also we do fundraising for the Japanese documentation, which helps to continue the project and reduce the cost to maintain. It definitely helps to learn, especially for new Rails developers in Japan. ;)

Our project is well-documented in Japanese but not in English because most of our expected users are Japanese speakers. But I hope our example helps to translate in other languages. :D

Thread Thread
 
kevinluo201 profile image
Kevin Luo

I updated that

Collapse
 
isis profile image
Isis Tejeda

hi Kevin. Nice job on doing this. I was recently thinking of using ChatGPT to update the current version I have.
I am the one that translated the one for Rails 6 in Spanish (and yes not fully updated, a changed messed up the styling), and its a lot of work and hours to translate it the 'traditional' way. Due to time constraints, yes its hard to keep them maintained.

Will try to come back and see how your project continues. I'm curious if others are using the translations you did.

Collapse
 
kevinluo201 profile image
Kevin Luo

Hey @isis, I haven't checked its status recently because my son was born after I finished this side project. 😁
Here's today's GA result of those websites
Image description
I think most users are from Asia.

Anyway, after more than one year, LLM technology has improved a lot. e.g. the total amount of tokens can be much more than it was. I heard the concept of "embedding" recently: store the document in a vector database first and the access of it won't count any token. I think now what we'd better do is to wait a little bit longer. Let those tech giants compete among themselves and extend LLM's limitations as far as possible. Then we can start again. And I don't think it will be a pretty near future thing.

Collapse
 
kevinluo201 profile image
Kevin Luo

@isis My other thought is that it might never achieve the highest quality based on the current LLM's approach because uncertainty or indeterministic is LLM's nature. Maybe we should provide open-source projects translators with a better translation tool which is assisted by AI

Collapse
 
michaeltharrington profile image
Michael Tharrington

Wow, this is awesome, Kevin!

Collapse
 
nezirzahirovic profile image
Nezir Zahirovic

Great job!

Collapse
 
jettliya profile image
Jett Liya

In the vast landscape of web development, mastering frameworks like Ruby on Rails is pivotal for building robust and scalable applications. However, delving into the Rails Guide can often feel like navigating a labyrinth of technical jargon and complex concepts.

Enter AI, the beacon of hope for developers seeking clarity amidst the intricacies of the Rails Guide. With its advanced algorithms and natural language processing capabilities, AI offers a transformative solution to the challenge of translation.

Imagine a world where every line of the Rails Guide is effortlessly translated into clear, concise language, accessible to developers of all levels. No longer do you need to grapple with obscure terminology or convoluted explanations – AI bridges the gap between complexity and comprehension with ease.

From beginners seeking to grasp fundamental concepts to seasoned developers navigating advanced features, AI-driven translation promises to unlock the full potential of the Rails Guide for all. With AI as our ally, we can embark on a journey of discovery and mastery within the realm of Ruby on Rails, empowered by clarity and insight.

So, dear AI, can you translate the Rails Guide for me? The answer is a resounding yes – and with AI by our side, the possibilities are endless.

If you want to learn in detail about ai and ai tools to visit this website:
aichief.com/

Collapse
 
rennyren profile image
Renny Ren

You may use the gpt-3.5-turbo-16k model to address the token issue.

Collapse
 
kevinluo201 profile image
Kevin Luo

yeah, but that only moderates the problem a bit since it still cannot swallow the whole article at once