A while back one of the things I did to get familiar with Ruby on Rails was to translate them to Spanish, yes it takes a lot of time. And just like any documentation, it gets outdated rather quickly.
A little background
The original translation took up months of work before, after work or on the weekends.
MR to update Spanish translation
Since then, there was always the intent and very slow progress at updating to 7 version, but it never got done. But now, we can use Google translate or ChatGPT, so why would we really need them?
This month, I started to look up what current versions and if any work had been done with translations. I learned a couple of things.
1: I found one article about someone already using ai to translate them! This "Dear AI, can you translate the Rails Guide for me?"
2: The translation links were removed from rails in Oct 2023 It appears Rails wants to find a way to support them natively
3: Rails 7.2 had just come out.
All of this got me going to test out what OpenAi could do to see about translating them to Spanish, instead of doing them manually.
Thank you to Kevin, for taking the step to share your article a year ago about updating the Rails Guides to various languages. Started of with the same name of the files but most of the content has been edited.
So how much would this cost:
According to GPT
“The GPT-4 model can handle up to approximately 25,000 words in a single interaction, which includes both the input message and the response generated by the model. Since 25,000 words is roughly equivalent to 100,000 characters, this would be the upper limit for the total number of characters that can be processed in one interaction.
Given that this total includes both the message you send and the model's response, the maximum length of the message you could send would be somewhat less than 100,000 characters. A good estimate would be to assume that about half of this capacity could be used for your input message, meaning the maximum length of a message you could send might be around 50,000 characters, allowing enough space for a response of similar length.”
Wanted to know how much was this going to cost.
At the time of doing this.
The gpt-4o-mini
model is more cost efficient when we run the files to get translated.
than gpt-4o-2024-08-06
.
I did a couple of test runs with both of these model to compare the translations.
After a couple of test runs and comparison on the response output with the translation. I preferred the output for the gpt-4o-2024-08-06
The output came out closer to what I wanted it to look like even with the same prompts. Then as I started to read the content of the guides it was also my preference in the language it was using for the gpt-4o-2024-08-06
.
By default it also did not translate code blocks.
Few Items I did in my code:
I used the openai-ruby gem.
I also wanted to handle all of the translation of the files in a single method instead of having to give it explicit instructions and a separate method on the type of file. Was to just create a single method that handles the translation of the files.
I ended up with this.
So I am just sending in the file, and the prompts to train it.
A separate method for the 3 types of files in the guides. This was done just to give it different prompts.
Some of the differences that I saw between models and the initial work:
Bulleted items were coming back with dashes instead of bullet points.
Some of the headings in the markdown files were also coming out like:
*#Action Mailer Basics
*
instead of
*Action Mailer Basics
*
====================
I also wanted to keep the hyperlinks in the same place in the markdown file instead of at the bottom of the page.
This also got rid of the error of the links within the page not working. So I no longer had to edit those manually.
The link to the project is: https://github.com/latinadeveloper/railsguides.es/tree/es-translation-7-2
There is a directory es-7-mini-model
that has the files translated with the mini one. In case anyone is curious on the output and translation.
I only did the translation for the website and chose to not do the epub(kindle) for now.
One of the things I did run into was getting the 429 error, too many requests.
I didn’t do any work around doing the translation files in batches, so that will be a follow up on this project. Some inner page links are still not fully generated by openAi, further training is needed there.
Total tokens: 772322, Total cost: $3.86161
Screenshot from start to finish, including set up.
Breakdown of each file.
File Name | Total Tokens | Cost ($) |
---|---|---|
7_0_release_notes.md | 5765 | 0.028825 |
7_1_release_notes.md | 16434 | 0.08217 |
active_record_callbacks.md | 12023 | 0.060115 |
active_record_composite_primary_keys.md | 21377 | 0.106885 |
active_record_encryption.md | 4598 | 0.02299 |
active_record_migrations.md | 11944 | 0.05972 |
active_record_multiple_databases.md | 29971 | 0.149855 |
active_record_postgresql.md | 11692 | 0.05846 |
active_record_querying.md | 12190 | 0.06095 |
active_record_validations.md | 39487 | 0.197435 |
active_storage_overview.md | 28775 | 0.143875 |
active_support_instrumentation.md | 28302 | 0.14151 |
action_mailer_basics.md | 15902 | 0.07951 |
api_app.md | 17878 | 0.08939 |
autoloading_and_reloading_constants.md | 9556 | 0.04778 |
configuring.md | 14584 | 0.07292 |
api_documentation_guidelines.md | 85618 | 0.42809 |
asset_pipeline.md | 7723 | 0.038615 |
association_basics.md | 23343 | 0.116715 |
caching_with_rails.md | 41625 | 0.208125 |
classic_to_zeitwerk_howto.md | 14966 | 0.07483 |
command_line.md | 8878 | 0.04439 |
contributing_to_ruby_on_rails.md | 13416 | 0.06708 |
debugging_rails_applications.md | 18054 | 0.09027 |
development_dependencies_install.md | 22538 | 0.11269 |
documents.yaml | 4042 | 0.02021 |
engines.md | 2288 | 0.01144 |
error_reporting.md | 23233 | 0.116165 |
form_helpers.md | 3673 | 0.018365 |
generators.md | 29046 | 0.14523 |
getting_started.md | 9454 | 0.04727 |
getting_started_with_devcontainer.md | 33916 | 0.16958 |
i18n.md | 2437 | 0.012185 |
index.html.erb | 27371 | 0.136855 |
initialization.md | 762 | 0.00381 |
layout.html.erb | 9338 | 0.04669 |
layouts_and_rendering.md | 3821 | 0.019105 |
maintenance_policy.md | 26178 | 0.13089 |
plugins.md | 2088 | 0.01044 |
rails_application_templates.md | 7718 | 0.03859 |
rails_on_rack.md | 3825 | 0.019125 |
routing.md | 5289 | 0.026445 |
ruby_on_rails_guides_guidelines.md | 29928 | 0.14964 |
security.md | 2555 | 0.012775 |
testing.md | 36505 | 0.182525 |
threading_and_code_execution.md | 38265 | 0.191325 |
tuning_performance_for_deployment.md | 5691 | 0.028455 |
working_with_javascript_in_rails.md | 6391 | 0.031955 |
upgrading_ruby_on_rails.md | 6372 | 0.03186 |
Total | 772322 | 3.86161 |
After all of this. I still need to read them, and would love any other native Spanish speaker to contribute if things are off.
At the end of the day, I wanted to have them updated in case it helps out anyone that prefers to read it in their language and it was a fun way to try out OpenAi.
🐝 Inspired
Top comments (0)