DEV Community

JimmyLiao for Google Developer Experts

Posted on

Free Academic Paper Translation with TranslateGemma

The Problem: You're Wasting Hours on Research Papers

Here's a situation you might recognize:

You open an arXiv paper. It's groundbreaking work in your field. You need to understand it. But after 20 minutes staring at the abstract, you've only grasped about 60% of what's happening.

So you start the copy-paste dance:

  1. Highlight a paragraph ๐Ÿ“‹
  2. Open DeepL in another tab
  3. Paste and translate
  4. Copy translation back to your notes
  5. Lose all formatting ๐Ÿ˜ซ
  6. Repeat 47 more times...

Three hours later, you're exhausted, your notes are a mess, and you're not even sure you understood the methodology correctly.

What if I told you there's a better way?

In this post, I'll show you how to translate entire arXiv papers into beautiful bilingual HTML โ€” original and translation side-by-side โ€” using Google's TranslateGemma model on free Colab GPU.

Set it up once, translate forever. Let's dive in.


๐ŸŽฏ What Makes This Different?

Before we jump into the tutorial, let's understand why this approach beats traditional tools:

TranslateGemma is Like a Specialized Translator for Academics

Think of general translation APIs (DeepL, Google Translate) as generalist interpreters. They're great at casual conversations but sometimes stumble on domain-specific jargon.

TranslateGemma is like hiring a PhD student who speaks both languages โ€” it understands:

  • Technical terminology in context
  • Academic writing conventions
  • The difference between "model" (ML model) vs "model" (fashion model)
  • How to preserve mathematical notation

The Bilingual HTML is Like Having Training Wheels

Instead of reading pure translation, you get:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Original (English)              โ”‚ Translation (Your Language)     โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ This work introduces Gemma...   โ”‚ ๆœฌ็ ”็ฉถไป‹็ดนไบ† Gemma...             โ”‚
โ”‚ ...                             โ”‚ ...                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ†‘ Navigate with โ† โ†’ keys โ†‘
Enter fullscreen mode Exit fullscreen mode

This means:

  • โœ… Learn English while reading in your language
  • โœ… Check translations when something feels off
  • โœ… Build vocabulary by seeing terms in context

โšก Why Free Colab T4 GPU Changes Everything

The old way:

  • Pay $0.01-0.02 per page for API credits
  • Or run models locally (if you have a GPU)
  • Or keep copy-pasting...

The new way:

  • Google Colab gives you free T4 GPU access (15GB VRAM)
  • TranslateGemma 4B fits comfortably in that memory
  • Zero cost for reasonable daily usage
What You Get The Cost
Tesla T4 GPU (15GB) $0
TranslateGemma 4B model $0
~3 minutes per page $0
Unlimited papers (within daily quota) $0

The catch? About 3 minutes per page translation time. But honestly? That's the time you'd spend copy-pasting anyway, and you get much better results.


๐Ÿš€ Let's Build This: 10-Minute Setup

Instead of drowning you in theory, let's get your first paper translated. We'll explain what's happening as we go.

Prerequisites (5 minutes setup, one-time)

You'll need:

  1. Google Account (for Colab)
  2. HuggingFace Account (sign up free)
  3. HF Token with read access (create here)
  4. Accept Gemma Terms (click here)

Step 1: Open Notebook & Enable GPU (1 minute)

Click this badge:

๐Ÿ‘‰ Click here to open in Google Colab

Then:

  1. Runtime menu โ†’ Change runtime type
  2. Select T4 GPU from dropdown
  3. Click Save

Why T4? It's the free tier GPU that's perfect for this task โ€” enough memory for the 4B model, but not overkill.


Step 2: Run Environment Detection (30 seconds)

Execute the first code cell (click โ–ถ๏ธ or press Shift+Enter):

# This auto-detects whether you're on Colab, GCP, or local Jupyter
ENV = detect_environment()
Enter fullscreen mode Exit fullscreen mode

Output:

================================================================================
๐Ÿ” Environment Detection
================================================================================
๐Ÿ–ฅ๏ธ  Environment: COLAB
๐Ÿ Python: 3.10
๐Ÿ“‚ Working dir: /content
================================================================================
โœ… Environment: COLAB - Ready!
================================================================================
Enter fullscreen mode Exit fullscreen mode

Environment Detection Output

The notebook automatically detects your runtime environment

What's happening here?
The notebook adapts to your environment automatically. Same notebook works on:

  • Google Colab (most users)
  • GCP Custom Runtime (advanced)
  • Local Jupyter (if you have GPU)

No need to modify code โ€” it just worksโ„ข.


Step 3: Install Dependencies (2 minutes)

Next cell installs packages based on your environment:

# Colab gets lightweight dependencies
!pip install -q huggingface_hub transformers accelerate \
             sentencepiece protobuf pymupdf pillow \
             opencc-python-reimplemented
Enter fullscreen mode Exit fullscreen mode

Key package: opencc-python-reimplemented

This ensures if you're translating to Traditional Chinese (Taiwan/Hong Kong), you get ๅŸบๆ–ผ not ๅŸบไบŽ. Small details matter in academic writing.

Just click โ–ถ๏ธ and wait for installation to complete.


Step 4: Authenticate with HuggingFace (1 minute)

The notebook will prompt for your HF token:

๐Ÿ“ Please enter HuggingFace Token:
   ๐Ÿ’ก Tip: Use Colab Secrets (๐Ÿ”‘ icon) for better security
   1. Get token: https://huggingface.co/settings/tokens
   2. Accept model: https://huggingface.co/google/translategemma-4b-it

Token: โ–ˆ
Enter fullscreen mode Exit fullscreen mode

Paste your token and press Enter. Done.

Security tip: Use Colab's built-in secrets manager (๐Ÿ”‘ sidebar icon) instead of pasting tokens directly if you're sharing notebooks.


Step 5: Load the Model (First run: 5 min, After: 30 sec)

This is where the magic happens:

from transformers_backend import TransformersBackend

backend = TransformersBackend()
result = backend.load_model()
Enter fullscreen mode Exit fullscreen mode

First run output:

๐Ÿš€ Loading TranslateGemma (4B)...
   โณ Downloading model (~8.6GB) on first run...

Downloading: 100% |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 8.6G/8.6G [04:32<00:00, 31.5MB/s]

โœ… Model loaded!
๐Ÿ“ Device: cuda:0
๐Ÿ“Š Load time: 37.8s
๐Ÿ’พ Memory: 13.8 GB used / 15.0 GB total
๐ŸŽ‰ Ready to translate!
Enter fullscreen mode Exit fullscreen mode

What just happened?

  • Downloaded TranslateGemma 4B (8.6GB) to Colab's disk
  • Loaded model into GPU memory
  • Cached for future runs (next time: 30 seconds!)

Grab a coffee โ˜• on first run. It's worth the wait.


Step 6: Configure Your Translation (30 seconds)

Now the fun part โ€” telling it what to translate:

# Which paper?
ARXIV_ID = "2403.08295"  # Gemma paper (or any arXiv ID)

# Which pages?
SECTIONS = {
    "abstract": (1, 3),  # Pages 1-3
}

# What languages?
SOURCE_LANG = "en"
TARGET_LANG = "zh-TW"  # Traditional Chinese (Taiwan)

# Generate beautiful HTML?
SAVE_HTML = True
Enter fullscreen mode Exit fullscreen mode

Customization examples:

Translate intro section only:

SECTIONS = {
    "intro": (2, 5),
}
Enter fullscreen mode Exit fullscreen mode

Translate to Japanese:

TARGET_LANG = "ja"
Enter fullscreen mode Exit fullscreen mode

Translate everything:

SECTIONS = {
    "full": (1, 20),  # All pages
}
Enter fullscreen mode Exit fullscreen mode

Supported languages: 50+ including zh-TW, zh-CN, ja, ko, fr, de, es, pt, ru, etc.


Step 7: Hit Translate! (3 min per page)

Execute the translation cell:

# Download PDF from arXiv
pdf_path, total_pages = download_arxiv(ARXIV_ID)

# Translate page by page with progress bar
with tqdm(total=3, desc="๐Ÿ“– Translating") as pbar:
    for page_num in range(1, 4):
        text = extract_text(pdf_path, page_num)
        result = backend.translate(text,
                                  source_lang=SOURCE_LANG,
                                  target_lang=TARGET_LANG)
        results.append(result)
        pbar.update(1)
Enter fullscreen mode Exit fullscreen mode

Real output from my test:

๐Ÿ“ฅ Downloading arXiv:2403.08295
โœ… Downloaded: 2403.08295.pdf (17 pages)

================================================================================
๐Ÿš€ Translation Started
================================================================================
๐Ÿ“Š Pages: 3
โฑ๏ธ  Est. time: ~9 minutes

๐Ÿ“– Translating: 33% |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–Œ         | 1/3 [02:48<05:36, 168.05s/page]
โœ… Page 1: 168.05s

๐Ÿ“– Translating: 67% |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ | 2/3 [05:51<02:43, 163.25s/page]
โœ… Page 2: 163.25s

๐Ÿ“– Translating: 100% |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 3/3 [08:37<00:00, 166.29s/page]
โœ… Page 3: 166.29s

================================================================================
โœ… Translation Complete!
================================================================================
๐Ÿ“Š Pages: 3
โฑ๏ธ  Total: 8 min 37 sec
โšก Avg: 2.8 min/page
================================================================================
Enter fullscreen mode Exit fullscreen mode

Translation Progress

Live translation progress with tqdm showing real-time status per page

What's happening under the hood?

For each page, the backend:

  1. Sends text to TranslateGemma with a simple, direct prompt:
Translate the following text from en to Traditional Chinese (Taiwan, ็น้ซ”ไธญๆ–‡).
Only output the translation, do not include explanations:

[Original text here]

Translation:
Enter fullscreen mode Exit fullscreen mode
  1. Model generates translation using GPU acceleration
  2. Extracts clean translation from output
  3. Applies OpenCC post-processing (for zh-TW)

Pro tip: Pages with heavy math/tables take similar time โ€” the model handles them well.


Step 8: View Results in Notebook

Immediately after translation, you'll see:

================================================================================
๐Ÿ“„ Page 1 - ABSTRACT
================================================================================

๐Ÿ“ Original:
--------------------------------------------------------------------------------
This work introduces Gemma, a family of lightweight, state-of-the-art open
models built from the research and technology used to create Gemini models.
Gemma models demonstrate strong performance across academic benchmarks for
language understanding, reasoning, and safety.

๐ŸŒ Translation:
--------------------------------------------------------------------------------
่ซ–ๆ–‡ๆ‘˜่ฆ๏ผš
Gemma ๆ˜ฏไธ€็ณปๅˆ—ๅŸบๆ–ผ Gemini ็š„่ผ•้‡็ดšใ€ๅ…ˆ้€ฒ็š„้–‹ๆบๆจกๅž‹ใ€‚้€™ไบ›ๆจกๅž‹ๅœจ่ชž่จ€็†่งฃใ€
ๆŽจ็†ๅ’Œๅฎ‰ๅ…จๆ€ง็ญ‰ๆ–น้ข็š„่กจ็พๅ„ช็•ฐ๏ผŒไธฆๅœจ 18 ้ …ๆ–‡ๅญ—ไปปๅ‹™ไธญ๏ผŒๅœจๅŒ็ญ‰่ฆๆจก็š„้–‹ๆบๆจกๅž‹
ไธญ่กจ็พๆ›ดไฝณใ€‚
Enter fullscreen mode Exit fullscreen mode

Notice the quality:

  • "lightweight" โ†’ "่ผ•้‡็ดš" โœ… (not "่ผ•" or "ๅ…‰")
  • "state-of-the-art" โ†’ "ๅ…ˆ้€ฒ" โœ… (contextually appropriate)
  • "benchmarks" โ†’ "ๅŸบๆบ–ๆธฌ่ฉฆ" โœ… (technical term)
  • Traditional Chinese: ๅŸบๆ–ผ โœ… (not ๅŸบไบŽ)

This is way better than copy-pasting into Google Translate.


Step 9: Download Interactive HTML (10 seconds)

The final cell generates a self-contained HTML file:

# Generate bilingual HTML
filename = f"arxiv_{ARXIV_ID}_{SOURCE_LANG}-{TARGET_LANG}.html"

# Auto-download in Colab
from google.colab import files
files.download(filename)
Enter fullscreen mode Exit fullscreen mode

Output:

๐Ÿ’พ HTML saved: arxiv_2403.08295_en-zh-TW.html
๐Ÿ“‚ Full path: /content/arxiv_2403.08295_en-zh-TW.html
๐Ÿ“Š Size: 143.2 KB
๐Ÿ“„ Pages: 3

๐Ÿ“ฅ To view the full HTML:
   1. Download: Right-click 'arxiv_2403.08295_en-zh-TW.html' in Files panel โ†’ Download
   2. Or use auto-download (Colab native only)
Enter fullscreen mode Exit fullscreen mode

Open the HTML in your browser:

Bilingual HTML Interface - Header

Clean header with title, language pair, date, and keyboard-friendly navigation

Bilingual HTML Interface - Side-by-Side Layout

Original English (left) and Traditional Chinese translation (right) in perfect sync

What you're seeing:

  • Header: arXiv:2403.08295 Bilingual Translation
  • Metadata: en โ†’ zh-TW | 2026-01-19 14:22
  • Navigation: โ—„ Prev | Page 1 (1/3) | Next โ–ถ
  • Hint bar: ๐Ÿ’ก Use โ† โ†’ keys (yellow background for visibility)
  • Section header: ๐Ÿ“„ ABSTRACT - Page 1 โฑ๏ธ 179.74s (shows translation time)
  • Dual columns: Gray background for original, white for translation

Features:

  • โœ… Side-by-side original + translation (never lose context)
  • โœ… Keyboard navigation (โ† โ†’ arrow keys for fast reading)
  • โœ… Page counter with progress ("Page 1 (1/3)")
  • โœ… Translation time per page (โฑ๏ธ 179.74s shown in purple header)
  • โœ… Works offline (no internet needed after download)
  • โœ… Mobile responsive (columns stack vertically on small screens)
  • โœ… Clean typography (monospace for original, sans-serif for translation)

This is your forever-reference for that paper. Share it, annotate it, or keep it for later.


๐Ÿ”ฌ Translation Quality: Let's Be Honest

I tested this on the Gemma Technical Report (arXiv:2403.08295), a genuinely complex paper with:

  • Model architecture details
  • Training methodology
  • Benchmark results (tables)
  • Mathematical notation
  • Lots of jargon ("multi-query attention", "RoPE embeddings", "supervised fine-tuning")

Sample: Original Text

The Gemma model architecture is based on the transformer decoder (Vaswani et al., 2017).
The core parameters of the architecture are summarized in Table 1. Models are trained on
a context length of 8192 tokens. We also utilize several improvements proposed after the
original transformer paper, and list them below:

Multi-Query Attention (Shazeer, 2019). Notably, the 7B model uses multi-head attention
while the 2B checkpoints use multi-query attention (with num_kv_heads = 1), based on
ablations that showed that multi-query attention works well at small scales.
Enter fullscreen mode Exit fullscreen mode

TranslateGemma Output

Gemma ๆจกๅž‹ๆžถๆง‹ๅŸบๆ–ผ Transformer ่งฃ็ขผๅ™จ๏ผˆVaswani ็ญ‰ไบบ๏ผŒ2017๏ผ‰ใ€‚ๆžถๆง‹็š„ๆ ธๅฟƒๅƒๆ•ธ
็ธฝ็ตๆ–ผ่กจ 1 ไธญใ€‚ๆจกๅž‹ๆ˜ฏๅœจ 8192 ๅ€‹ token ็š„ไธŠไธ‹ๆ–‡้•ทๅบฆไธŠ่จ“็ทด็š„ใ€‚ๆˆ‘ๅ€‘้‚„ไฝฟ็”จไบ†ๅŽŸๅง‹
Transformer ่ซ–ๆ–‡ไน‹ๅพŒๆๅ‡บ็š„ๅนพ้ …ๆ”น้€ฒ๏ผŒไธฆๅœจไธ‹้ขๅˆ—ๅ‡บ๏ผš

ๅคšๆŸฅ่ฉขๆณจๆ„ๅŠ›๏ผˆShazeer๏ผŒ2019๏ผ‰ใ€‚ๅ€ผๅพ—ๆณจๆ„็š„ๆ˜ฏ๏ผŒ7B ๆจกๅž‹ไฝฟ็”จๅคš้ ญๆณจๆ„ๅŠ›๏ผŒ่€Œ 2B ๆชขๆŸฅ
้ปžไฝฟ็”จๅคšๆŸฅ่ฉขๆณจๆ„ๅŠ›๏ผˆnum_kv_heads = 1๏ผ‰๏ผŒ้€™ๆ˜ฏๅŸบๆ–ผๆถˆ่ž็ ”็ฉถ้กฏ็คบๅคšๆŸฅ่ฉขๆณจๆ„ๅŠ›ๅœจๅฐ่ฆๆจก
ไธ‹ๆ•ˆๆžœ่‰ฏๅฅฝใ€‚
Enter fullscreen mode Exit fullscreen mode

My Assessment

Aspect Rating Notes
Technical Accuracy โญโญโญโญโญ "multi-query attention" โ†’ "ๅคšๆŸฅ่ฉขๆณจๆ„ๅŠ›" is spot-on
Terminology Consistency โญโญโญโญโญ Same term translated same way throughout
Grammar & Flow โญโญโญโญโญ Reads naturally in target language
Format Preservation โญโญโญโญโญ Keeps paragraphs, citations, structure intact
Context Understanding โญโญโญโญ Gets that "ablations" means ablation studies (not medical)

Where it shines:

  • โœ… Technical jargon (transformers, attention mechanisms, tokens)
  • โœ… Citations format preserved: (Vaswani et al., 2017)
  • โœ… Numbers and variables unchanged: 8192, 7B, num_kv_heads
  • โœ… Academic tone maintained

Minor quirks:

  • โš ๏ธ Sometimes literal translation where paraphrase would be smoother
  • โš ๏ธ Very occasional wrong word choice (maybe 1-2 per page)

Compared to:

  • DeepL: Better for general text, but struggles with ML terminology
  • Google Translate: Faster, but often mistranslates domain terms
  • GPT-4/Claude API: Similar quality, but costs $0.01-0.02 per page
  • Human translator: Obviously better, but $$$$ and slow

For free academic translation, this is unbeatable.


โšก Performance & Cost: The Real Numbers

Let me share actual benchmarks from my testing:

My Setup

  • Platform: Google Colab Free Tier
  • GPU: Tesla T4 (15GB VRAM)
  • Model: TranslateGemma 4B (~8.6GB)
  • Test paper: Gemma Report (arXiv:2403.08295)

Timing Breakdown

Operation First Run Subsequent Runs
Model download ~5 min (one-time) -
Model loading 37.8 sec 30 sec (cached)
Translation 165-170 sec/page Same
HTML generation <1 sec <1 sec

Total for 3 pages:

  • First ever run: ~15 minutes (including model download)
  • After model cached: ~9 minutes (just translation time)

GPU Usage

๐Ÿ“Š GPU Memory:
   Total: 15.0 GB
   Model: ~8.6 GB
   Working: ~1.2 GB
   Available: ~5.2 GB

๐Ÿ“Š Utilization:
   During translation: 95-100%
   Idle: 0%
Enter fullscreen mode Exit fullscreen mode

The T4 is fully utilized during translation โ€” that's why it's relatively fast.

Cost Comparison (10-page paper)

Method Time Cost Quality
TranslateGemma + Colab ~30 min $0.00 โญโญโญโญโญ
Claude 3.5 API ~2 min $0.10 โญโญโญโญ
DeepL Pro API ~1 min $0.20 โญโญโญโญ
Google Translate Instant $0.00 โญโญโญ
Human translator 2-3 days $50-200 โญโญโญโญโญ

My take: If you're reading 5-10 papers per week, the time investment of TranslateGemma pays off in quality + zero cost. For one-off urgent translations, APIs are faster.


๐Ÿ› ๏ธ Power User Tips

Once you've got the basics down, here are some pro moves:

Tip 1: Batch Translate Strategically

Don't translate entire papers blindly. Use this approach:

SECTIONS = {
    "abstract": (1, 1),      # Quick scan: worth deep reading?
    "intro": (2, 4),         # Context and motivation
    "conclusion": (15, 16),  # Main takeaways
}
Enter fullscreen mode Exit fullscreen mode

Read these first (10 min translation). If it's relevant, come back for:

SECTIONS = {
    "method": (5, 10),
    "results": (11, 14),
}
Enter fullscreen mode Exit fullscreen mode

Why? You'll save time on papers that aren't relevant to your work.


Tip 2: Translate to Multiple Languages

Learning Japanese and Chinese? Do this:

for lang in ["zh-TW", "ja"]:
    TARGET_LANG = lang
    # Run translation
    # Generate HTML
Enter fullscreen mode Exit fullscreen mode

Now you have 3 versions to compare:

  • Original English
  • Traditional Chinese
  • Japanese

Great for building technical vocabulary across languages.


Tip 3: Fix the "Pages 2-3 Didn't Translate" Bug

If you're using an older version, pages with lots of charts/tables might fail to translate (they just return the original text).

We fixed this recently! Update to latest:

cd trans-gemma && git pull
Enter fullscreen mode Exit fullscreen mode

What we changed:

  • Switched from complex chat template to simple direct prompt
  • More robust extraction logic
  • Better handling of mixed-content pages (text + figures)

Tip 4: Run Locally If You Have GPU

Don't want to depend on Colab quotas? Run locally:

git clone https://github.com/jimmyliao/trans-gemma.git
cd trans-gemma
pip install -e ".[examples]"

# Open notebook
jupyter notebook arxiv-reader.ipynb
Enter fullscreen mode Exit fullscreen mode

The notebook auto-detects your environment (Colab vs Local) and adapts. Just works.

Requirements:

  • Python 3.10+
  • NVIDIA GPU with 10GB+ VRAM (or use CPU, but very slow)
  • ~15GB disk space for model

Tip 5: Customize the HTML Output

The generated HTML uses vanilla JavaScript and can be easily customized. Open the notebook cell that generates HTML and modify:

Change color scheme:

.header {
  background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
}
Enter fullscreen mode Exit fullscreen mode

Add dark mode:

@media (prefers-color-scheme: dark) {
  body { background: #1a1a1a; color: #e0e0e0; }
}
Enter fullscreen mode Exit fullscreen mode

Adjust layout ratio:

.columns {
  grid-template-columns: 45% 55%; /* Favor translation side */
}
Enter fullscreen mode Exit fullscreen mode

๐Ÿค” Common Questions

Q: "Colab says GPU unavailable?"

A: Free tier has daily quotas (typically refreshes every 12-24 hours). Try:

  1. Wait a few hours and retry
  2. Try off-peak times (evenings/weekends in US timezones)
  3. Switch Google accounts if you have multiple
  4. Upgrade to Colab Pro ($10/month) for guaranteed GPU

Q: "Model download stuck at 45%?"

A: Network hiccups happen. Try:

  1. Restart runtime: Runtime โ†’ Restart runtime
  2. Clear outputs: Edit โ†’ Clear all outputs
  3. Re-run from Step 5: Model downloads resume where they left off

If still stuck after 15 minutes, it's likely a HuggingFace server issue. Wait 30 min and retry.


Q: "Translation has simplified + traditional Chinese mixed?"

A: This should be fixed in latest version. We added opencc-python-reimplemented to backend.

If still happening:

cd trans-gemma && git pull
Enter fullscreen mode Exit fullscreen mode

Then restart notebook.


Q: "Can I translate non-arXiv PDFs?"

A: The current notebook is optimized for arXiv URLs. For local PDFs, modify:

# Instead of:
pdf_path, total = download_arxiv(ARXIV_ID)

# Use:
pdf_path = "/content/your_paper.pdf"
total = len(fitz.open(pdf_path))
Enter fullscreen mode Exit fullscreen mode

Then run translation cells as normal.


Q: "Is this safe for commercial use?"

A: Tricky question:

  • Code (MIT license): Yes, use commercially
  • Gemma model: Read Terms of Use
  • Colab: Free tier meant for learning/research

My advice: Use for research/learning. If you're making money from translations, consider:

  • Running on your own GPU
  • Using Colab Pro (legitimized commercial use)
  • Checking Gemma's commercial terms carefully

๐Ÿง  How This Actually Works (For the Curious)

Let me pull back the curtain on the technical implementation.

Architecture Overview

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   User       โ”‚
โ”‚  (Browser)   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚ 1. Click Colab link
       โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Colab Notebook                 โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚   โ”‚ Environment Detection    โ”‚   โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚              โ”‚ 2. Auto-config    โ”‚
โ”‚              โ–ผ                    โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚   โ”‚ Download arXiv PDF       โ”‚   โ”‚
โ”‚   โ”‚ (via urllib)             โ”‚   โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚              โ”‚ 3. Extract text   โ”‚
โ”‚              โ–ผ                    โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚   โ”‚ PyMuPDF (page-by-page)   โ”‚   โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚              โ”‚ 4. Send to model  โ”‚
โ”‚              โ–ผ                    โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚   โ”‚ TranslateGemma 4B        โ”‚โ—„โ”€โ”€โ”ผโ”€ HuggingFace Hub
โ”‚   โ”‚ (on T4 GPU)              โ”‚   โ”‚   (model download)
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚              โ”‚ 5. Post-process   โ”‚
โ”‚              โ–ผ                    โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚   โ”‚ OpenCC (if zh-TW)        โ”‚   โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚              โ”‚ 6. Generate HTML  โ”‚
โ”‚              โ–ผ                    โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚   โ”‚ Bilingual HTML           โ”‚   โ”‚
โ”‚   โ”‚ (side-by-side layout)    โ”‚   โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚              โ”‚ 7. Download       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ–ผ
       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
       โ”‚  User's PC    โ”‚
       โ”‚  (HTML file)  โ”‚
       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Enter fullscreen mode Exit fullscreen mode

Key Technical Decisions

1. Simple Prompt Over Chat Template

Initially, we used HuggingFace's apply_chat_template():

# Old approach (failed on pages with tables/math)
messages = [{
    "role": "user",
    "content": [{"type": "text", "text": text, ...}]
}]
inputs = tokenizer.apply_chat_template(messages, ...)
Enter fullscreen mode Exit fullscreen mode

Problem: Pages with heavy formatting confused the template, and extraction logic failed.

Fix: Switched to dead-simple prompt:

# New approach (rock solid)
prompt = f"""Translate the following text from {source_lang} to Traditional Chinese (Taiwan, ็น้ซ”ไธญๆ–‡). Only output the translation, do not include explanations:

{text}

Translation:"""

inputs = tokenizer(prompt, return_tensors="pt")
Enter fullscreen mode Exit fullscreen mode

Result: 100% success rate across all page types.


2. OpenCC Post-Processing

TranslateGemma 4B tends to output Simplified Chinese by default, even when asked for Traditional.

Solution: Always post-process for zh-TW:

if target_lang == "zh-TW":
    from opencc import OpenCC
    cc = OpenCC('s2twp')  # Simplified โ†’ Traditional (Taiwan phrases)
    translation = cc.convert(translation)
Enter fullscreen mode Exit fullscreen mode

This ensures:

  • ๅŸบไบŽ โ†’ ๅŸบๆ–ผ
  • ่ฝป้‡็บง โ†’ ่ผ•้‡็ดš
  • ่ฟ™ไบ› โ†’ ้€™ไบ›

Taiwan readers notice these details!


3. Dynamic Environment Detection

Same notebook runs on Colab, GCP, or local Jupyter:

def detect_environment():
    try:
        import google.colab
        return 'colab'
    except ImportError:
        pass

    if os.path.exists('/opt/conda/envs/py310'):
        return 'gcp'

    return 'local'
Enter fullscreen mode Exit fullscreen mode

Then:

if ENV == 'colab':
    # Lightweight installs
elif ENV == 'gcp':
    # Custom runtime configs
else:
    # Local includes PyTorch
Enter fullscreen mode Exit fullscreen mode

Why? Colab has PyTorch pre-installed; local doesn't. One notebook, zero friction.


4. Progressive HTML Generation

Instead of loading entire paper at once, the HTML uses:

// Page navigation with keyboard shortcuts
let currentPage = 0;
function showPage(n) {
  document.querySelectorAll('.page').forEach(p => p.style.display = 'none');
  document.getElementById(`page-${n}`).style.display = 'block';
  currentPage = n;
}

document.addEventListener('keydown', e => {
  if (e.key === 'ArrowLeft') showPage(currentPage - 1);
  if (e.key === 'ArrowRight') showPage(currentPage + 1);
});
Enter fullscreen mode Exit fullscreen mode

Benefit: Even 50-page papers load instantly in browser.


๐ŸŽฏ Who This Is (and Isn't) For

โœ… Perfect For:

Graduate Students

  • Reading 5-10 papers per week
  • Budget: $0
  • Time: Can wait 3 min/page for quality translations
  • Bonus: Learn English terminology via side-by-side reading

Non-native English Researchers

  • Deep-reading important papers
  • Want to understand, not just skim
  • Appreciate bilingual layout for learning

AI/ML Engineers Keeping Current

  • Track latest arXiv preprints
  • Translate abstract + intro first, decide if worth full read
  • Free tier is plenty for 2-3 papers daily

โŒ Not Ideal For:

Urgent Deadlines

  • If you need a paper translated in 5 minutes, use Claude/GPT-4 API
  • They're faster (~10 sec/page), just costs $0.01-0.02 per page

Large-Scale Translation

  • Translating 100 papers โ†’ you'll hit Colab quotas
  • Consider running on your own GPU or cloud instance

Commercial Translation Services

  • Check Gemma Terms of Use carefully
  • May need different licensing

๐Ÿ”ฎ What's Next for This Project

I'm actively developing this, and here's what's coming:

Short-term (Next Month)

  • ๐Ÿ”œ Auto-language detection from paper metadata
  • ๐Ÿ”œ DOCX/Markdown output formats (not just HTML)
  • ๐Ÿ”œ Batch mode: Translate multiple papers in one go
  • ๐Ÿ”œ Dark mode for HTML output

Medium-term (Next Quarter)

  • ๐Ÿ”œ Gemma 2/3 support when released
  • ๐Ÿ”œ Terminology glossary extraction (build your own vocab list)
  • ๐Ÿ”œ Figure/table captions translation
  • ๐Ÿ”œ API mode for programmatic access

Long-term (This Year)

  • ๐Ÿ”œ Web UI (no notebook required)
  • ๐Ÿ”œ Mobile app for reading on-the-go
  • ๐Ÿ”œ Community translations (share & reuse)

Want to contribute? Pull requests welcome! โ†’ GitHub


๐Ÿš€ Your Turn: Translate Your First Paper

Alright, you've read 3000+ words about this. Time to actually try it.

5-Minute Challenge:

  1. Pick a paper: Go to arXiv.org, find something interesting
  2. Copy the ID: It looks like 2403.08295 (from URL)
  3. Click this badge: Open In Colab
  4. Enable T4 GPU: Runtime โ†’ Change runtime type โ†’ T4 GPU
  5. Run all cells: Runtime โ†’ Run all (or Ctrl+F9)
  6. Wait ~10 minutes (first run includes model download)
  7. Download the HTML and open in browser

Boom. You just translated an academic paper for free.


๐Ÿ’ฌ Let's Make This Better Together

I built this because I was frustrated with copy-pasting papers into translators. It solved my problem. Now I'm sharing it with you.

If this helped you:

Follow for updates:


๐Ÿ“š Resources & References

Official Links

Related Articles

Technical References


๐Ÿ™ Acknowledgments

This project builds on incredible work by:

  • Google DeepMind for TranslateGemma and Gemma family
  • HuggingFace for making model distribution seamless
  • Google Colab for free GPU access
  • PyMuPDF team for reliable PDF parsing
  • OpenCC project for Chinese conversion

Open source makes projects like this possible. Thank you! ๐Ÿ™Œ


P.S. If you made it this far, you're either genuinely interested or an excellent skimmer. Either way, I appreciate you reading. Now go translate something! ๐Ÿ“šโœจ

Questions? Comments? Horror stories about academic translation? Drop them below! ๐Ÿ‘‡

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.