Building Language Translation Systems with LLM: A Step-by-Step Guide

#learnai #oxlo #ai

We are going to build a document translation pipeline that uses an LLM to translate text between languages while preserving tone and formatting. This is useful for localizing support tickets, product documentation, or user-generated content without managing separate translation services. We will run everything against Oxlo.ai so costs stay flat per request regardless of document length.

What you'll need

Python 3.10 or higher
The OpenAI SDK: pip install openai
An Oxlo.ai API key from https://portal.oxlo.ai

Step 1: Configure the Oxlo.ai client

First, we point the OpenAI SDK at Oxlo.ai. I use Llama 3.3 70B as the default workhorse because it handles multilingual tasks reliably. The base URL and key are the only changes needed to make existing OpenAI-compatible code run on Oxlo.ai.

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key=os.environ.get("OXLO_API_KEY")
)

# Quick connectivity test
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Translate 'Hello, world' to Spanish."}],
)
print(response.choices[0].message.content)

Step 2: Define the system prompt

A strong system prompt keeps the model from adding explanations or losing formatting. We explicitly tell it to return only the translation and to preserve markdown, line breaks, and code blocks.

SYSTEM_PROMPT = """You are a professional translator. Translate the text provided by the user into the target language specified.

Rules:
- Return ONLY the translated text. Do not add explanations, notes, or quotation marks around the output.
- Preserve all original formatting, including line breaks, markdown syntax, and code blocks.
- Maintain the original tone (formal, casual, technical).
- Do not translate proper nouns or brand names unless explicitly instructed.
- If the source text is already in the target language, return it unchanged.

Source language: {source_lang}
Target language: {target_lang}
"""

Step 3: Build the core translator

Next, we wrap the API call in a reusable function. For non-English language pairs, I switch to Qwen 3 32B because its multilingual pretraining is particularly strong. The function injects the source and target language into the system prompt before each call.

def translate_chunk(text: str, source_lang: str, target_lang: str, model: str = "llama-3.3-70b") -> str:
    if not text.strip():
        return text

    prompt = SYSTEM_PROMPT.format(source_lang=source_lang, target_lang=target_lang)

    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": prompt},
            {"role": "user", "content": text},
        ],
        temperature=0.3,
    )
    return response.choices[0].message.content.strip()

Step 4: Chunk long documents

Real documents exceed a single prompt, so we split on double newlines to preserve paragraph boundaries. Because Oxlo.ai uses flat per-request pricing instead of per-token metering, translating many short paragraphs costs the same as one long prompt. This makes document-level translation predictable and cheap.

def translate_document(text: str, source_lang: str, target_lang: str) -> str:
    paragraphs = text.split("\n\n")
    translated = []

    for para in paragraphs:
        # Use Qwen 3 32B for complex multilingual reasoning
        result = translate_chunk(para, source_lang, target_lang, model="qwen-3-32b")
        translated.append(result)

    return "\n\n".join(translated)

Step 5: Add a validation pass

Translation quality matters, so we add a back-translation check. We send the translated text back through Kimi K2.6 to translate it into the original language, then compare. If the meaning diverges, we flag the paragraph. Kimi K2.6 handles long context and reasoning well, which helps catch subtle semantic drift.

def back_translate(text: str, original_lang: str) -> str:
    prompt = SYSTEM_PROMPT.format(source_lang="the target language", target_lang=original_lang)
    response = client.chat.completions.create(
        model="kimi-k2.6",
        messages=[
            {"role": "system", "content": prompt},
            {"role": "user", "content": text},
        ],
        temperature=0.3,
    )
    return response.choices[0].message.content.strip()

def validate_translation(original: str, translated: str, source_lang: str) -> dict:
    back = back_translate(translated, source_lang)
    # Simple heuristic: length similarity
    ratio = len(back) / max(len(original), 1)
    status = "PASS" if 0.5 <= ratio <= 2.0 else "REVIEW"
    return {
        "original": original,
        "translated": translated,
        "back_translated": back,
        "status": status,
    }

Run it

Here is a complete script that translates a Spanish product announcement into English and validates the result. Save it as translate.py, set your OXLO_API_KEY, and run it.

if __name__ == "__main__":
    source = """Estamos encantados de anunciar el lanzamiento de Oxlo.ai.

Nuestra plataforma ofrece inferencia de IA con precios fijos por solicitud. No importa cuán largo sea su prompt, el costo es el mismo.

Esto hace que Oxlo.ai sea ideal para cargas de trabajo de contexto largo y agentes autónomos."""

    print("Translating document...")
    english = translate_document(source, "Spanish", "English")
    print("\n--- Translation ---\n")
    print(english)

    print("\n--- Validation ---\n")
    # Validate the second paragraph
    para = source.split("\n\n")[1]
    eng_para = english.split("\n\n")[1]
    result = validate_translation(para, eng_para, "Spanish")
    print(f"Status: {result['status']}")
    print(f"Back-translated: {result['back_translated']}")

Expected output:

Translating document...

--- Translation ---

We are delighted to announce the launch of Oxlo.ai.

Our platform offers AI inference with fixed pricing per request. No matter how long your prompt is, the cost remains the same.

This makes Oxlo.ai ideal for long-context workloads and autonomous agents.

--- Validation ---

Status: PASS
Back-translated: Our platform offers AI inference with fixed pricing per request. It does not matter how long your prompt is, the cost is the same.

Next steps

Wire the translator into a FastAPI endpoint so other services can submit documents asynchronously. You can also inject a custom glossary into the system prompt for domain-specific terms like medical or legal vocabulary. If you need higher throughput, look at Oxlo.ai's Premium plan for priority queue access and compare request-based pricing against your current token-based provider at https://oxlo.ai/pricing.