DEV Community

Cover image for Building Translation Workflows: Matching Service Levels to Document Types in Code
Diogo Heleno
Diogo Heleno

Posted on • Originally published at m21global.com

Building Translation Workflows: Matching Service Levels to Document Types in Code

Building Translation Workflows: Matching Service Levels to Document Types in Code

Translation services aren't one-size-fits-all, and neither should be your translation workflow automation. After reading about when to use standard translation, I realized most developers approach translation integration backwards—starting with a single API and forcing all content through the same pipeline.

Instead, we should build routing logic that matches document types to appropriate service levels. Here's how to architect a translation system that automatically selects the right approach based on content characteristics.

The Three-Tier Translation Architecture

Most translation workflows can be mapped to three service tiers:

  • Tier 1: AI-assisted translation for reference material
  • Tier 2: Standard translation for internal operational docs
  • Tier 3: Premium translation for external/legal content

Your system should route documents automatically based on metadata, not manual decisions.

Document Classification Logic

Start by building a classifier that evaluates three key factors:

class DocumentClassifier:
    def __init__(self):
        self.external_keywords = ['contract', 'agreement', 'legal', 'compliance', 'investor']
        self.internal_paths = ['/training/', '/internal/', '/procedures/']
        self.high_risk_extensions = ['.pdf', '.docx']  # Often formal docs

    def classify_document(self, doc_path, content, metadata):
        risk_score = 0

        # Check audience indicators
        if any(keyword in content.lower() for keyword in self.external_keywords):
            risk_score += 3

        # Check file path patterns
        if not any(path in doc_path for path in self.internal_paths):
            risk_score += 2

        # Check metadata flags
        if metadata.get('regulatory', False):
            risk_score += 3
        if metadata.get('public_facing', False):
            risk_score += 2

        return self.map_risk_to_tier(risk_score)

    def map_risk_to_tier(self, score):
        if score >= 5:
            return 'premium'  # Multi-linguist review
        elif score >= 2:
            return 'standard'  # Single qualified linguist
        else:
            return 'ai_assisted'  # AI + human review
Enter fullscreen mode Exit fullscreen mode

Translation Service Router

With classification in place, build a router that connects to different translation APIs based on tier:

class TranslationRouter:
    def __init__(self):
        self.classifier = DocumentClassifier()
        self.services = {
            'ai_assisted': AITranslationService(),
            'standard': StandardTranslationService(), 
            'premium': PremiumTranslationService()
        }

    async def translate_document(self, document):
        tier = self.classifier.classify_document(
            document.path, 
            document.content, 
            document.metadata
        )

        service = self.services[tier]

        # Route with appropriate SLA expectations
        if tier == 'ai_assisted':
            return await service.translate(document, sla_hours=24)
        elif tier == 'standard':
            return await service.translate(document, sla_days=5)
        else:  # premium
            return await service.translate(document, sla_days=10, review_rounds=2)
Enter fullscreen mode Exit fullscreen mode

Handling Different Service APIs

Each tier typically uses different providers with different capabilities:

class AITranslationService:
    def __init__(self):
        self.client = openai.OpenAI()

    async def translate(self, document, sla_hours):
        # Fast, good for reference material
        prompt = f"Translate this {document.source_lang} text to {document.target_lang}: {document.content}"

        response = await self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}]
        )

        return {
            'translation': response.choices[0].message.content,
            'confidence': 0.85,  # AI confidence estimate
            'review_required': len(document.content) > 5000
        }

class StandardTranslationService:
    def __init__(self):
        # Connect to professional translation API
        self.api_key = os.getenv('TRANSLATION_SERVICE_API_KEY')

    async def translate(self, document, sla_days):
        payload = {
            'content': document.content,
            'source_lang': document.source_lang,
            'target_lang': document.target_lang,
            'service_level': 'standard',
            'use_translation_memory': True,
            'glossary_id': document.metadata.get('glossary_id')
        }

        # Submit to professional service
        job = await self.submit_job(payload)
        return await self.poll_for_completion(job.id, sla_days)
Enter fullscreen mode Exit fullscreen mode

Metadata-Driven Configuration

Make your routing decisions transparent by storing them in document metadata:

# document_config.yaml
document_types:
  training_manual:
    default_tier: standard
    audience: internal
    risk_level: low

  user_agreement:
    default_tier: premium
    audience: external
    risk_level: high
    requires_legal_review: true

  product_catalog:
    default_tier: ai_assisted
    audience: mixed
    risk_level: low
    volume_expected: high
Enter fullscreen mode Exit fullscreen mode
def load_document_config():
    with open('document_config.yaml', 'r') as f:
        return yaml.safe_load(f)

def override_classification(doc_type, base_classification):
    config = load_document_config()
    doc_config = config['document_types'].get(doc_type)

    if doc_config:
        return doc_config['default_tier']
    return base_classification
Enter fullscreen mode Exit fullscreen mode

Monitoring and Cost Control

Track usage patterns to optimize your routing logic:

class TranslationMetrics:
    def __init__(self):
        self.metrics = defaultdict(list)

    def track_job(self, tier, word_count, cost, completion_time):
        self.metrics[tier].append({
            'word_count': word_count,
            'cost': cost,
            'completion_time': completion_time,
            'cost_per_word': cost / word_count,
            'timestamp': datetime.now()
        })

    def analyze_efficiency(self):
        for tier, jobs in self.metrics.items():
            avg_cost_per_word = sum(j['cost_per_word'] for j in jobs) / len(jobs)
            avg_completion = sum(j['completion_time'] for j in jobs) / len(jobs)

            print(f"{tier}: ${avg_cost_per_word:.4f}/word, {avg_completion:.1f}h avg")
Enter fullscreen mode Exit fullscreen mode

Integration Example

Put it all together in a simple workflow:

async def process_translation_request(file_path, target_languages):
    document = Document.from_file(file_path)
    router = TranslationRouter()
    metrics = TranslationMetrics()

    results = {}

    for lang in target_languages:
        document.target_lang = lang

        start_time = time.time()
        result = await router.translate_document(document)
        completion_time = time.time() - start_time

        # Track metrics
        metrics.track_job(
            result['tier_used'],
            len(document.content.split()),
            result['cost'],
            completion_time
        )

        results[lang] = result

    return results
Enter fullscreen mode Exit fullscreen mode

Key Takeaways

Building intelligent translation routing saves both time and money. Instead of manually deciding which service to use, let your code make those decisions based on document characteristics and business rules.

The key is matching service level to actual risk—not every document needs premium translation, but external-facing content definitely does. Start with simple classification rules and refine them based on your actual usage patterns.

Your translation workflow should be as thoughtful as any other part of your architecture. Route intelligently, monitor costs, and adjust based on real data.

Top comments (0)