DEV Community

Cover image for Translate multiple source language documents into numerous destination languages using Amazon Translate
Olawale Adepoju for AWS Community Builders

Posted on • Originally published at dev.classmethod.jp

Translate multiple source language documents into numerous destination languages using Amazon Translate

To connect with a worldwide audience of consumers, clients, and investors, businesses must translate business-critical information such as promotional materials, guidebooks, and online ordering into different languages. Determining the source language in each document before calling a translation task poses challenges.

Overview

The automated language recognition capability for batch translation tasks in Amazon Translate now allows you to translate a batch of documents in many languages ​​with a single translation job. This eliminates the requirement for you to organize the document translation procedure, which required the detection and classification of dominant languages. Amazon Translate also supports translation to several target languages ​​(up to 10 languages).

Automated source language detection for batch translation jobs enables you to translate documents written in many supported languages ​​in a single operation. You can also specify up to ten different languages ​​as targets. Amazon Translate determines the prevailing language in each of your source documents using Amazon Comprehend and utilizes it as the source language.

Create a batch translation job via the console

In this blog, we will use batch translation to automatically identify the source language and translate it into multiple languages ​​(Japanese and Spanish). The location of the input and output will be the Amazon S3.

NOTE: Batch translation is supported in the following AWS Regions

US East (N. Virginia)
US East (Ohio)
US West (Oregon)
Asia Pacific (Seoul)
Europe (Frankfurt)
Europe (Ireland)
Europe (London)

Image description

You may decide to choose the output it should be a formal tone or informal, also profanity masking for profane words or phrases can be supported.

Then, as part of the configuration, we create an Amazon Identity and Access Management (IAM) role. The role has access to both the input and output S3 buckets.
Upon the creation of the job, you may track the progress of the batch translation task in the Translation jobs area.

Image description

After the translation job is completed, check out the output S3 bucket location to confirm the translation job to their target language respectively.

The input consists of two files in two distinct languages, so the output document is expected to be four, each with two dominant language documents translated into two target languages.

Image description

Create a batch translation job via the AWS SDK

The batch translation call in Python Boto3 is used to translate documents in your source S3 bucket. Enter the following values:
InputDataConfig - Provide the location of your input documents in the S3 bucket.
OutputDataConfig - Provide the S3 bucket where your output documents will be stored.
DataAccessRoleArn - Construct an IAM role that grants Amazon Translate access to your input and output S3 buckets.
Use auto for source language code.
TargetLanguageCodes: You can specify up to ten target languages.

import boto3
client = boto3.client('translate')

def lambda_handler(event, context):

response = client.start_text_translation_job(

   JobName='Translation-job',
   InputDataConfig={
     'S3Uri': 's3://<<REPLACE-WITH-YOUR-INPUT-BUCKET>>/input',
  'ContentType': 'text/plain'
  },
  OutputDataConfig={
   'S3Uri': 's3://<<REPLACE-WITH-YOUR-OUTPUT-BUCKET>>/output'
  },
 DataAccessRoleArn='<<REPLACE-WITH-THE-IAM-ROLE-ARN>>',
 SourceLanguageCode='auto',
 TargetLanguageCodes=[
  'ja', 'es'
]
)
Enter fullscreen mode Exit fullscreen mode

Top comments (0)