Discussion on: How would you create a translator app?

View post

Rule based translation can be fairly simple to implement and can get... some result. Implementing them can be as simple as making a top-down or bottom-up parser: for each substring of words in the input, you keep track of translations and the part-of-speech for each translation (and any other information that might be useful, for example, for getting correct agreement in the output language)

Rule based translation won't work on real-world scale, real-world text, because human language is too complex and too nuanced to write down all of the rules accurately.

However, if you focus on the subset of a language that is learned in the beginning of a foreign language class, you might actually be able to get some answers for the simple exercise-style phrases/sentences.

For example, here are a few translation "rules" to go from Japanese to English:

先生 --> teacher (noun)
英語 --> English (noun)
帽子 --> hat (noun)
noun 1 の noun 2 --> 1's 2 (noun)
これ --> this (noun)
それ --> that (noun)
noun 1 は noun 2 です。 --> 1 is 2.

これ は 先生 の 帽子 です。 --> this is teacher's hat.

There are lots of problems with this, though! For example,

英語の先生 --> English teacher (NOT English's teacher)

So in some cases, multiple rules will match, and you will need some way to decide which is "better". (This might be a good application for ML based on statistics from a corpus!)

An example of this all falling apart are so called "eel sentences" (ウナギ文).

私 は ウナギ です。--> Literally, "I am an eel"

This is obviously a strange sentence. However, it could be a totally normal answer to what you're going to order at a restaurant:

注文 は 何 ですか。 --> What is your order?
私 は ウナギ です。 --> I'll have eel. (My order is eel)

And this is why computers still struggle, and will continue to struggle, with language for a long time!

ItsASine (Kayla) • Jul 31 '18

This is a fantastic example (and info on Rule-Based Translation!), thanks!

A human will always be best for look-and-feel kind of stuff like this (though I suppose it's more hear-and-speak?) but it should be a nifty way to apply the lessons learned from Duolingo :)