DEV Community

ItsASine (Kayla)
ItsASine (Kayla)

Posted on

How would you create a translator app?

With the summer coming to a close, I'll be having more free time opening up. I was thinking of picking up the most recent item on my Side Projects of Doom list: a translator.

Things I would want:

  • English to a singular specific language
    • No English to Thing and Thing to English like Google/Bing. Just one way.
    • No English to ALL THE THINGS like Google/Bing. Only focused on the one path.
  • Not needed to be pretty and scalable
    • This is not a commercial venture -- I'll just be playing
    • The only user is me... and whoever may click on the link to it when I post in #showdev
    • It'll be open sourced and search engine indexed, but since it's a niche use case and not unique, I really doubt anyone will care but me

Keeping those two things in mind, the big sexy option of getting into an NLP solution seems excessive to me. I don't have an army of computers to train with and don't see the need to bring in AWS when the only person using the fine-tuned ML algorithm would be me.

Yes, I could use Google Translate and be done with it, but where's the fun in that? Now I can learn new tech AND a new language. I can leave the natural language processing to the Googles and Bings of the world.

Initial thoughts would be to do it in JavaScript to keep in the realm of what I know while just adding to my skills, as to not create barriers due to needing to learn fundamentals, but I don't know if that's the right tech for the job. I'd just do straight if this than that with objects or something. First match on sentences, then phrases, then words, then just end up keeping the original word if unknown. Maybe learn ES6+ and more TypeScript.

So... keeping in mind that this is fun learning rather than an enterprise project, what would you do if you were in my place? I've only gotten as far as making a Duolingo account to start learning some grammar :)

Top comments (6)

presto412 profile image
Priyansh Jain

Reminds me of my high school coding days.
I had my go with it, and they had recently taught us about prefix and postfix notations.
So we have the Hindi language, that has a general composure in a sentence as Subject-Object-Verb (main(I) khel(games) khelta hun(play)), and English has Subject-Verb-Object ( I play games). So I just used the verb here as an operator, and the subject and object as operand.
Therefore, Hindi will be like ab+ and English will be like a+b. I just applied the prefix to postfix algorithm and vice versa to translate. My translator would successfully translate simple sentences that use few words!
Great times those.

itsasine profile image
ItsASine (Kayla)

I'm trying to go from Subject Verb Object to Object Verb Subject, so hopefully I'll have as much fun to consider as you had :)

dmfay profile image
Dian Fay

If I had to do this and NLP were out of the question, I'd start by tokenizing input text at the sentence and word level to get an array of arrays of words. Next task is to identify each word, which is monumentally tricky for a few reasons:

  • tenses, plurals, possessives, conjugations, and declensions all affect spelling
  • the input data isn't necessarily coming from a perfect typist
  • homographs exist

Taking as given the existence of lookup tables for all these combinations (Wiktionary is a godsend), the problem of no or multiple matches can be at least approached by introducing a confidence factor. Exact match = 100% confidence. n matches = 100/n % confidence each. No matches, and things get fun: I'd start by searching the lookup tables for similar values using the Levenshtein distance algorithm, factoring the distance and perhaps word length into match confidence.

Result so far is an array of (arrays of (arrays of [match, confidence] tuples) word-candidates) sentences. And this is probably about where I'd give up on going it alone, because the next step is teaching a computer grammar in both the origin and target languages in order to assign each word a position and part of speech in a mappable sentence structure. That's not even an NLP question, it's a "dedicate your entire career to scratching the surface" question -- just ask Noam Chomsky!

curtisfenner profile image
Curtis Fenner • Edited

Rule based translation can be fairly simple to implement and can get... some result. Implementing them can be as simple as making a top-down or bottom-up parser: for each substring of words in the input, you keep track of translations and the part-of-speech for each translation (and any other information that might be useful, for example, for getting correct agreement in the output language)

Rule based translation won't work on real-world scale, real-world text, because human language is too complex and too nuanced to write down all of the rules accurately.

However, if you focus on the subset of a language that is learned in the beginning of a foreign language class, you might actually be able to get some answers for the simple exercise-style phrases/sentences.

For example, here are a few translation "rules" to go from Japanese to English:

先生 --> teacher (noun)
英語 --> English (noun)
帽子 --> hat (noun)
noun 1 の noun 2 --> 1's 2 (noun)
これ --> this (noun)
それ --> that (noun)
noun 1 は noun 2 です。 --> 1 is 2.

これ は 先生 の 帽子 です。 --> this is teacher's hat.

There are lots of problems with this, though! For example,

英語の先生 --> English teacher (NOT English's teacher)

So in some cases, multiple rules will match, and you will need some way to decide which is "better". (This might be a good application for ML based on statistics from a corpus!)

An example of this all falling apart are so called "eel sentences" (ウナギ文).

私 は ウナギ です。--> Literally, "I am an eel"

This is obviously a strange sentence. However, it could be a totally normal answer to what you're going to order at a restaurant:

注文 は 何 ですか。 --> What is your order?
私 は ウナギ です。 --> I'll have eel. (My order is eel)

And this is why computers still struggle, and will continue to struggle, with language for a long time!

itsasine profile image
ItsASine (Kayla)

This is a fantastic example (and info on Rule-Based Translation!), thanks!

A human will always be best for look-and-feel kind of stuff like this (though I suppose it's more hear-and-speak?) but it should be a nifty way to apply the lessons learned from Duolingo :)

juanfrank77 profile image
Juan F Gonzalez

You can use the Watson API for language processing. It's not an overkill cause you can use what you need and then scale it later if it proves to be interesting.
That way you would be learning a new language And get to play around with the Watson functions (which are pretty cool)