Attention Is All You Need - Part 2

Hello, I'm Ganesh. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.

In previous article we discussed about the limitations of RNNs and how they were not able to capture long range dependencies and process input in parallel.

How was Translation done earlier?

There are many way the translation work but in earlier days it was done using seq2seq models.

For example:

Translation: "The cat sat on the mat" -> "Le chat s'est assis sur le tapis"

This translation was done using RNNs.

Let's see how this was done.

Input : Many words
Output : Many words

This had major flaws if the input seqeuence length was long the output was very poor.

so, this made the accuracy of the model very poor for long sentences.

If context is very load and it had accuracy issues then decoder might get confused and couldn't predict the correct output.

How Long sentence problem was solved?

We had to provide additional context to decoder in order to over come this issues.

That's when attention mechanism comes into picture.

This was another improvement over the seq2seq models.

We add context vector to decoder which help them to have full context vector about the input sequence.

What's next?

In this article we discussed how the translation was done earlier and how it was improved with the attention mechanism.

Reference: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.
⭐ Star it on GitHub: https://github.com/HexmosTech/git-lrc

DEV Community

Attention Is All You Need - Part 2

How was Translation done earlier?

How Long sentence problem was solved?

What's next?

Top comments (0)