Hello, I'm Ganesh. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.
I will going to discuss about the paper "Attention Is All You Need" which introduced the Transformer architecture in 2017 and it has become one of the most important models in the field of NLP.
This paper was published in 2017 by Google researchers.
What is Background?
The Goal of Machine Learning is to learn mapping from input to output.
For example:
Predicting house price based on sqft was based on bedrooms, bathrooms, locality, etc
In email spam detection, the input is email text and the output is spam or not spam.
This were mapped through neural networks.
Neural networks is sequence of layers each transforming an input to output of previous layer.
But this had a major limitation that it was not able to capture the long range dependencies in the input.
What problem they solved?
The earlier models were based on Recurrent Neural Networks (RNNs) processed one token per time step.
Which mainly had two problems:
- It was not able to capture the long range dependencies in the input.
- It was not able to process the input in parallel. As it was only depended on sequential information.
How this was solved?
Transformer architecture was introduced to solve this problem.
It is based on attention mechanism.
Which allows the model to focus on the most relevant parts of the input sequence.
This is simple explaination for now I will conclude and wrap it up for this article.
Conclusion
In this article, we discussed the background of the Attention Is All You Need paper and the problem it solved.
In the next article, we will discuss the Transformer architecture in detail with an example.
Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.
⭐ Star it on GitHub: https://github.com/HexmosTech/git-lrc

Top comments (0)