DEV Community

Ganesh Kumar
Ganesh Kumar

Posted on

Attention Is All You Need - Part 6

Hello, I'm Ganesh. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.

In previous article we discussed about why traditional RNN model didn't work for long sentences.

In this article we will discuss about, how single attention head works?

Why Single Attention Head is not enough?

In single attention head, context of word is maintained and relates word/ token to each other.

But this also comes with a limitation of contextual understanding.

For Example:

If the sentence had multiple meanings, single attention head will focus on one.

Example:

In 5 Coach long train, in first coach the man was sleeping in 2nd coach the man was standing in 3rd coach the man was playing and in 4th coach the man was eating food.
Enter fullscreen mode Exit fullscreen mode

So, for contextual understanding of man in above example needed and that is where single attention head fails.

To Solve this, multi head attention is introduced.

Multi Attention Head

Multi Attention Head works parallelly and independently. So, it captures contextual understanding of word/ token.

What next?

Finaly we got some idea on what end all happening under this.

I just want to hold upcomming exploration.

Just by getting into surface level I learned many things. But detailing i couldn't learn.

I think by 2 weeks I will explore those field and share my learning.

Reference: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

git-lrc

Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.
⭐ Star it on GitHub: https://github.com/HexmosTech/git-lrc

Top comments (0)