Attention Is All You Need - Part 6

Hello, I'm Ganesh. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.

In previous article we discussed about why traditional RNN model didn't work for long sentences.

In this article we will discuss about, how single attention head works?

Why Single Attention Head is not enough?

In single attention head, context of word is maintained and relates word/ token to each other.

But this also comes with a limitation of contextual understanding.

For Example:

If the sentence had multiple meanings, single attention head will focus on one.

Example:

In 5 Coach long train, in first coach the man was sleeping in 2nd coach the man was standing in 3rd coach the man was playing and in 4th coach the man was eating food.

So, for contextual understanding of man in above example needed and that is where single attention head fails.

To Solve this, multi head attention is introduced.

Multi Attention Head

Multi Attention Head works parallelly and independently. So, it captures contextual understanding of word/ token.

What next?

Finaly we got some idea on what end all happening under this.

I just want to hold upcomming exploration.

Just by getting into surface level I learned many things. But detailing i couldn't learn.

I think by 2 weeks I will explore those field and share my learning.

Reference: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.
⭐ Star it on GitHub: https://github.com/HexmosTech/git-lrc

DEV Community

Attention Is All You Need - Part 6

Why Single Attention Head is not enough?

Multi Attention Head

What next?

Top comments (0)