Attention in Seq2Seq models

In this post, we will discuss about transformer model, an attention based model which has significant boost in model training speed. In this sequence processing model, there is no recurrent layers or convolution layers being used. Instead, it is made of attention and fully connected layers.