DEV Community

Cover image for Transformers: The Architecture of AI
Ajay Krupal K
Ajay Krupal K

Posted on

Transformers: The Architecture of AI

Pre-requisite:
What is neural networks?: https://dev.to/ajaykrupalk/neural-networks-the-artificial-brain-behind-ai-115p

ChatGPT's GPT-3 and GPT-4 and Google Bard's BERT are a few examples of Transformers. But what are transformers and how are they helping the most used chat models?

What are transformers?

Transformers are something that transforms from one sequence into another. Each transformer has two parts: an Encoder and a Decoder. The encoder is associated with the input and the decoder with the output.

Transformers

How do Transformers work?

Transformers work through sequence-to-sequence learning. That means, they take a sequence of tokens (e.g., words in a sentence) and predict the next token. This is done by iterating through encoder layers.

Working of Transformers

The encoder generates encodings that define which parts of the input sequence are relevant to each other and passes these encodings to the next encoder layer.

The decoder takes all of the encodings and uses their derived context to generate the output sequence.

Transformers are a form of semi-supervised learning i.e. they are pre-trained in an unsupervised manner with a large, unlabelled data set, whereafter they are fine-tuned through supervised learning to improve performance.

Attention Mechanism

Transformers are different from other architectures in that they do not necessarily process data in order. Transformers use something called an Attention Mechanism which provides context around items in the input sequence. This makes them more efficient, for example, for tasks like translation, where the meaning of text is considered rather than the order of words.

Top comments (0)