Back in 2017, the first transformer architecture introduced two main components:
- an encoder
- a decoder
These two parts were connected so they could work together.
This original design is known as an encoder–decoder transformer.
Decoders Can Work on Their Own
Over time, researchers realized that the decoder alone was powerful enough for many tasks.
Using only a decoder, models could:
- generate text
- continue sentences
- perform translation and other language tasks
As we discussed in the article on decoder only transformers, these models form the foundation of systems like ChatGPT.
These are called decoder-only transformers.
Encoders Can Also Work Independently
In a similar way, encoder-based models are also very useful on their own.
This idea forms the foundation of models like BERT and many others.
These are called encoder-only transformers.
Building Blocks of Encoder-Only Transformers
Encoder-only transformers use the same core components we explored earlier:
- Word embeddings convert words into numbers
- Positional encoding keeps track of word order
- Self-attention helps establish relationships between words
When these layers are combined, they create a new representation for each token that captures:
- meaning
- position
- relationships with other words
These representations are called context-aware embeddings or contextualized embeddings.
Why Context-Aware Embeddings Are Useful
Context-aware embeddings can help group together:
- similar sentences
- similar paragraphs
- similar documents
This capability is one of the foundations of Retrieval-Augmented Generation (RAG).
RAG works by:
- Breaking documents into smaller chunks of text
- Using an encoder-only transformer to generate embeddings for each chunk
- Comparing embeddings to find the most relevant information
Other Uses of Encoder-Only Transformers
Context-aware embeddings can also be used as inputs for machine learning models.
For example:
- neural networks can use them for sentiment classification
- logistic regression models can also use them for classification tasks
That wraps up encoder-only transformers.
In the next article, we will explore reinforcement learning in neural networks.
Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.
Just run:
ipm install repo-name
… and you’re done! 🚀


Top comments (0)