Poly-Encoder

Introduction

Poly Encoder is a neural network architecture used in dialogue systems and information retrieval. Developed by Facebook AI Research in 2020, it combines the strengths of Bi-encoders and Cross-encoders to improve performance and efficiency.

Why are we using Poly-Encoder?

The most convincing reason why we are using Poly-encoder is to improve the performance. Poly encoder offers a balance between the speed of Bi-encoders and the accuracy of Cross-encoders. This makes it particularly effective for tasks that require both efficiency and precision.

Let's Compare to Cross-encoders and Bi-encoders to know which one would be the best for you.

Bi-encoders

Speed: Very fast
Accuracy: Lower compared to Cross-Encoders. They lack direct interaction between query and candidates, potentially missing nuanced relationships. ##### Cross-encoders
Speed: Very Slow, especially with large candidate sets. They process query and each candidate together, requiring separate computation for each pair.
Accuracy: Very High. Much better than Bi-Encoders or Poly-encoders. Direct interaction between query and candidates allows for capturing complex relationships ##### Poly-encoders
Speed: Medium. Faster than Cross-encoders and slightly slower than Bi-encoders.
Accuracy: Better than Bi-encoders and approaching that of Cross-encoders. The multiple code vectors and attention mechanism allow for more nuanced matching.
Poly Encoders achieve this balance through Pre-computation, Attention Mechanism, and Multiple code vectors

The Structure of Poly-Encoder

This diagram illustrates the architecture of Poly-Encoder model, which is designed for efficient and effective text matching.

Context Encoder

In Context Encoder, it takes multiple inputs (such as In_x 1, In_x 2, ..., In_x N_x). Then it processes these inputs to produce multiple output embeddings (such as Out_x 1, Out_x 2, ..., Out_x N_x). These outputs represent different aspects or features or input context.

Query Codes

This is a set of learnable vectors(Code 1, ..., Code m). And These act as global features that the model learns to extract from the context.

Attention Mechanism

For each query code, an attention mechanism is applied over the context encoder outputs. Attention Mechanism produces. This produces a set of embeddings (Emb 1, ..., Emb m) that capture different aspects of the context relevant to each query code.

Context Embedding (Ctxt Emb)

This is another attention mechanism combines the embeddings from the previous step into a single context embedding. This embedding represents the entire context, weighted by its relevance to the query.

Candidate Encoder

This is similar to the context encoder, but processes candidate responses. The outputs(Out_y 1, Out_y 2, ..., Out_y N_y) represent features of the candidate.

Candidate Aggregator

It combines the candidate encoder outputs into a single candidate embedding (Cand emb)

Scoring

The final score is computed by combining the context embedding and the candidate embedding. This score represents the relevance or match between the context and the candidate.