DEV Community

Anar
Anar

Posted on • Edited on

Write your own Hybrid Search with RRF and ‘Ember’ rerank

Embedding models are good for vector search with the help of vector databases. But there is a pitfall here, you can’t always get the best results based on the given context. Embedding models usually use dense vectors. With the help of sparse vectors like bm25 search, we can increase search performance dramatically. That is not the only thing for making the search better we can use reranker to rank out vector search responses for better hybrid search.

To keep it simple I will use Weaviate as data store. For embedding our texts we will use LLMRails and its Ember model for reranking. Because LLMRails is using its model in API instead of loading its model we will use it via API for embedding. First we will get vector and text search results

import weaviate

w_client = weaviate.Client(url='Your Weaviate Url')
schema = 'Your schema name'

def embed_text(query):
    response = requests.post('https://api.llmrails.com/v1/embeddings',
        headers={'X-API-KEY':'Your API key at https://console.llmrails.com/api-keys'},
        json={
            'input':[query],
            'model':'embedding-english-v1'
        }
    )

    return response.json()['data'][0]['embedding']


def search_vector(query):
    embeddings = embed_text(query)
    response = w_client.query.get(schema, ['text']).with_near_vector({"vector":embeddings}).with_additional(['id',"score", "explainScore","distance"]).with_limit(5).do()
    mapping =  {item['_additional']['id']: item['text'] for item in response["data"]["Get"][schema]}

    ranks = rerank(query, list(mapping.values()))
    reordered_list = [list(mapping.keys())[i] for i in ranks]
    return {i: mapping[i] for i in reordered_list}


def search_text(query):
    response = w_client.query.get(schema, ['text']).with_bm25(query).with_additional(['id',"score", "explainScore","distance"]).with_limit(5).do()
    return {item['_additional']['id']: item['text'] for item in response["data"]["Get"][schema]}
Enter fullscreen mode Exit fullscreen mode

In search vector we use rerank function and here is its implementation.

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

def rerank(query, all_documents):
    name = 'llmrails/ember-v1'
    tokenizer = AutoTokenizer.from_pretrained(name)
    model = AutoModelForSequenceClassification.from_pretrained(name)
    model.eval()

    pairs = []
    for i in all_documents:
        pairs.append((query, i))

    with torch.no_grad():
        inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
        scores = model(**inputs, return_dict=True).logits.view(-1, ).float()
        sorted_indices = sorted(range(len(scores)), key=lambda k: scores[k], reverse=True)
        return sorted_indices
Enter fullscreen mode Exit fullscreen mode

After getting all results from the text search and reranked vector search it is time to merge them with RRF ( Reciprocal Rank Fusion ) which is mostly used in the industry for assessing the weights of response documents in industry (e.g Elasticsearch). What it does is gets document ids and tries to weight them by how many times they are used in each of response.

def reciprocal_rank_fusion(query, v_resp, t_resp, k=60):
    fused_scores = {}

    for rank, doc in enumerate(v_resp):
        if doc not in fused_scores:
            fused_scores[doc] = 0

        previous_score = fused_scores[doc]
        fused_scores[doc] += 1 / (rank + k)
        print(f"Updating score for {doc} from {previous_score} to {fused_scores[doc]} based on rank {rank} in query '{query}'")

    for rank, doc in enumerate(t_resp):
        if doc not in fused_scores:
            fused_scores[doc] = 0

        previous_score = fused_scores[doc]
        fused_scores[doc] += 1 / (rank + k)
        print(f"Updating score for {doc} from {previous_score} to {fused_scores[doc]} based on rank {rank} in query '{query}'")

    reranked_results = {doc: score for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)}
    return list(reranked_results.keys())[:5]
Enter fullscreen mode Exit fullscreen mode

And at the end we will implement our hybrid search function to get all results.

def hybrid_reranked(query):
    v_resp = search_vector(query)
    t_resp = search_text(query)

    hybrid = reciprocal_rank_fusion(query, v_resp, t_resp)
    new_dict = {key: v_resp.get(key, t_resp.get(key)) for key in hybrid}
    return new_dict

hybrid_reranked('What is the range of parameters for the large language models (LLMs) developed in this work?')

Enter fullscreen mode Exit fullscreen mode

Do not forget to have a look at https://www.llmrails.com/ and https://docs.llmrails.com/. It is one of best and cheapest solutions in industry for embedding and RAG.

Top comments (0)