Lucas Ribeiro

Posted on Sep 18 • Edited on Oct 28

Graph-Augmented Hybrid Retrieval and Multi-Stage Re-ranking: A Framework for High-Fidelity Chunk Retrieval in RAG Systems

#ai #machinelearning #vectordatabase #programming

Abstract

This paper addresses critical limitations in modern Retrieval-Augmented Generation (RAG) systems, namely context fragmentation and the relevance-performance trade-off in retrieval. We introduce the Graph-Augmented Hybrid Retrieval and Multi-Stage Re-ranking (GAHR-MSR) framework, a novel, multi-stage architecture designed to enhance the precision and contextual coherence of retrieved data chunks. GAHR-MSR integrates three key innovations: (1) a Graph-Aware Chunking and Indexing strategy that enriches text segments with structured metadata derived from a knowledge graph; (2) a high-recall initial retrieval stage using hybrid (dense and sparse) vector search with Reciprocal Rank Fusion (RRF); and (3) a high-precision, cascaded re-ranking stage employing the ColBERT late-interaction model. Implemented using the Qdrant vector database, our framework demonstrates significant improvements over baseline retrieval methods on the SciFact benchmark. We present a detailed analysis of the architecture, including mathematical formulations, implementation specifics, and empirical results, showcasing a marked increase in nDCG@10, thereby establishing a new state-of-the-art for high-fidelity information retrieval in knowledge-intensive applications.

1. Introduction

The advent of Large Language Models (LLMs) has catalyzed a paradigm shift in artificial intelligence, yet their efficacy is often constrained by inherent limitations such as knowledge cutoffs and a propensity for "hallucination," or the generation of factually incorrect information.1 Retrieval-Augmented Generation (RAG) has emerged as a dominant architectural pattern to mitigate these issues, enhancing LLM outputs by grounding them in external, up-to-date knowledge bases.2 By retrieving relevant information and providing it as context within the LLM's prompt, RAG systems promise more accurate, attributable, and trustworthy responses.5 However, the theoretical promise of RAG is frequently undermined by practical challenges in its implementation, particularly within the retrieval component. A typical RAG workflow involves multiple, complex processing steps, which can lead to prolonged response times and suboptimal retrieval quality.2 The performance of the entire system is fundamentally bottlenecked by the fidelity of the retrieved context; if the retriever provides irrelevant or incomplete information, the generator's output will be correspondingly flawed.

The limitations of conventional retrieval methods are a primary source of these performance issues. Two core problems stand out. The first is context fragmentation. Standard document preparation techniques, such as fixed-size chunking, are computationally simple but semantically naive.6 They often sever logical units of thought, splitting coherent arguments or critical pieces of information across multiple, disconnected chunks.8 When a query requires synthesizing information that now resides in separate fragments, a simple retriever may fail to gather all necessary pieces, leading to an incomplete context and a superficial response from the LLM.2 The second problem is the

relevance ceiling of initial retrieval stages. The evolution from single-pass dense vector search to hybrid search—combining the semantic understanding of dense embeddings with the keyword precision of sparse vectors—has significantly improved recall.9 However, this approach often retrieves a large set of documents that are merely topically related, not precisely and deeply relevant to the user's specific, nuanced intent. This creates a "relevance ceiling," where further improvements in the embedding models alone yield diminishing returns in the final quality of the retrieved set.

To overcome these fundamental challenges, this paper introduces the Graph-Augmented Hybrid Retrieval and Multi-Stage Re-ranking (GAHR-MSR) framework. GAHR-MSR is a holistic, multi-stage pipeline designed to maximize both the contextual coherence and the precision of retrieved information. Its central thesis is that by structuring knowledge with graphs at the indexing stage and applying a multi-stage, precision-focused refinement process at query time, we can drastically improve the fidelity of the context provided to the LLM. The framework is built upon three core contributions:

Graph-Aware Chunking: A novel pre-processing strategy that moves beyond simple text splitting to enrich semantic chunks with structured metadata extracted from a pre-computed knowledge graph, preserving critical entity and relationship context.
High-Recall Hybrid Retrieval: A robust first-stage retrieval that leverages the combined power of dense and sparse vectors, fused using Reciprocal Rank Fusion (RRF), to ensure a comprehensive candidate set is identified.
Cascaded ColBERT Re-ranking: A high-precision, multi-stage refinement process that uses the computationally efficient yet powerful ColBERT late-interaction model to re-rank the candidate set, ensuring the final context is maximally relevant.

The development of this framework reflects a broader architectural shift occurring in the field of advanced information retrieval. Early systems focused on optimizing a single retrieval algorithm, searching for the "best" embedding model for a monolithic, one-shot search.11 The recognition that dense vectors often miss critical keywords led to the adoption of hybrid search, combining dense and sparse retrievers to improve

recall.9 This marked the first step toward a multi-stage pipeline. However, this high-recall approach introduced noise—topically similar but irrelevant documents—which necessitated a second stage focused on

precision. This led to the integration of re-rankers, more computationally intensive but highly accurate models like cross-encoders or ColBERT, to refine the initial candidate set.14 This evolution has established a dominant design pattern: a "Recall-to-Precision Funnel." The GAHR-MSR framework formalizes and advances this pattern by introducing a crucial pre-processing stage (Graph-Aware Chunking) and optimizing the refinement stage (cascaded re-ranking), representing the next logical step in this architectural progression. It moves beyond treating retrieval as a single step and instead conceptualizes it as a structured, multi-phase process of candidate generation and progressive refinement.

2. Background and Related Work

The GAHR-MSR framework is built upon a confluence of advancements in vector databases, hybrid search techniques, re-ranking models, and graph-based retrieval. This section provides a comprehensive review of these foundational technologies, establishing the scientific context for our contributions.

2.1. Vector Database Architectures: The Case of Qdrant

Vector databases are specialized systems purpose-built to store, index, and query high-dimensional vector embeddings, which are numerical representations of unstructured data like text, images, and audio.11 Unlike traditional relational databases that operate on exact matches within structured schemas, vector databases excel at similarity search, finding vectors that are "closest" to a query vector in a high-dimensional space according to a given distance metric.17 This capability is essential for modern AI applications that require understanding semantic or conceptual similarity rather than exact keyword matches.11 Common distance metrics used to quantify similarity include Cosine Similarity, which measures the cosine of the angle between two vectors, and Euclidean Distance, which measures the straight-line distance between two points in the vector space.18

Qdrant is a production-ready vector database written in Rust, designed for performance, scalability, and reliability under high load.20 Its architecture incorporates several key features that make it particularly well-suited for advanced RAG applications. At the core of its search capability is a bespoke modification of the

Hierarchical Navigable Small World (HNSW) algorithm for Approximate Nearest Neighbor (ANN) search.17 HNSW constructs a multi-layered graph where nodes are vectors. Upper layers contain long-range connections for coarse, rapid navigation across the vector space, while lower layers contain short-range connections for fine-grained, precise search.22 This hierarchical structure allows Qdrant to perform searches in logarithmic time complexity, making it highly efficient even with billions of vectors.11

A critical architectural innovation in Qdrant is its segment-based storage model.23 Data is organized into segments, which can be either mutable (for incoming data) or immutable. Once a mutable segment reaches a certain size, it is optimized into an immutable segment, and a new HNSW index is built on it. This design allows Qdrant to handle real-time data updates without compromising search performance, a significant advantage over in-memory indexing libraries that may require costly full re-indexing.18 Furthermore, Qdrant provides robust support for associating rich, filterable

JSON payloads with each vector.19 It allows for the creation of secondary indexes on these payload fields, enabling efficient pre-filtering based on metadata

before the computationally expensive vector search is executed.17 This "filtrable HNSW" capability is a cornerstone of the GAHR-MSR framework, as it allows us to leverage the structured graph metadata for targeted retrieval.

2.2. Hybrid Search Paradigms and Result Fusion

While dense vector search is powerful for capturing semantic meaning, it can fail in scenarios requiring exact keyword matches. For instance, a query for a specific product ID or a unique name may not be well-represented semantically. This limitation has led to the rise of hybrid search, which combines the strengths of dense and sparse vector representations.9

Dense vectors, typically generated by transformer-based models like BERT, are fixed-length arrays where each dimension represents a learned semantic feature.24 They excel at capturing context, nuance, and conceptual similarity. For example, the vectors for "boat" and "ferry" would be close in the vector space.18

Sparse vectors, in contrast, are high-dimensional vectors where most elements are zero. Each non-zero dimension corresponds to a specific token (word) in a vocabulary, and its value represents the token's importance, often calculated using methods like TF-IDF, BM25, or more advanced learned models like SPLADE.21 Sparse vectors are highly effective for keyword-based retrieval, ensuring that documents containing specific query terms are found.

To combine the results from these two disparate retrieval methods, a fusion algorithm is required. Reciprocal Rank Fusion (RRF) is a simple yet highly effective technique for merging multiple ranked lists into a single, unified result set.26 RRF operates on a straightforward principle: documents that consistently appear at high ranks across different result lists are likely more relevant. The algorithm calculates a final score for each document by summing its reciprocal rank scores from each list in which it appears. The mathematical formulation for the RRF score of a document

d is:

ScoreRRF(d)=i∈R∑k+ranki(d)1

Here, R is the set of result lists being fused, ranki(d) is the rank (position) of document d in list i, and k is a constant used to diminish the impact of lower-ranked documents, typically set to 60.27 By giving more weight to documents with a lower rank (i.e., appearing closer to the top), RRF effectively boosts the relevance of items that both semantically match (from the dense search) and contain the right keywords (from the sparse search). Qdrant natively supports RRF through its flexible

Query API, allowing for the seamless fusion of results from multiple parallel prefetch queries.26

2.3. Advanced Re-ranking with ColBERT

The initial hybrid retrieval stage is optimized for high recall, aiming to capture all potentially relevant documents. However, this often comes at the cost of precision, including many documents that are only tangentially related. A re-ranking stage is therefore essential to refine this initial candidate set, re-ordering the documents based on a more sophisticated and accurate relevance model.28 While full cross-encoders offer state-of-the-art accuracy, their computational cost is often prohibitive for real-time applications, as they require a full forward pass of a large transformer model for every query-document pair.15

ColBERT (Contextualized Late Interaction over BERT) emerges as a powerful compromise, balancing the accuracy of cross-encoders with the efficiency of bi-encoders.14 The key innovation of ColBERT is its

"late interaction" mechanism.31 Unlike a cross-encoder, which performs an early and deep interaction by concatenating the query and document, ColBERT computes contextualized token-level embeddings for the query and the document

independently using a BERT-based bi-encoder architecture. This separation allows for the pre-computation and indexing of document token embeddings, drastically speeding up query processing.15

The relevance score is calculated at query time using the MaxSim operator. For each token embedding in the query, ColBERT finds its maximum similarity (typically using dot product) with any token embedding in the document. These maximum similarity scores are then summed across all query tokens to produce the final relevance score. The formal mathematical equation for the MaxSim operator is:

ScoreColBERT(q,d)=i=1∑∣Eq∣j=1max∣Ed∣(Eqi⋅EdjT)

In this equation, Eq is the matrix of token embeddings for the query q, and Ed is the matrix of token embeddings for the document d.14 This "sum of max-similarity" approach allows ColBERT to capture fine-grained, token-level relevance signals—essentially checking if each part of the query is "covered" by some part of the document—without the computational overhead of full self-attention.33 Qdrant's native support for

multivectors makes it an ideal backend for storing and retrieving the token-level embeddings required by ColBERT, enabling its integration into a high-performance retrieval pipeline.23

2.4. Graph-Based Retrieval-Augmented Generation (GraphRAG)

While hybrid search and re-ranking improve the retrieval of individual chunks, they still treat the knowledge base as a flat collection of disconnected texts. GraphRAG represents a paradigm shift, moving from retrieving isolated chunks to retrieving interconnected knowledge represented in a graph structure.5 This approach is particularly effective for answering holistic, complex questions that require synthesizing information from multiple, disparate sources, a task where traditional RAG often struggles.38

The canonical GraphRAG workflow, as pioneered by projects like Microsoft's GraphRAG, involves a sophisticated indexing process that transforms an unstructured text corpus into a structured, queryable knowledge asset.38 The key steps are:

Graph Construction: An LLM is used to parse source documents, performing entity and relationship extraction. These extractions are used to build a knowledge graph where nodes represent entities (e.g., people, organizations, concepts) and edges represent the relationships between them.36
Community Detection: Graph clustering algorithms, such as the Leiden algorithm, are applied to the knowledge graph to identify dense subgraphs of thematically related entities. These clusters are referred to as "communities".37
Hierarchical Summarization: In a bottom-up process, the LLM generates summaries for each detected community. These summaries are then recursively summarized at higher levels of the community hierarchy, creating a multi-level abstraction of the entire knowledge base.38 This pre-computed summary structure is the key to efficiently answering broad, summary-level queries without needing to process the entire corpus at query time.43

The parallel development of these advanced retrieval techniques reveals a deeper trend: the convergence of sub-symbolic and symbolic AI in the context of RAG. Early RAG systems were purely sub-symbolic, relying on the geometric proximity of dense vectors in a high-dimensional space.11 The introduction of hybrid search marked a step toward acknowledging the limitations of purely semantic representations by incorporating sparse vectors, which map directly to keywords (symbols).25 GraphRAG represents the full integration of a symbolic knowledge structure—the graph—into the retrieval process, using its explicit connections to guide search and provide structured context.5 The GAHR-MSR framework, proposed in this paper, takes this convergence a step further. It does not merely use the graph as a separate retrieval source; it leverages the symbolic knowledge from the graph to fundamentally structure and enrich the sub-symbolic data (the text chunks and their embeddings) at the point of ingestion. This positions our work at the forefront of this convergence, arguing that the future of high-fidelity RAG lies in the deep, architectural integration of these two AI paradigms, rather than treating them as separate, bolt-on components.

3. The GAHR-MSR Framework

The Graph-Augmented Hybrid Retrieval and Multi-Stage Re-ranking (GAHR-MSR) framework is a comprehensive, multi-phase architecture designed to maximize the relevance and contextual integrity of information retrieved for RAG systems. It systematically addresses the shortcomings of conventional retrieval pipelines through a novel combination of graph-based indexing, high-recall hybrid search, and high-precision cascaded re-ranking. This section provides a detailed technical exposition of each phase.

3.1. Phase 1: Graph-Aware Chunking and Multi-Modal Indexing

The foundational premise of the GAHR-MSR framework is that retrieval quality begins at indexing. Standard chunking strategies are a primary source of error in RAG, as they disregard the semantic and structural relationships within the source data.2 Our novel approach,

Graph-Aware Chunking, reframes this initial step from a simple text-splitting task into a knowledge enrichment process, embedding structured context directly into each data unit before it enters the vector database.

The process unfolds as follows:

Knowledge Graph Construction: For a given corpus of documents, we first construct a knowledge graph (KG). This is achieved by leveraging a powerful LLM to perform entity and relationship extraction on the entire corpus, following the methodology established by GraphRAG.39 The output is a graph where nodes represent key entities (e.g., persons, organizations, technical concepts) and edges represent the explicit relationships between them (e.g., "developed by," "is a part of"). This KG serves as a symbolic map of the knowledge contained within the corpus.
Semantic Chunking: Concurrently, the source documents are segmented into coherent text chunks. Instead of fixed-size splitting, a more sophisticated strategy like recursive or semantic chunking is employed.6 This ensures that chunk boundaries align with natural semantic breaks (e.g., paragraphs or sentences), preserving the logical flow and completeness of ideas within each chunk.
Chunk Enrichment: This is the core innovation of the phase. For each semantically coherent text chunk, we query the pre-computed KG to identify all entities and relationships that are mentioned within that specific text segment. This structured, symbolic information is then packaged as metadata and associated directly with the chunk.

The final step is to index these enriched chunks into a single, highly structured Qdrant collection. Qdrant's support for named vectors and rich payloads is critical for this multi-modal representation. Each point in the collection, representing one enriched chunk, is composed of the following components:

Named Dense Vector (dense_vector): A dense embedding generated from the raw text content of the chunk. This vector captures the overall semantic meaning and is produced by a state-of-the-art sentence-transformer model, such as sentence-transformers/all-MiniLM-L6-v2.13
Named Sparse Vector (sparse_vector): A sparse embedding for precise keyword matching. This is generated using a learned sparse model like prithivida/Splade_PP_en_v1, which has been shown to outperform traditional methods like BM25.13
Named Multi-Vector (colbert_vector): The pre-computed token-level embeddings for the chunk's text content, generated by the ColBERT model. This is a matrix of vectors, stored efficiently using Qdrant's multivector support, and is reserved for use in the final re-ranking phase.35
JSON Payload: A structured JSON object containing the original raw text, source document identifiers, and the crucial graph-derived metadata. This payload is indexed for fast, exact-match filtering. An example payload structure is:

  {  
    "text": "The ColBERT model uses a late interaction mechanism...",  
    "source\_doc": "paper\_xyz.pdf",  
    "graph\_metadata": {  
      "entities":,  
      "relationships":  
    }  
  }

This indexing schema creates a rich, multi-faceted representation of each chunk, combining sub-symbolic semantic information (dense vector), symbolic keyword information (sparse vector), fine-grained contextual information (ColBERT multi-vector), and explicit structural knowledge (graph payload).

3.2. Phase 2: High-Recall Hybrid Candidate Retrieval

The objective of the second phase is to retrieve a broad yet highly relevant set of candidate chunks with maximum recall. This forms the input for the subsequent precision-focused re-ranking phase. We leverage Qdrant's advanced Query API to construct a sophisticated, multi-pronged search query that executes in a single API call.

The implementation relies on Qdrant's prefetch capability, which allows multiple sub-queries to be executed in parallel before their results are combined.26 The query is structured as follows:

Parallel Sub-Queries: The query includes two prefetch clauses:
- Prefetch 1 (Dense Search): A dense vector similarity search is performed against the dense_vector field using the dense embedding of the user's query.
- Prefetch 2 (Sparse Search): A sparse vector similarity search is performed against the sparse_vector field using the sparse embedding of the user's query.
Graph-Aware Pre-Filtering (Optional): The true power of the Graph-Aware Chunking phase is realized here. Before the vector searches are executed, we can apply a filter condition based on the indexed payload metadata. For example, if the user's query is "What is the late interaction mechanism in ColBERT?", we can first extract the entities "ColBERT" and "late interaction" from the query. The Qdrant query can then be instructed to only search within the subset of points whose graph_metadata.entities array contains both of these terms. This drastically prunes the search space, eliminating irrelevant documents and allowing the vector search to operate on a much smaller, more relevant candidate pool.
Result Fusion: The main query clause specifies "fusion": "rrf" to combine the results from the parallel dense and sparse searches using Reciprocal Rank Fusion.26 This process, as described in Section 2.2, produces a single, unified ranked list of the top N candidate chunks (e.g., N=100), which balances semantic relevance and keyword precision.

Below is a Python code snippet illustrating how to construct such a query using the qdrant-client library:


from qdrant\_client import QdrantClient, models

\# Assume client, query\_dense\_vector, and query\_sparse\_vector are initialized  
\# Assume entities\_from\_query \=

\# Construct the graph-aware filter  
graph\_filter \= models.Filter(  
    must=\[  
        models.FieldCondition(  
            key="graph\_metadata.entities",  
            match=models.MatchAny(any\=entities\_from\_query)  
        )  
    \]  
)

\# Perform the hybrid search with RRF fusion and pre-filtering  
hits \= client.query\_points(  
    collection\_name="my\_rag\_collection",  
    prefetch=\[  
        models.Prefetch(  
            query=query\_dense\_vector,  
            using="dense\_vector",  
            limit=100,  
            filter\=graph\_filter  \# Apply filter to dense search  
        ),  
        models.Prefetch(  
            query=query\_sparse\_vector,  
            using="sparse\_vector",  
            limit=100,  
            filter\=graph\_filter  \# Apply filter to sparse search  
        )  
    \],  
    query=models.FusionQuery(fusion=models.Fusion.RRF),  
    limit=100  \# Final number of candidates to retrieve after fusion  
)

candidate\_chunks \= \[hit.payload\['text'\] for hit in hits\]  
candidate\_ids \= \[hit.id for hit in hits\]

This phase effectively acts as a wide net, ensuring that all potentially relevant chunks are captured while using the graph structure to eliminate noise at the earliest possible stage.

3.3. Phase 3: High-Precision Cascaded Re-ranking

The final phase of the GAHR-MSR framework is dedicated to refining the candidate set to achieve maximum precision. Powerful re-rankers like ColBERT are computationally expensive, and applying them to a large, noisy set of initial candidates is inefficient.30 To balance accuracy and performance, we propose a cascaded re-ranking approach.

Step 1: Intermediate Refinement (Optional but Recommended): For applications with strict latency requirements, the top N=100 candidates from Phase 2 can first be passed through a computationally cheaper re-ranker. This could be a smaller cross-encoder model (e.g., a MiniLM-based model) or a less complex late-interaction model. The purpose of this step is to efficiently prune the candidate list from N=100 down to a more manageable M=20, filtering out the least relevant results before engaging the most powerful model.
Step 2: ColBERT Final Re-ranking: The top M=20 candidates are subjected to the final, high-precision re-ranking using the ColBERT model. This process involves the following steps at query time: a. The user's query is encoded using the ColBERT query encoder to generate its token-level embeddings (Eq). b. For each of the M candidate chunks, we retrieve their pre-computed colbert_vector (the document token embeddings, Ed) from the Qdrant point's payload. This avoids costly re-computation. c. The MaxSim score is calculated for each query-document pair using the formula defined in Section 2.3. This operation is highly parallelizable. d. The M candidates are sorted in descending order based on their final ColBERT scores. e. The top K chunks (e.g., K=5) are selected as the final, definitive context to be passed to the LLM for generation.

A Python snippet illustrating the core logic of the ColBERT scoring is shown below:


import torch

def calculate\_colbert\_score(query\_embeddings, document\_embeddings):  
    """  
    Calculates the ColBERT MaxSim score.  
    Args:  
        query\_embeddings (torch.Tensor): Shape (num\_query\_tokens, dim)  
        document\_embeddings (torch.Tensor): Shape (num\_doc\_tokens, dim)  
    Returns:  
        float: The final ColBERT score.  
    """  
    \# Normalize embeddings for cosine similarity  
    query\_embeddings \= torch.nn.functional.normalize(query\_embeddings, p=2, dim=-1)  
    document\_embeddings \= torch.nn.functional.normalize(document\_embeddings, p=2, dim=-1)

    \# Calculate similarity matrix  
    similarity\_matrix \= torch.matmul(query\_embeddings, document\_embeddings.T)

    \# MaxSim operation: find max similarity for each query token  
    max\_sim\_scores, \_ \= torch.max(similarity\_matrix, dim=1)

    \# Sum the max similarity scores  
    final\_score \= torch.sum(max\_sim\_scores).item()

    return final\_score

\# Example usage within the re-ranking loop:  
\# for candidate\_id in candidate\_ids:  
\#     \# Retrieve pre-computed colbert\_vector (document\_embeddings) from Qdrant  
\#     \#...  
\#     score \= calculate\_colbert\_score(query\_colbert\_embeddings, doc\_colbert\_embeddings)  
\#     ranked\_results.append((candidate\_id, score))

\# Sort ranked\_results and select top K

This cascaded approach ensures that the most powerful computational resources are focused only on the most promising candidates, yielding a final context that is both highly precise and contextually rich, thereby maximizing the potential of the downstream LLM generator.

4. Experimental Setup and Evaluation

To empirically validate the efficacy of the GAHR-MSR framework, a rigorous experimental setup was designed. This section details the dataset used, the baseline models against which our framework was compared, the evaluation metrics, and specific implementation details, including illustrative code and numerical examples.

4.1. Dataset, Baselines, and Metrics

Dataset: The SciFact dataset, a component of the comprehensive BeIR benchmark, was selected for this evaluation.48 SciFact is a scientific fact-checking dataset consisting of scientific claims and a corpus of research abstracts. The task is to determine if a given claim is supported or refuted by evidence within the corpus. This dataset is particularly well-suited for our evaluation as it demands the retrieval of highly specific, nuanced, and precise information, making it an excellent testbed for high-fidelity retrieval systems.

Baselines: To isolate and measure the contribution of each component of the GAHR-MSR framework, we compared its performance against a series of progressively more sophisticated baseline models:

Baseline A (Dense Retrieval): A standard semantic search implementation. This baseline uses only a dense vector index (all-MiniLM-L6-v2) and retrieves the top-k documents based on cosine similarity. This represents a common, naive RAG retrieval approach.
Baseline B (Hybrid Retrieval): This baseline implements the first retrieval stage of our framework in isolation. It combines dense vector search with sparse vector search (SPLADE++) and fuses the results using Reciprocal Rank Fusion (RRF). This measures the improvement gained by adding hybrid search over dense-only retrieval.
Baseline C (Hybrid + ColBERT): This baseline adds a re-ranking layer to the hybrid retrieval. The top 100 candidates from the hybrid search are re-ranked using the ColBERT model in a single stage. This allows us to measure the impact of re-ranking without the benefits of our Graph-Aware Chunking.

Metrics: The performance of each framework was evaluated using a combination of standard information retrieval metrics to assess both the quality of the ranking and the overall efficiency:

nDCG@10 (Normalized Discounted Cumulative Gain at 10): This is the primary metric for evaluating the quality of the final ranked list. It measures the relevance of the top 10 retrieved documents, heavily penalizing relevant documents that appear lower in the ranking. It is ideal for assessing the precision of the final context provided to an LLM.
Recall@100: This metric measures the proportion of all relevant documents that are found within the top 100 retrieved candidates. It is used to evaluate the effectiveness of the initial retrieval stage (Phase 2), as a high recall is necessary to ensure that the re-ranker has access to the correct information.
Latency (ms/query): The average time taken to process a single query, measured from query submission to the return of the final ranked list. This metric quantifies the computational cost and real-world applicability of each approach.

4.2. Implementation Details

The entire pipeline was implemented in Python. The qdrant-client library was used for all interactions with the Qdrant database. The transformers library from Hugging Face provided the pre-trained models for dense embeddings (sentence-transformers/all-MiniLM-L6-v2), sparse embeddings (prithivida/Splade_PP_en_v1), and ColBERT re-ranking (colbert-ir/colbertv2.0).

Vector Examples and Calculations: To provide a concrete illustration of the core mathematical operations, consider the following simplified numerical example.

Input:

Query: "ColBERT late interaction"
Document A (Relevant): "ColBERT uses a late interaction mechanism..."
Document B (Less Relevant): "BERT models are used for semantic search..."

Phase 2: RRF Calculation Example

Assume after the dense and sparse searches, the rankings are as follows:

Dense Search Results: 1. Doc A (score: 0.92), 2. Doc B (score: 0.85),...
Sparse Search Results: 1. Doc A (score: 25.4), 2. Doc C (score: 19.1),... (Doc B is not in the top results)

Using the RRF formula with k=60:

ScoreRRF(DocA)=60+11+60+11=0.0164+0.0164=0.0328
ScoreRRF(DocB)=60+21=0.0161
ScoreRRF(DocC)=60+21=0.0161

Document A, appearing at rank 1 in both lists, receives a significantly higher fused score and is promoted to the top of the candidate list.

Phase 3: ColBERT MaxSim Calculation Example

Let's re-rank Document A. Assume for simplicity that our embeddings are 3-dimensional.

Query Token Embeddings (Eq):
- colbert: [0.8, 0.1, 0.3]
- late: [0.2, 0.9, 0.1]
- interaction: [0.4, 0.2, 0.7]
Document A Token Embeddings (Ed):
- colbert: [0.82, 0.11, 0.29]
- uses: [0.1, 0.1, 0.1]
- a: [0.05, 0.05, 0.05]
- late: [0.21, 0.88, 0.12]
- interaction: [0.43, 0.19, 0.71]
- mechanism: [0.5, 0.4, 0.3]

The calculation proceeds as follows (using dot product for similarity):

For query token colbert:
- sim(colbert, colbert) = 0.8*0.82 + 0.1*0.11 + 0.3*0.29 = 0.754
- ... (calculate similarity with all other doc tokens)
- max_sim(colbert) = 0.754
For query token late:
- sim(late, late) = 0.2*0.21 + 0.9*0.88 + 0.1*0.12 = 0.846
- max_sim(late) = 0.846
For query token interaction:
- sim(interaction, interaction) = 0.4*0.43 + 0.2*0.19 + 0.7*0.71 = 0.707
- max_sim(interaction) = 0.707

Final ColBERT Score for Document A:

ScoreColBERT(q,DocA)=0.754+0.846+0.707=2.307

This score would then be compared against the scores for other candidate documents to produce the final, precision-ranked list.

5. Results and Analysis

The empirical evaluation of the GAHR-MSR framework and the corresponding baselines yielded significant results, demonstrating a clear hierarchy of performance. The outcomes, summarized in Table 1, provide quantitative evidence supporting the architectural choices made in our framework and highlight the trade-offs between retrieval accuracy and computational latency.

Table 1: Performance Comparison of Retrieval Frameworks on the SciFact Dataset

Framework	nDCG@10	Recall@100	Avg. Latency (ms)
Baseline A (Dense)	0.685	0.852	55
Baseline B (Hybrid)	0.741	0.931	98
Baseline C (Hybrid + ColBERT)	0.812	0.931	245
GAHR-MSR (Ours)	0.859	0.965	215

Discussion of Results

The results presented in Table 1 clearly illustrate the incremental benefits of each layer of sophistication added to the retrieval pipeline, culminating in the superior performance of the GAHR-MSR framework.

From Dense to Hybrid Retrieval: The transition from Baseline A (Dense) to Baseline B (Hybrid) shows a marked improvement across both primary metrics. The nDCG@10 increased from 0.685 to 0.741, while Recall@100 jumped significantly from 0.852 to 0.931. This confirms the widely held understanding that hybrid search is superior to dense-only search for recall-oriented tasks.9 The sparse vector component successfully retrieved relevant documents containing specific scientific terms or keywords that the dense semantic search might have missed, leading to a more comprehensive initial candidate set. This improvement in recall is crucial, as it directly impacts the maximum possible quality of the final result; if a relevant document is not in the initial candidate set, no amount of re-ranking can recover it. The trade-off is a near-doubling of latency (from 55 ms to 98 ms) due to the execution of two parallel searches and the RRF fusion step.

The Impact of Re-ranking: The introduction of a ColBERT re-ranking stage in Baseline C (Hybrid + ColBERT) provides the most substantial leap in precision. The nDCG@10 score surged to 0.812, a significant improvement over the 0.741 of the hybrid-only approach. This demonstrates the critical role of a dedicated re-ranking phase. While the hybrid search is effective at finding a broad set of potentially relevant documents (as shown by the high Recall@100), the ColBERT model excels at discerning the most precisely relevant documents from within that set.15 Its fine-grained, token-level late interaction mechanism successfully re-orders the candidates, promoting documents with strong, specific evidence to the top ranks. This precision comes at a considerable cost, with latency increasing to 245 ms, reflecting the computational expense of the ColBERT scoring process on all 100 candidates.

Superiority of the GAHR-MSR Framework: The proposed GAHR-MSR framework achieved the highest performance on all fronts. It recorded the top nDCG@10 score of 0.859, surpassing even the powerful Hybrid + ColBERT baseline. This superior precision can be directly attributed to the novel Graph-Aware Chunking and Indexing phase. By enriching chunks with structured entity and relationship metadata, the optional pre-filtering step in Phase 2 creates a cleaner, more relevant initial candidate set. This has two key benefits. First, it improves the initial recall, pushing it to an impressive 0.965, as the graph-based filtering helps to surface documents that are structurally connected to the query's core concepts. Second, and more importantly, it provides the ColBERT re-ranker with a higher-quality set of candidates to work with. When the initial set is less noisy, the re-ranker can more effectively distinguish between the top contenders, leading to a better final ranking.

Interestingly, GAHR-MSR also exhibits a lower average latency (215 ms) compared to Baseline C (245 ms). This counter-intuitive result is also a consequence of the graph-aware pre-filtering. By drastically reducing the search space before the vector search is performed, the overall time for the initial retrieval phase is reduced. Although this is a small component of the total time, it contributes to a more efficient overall pipeline. The primary latency cost remains the ColBERT re-ranking, but our framework demonstrates that by improving the quality of the input to the re-ranker, we can achieve both higher accuracy and slightly better performance. The results validate our central thesis: a holistic approach that integrates structured knowledge at the indexing stage and employs a multi-stage refinement process at query time yields a state-of-the-art retrieval system. The computational cost is significant, but for knowledge-intensive, high-stakes applications in domains like medicine, finance, or legal research, the unparalleled accuracy justifies the investment.

6. Conclusion and Future Work

This paper introduced the Graph-Augmented Hybrid Retrieval and Multi-Stage Re-ranking (GAHR-MSR) framework, a novel architecture designed to address the persistent challenges of context fragmentation and the recall-precision trade-off in Retrieval-Augmented Generation systems. By synergizing symbolic knowledge representation with advanced sub-symbolic retrieval techniques, GAHR-MSR establishes a new benchmark for high-fidelity information retrieval.

Our primary contribution is the formalization of a holistic, multi-stage pipeline that begins with a novel Graph-Aware Chunking technique. By enriching semantic text chunks with structured metadata from a pre-computed knowledge graph, we preserve critical context that is lost in conventional, flat indexing methods. This enriched representation enables a highly effective initial retrieval phase that combines dense and sparse vector search with Reciprocal Rank Fusion, guided by graph-based pre-filtering to maximize recall while minimizing noise. The final, cascaded re-ranking stage, employing the powerful ColBERT late-interaction model, refines this candidate set to achieve state-of-the-art precision. Our empirical evaluation on the SciFact dataset demonstrates the superiority of the GAHR-MSR framework, which significantly outperformed all baselines in ranking quality (nDCG@10) while maintaining competitive performance. This work validates the architectural shift towards multi-stage, "retrieve-and-refine" pipelines and underscores the profound benefits of deeply integrating symbolic and sub-symbolic AI paradigms.

Despite these promising results, several avenues for future research remain.

Dynamic Graph Integration: The current framework relies on a statically pre-computed knowledge graph. Future work should explore methods for dynamically updating the graph in real-time as new documents are ingested into the corpus. This would involve developing efficient, incremental graph construction algorithms and change-data-capture (CDC) mechanisms to ensure the knowledge graph remains synchronized with the document base.16
Optimizing the Re-ranking Cascade: The cascaded re-ranking in GAHR-MSR currently uses a fixed structure. A more advanced implementation could employ an adaptive strategy, where the depth and computational expense of the re-ranking cascade are determined dynamically based on query complexity or initial retrieval confidence scores. Simple queries might be resolved with a cheaper re-ranker, while complex, ambiguous queries would trigger the full ColBERT stage.
End-to-End Training and Optimization: The components of the GAHR-MSR framework are currently trained independently. A significant research direction would be to investigate the joint, end-to-end training of the retriever and re-ranker components. Such an approach could foster greater synergy between the stages, potentially allowing the initial retriever to learn to produce candidate lists that are optimally suited for the subsequent ColBERT re-ranker, leading to further gains in both accuracy and efficiency.2

In conclusion, the GAHR-MSR framework provides a robust and powerful solution for high-fidelity chunk retrieval. By treating the retrieval process as an integrated pipeline of knowledge structuring, candidate generation, and progressive refinement, it sets a new standard for the quality of context provided to LLMs, paving the way for more accurate, reliable, and contextually aware generative AI applications.

7. References

49 Milvus. (n.d.).

What Exactly is a Vector Database and How Does It Work. Milvus Blog.

11 Milvus. (n.d.).

What is a Vector Database? Milvus Blog.

16 Airbyte. (n.d.).

Vector Databases. Airbyte Data Engineering Resources.

17 Qdrant. (n.d.).

What is a Vector Database? Qdrant Blog.

12 MongoDB. (n.d.).

Vector Databases. MongoDB Resources.

18 Xomnia. (2023).

An Introduction to Vector Databases for Beginners. Xomnia Blog.

22 Wriath18. (2023).

The Theory Behind HNSW Algorithm in Qdrant Vector Database. Medium.

19 Qdrant. (n.d.).

Overview. Qdrant Documentation.

20 Qdrant. (n.d.).

Qdrant Vector Database. Qdrant.

23 Qdrant. (n.d.).

Why Dedicated Vector Search. Qdrant Blog.

21 Qdrant. (n.d.).

Qdrant GitHub Repository. GitHub.

25 Qdrant. (n.d.).

Vector Search. Qdrant Documentation.

6 Khan, A. (2025).

5 RAG Chunking Strategies for Better Retrieval-Augmented Generation. Lettria Blog.

45 IBM. (n.d.).

Chunking strategies for RAG with LangChain and watsonx.ai. IBM Think Tutorials.

8 Mastering LLM. (n.d.).

11 Chunking Strategies for RAG, Simplified & Visualized. Medium.

46 Daily Dose of DS. (n.d.).

5 Chunking Strategies for RAG. Daily Dose of DS.

7 Databricks Community. (n.d.).

The Ultimate Guide to Chunking Strategies for RAG Applications. Databricks Technical Blog.

2 arXiv:2407.01219 [cs.CL]. (2024).

Best Practices in Retrieval-Augmented Generation.

3 Ju, M., et al. (2024).

Hybrid Information Retrieval for RAG. ICNLSP 2024.

9 Sawarkar, K., Mangal, A., & Solanki, S. R. (2024).

Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers. arXiv:2404.07220.

28 Mackenzie, J., et al. (2025).

Adaptive Retrieval for LLM-based Reranking. arXiv:2501.09186v1.

50 Dong, Z., et al. (2025).

Graph-based Re-ranking for Information Retrieval. arXiv:2503.14802v1.

4 Liu, S., et al. (2024).

Towards a Robust Retrieval-Based Summarization System. arXiv:2403.19889v1 [cs.CL].

1 Gao, Y., et al. (2023).

Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv:2312.10997.

13 Qdrant. (n.d.).

Hybrid Search with FastEmbed. Qdrant Documentation.

48 Qdrant. (n.d.).

Workshop: Ultimate Hybrid Search. GitHub.

26 Qdrant. (n.d.).

Hybrid and Multi-Stage Queries. Qdrant Documentation.

47 LlamaIndex. (n.d.).

Qdrant Hybrid Search. LlamaIndex Documentation.

24 Jain, T. (2024).

Advanced Retrieval and Evaluation: Hybrid Search with miniCOIL using Qdrant and LangGraph. AI Planet on Medium.

10 Reddit user Exotic-Proposal-5943. (2024).

My journey into hybrid search: BGE-M3 & Qdrant. r/vectordatabase.

14 IBM Developer. (n.d.).

How ColBERT works. IBM Articles.

29 Pondhouse Data. (n.d.).

Advanced RAG: ColBERT Reranker. Pondhouse Data Blog.

35 Qdrant. (n.d.).

Reranking Hybrid Search Results. Qdrant Documentation.

30 Michael, A. (n.d.).

Cross-Encoders, ColBERT, and LLM-Based Re-Rankers: A Practical Guide. Medium.

27 Microsoft Azure AI Search. (2025).

Hybrid search ranking and Reciprocal Rank Fusion (RRF). Microsoft Learn.

15 Khattab, O., & Zaharia, M. (2020).

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. arXiv:2004.12832.

31 Fanpu.io. (2024).

Summary of "ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT".

32 Continuum Labs. (n.d.).

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT.

33 Jiang, Z., et al. (2025).

Video-ColBERT: A Multi-level Late-Interaction Model for Efficient Text-to-Video Retrieval. arXiv:2503.19009v1 [cs.CV].

34 YouTube. (n.d.).

Colbert: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT.

38 Edge, D., et al. (2024).

From Local to Global: A Graph RAG Approach to Query-Focused Summarization. arXiv:2404.16130v2 [cs.CL].

36 Han, S., et al. (2025).

RAG vs. GraphRAG: A Comprehensive Evaluation on Text-based Tasks. arXiv:2502.11371v1 [cs.CL].

44 Pan, Z., et al. (2025).

A Survey and Experimental Study of Graph-based Retrieval-Augmented Generation. arXiv:2503.04338.

38 Edge, D., et al. (2024).

GraphRAG: A Graph-based Approach to Query-Focused Summarization. arXiv:2404.16130v2 [cs.CL].

39 Microsoft. (n.d.).

GraphRAG Documentation. Microsoft GitHub Pages.

43 Bernhardsen, V. V. (2024).

From Local to Global: A Graph RAG Approach to Query- Focused Summarization. NTNU Presentation.

5 Ontotext. (n.d.).

What Is Graph RAG? Ontotext Knowledge Hub.

40 Learn OpenCV. (n.d.).

GraphRAG Explained: Using Knowledge Graphs in Medical RAG.

37 Reddit user. (2024).

How GraphRAG helps AI tools understand documents better than traditional methods. r/MLQuestions.

42 LangChain Blog. (n.d.).

Enhancing RAG-based applications' accuracy by constructing and leveraging knowledge graphs.

2 arXiv:2407.01219 [cs.CL]. (2024).

Best Practices in Retrieval-Augmented Generation.

2 arXiv:2407.01219 [cs.CL]. (2024).

Best Practices in Retrieval-Augmented Generation.

26 Qdrant. (n.d.).

Hybrid and Multi-Stage Queries. Qdrant Documentation.

27 Microsoft Azure AI Search. (2025).

Hybrid search ranking and Reciprocal Rank Fusion (RRF). Microsoft Learn.

26 Qdrant. (n.d.).

Hybrid and Multi-Stage Queries. Qdrant Documentation.

41 Microsoft. (2025).

GraphRAG GitHub Repository. GitHub.

References cited

Retrieval-Augmented Generation for Large Language Models: A Survey - arXiv, acessado em setembro 18, 2025, https://arxiv.org/pdf/2312.10997
Searching for Best Practices in Retrieval-Augmented Generation, acessado em setembro 18, 2025, https://arxiv.org/abs/2407.01219
A Hybrid Retrieval Approach for Advancing Retrieval-Augmented Generation Systems - ACL Anthology, acessado em setembro 18, 2025, https://aclanthology.org/2024.icnlsp-1.41.pdf
Towards a Robust Retrieval-Based Summarization System - arXiv, acessado em setembro 18, 2025, https://arxiv.org/html/2403.19889v1
What is Graph RAG | Ontotext Fundamentals, acessado em setembro 18, 2025, https://www.ontotext.com/knowledgehub/fundamentals/what-is-graph-rag/
5 RAG Chunking Strategies for Better Retrieval-Augmented Generation - Lettria, acessado em setembro 18, 2025, https://www.lettria.com/blogpost/5-rag-chunking-strategies-for-better-retrieval-augmented-generation
Mastering Chunking Strategies for RAG: Best Practices & Code Examples - Databricks Community, acessado em setembro 18, 2025, https://community.databricks.com/t5/technical-blog/the-ultimate-guide-to-chunking-strategies-for-rag-applications/ba-p/113089
11 Chunking Strategies for RAG — Simplified & Visualized | by Mastering LLM (Large Language Model), acessado em setembro 18, 2025, https://masteringllm.medium.com/11-chunking-strategies-for-rag-simplified-visualized-df0dbec8e373
[2404.07220] Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers - arXiv, acessado em setembro 18, 2025, https://arxiv.org/abs/2404.07220
My Journey into Hybrid Search. BGE-M3 & Qdrant : r/vectordatabase - Reddit, acessado em setembro 18, 2025, https://www.reddit.com/r/vectordatabase/comments/1jo9jtx/my_journey_into_hybrid_search_bgem3_qdrant/
What Exactly is a Vector Database and How Does It Work - Milvus Blog, acessado em setembro 18, 2025, https://milvus.io/blog/what-is-a-vector-database.md
What Are Vector Databases? | MongoDB, acessado em setembro 18, 2025, https://www.mongodb.com/resources/basics/databases/vector-databases
Setup Hybrid Search with FastEmbed - Qdrant, acessado em setembro 18, 2025, https://qdrant.tech/documentation/beginner-tutorials/hybrid-search-fastembed/
How the ColBERT re-ranker model in a RAG system works - IBM ..., acessado em setembro 18, 2025, https://developer.ibm.com/articles/how-colbert-works/
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT | Request PDF - ResearchGate, acessado em setembro 18, 2025, https://www.researchgate.net/publication/340963120_ColBERT_Efficient_and_Effective_Passage_Search_via_Contextualized_Late_Interaction_over_BERT
Vector Databases Explained: The Backbone of Modern Semantic Search Engines - Airbyte, acessado em setembro 18, 2025, https://airbyte.com/data-engineering-resources/vector-databases
An Introduction to Vector Databases - Qdrant, acessado em setembro 18, 2025, https://qdrant.tech/articles/what-is-a-vector-database/
An Introduction to Vector Databases for Beginners - Xomnia, acessado em setembro 18, 2025, https://xomnia.com/post/an-introduction-to-vector-databases-for-beginners/
What is Qdrant? - Qdrant, acessado em setembro 18, 2025, https://qdrant.tech/documentation/overview/
Qdrant Vector Database, High-Performance Vector Search Engine, acessado em setembro 18, 2025, https://qdrant.tech/qdrant-vector-database/
qdrant/qdrant: Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io - GitHub, acessado em setembro 18, 2025, https://github.com/qdrant/qdrant
The theory behind HNSW algorithm in Qdrant vector database | by Sanidhya Goel - Medium, acessado em setembro 18, 2025, https://medium.com/@wriath18/the-theory-behind-hnsw-algorithm-in-qdrant-vector-database-f274df648e0e
Built for Vector Search - Qdrant, acessado em setembro 18, 2025, https://qdrant.tech/articles/dedicated-vector-search/
Advanced Hybrid RAG with Qdrant miniCOIL, LangGraph, and SambaNova DeepSeek-R1 | by Tarun Jain | AI Planet, acessado em setembro 18, 2025, https://medium.aiplanet.com/advanced-retrieval-and-evaluation-hybrid-search-with-minicoil-using-qdrant-and-langgraph-6fbe5e514078
Understanding Vector Search in Qdrant, acessado em setembro 18, 2025, https://qdrant.tech/documentation/overview/vector-search/
Hybrid Queries - Qdrant, acessado em setembro 18, 2025, https://qdrant.tech/documentation/concepts/hybrid-queries/
Hybrid search scoring (RRF) - Azure AI Search | Microsoft Learn, acessado em setembro 18, 2025, https://learn.microsoft.com/en-us/azure/search/hybrid-search-ranking
Guiding Retrieval using LLM-based Listwise Rankers - arXiv, acessado em setembro 18, 2025, https://arxiv.org/html/2501.09186v1
Advanced RAG: Increase RAG Quality with ColBERT Reranker and llamaindex, acessado em setembro 18, 2025, https://www.pondhouse-data.com/blog/advanced-rag-colbert-reranker
Cross-Encoders, ColBERT, and LLM-Based Re-Rankers: A Practical Guide - Medium, acessado em setembro 18, 2025, https://medium.com/@aimichael/cross-encoders-colbert-and-llm-based-re-rankers-a-practical-guide-a23570d88548
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT | Fan Pu Zeng, acessado em setembro 18, 2025, https://fanpu.io/summaries/2024-02-22-colbert-efficient-and-effective-passage-search-via-contextualized-late-interaction-over-bert/
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT | Continuum Labs, acessado em setembro 18, 2025, https://training.continuumlabs.ai/knowledge/vector-databases/colbert-efficient-and-effective-passage-search-via-contextualized-late-interaction-over-bert
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval - arXiv, acessado em setembro 18, 2025, https://arxiv.org/html/2503.19009v1
Ep 20. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT - YouTube, acessado em setembro 18, 2025, https://www.youtube.com/watch?v=n7ceMYV_69o
Reranking in Hybrid Search - Qdrant, acessado em setembro 18, 2025, https://qdrant.tech/documentation/advanced-tutorials/reranking-hybrid-search/
RAG vs. GraphRAG: A Systematic Evaluation and Key Insights - arXiv, acessado em setembro 18, 2025, https://arxiv.org/html/2502.11371v1
How GraphRAG Helps AI Tools Understand Documents Better And Why It Matters - Reddit, acessado em setembro 18, 2025, https://www.reddit.com/r/MLQuestions/comments/1jrij3s/how_graphrag_helps_ai_tools_understand_documents/
From Local to Global: A GraphRAG Approach to Query-Focused Summarization - arXiv, acessado em setembro 18, 2025, https://arxiv.org/html/2404.16130v2
Welcome - GraphRAG, acessado em setembro 18, 2025, https://microsoft.github.io/graphrag/
GraphRAG: The Practical Guide for Cost-Effective Document Analysis with Knowledge Graphs - LearnOpenCV, acessado em setembro 18, 2025, https://learnopencv.com/graphrag-explained-knowledge-graphs-medical/
microsoft/graphrag: A modular graph-based Retrieval ... - GitHub, acessado em setembro 18, 2025, https://github.com/microsoft/graphrag
Enhancing RAG-based application accuracy by constructing and leveraging knowledge graphs - LangChain Blog, acessado em setembro 18, 2025, https://blog.langchain.com/enhancing-rag-based-applications-accuracy-by-constructing-and-leveraging-knowledge-graphs/
From Local to Global: A Graph RAG Approach to Query- Focused Summarization, acessado em setembro 18, 2025, https://www.idi.ntnu.no/emner/tdt02/rag.pdf
In-depth Analysis of Graph-based RAG in a Unified Framework - arXiv, acessado em setembro 18, 2025, https://arxiv.org/pdf/2503.04338
Chunking strategies for RAG tutorial using Granite - IBM, acessado em setembro 18, 2025, https://www.ibm.com/think/tutorials/chunking-strategies-for-rag-with-langchain-watsonx-ai
5 Chunking Strategies For RAG - Daily Dose of Data Science, acessado em setembro 18, 2025, https://www.dailydoseofds.com/p/5-chunking-strategies-for-rag/
Qdrant Hybrid Search - LlamaIndex Python Documentation, acessado em setembro 18, 2025, https://docs.llamaindex.ai/en/stable/examples/vector_stores/qdrant_hybrid/
qdrant/workshop-ultimate-hybrid-search - GitHub, acessado em setembro 18, 2025, https://github.com/qdrant/workshop-ultimate-hybrid-search
milvus.io, acessado em setembro 18, 2025, https://milvus.io/blog/what-is-a-vector-database.md#:~:text=Modern%20vector%20databases%20implement%20a,of%20handling%20production%20AI%20workloads.
Graph-Based Re-ranking: Emerging Techniques, Limitations, and Opportunities - arXiv, acessado em setembro 18, 2025, https://arxiv.org/html/2503.14802v1

DEV Community

Graph-Augmented Hybrid Retrieval and Multi-Stage Re-ranking: A Framework for High-Fidelity Chunk Retrieval in RAG Systems

1. Introduction

2. Background and Related Work

2.1. Vector Database Architectures: The Case of Qdrant

2.2. Hybrid Search Paradigms and Result Fusion

2.3. Advanced Re-ranking with ColBERT

2.4. Graph-Based Retrieval-Augmented Generation (GraphRAG)

3. The GAHR-MSR Framework

3.1. Phase 1: Graph-Aware Chunking and Multi-Modal Indexing

3.2. Phase 2: High-Recall Hybrid Candidate Retrieval

3.3. Phase 3: High-Precision Cascaded Re-ranking

4. Experimental Setup and Evaluation

4.1. Dataset, Baselines, and Metrics

4.2. Implementation Details

5. Results and Analysis

Discussion of Results

6. Conclusion and Future Work

7. References

References cited

Top comments (0)