ismail kattan for Mantis Stajyer Blogu

Posted on Jan 5

When Search Understands You: Semantic Search and RAG Chatbots with OpenSearch

#rag #opensearch #flask #ai

Introduction

This project is a Flask-based note management system that goes beyond traditional CRUD functionality by integrating hybrid semantic and lexical search with semantic highlighting. It also introduces a RAG-powered chatbot that enables users to interact with their notes conversationally, making note retrieval more intuitive and context-aware.

You can find the full source code and implementation details on the Mantis Interns GitHub repository:
Mantis Interns GitHub – Notebook

Technologies Used

Python & Flask: Used to build a lightweight and modular backend, allowing rapid iteration during the training period.
SQLite: Chosen for its simplicity and ease of setup while still being sufficient for managing user data and notes.
OpenSearch: Used to implement both lexical and semantic search, enabling hybrid search capabilities and serving as the retrieval layer for the RAG chatbot.
Tailwind CSS: Helped in building a clean and responsive UI without spending excessive time on custom styling.
LLM Integration: Used to enable conversational access to user notes by generating context-aware responses based on retrieved documents.

Problem Statement

Traditional keyword search is often insufficient when users don’t remember exact words or phrasing of their notes. This becomes more challenging when the goal is to interact with notes conversationally.

We solved this problem with OpenSearch, the image below shows the result of the "technology of art” query.

System Overview

The system is designed as a modular Flask-based application where authentication, note management, search, and conversational access are handled as separate but connected components. The architecture focuses on simplicity, clear data flow, and extensibility.

This diagram illustrates the high-level components of the system and how requests flow between the client, backend services, and external search components.

The chatbot relies on a Retrieval-Augmented Generation (RAG) pipeline, where OpenSearch retrieves the most relevant notes before constructing a constrained context for the language model.

OpenSearch Setup

OpenSearch was deployed using Docker in a single-node configuration to simplify local development while enabling advanced search features. The setup supports keyword-based search, vector similarity search, and ML-powered pipelines, with persistence enabled through Docker volumes.

The configuration focuses on:

Single-node OpenSearch cluster
Enabled k-NN vector search
ML Commons support for embeddings and inference
REST-based integration with the Flask backend

Detailed setup and configuration steps are available in the project repository.

Hybrid Search with OpenSearch

Why Hybrid Search?

Keyword-based search works well for exact matches but fails when users search by meaning rather than specific terms. Semantic search improves recall but may lack precision on its own. Combining both approaches results in more accurate and reliable note retrieval.

Design Overview

When a user submits a query:

A lexical search (BM25) is executed on note text fields
A semantic search is performed using vector similarity
Results from both searches are merged and ranked

This hybrid approach balances precision and semantic relevance.

Embeddings and Indexing

A sentence-transformer model is used to generate vector embeddings for note content. To keep the backend simple, an ingest pipeline automatically generates embeddings during indexing, allowing Flask to send only raw note data.

The notes index is designed to support:

Text fields for keyword search
Vector fields for semantic similarity
Metadata filtering by user, category, and tags

Ranking and Highlighting

Search results are ranked using Reciprocal Rank Fusion (RRF) to combine lexical and semantic scores effectively. Semantic highlighting is applied to surface the most relevant text segments, improving result interpretability.

RAG Chatbot Design

Motivation

While hybrid search improves note discovery, it still requires users to manually inspect results. To provide a more natural and conversational experience, a Retrieval-Augmented Generation (RAG) chatbot was introduced, allowing users to interact with their notes using natural language questions.

The goal was to ensure that responses are:

Grounded in the user’s own notes
Context-aware
Free from hallucinated or unrelated information

High-Level Design

The chatbot follows a RAG pipeline where OpenSearch acts as the retrieval layer and a large language model (LLM) handles response generation.

At a high level:

OpenSearch retrieves the most relevant notes based on the user query
Selected note fields are used to build a constrained context
The LLM generates a response strictly based on this context

This design ensures that the chatbot answers are rooted in actual user data rather than general knowledge.

LLM Integration via OpenSearch

Instead of calling the LLM directly from the Flask backend, the model is integrated through OpenSearch’s ML framework using a remote connector. This allows OpenSearch to orchestrate both retrieval and generation in a single pipeline.

Key benefits of this approach:

Reduced backend complexity
Centralized control over prompts and context
Easier experimentation with different models

Context Construction

To minimize noise and token usage, only selected fields from retrieved notes are included in the context:

Title
Content
Category
Tags

The system prompt guides the model to behave as a personal notes assistant, encouraging accurate, polite, and context-bound responses. If relevant information is missing, the model is instructed to acknowledge this explicitly.

Session and Message Management

On the application side, chat sessions and messages are persisted to maintain conversational continuity. Each session is isolated per user, ensuring that retrieved context and generated responses remain private and relevant.

Conclusion

This project demonstrates how a traditional CRUD-based application can be incrementally enhanced into a smart, conversational system. By integrating hybrid search and a RAG-based chatbot, the notes application evolved beyond simple keyword matching into a more intuitive and meaningful user experience.

Using OpenSearch as both a retrieval and orchestration layer simplified the architecture while enabling advanced capabilities such as semantic search, contextual highlighting, and grounded text generation. The design choices made throughout the project prioritized clarity, modularity, and practical trade-offs suitable for a real-world application.

Overall, this experience reinforced the importance of combining solid system design with modern search and AI techniques, and highlighted how thoughtful integration can significantly improve usability without adding unnecessary complexity.

DEV Community