DEV Community: Mantis Stajyer Blogu

When Search Understands You: Semantic Search and RAG Chatbots with OpenSearch

ismail kattan — Mon, 05 Jan 2026 14:33:04 +0000

Introduction

This project is a Flask-based note management system that goes beyond traditional CRUD functionality by integrating hybrid semantic and lexical search with semantic highlighting. It also introduces a RAG-powered chatbot that enables users to interact with their notes conversationally, making note retrieval more intuitive and context-aware.

You can find the full source code and implementation details on the Mantis Interns GitHub repository:
Mantis Interns GitHub – Notebook

Technologies Used

Python & Flask: Used to build a lightweight and modular backend, allowing rapid iteration during the training period.
SQLite: Chosen for its simplicity and ease of setup while still being sufficient for managing user data and notes.
OpenSearch: Used to implement both lexical and semantic search, enabling hybrid search capabilities and serving as the retrieval layer for the RAG chatbot.
Tailwind CSS: Helped in building a clean and responsive UI without spending excessive time on custom styling.
LLM Integration: Used to enable conversational access to user notes by generating context-aware responses based on retrieved documents.

Problem Statement

Traditional keyword search is often insufficient when users don’t remember exact words or phrasing of their notes. This becomes more challenging when the goal is to interact with notes conversationally.

We solved this problem with OpenSearch, the image below shows the result of the "technology of art” query.

System Overview

The system is designed as a modular Flask-based application where authentication, note management, search, and conversational access are handled as separate but connected components. The architecture focuses on simplicity, clear data flow, and extensibility.

This diagram illustrates the high-level components of the system and how requests flow between the client, backend services, and external search components.

The chatbot relies on a Retrieval-Augmented Generation (RAG) pipeline, where OpenSearch retrieves the most relevant notes before constructing a constrained context for the language model.

OpenSearch Setup

OpenSearch was deployed using Docker in a single-node configuration to simplify local development while enabling advanced search features. The setup supports keyword-based search, vector similarity search, and ML-powered pipelines, with persistence enabled through Docker volumes.

The configuration focuses on:

Single-node OpenSearch cluster
Enabled k-NN vector search
ML Commons support for embeddings and inference
REST-based integration with the Flask backend

Detailed setup and configuration steps are available in the project repository.

Hybrid Search with OpenSearch

Why Hybrid Search?

Keyword-based search works well for exact matches but fails when users search by meaning rather than specific terms. Semantic search improves recall but may lack precision on its own. Combining both approaches results in more accurate and reliable note retrieval.

Design Overview

When a user submits a query:

A lexical search (BM25) is executed on note text fields
A semantic search is performed using vector similarity
Results from both searches are merged and ranked

This hybrid approach balances precision and semantic relevance.

Embeddings and Indexing

A sentence-transformer model is used to generate vector embeddings for note content. To keep the backend simple, an ingest pipeline automatically generates embeddings during indexing, allowing Flask to send only raw note data.

The notes index is designed to support:

Text fields for keyword search
Vector fields for semantic similarity
Metadata filtering by user, category, and tags

Ranking and Highlighting

Search results are ranked using Reciprocal Rank Fusion (RRF) to combine lexical and semantic scores effectively. Semantic highlighting is applied to surface the most relevant text segments, improving result interpretability.

RAG Chatbot Design

Motivation

While hybrid search improves note discovery, it still requires users to manually inspect results. To provide a more natural and conversational experience, a Retrieval-Augmented Generation (RAG) chatbot was introduced, allowing users to interact with their notes using natural language questions.

The goal was to ensure that responses are:

Grounded in the user’s own notes
Context-aware
Free from hallucinated or unrelated information

High-Level Design

The chatbot follows a RAG pipeline where OpenSearch acts as the retrieval layer and a large language model (LLM) handles response generation.

At a high level:

OpenSearch retrieves the most relevant notes based on the user query
Selected note fields are used to build a constrained context
The LLM generates a response strictly based on this context

This design ensures that the chatbot answers are rooted in actual user data rather than general knowledge.

LLM Integration via OpenSearch

Instead of calling the LLM directly from the Flask backend, the model is integrated through OpenSearch’s ML framework using a remote connector. This allows OpenSearch to orchestrate both retrieval and generation in a single pipeline.

Key benefits of this approach:

Reduced backend complexity
Centralized control over prompts and context
Easier experimentation with different models

Context Construction

To minimize noise and token usage, only selected fields from retrieved notes are included in the context:

Title
Content
Category
Tags

The system prompt guides the model to behave as a personal notes assistant, encouraging accurate, polite, and context-bound responses. If relevant information is missing, the model is instructed to acknowledge this explicitly.

Session and Message Management

On the application side, chat sessions and messages are persisted to maintain conversational continuity. Each session is isolated per user, ensuring that retrieved context and generated responses remain private and relevant.

Conclusion

This project demonstrates how a traditional CRUD-based application can be incrementally enhanced into a smart, conversational system. By integrating hybrid search and a RAG-based chatbot, the notes application evolved beyond simple keyword matching into a more intuitive and meaningful user experience.

Using OpenSearch as both a retrieval and orchestration layer simplified the architecture while enabling advanced capabilities such as semantic search, contextual highlighting, and grounded text generation. The design choices made throughout the project prioritized clarity, modularity, and practical trade-offs suitable for a real-world application.

Overall, this experience reinforced the importance of combining solid system design with modern search and AI techniques, and highlighted how thoughtful integration can significantly improve usability without adding unnecessary complexity.

Fitera: AI-Powered Nutrition and Fitness Tracking Application

Meryem Sude Gök — Thu, 31 Jul 2025 12:47:46 +0000

Project Overview

Fitera is a comprehensive web application designed to help users track their nutrition habits, exercise routines, and health goals. The application features an AI-powered chatbot for personalized nutrition and fitness advice, detailed nutritional analysis, and comprehensive health monitoring capabilities.

You can find the source code at Mantis Intern's Github.

Technology Stack

Backend

Python 3.10+ with Flask web framework
Flask-Smorest for API documentation and validation
PostgreSQL database with SQLAlchemy ORM
JWT authentication system
Anthropic Claude AI model for chatbot functionality
RAG (Retrieval-Augmented Generation) methodology for context-aware responses

Frontend

Vue.js 3 with Composition API
Vue Router for navigation
Vuetify for UI components
Axios for HTTP requests
Vite as build tool

Core Features

User Management

Secure registration and login system
Profile management with height, weight, and goal tracking
Allergy and diet preference settings
BMI calculation and health metrics

Nutrition Tracking

Daily meal logging (breakfast, lunch, dinner, snacks)
Detailed macro and micronutrient analysis
Water consumption tracking
Comprehensive meal history with nutritional insights

Exercise Tracking

Exercise logging with duration and intensity
Walking and step tracking
Exercise recommendations based on user profile
Performance analysis and progress tracking

Health Monitoring

Sleep quality logging
Weight tracking over time
Health goal setting and progress monitoring

AI-Powered Chatbot

Nutrition consultation and advice
Exercise recommendations
Health guidance and tips
Personalized responses based on user data

Technical Implementation

Database Architecture

The application uses PostgreSQL with the following key tables:

users - User profiles and preferences
food - Comprehensive food database
macro_nutrients - Protein, carbs, fat tracking
micro_nutrients - Vitamins and minerals
meal_log - Daily meal records
exercise_log - Exercise tracking
water_log - Hydration monitoring
sleep_log - Sleep quality data
weight_log - Weight progression

AI Chatbot Implementation

The chatbot utilizes Anthropic's Claude 3.5 Sonnet model with a custom RAG (Retrieval-Augmented Generation) system:

Context Retrieval: The system extracts relevant information from the database using keyword-based search
Prompt Engineering: User queries are enhanced with retrieved context and conversation history
Response Generation: Claude generates personalized responses based on the enriched context
Conversation Management: Maintains conversation history for contextual continuity

RAG Methodology

The application implements a simplified RAG system that:

Builds documents from database tables (food, nutrients, exercise data)
Performs keyword-based similarity search
Retrieves relevant context for user queries
Enhances AI responses with domain-specific information

Development Challenges and Solutions

Database Integration

Challenge: Managing complex nutritional data with multiple related tables
Solution: Implemented a normalized database schema with proper relationships and efficient querying

AI Context Management

Challenge: Providing relevant, personalized responses without overwhelming the AI model
Solution: Developed a targeted document retrieval system that extracts only the most relevant information from the database

Real-time Data Processing

Challenge: Handling concurrent user requests and maintaining data consistency
Solution: Implemented proper database transactions and connection pooling

Frontend-Backend Communication

Challenge: Ensuring seamless data flow between Vue.js frontend and Flask backend
Solution: Designed RESTful APIs with proper error handling and data validation

User Experience Features

Modern Interface

Responsive design that works on desktop and mobile devices
Light/dark theme support for user preference
Intuitive navigation with clear visual hierarchy

Personalization

User-specific recommendations based on profile data
Adaptive interface that learns from user behavior
Customizable dashboard with preferred metrics

Data Visualization

Progress charts for weight, exercise, and nutrition goals
Nutritional breakdown with visual representations
Historical data analysis with trend identification

API Architecture

The application provides comprehensive REST APIs:

Authentication endpoints for user management
CRUD operations for all health tracking features
AI chatbot integration with conversation management
Data export and import capabilities

Security Implementation

JWT-based authentication with secure token management
Password hashing using bcrypt
Environment variable configuration for sensitive data
Input validation and sanitization
CORS configuration for secure cross-origin requests

Performance Optimization

Database indexing for fast query execution
Efficient document retrieval for AI context
Frontend caching strategies
Optimized API response times
Minimal dependency footprint

Conclusion

Fitera represents a modern approach to health and fitness tracking, combining traditional data management with cutting-edge AI technology. The application successfully bridges the gap between comprehensive health monitoring and personalized guidance, providing users with both the tools to track their progress and the intelligence to make informed decisions about their health.

Future Development

While the current version provides a solid foundation for nutrition and fitness tracking, potential future enhancements could include:

Integration with wearable devices for automatic data collection
Advanced machine learning for predictive health insights
Social features for community support and motivation
Mobile application development for enhanced accessibility

The project serves as an excellent example of how modern web technologies can be combined with AI to create meaningful, user-centric health applications.

Fitera was developed as a comprehensive health tracking solution, showcasing the potential of AI-enhanced personal health management tools.

Tagwise: The Story Behind an AI-Powered Bookmark Categorization Project

ebrargunay — Sat, 31 May 2025 12:00:35 +0000

INTRODUCTION

Tagwise is a project that aims to solve a common problem faced by many internet users: organizing bookmarks. Today, users save hundreds of links, but managing and categorizing them becomes a time-consuming and messy process. Tagwise was created to automate this task and make users’ lives easier.

Starting the Project: The Naming Process

The first step of the project was to find a suitable name. We wanted a name that clearly reflected the function of the site and communicated its purpose to users. Since the project is based on tagging and categorizing bookmarks, the word “tag” stood out. In addition, because the system offers smart suggestions, we added the word “wise.” Combining these two words, the name “Tagwise” was born — a name that both describes the function and suggests intelligence.

Logo Design: Colors and Symbols

After deciding on the name, we moved on to designing the visual identity of the project. For the logo, we decided to use shapes and symbols that reflect artificial intelligence. This was important to emphasize the tech-savvy and smart structure of the system.

When choosing colors, we went with blue and yellow. Blue represents trust and professionalism, while yellow symbolizes energy and creativity. This combination aligned well with our goal of offering a user-friendly and calming interface.

Technical Foundation and Technologies Used

The technical foundation of Tagwise is built on modern web technologies and AI systems.

We used Django and Django REST Framework for backend development.
The database was built with PostgreSQL.
For handling HTTP requests and HTML parsing, we used httpx and BeautifulSoup4.
Selenium and webdriver_manager were added for web automation.

On the AI side, we integrated OpenAI GPT-4o and Google Gemini API.
For YouTube links, we used yt-dlp and youtube-transcript-api to extract titles, descriptions, and transcripts when available.

To enable users to search their bookmarks using natural language, we implemented a chatbot using LangChain and FAISS, allowing semantic search over the stored content.

How the System Works

1. URL Processing:
When a user submits a link, it is fetched using httpx and parsed with BeautifulSoup4 to extract title, description, and main content.

2. YouTube Links:
For YouTube URLs, video titles and descriptions are retrieved via yt-dlp. If transcripts are available, they’re also extracted.

3. Fallback with Screenshots:
In cases where HTML content cannot be fetched, Selenium is used to capture a screenshot of the page. This image is then analyzed by the AI model for categorization. Screenshots also serve as thumbnails when one is not provided by the source site.

Categorization Approach

Categorization is handled entirely through large language models (LLMs). The extracted content is sent to GPT-4o or Gemini API for category prediction.
Note: The system does not use vector stores, RAG, or embedding techniques for categorization.

Chatbot and Vector Store Usage

The chatbot allows users to query their bookmark archives in natural language. It works by embedding the content and storing it in a FAISS vector store via LangChain.
When users type a query, the system uses retrieval-augmented generation (RAG) to fetch relevant bookmarks and present them as answers.

Importantly, this embedding and vector store functionality is used only for the chatbot, not for categorization.

Challenges and Solutions

- HTML Content Access:
When content could not be retrieved via standard HTTP requests, Selenium was used to capture screenshots for AI-based analysis.

- Missing Transcripts on YouTube:
When YouTube transcripts were unavailable, categorization relied only on video titles and descriptions.

- Missing Thumbnails:
If a link didn’t provide a thumbnail, a screenshot of the page was used instead.

Conclusion

Tagwise offers a smart, user-friendly solution to organize and categorize bookmarks automatically.
It was developed as part of an internship program and, while there are currently no plans to extend the project further, the experience and system created during this process lay a strong foundation for future applications.

Tagwise: Technical Review of AI-Powered Bookmark Categorization Project

Onur Ceyhan — Thu, 29 May 2025 22:46:24 +0000

Introduction

Tagwise is a straightforward and effective AI-powered web application developed as an internship project to automatically categorize bookmarked links.

You can checkout project at Mantis Intern's Github.

This article clearly discusses the project's technical infrastructure, methodologies, and developed solutions.

Project Objective

Modern internet users frequently bookmark hundreds of links, but manually organizing these links is often time-consuming. Tagwise aims to automate this task, quickly and accurately categorizing bookmarks from a single URL input.

Technologies Used

Backend Framework: Django, Django REST Framework
Database: PostgreSQL (psycopg2)
HTTP Requests: httpx
HTML Parsing: BeautifulSoup4
Web Automation: Selenium, webdriver_manager
Artificial Intelligence: OpenAI GPT-4o, Google Gemini API
YouTube Integration: yt-dlp, youtube-transcript-api
Vector Store & Chatbot: LangChain, FAISS (used only for chatbot functionality)

System Workflow and Process

URL Processing
Users enter only the URL. The content from the URL is retrieved in HTML format using httpx. HTML content is parsed into the title, description, and main content using BeautifulSoup4.
Special Process for YouTube Links
For YouTube links, video titles and descriptions are fetched using yt-dlp. If available, transcripts (subtitles) are retrieved using youtube-transcript-api. The gathered content is then sent to the AI for categorization.
Alternative Content Capture (Selenium)
For sites where HTML content cannot be fetched or parsed, a screenshot of the page is captured using Selenium. This screenshot is sent as visual data to the AI model for category determination. Additionally, if the site lacks a thumbnail (og:image), the Selenium screenshot is automatically used as a thumbnail.

Category Assignment Approach

The categorization process is entirely performed through large language models (LLMs). Prompt engineering methods send content directly to OpenAI GPT-4o or Google Gemini API, automatically determining the category. Technologies such as vector store, RAG, or embeddings are not used in the category determination process.

Chatbot Feature and Vector Store Usage

The project also includes a chatbot feature allowing users to query their bookmark archives in natural language. This chatbot operates by converting bookmark content into embeddings via LangChain, which are then stored in a FAISS vector store. When a user query is received, relevant content is retrieved using the Retrieval-Augmented Generation (RAG) methodology, and presented to the user. These vector store and embedding operations are exclusively for chatbot functionality and are not involved in the categorization process.

Challenges Encountered and Solutions

Fetching HTML Content: Selenium screenshot solutions were employed for content that could not be directly fetched with httpx and BeautifulSoup4.
YouTube Transcript Absence: Categorization was conducted solely based on video titles and descriptions when transcripts were unavailable.
Thumbnail Absence: Selenium screenshots were utilized as thumbnails when og:image or similar visuals were missing.

Conclusion

Tagwise offers a simple yet efficient solution for automatically categorizing bookmarks quickly. The project was developed as part of an internship. No further development is planned for the time being.

Feel free to reach out with your questions and comments!

Tam metin araması (Full-Text Search) nasıl çalışır?

Elif Albakır — Thu, 30 Jan 2025 12:27:05 +0000

Full-Text Search dokümanlar içinde serbest metin üzerinden arama yapılmasına olanak sağlayan, Web arama motorlarında ve web sayfalarında en çok kullanılan arama metodlarından biridir.

Full-Text Search (tam metin araması) büyük veri blokları arasından herhangi bir kaynaktan alınan metin belgeleri içinden, anahtar kelimenin aratılarak anahtar kelime ile eşleşen dokümanların bulunduğu sonuca hızlı ve daha isabetli bir şekilde erişebilmenizi sağlar.

Tam metin araması şu şekilde çalışır:
İlk önce verileri hızlıca arayabilmek için bir invented index (ters indeks) oluşturulur. Daha sonra bu indeks kullanılarak TF değeri (term frequency, geçiş sıklığı) ve IDF değeri (inverse document frequency, ters belge sıklığı) hesaplanır ve en son bulunan bu iki değer çarpılırak her bir doküman için vektörler oluşturulur ve sorgu cümlesinin vektörü ile arasındaki açı hesaplanır (cosine similarity). Sorgu vektörü ile doküman vektörü arasındaki açı ne kadar küçükse o doküman o kadar ilgili demektir.

TF-IDF değeri tam metin araması dışında belge sınıflandırma, konu modelleme ve stop-word filtreleme olmak üzere çeşitli durumlarda kullanılmaktadır.

Ters indeksleme, dokümanlarınızın içindeki her bir kelime için hangi dokümanlarda o kelimenin olduğu bilgisini tutan bir sistemdir. Ters indeksleme işleminde dokümanlardak kelimeler satırlar şeklinde bölerek onları sütun şeklinde indexler böylece performanslı bir arama yöntemi olur.

TF, terim frekansı anlamına gelir. Bir kelimenin bir belgedeki geçiş sıklığıdır.

IDF, ters belge frekansı anlamına gelir. Belge sayısının incelenen kelimeyi içeren topluluktaki belge sayısına bölümünün logaritması alınarak hesaplanır. Örneğin, belge sayımız 100 ise ve aratılan kelimemiz yalnızca 10 belgede görünüyorsa bu durumda IDF değerimiz 1’dir. Toplam doküman sayısı ile geçen doküman sayısı arasındaki fark çok büyük olabileceğinden değeri normalize etmek için logaritması alınır.

\log{\frac{N}{n_{k}}}

$N$ = Toplam Belge sayısı
$n_{k}$ = k kelimesinin geçtiği belge sayısı

Ardından bulunan TF ve IDF sonuçları kelimenin belge de ne kadar önemli olduğunu bulabilmek amacıyla çarpılır. Bu sayede yaygın kelimeler ve kullanılan ekler düşük ağırlık alırken, spesifik kelimeler daha yüksek önem alır.

w_{ik} = tf_{ik} \cdot idf_{k}

$tf_{ik}$ = k kelimesinin i belgesindeki geçiş sıklığı

Doküman vektörlerinin daha sağlıklı karşılaştırılabilmesi için terimlerin ağırlıklarının normalize edilmesi gerekir.

N(w_{ik}) = \frac{tf_{ik} \cdot \log{\frac{N}{n_{k}}}}{\sqrt{\sum_{k=0}^{M-1} (tf_{ik})^{2} \cdot \left[ \log\left( \frac{N}{n_{k}} \right) \right] ^{2} }}

$M$ = Toplam kelime sayısı

Her bir kelime için elde edilmiş TF-IDF değerleri hesaplanarak doküman vektörleri oluşturulur. Örneğin Mantis ve Yazılım kelimelerinden oluşan vektör şu şekilde olacaktır.

\overrightarrow{D_{1}} = \left[ w_{mantis_{d1}}, w_{yazılım_{d1}} \right]

İlgili dokümanları bulmak için sorgu vektörü ile doküman vektörleri arasındaki yakınlığı bulmak gerekir. Bunun için de iki vektör arasındaki açı Cosine Similarity ile hesaplanarak iki vektör arasındaki benzerlik ölçülür. İki açı arasındaki fark ne kadar küçükse o kadar ilgilidir. Böylece işlenen sorgu, inverted index üzerindeki belgelerle eşleştirilir, alaka (relevans) düzeyine göre sıralanır ve kullanıcıya döndürülür.

İki vektör A ve B için kosinüs benzerliği şu formülle hesaplanır:

\cos(\theta) = \frac{\sum\limits_{i=1}^{n} A_i B_i}{\sqrt{\sum\limits_{i=1}^{n} A_i^2} \cdot \sqrt{\sum\limits_{i=1}^{n} B_i^2}}

BM25 ise TF-IDF’in geliştirilmiş bir versiyonudur ve belgelerin sıralama skorlarını hesaplamak için kullanılır. Aradığınız kelime ile en çok eşleşen belgeleri sunar. Bunu yaparken kelime sayacı, kelimelerin yeri ve kelimelerin uzunluğuna dikkat eder. Kelime sayacı ile aradığınız kelimenin dokümanda ne kadar sık geçtiğine bakar, aranılan kelimelerin dokümanda nerede geçtiğine dikkat eder, aratılan kelimenin başlıkta mı yoksa metinde mi olduğuna bakar. Aranan kelimelerin uzunluğu ve spesifikliği istenilen sonucun daha doğru olmasını sağlar. İnternette kullandığımız birçok web sitesinin arama motorunda (Solr, Elasticsearch, Opensearch, ...) bu algoritma kullanılır. Aynı zamanda e-posta filtreleme, ürün önerisi sistemleri ve chatbot’lar gibi birçok farklı yerde karşınıza çıkabilir.

SONUÇ

Bu yazının amacı Full-text Search’ün günlük hayatta kullandığımız Google, Yandex gibi arama motorlarının nasıl size istediğiniz dokümanları sunduğunu anlatmaktır. Full-text Search, kelimelerin dokümanlarla bağlantılarını çözerek istediğiniz dokümanın ne olduğunu algılayabilmek için içerisinde TF ve IDF hesaplamaları yapması ve bunlar sayesinde sorgulanan kelime ile ilişkili dokümanlara ulaşmayı sağlamaktadır. TF-IDF aslına bakılırsa sizin girdiğiniz kelimelerin alaka seviyelerini ölçerek sizin için en iyi dokümanı ortaya çıkartmak ister. Bir arama motorunda herhangi bir şey arattığınızda en üstte çıkan sitelerin her zaman konunuzla daha alakalı olduğunu fark etmişsinizdir. İşte bunun olmasının sebebi Full-Text Search’dür. Eğer Full-Text Search’ün nasıl çalıştığını algılayabilirseniz daha etkili ve işinize yarar sorgular yapabilirsiniz.

REFERANS

https://www.turhost.com/blog/tf-idf-nedir/
https://medium.com/algorithms-data-structures/tf-idf-term-frequency-inverse-document-frequency-53feb22a17c6
https://medium.com/@kamillgun/full-text-search-e22a1251539
https://erolakgul.net/2015/09/13/full-text-search-mimarisi/
https://barisakdas.medium.com/bm25-best-match-re%C5%9Fevancy-algoritmas%C4%B1-nedir-a72f4103031c

LLM Nedir, Transformers nasıl Çalışır?

Elif Albakır — Fri, 24 Jan 2025 08:45:07 +0000

Genel anlamda aslında LLM (Large Language Models) yani bir diğer adıyla Büyük Dil Modelleri hepimizin günlük yaşantısında bir şekilde kullandığı bir yapay zekâ modelidir. Buna örnek göstermek istersek en popüler örneklerinden birisi ChatGPT’dir. LLM’i farklı yapan şey nedir dersek ortaya kesinlikle insan zekasına yakın bir yaratıcılık ve düşünme yetisi diyebiliriz. LLM daha insanvari bir yapıya sahip olarak, insan duygularını analiz edebilme, kelime tahminlerinde bulunabilme ve kelimeler arasında bağlantı kurabilme gibi özelliklere sahiptir. LLM bunu verilen cümlelere göre sonraki kelimeleri tahmin ederek yapıyor. Bunu basitçe cep telefonunuzdaki kelime tamamlama özelliği gibi düşünebiliriz. Elbette LLM bunu kendiliğinden yapmıyor. Büyük miktarda metin verisiyle eğitmek ve üzerinde ince ayarlar yapmak gerekiyor. Yani LLM'ler büyük veri setleri ile eğitilmiş, kelimeler arasındaki bağlamsal ve dilbilgisel bağlantıları anlayarak uygun kelimeleri seçen bir derin öğrenme modelidir.

Bunu da kendi kendine öğrenme teknikleri aracılığıyla yapar. Bu noktada Transformers mimarisi devreye girer. Dönüştürücü yapay sinir ağlarının hatırlama ve önerme gibi yetenekleri için kullanılır. Transformers mimarisi kullanılarak girdilerin nasıl çıktılara dönüştüğünü adım adım anlatmakla başlayalım.

1.

Modele gelen sorgular token adı verilen modelin anlayabileceği birimlere bölünür. Bu birimler, kelime, ek ya da özel karakterler olabilir. Ama model kelimeleri değil sayıları anlar bu sebeple tokenler, modelin anlayabileceği dile, yani embedinglere çevrilmesi için modelin içerisindeki token id'lere çevrilir.

Modelin çalışmaya başlaması için girilen sorgunun bittiğini anlaması gerekir. Bunun için başlangıç tokenleri (<|endoftext|>) kullanılır. Başlangıç tokenleri basitçe anlatmak gerekirse cümle sonundaki nokta görevini görürler.

2.

Bu tokenler, model tarafından işlenmeden önce sayıların bulunduğu yüksek boyutlu bir vektöre (tensor) dönüştürülür ve vektör uzayında temsil edilirler. Buna ise embedding yani gömme denir.

3.

Pozisyonel kodlama ile embedinglere tokenlerin sırası eklenerek kodlayıcı (encoder) katmanına gönderilir.

4.

Encoder katmanı değişken uzunlukta bir veriyi alarak sabit bir değişken haline getirir, embedingler bu kısımda self-attention ve feed forward işlemlerinden de geçer. Self attention, tokenlar arasında bağlantı kurabilme veya kelime öneriminde bulunabilmesi için gereken bir işlemdir. Token arasındaki bağlam ve anlam bütünlüğünü oluşturulmasını sağlar. Buradaki işlemler paralel çalışır.

5.

Encoder katmanında üretilen girdi temsili decoder (çözücü) katmanına gider. Bu katman başlangıç tokenini (<|endoftext|>) gördüğünde bir sonraki kelime için olası cevap kombinasyonları hesaplanır. Ve bu cevap kombinasyonları embedinglere dönüştürülerek olasılıklarının hesaplanması için lineer katmana yollanır ve lineer katmanda oluşturulan olasılıklar softmax katmanında normalize edildikten sonra ilk aday kelime bu katmana geri gönderilir. Bu aday kelime girdi temsili de kullanılarak sonraki kelimeler tahmin edilir. Model cevabı bitirdiğinde başlangıç tokenini (<|endoftext|>) dönerek çıktının bittiğini bildirir.

6.

Lineer katmanda kelime sıralarının korunması ve sıraya dikkat edilir. Modelin tahmin ettiği kelime olasılıkların dağılımını hesaplar.

7.

Lineer katmanda tahmin edilen kelime olasılıkları softmax katmanına yollanır. Girdi ne olursa olsun (pozitif, negatif vs.) softmax fonksiyonu bu girdiyi 0 ve 1 arasında bir değere dönüştürür. Yani olasılık dağılımını normalize eder.

Sonuç

Bu yazı LLM’in ne olduğu ve transformers mimarisinin nasıl çalıştığını temel olarak anlatmak amacıyla yazılmıştır. LLM’ler gelişmiş büyük bir veri setleriyle eğitilmeleri ve transformers mimarisini kullanmaları sayesinde girilen sorgulara uygun çıktılar verebilmeleri, sonraki kelimeyi tahmin etme ve olasılıklar oluşturarak yaratıcı cevaplar verebilme yeteneğine sahiptirler. Bu yetenekleri NLP’ler için devrim yaratacak seviyelere gelmiştir. Elbette bu seviyelerin temelinde transformers mimarisinin temel yapı taşı olduğunu belirtmek gerekir. LLM’ler insan ve makine arasındaki iletişimin sağlanması ve etkileşimin artmasını sağlamışlardır. LLM’ler sayesinde doğal dil işleme (NLP) kazandığı bu farklı boyut sayesinde gelecekte çok daha ileri seviyelerde olacak bir gelişim göstermeye devam etmektedir.

Referanslar

https://rpradeepmenon.medium.com/introduction-to-large-language-models-and-the-transformer-architecture-534408ed7e61

https://developers.google.com/machine-learning/crash-course/llm/transformers?hl=tr

https://aws.amazon.com/what-is/large-language-model/

https://bulutistan.com/blog/large-language-model-llm-nedir-uygulama-ornekleri/

https://youtu.be/wjZofJX0v4M