<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mantis Stajyer Blogu</title>
    <description>The latest articles on DEV Community by Mantis Stajyer Blogu (@mantis-stajyer).</description>
    <link>https://dev.to/mantis-stajyer</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F10192%2F1f9970ae-dd88-4412-a937-899af528389d.jpeg</url>
      <title>DEV Community: Mantis Stajyer Blogu</title>
      <link>https://dev.to/mantis-stajyer</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mantis-stajyer"/>
    <language>en</language>
    <item>
      <title>When Search Understands You: Semantic Search and RAG Chatbots with OpenSearch</title>
      <dc:creator>ismail kattan</dc:creator>
      <pubDate>Mon, 05 Jan 2026 14:33:04 +0000</pubDate>
      <link>https://dev.to/mantis-stajyer/when-search-understands-you-semantic-search-and-rag-chatbots-with-opensearch-1cce</link>
      <guid>https://dev.to/mantis-stajyer/when-search-understands-you-semantic-search-and-rag-chatbots-with-opensearch-1cce</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;This project is a Flask-based note management system that goes beyond traditional CRUD functionality by integrating hybrid semantic and lexical search with semantic highlighting. It also introduces a RAG-powered chatbot that enables users to interact with their notes conversationally, making note retrieval more intuitive and context-aware.&lt;/p&gt;

&lt;p&gt;You can find the full source code and implementation details on the Mantis Interns GitHub repository:&lt;br&gt;
&lt;a href="https://github.com/Mantis-Software-Company-Interns/Notebook" rel="noopener noreferrer"&gt;Mantis Interns GitHub – Notebook&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Technologies Used
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Python &amp;amp; Flask:&lt;/strong&gt; Used to build a lightweight and modular backend, allowing rapid iteration during the training period.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SQLite:&lt;/strong&gt; Chosen for its simplicity and ease of setup while still being sufficient for managing user data and notes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenSearch:&lt;/strong&gt; Used to implement both lexical and semantic search, enabling hybrid search capabilities and serving as the retrieval layer for the RAG chatbot.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tailwind CSS:&lt;/strong&gt; Helped in building a clean and responsive UI without spending excessive time on custom styling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM Integration:&lt;/strong&gt; Used to enable conversational access to user notes by generating context-aware responses based on retrieved documents.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Problem Statement
&lt;/h2&gt;

&lt;p&gt;Traditional keyword search is often insufficient when users don’t remember exact words or phrasing of their notes. This becomes more challenging when the goal is to interact with notes conversationally.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp7hefkv32cxu8g8suf8v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp7hefkv32cxu8g8suf8v.png" alt="Dashboard" width="800" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We solved this problem with OpenSearch, the image below shows the result of the "technology of art” query.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgg0d8whcx57gyck6xpcb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgg0d8whcx57gyck6xpcb.png" alt="Search Results" width="800" height="390"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  System Overview
&lt;/h2&gt;

&lt;p&gt;The system is designed as a modular Flask-based application where authentication, note management, search, and conversational access are handled as separate but connected components. The architecture focuses on simplicity, clear data flow, and extensibility.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk9yxt675y4itssm4q3cw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk9yxt675y4itssm4q3cw.png" alt="System Diagram" width="800" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This diagram illustrates the high-level components of the system and how requests flow between the client, backend services, and external search components.&lt;/p&gt;

&lt;p&gt;The chatbot relies on a Retrieval-Augmented Generation (RAG) pipeline, where OpenSearch retrieves the most relevant notes before constructing a constrained context for the language model.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenSearch Setup
&lt;/h2&gt;

&lt;p&gt;OpenSearch was deployed using Docker in a single-node configuration to simplify local development while enabling advanced search features. The setup supports keyword-based search, vector similarity search, and ML-powered pipelines, with persistence enabled through Docker volumes.&lt;/p&gt;

&lt;p&gt;The configuration focuses on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single-node OpenSearch cluster&lt;/li&gt;
&lt;li&gt;Enabled k-NN vector search&lt;/li&gt;
&lt;li&gt;ML Commons support for embeddings and inference&lt;/li&gt;
&lt;li&gt;REST-based integration with the Flask backend&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Detailed setup and configuration steps are available in the project repository.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hybrid Search with OpenSearch
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why Hybrid Search?
&lt;/h3&gt;

&lt;p&gt;Keyword-based search works well for exact matches but fails when users search by meaning rather than specific terms. Semantic search improves recall but may lack precision on its own. Combining both approaches results in more accurate and reliable note retrieval.&lt;/p&gt;

&lt;h3&gt;
  
  
  Design Overview
&lt;/h3&gt;

&lt;p&gt;When a user submits a query:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A lexical search (BM25) is executed on note text fields&lt;/li&gt;
&lt;li&gt;A semantic search is performed using vector similarity&lt;/li&gt;
&lt;li&gt;Results from both searches are merged and ranked&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This hybrid approach balances precision and semantic relevance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Embeddings and Indexing
&lt;/h3&gt;

&lt;p&gt;A sentence-transformer model is used to generate vector embeddings for note content. To keep the backend simple, an ingest pipeline automatically generates embeddings during indexing, allowing Flask to send only raw note data.&lt;/p&gt;

&lt;p&gt;The notes index is designed to support:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Text fields for keyword search&lt;/li&gt;
&lt;li&gt;Vector fields for semantic similarity&lt;/li&gt;
&lt;li&gt;Metadata filtering by user, category, and tags&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Ranking and Highlighting
&lt;/h3&gt;

&lt;p&gt;Search results are ranked using Reciprocal Rank Fusion (RRF) to combine lexical and semantic scores effectively. Semantic highlighting is applied to surface the most relevant text segments, improving result interpretability.&lt;/p&gt;

&lt;h2&gt;
  
  
  RAG Chatbot Design
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Motivation
&lt;/h3&gt;

&lt;p&gt;While hybrid search improves note discovery, it still requires users to manually inspect results. To provide a more natural and conversational experience, a Retrieval-Augmented Generation (RAG) chatbot was introduced, allowing users to interact with their notes using natural language questions.&lt;/p&gt;

&lt;p&gt;The goal was to ensure that responses are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Grounded in the user’s own notes&lt;/li&gt;
&lt;li&gt;Context-aware&lt;/li&gt;
&lt;li&gt;Free from hallucinated or unrelated information&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  High-Level Design
&lt;/h3&gt;

&lt;p&gt;The chatbot follows a RAG pipeline where OpenSearch acts as the retrieval layer and a large language model (LLM) handles response generation.&lt;/p&gt;

&lt;p&gt;At a high level:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenSearch retrieves the most relevant notes based on the user query&lt;/li&gt;
&lt;li&gt;Selected note fields are used to build a constrained context&lt;/li&gt;
&lt;li&gt;The LLM generates a response strictly based on this context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This design ensures that the chatbot answers are rooted in actual user data rather than general knowledge.&lt;/p&gt;

&lt;h3&gt;
  
  
  LLM Integration via OpenSearch
&lt;/h3&gt;

&lt;p&gt;Instead of calling the LLM directly from the Flask backend, the model is integrated through OpenSearch’s ML framework using a remote connector. This allows OpenSearch to orchestrate both retrieval and generation in a single pipeline.&lt;/p&gt;

&lt;p&gt;Key benefits of this approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduced backend complexity&lt;/li&gt;
&lt;li&gt;Centralized control over prompts and context&lt;/li&gt;
&lt;li&gt;Easier experimentation with different models&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Context Construction
&lt;/h3&gt;

&lt;p&gt;To minimize noise and token usage, only selected fields from retrieved notes are included in the context:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Title&lt;/li&gt;
&lt;li&gt;Content&lt;/li&gt;
&lt;li&gt;Category&lt;/li&gt;
&lt;li&gt;Tags&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system prompt guides the model to behave as a personal notes assistant, encouraging accurate, polite, and context-bound responses. If relevant information is missing, the model is instructed to acknowledge this explicitly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Session and Message Management
&lt;/h3&gt;

&lt;p&gt;On the application side, chat sessions and messages are persisted to maintain conversational continuity. Each session is isolated per user, ensuring that retrieved context and generated responses remain private and relevant.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fworzmr4wyd2jrpiowt72.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fworzmr4wyd2jrpiowt72.png" alt="Chatbot UI" width="800" height="389"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This project demonstrates how a traditional CRUD-based application can be incrementally enhanced into a smart, conversational system. By integrating hybrid search and a RAG-based chatbot, the notes application evolved beyond simple keyword matching into a more intuitive and meaningful user experience.&lt;/p&gt;

&lt;p&gt;Using OpenSearch as both a retrieval and orchestration layer simplified the architecture while enabling advanced capabilities such as semantic search, contextual highlighting, and grounded text generation. The design choices made throughout the project prioritized clarity, modularity, and practical trade-offs suitable for a real-world application.&lt;/p&gt;

&lt;p&gt;Overall, this experience reinforced the importance of combining solid system design with modern search and AI techniques, and highlighted how thoughtful integration can significantly improve usability without adding unnecessary complexity.&lt;/p&gt;

</description>
      <category>rag</category>
      <category>opensearch</category>
      <category>flask</category>
      <category>ai</category>
    </item>
    <item>
      <title>Fitera: AI-Powered Nutrition and Fitness Tracking Application</title>
      <dc:creator>Meryem Sude Gök</dc:creator>
      <pubDate>Thu, 31 Jul 2025 12:47:46 +0000</pubDate>
      <link>https://dev.to/mantis-stajyer/fitera-ai-powered-nutrition-and-fitness-tracking-application-3iic</link>
      <guid>https://dev.to/mantis-stajyer/fitera-ai-powered-nutrition-and-fitness-tracking-application-3iic</guid>
      <description>&lt;h2&gt;
  
  
  Project Overview
&lt;/h2&gt;

&lt;p&gt;Fitera is a comprehensive web application designed to help users track their nutrition habits, exercise routines, and health goals. The application features an AI-powered chatbot for personalized nutrition and fitness advice, detailed nutritional analysis, and comprehensive health monitoring capabilities.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9gr3l6ozyntpj3dm6g88.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9gr3l6ozyntpj3dm6g88.png" alt=" " width="800" height="581"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can find the source code at &lt;a href="https://github.com/Mantis-Software-Company-Interns/Fitera" rel="noopener noreferrer"&gt;Mantis Intern's Github.&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Technology Stack
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Backend
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Python 3.10+&lt;/strong&gt; with Flask web framework&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flask-Smorest&lt;/strong&gt; for API documentation and validation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PostgreSQL&lt;/strong&gt; database with SQLAlchemy ORM&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JWT&lt;/strong&gt; authentication system&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic Claude&lt;/strong&gt; AI model for chatbot functionality&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAG (Retrieval-Augmented Generation)&lt;/strong&gt; methodology for context-aware responses&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Frontend
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vue.js 3&lt;/strong&gt; with Composition API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vue Router&lt;/strong&gt; for navigation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vuetify&lt;/strong&gt; for UI components&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Axios&lt;/strong&gt; for HTTP requests&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vite&lt;/strong&gt; as build tool&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Core Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  User Management
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Secure registration and login system&lt;/li&gt;
&lt;li&gt;Profile management with height, weight, and goal tracking&lt;/li&gt;
&lt;li&gt;Allergy and diet preference settings&lt;/li&gt;
&lt;li&gt;BMI calculation and health metrics&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Nutrition Tracking
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Daily meal logging (breakfast, lunch, dinner, snacks)&lt;/li&gt;
&lt;li&gt;Detailed macro and micronutrient analysis&lt;/li&gt;
&lt;li&gt;Water consumption tracking&lt;/li&gt;
&lt;li&gt;Comprehensive meal history with nutritional insights&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Exercise Tracking
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Exercise logging with duration and intensity&lt;/li&gt;
&lt;li&gt;Walking and step tracking&lt;/li&gt;
&lt;li&gt;Exercise recommendations based on user profile&lt;/li&gt;
&lt;li&gt;Performance analysis and progress tracking&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Health Monitoring
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Sleep quality logging&lt;/li&gt;
&lt;li&gt;Weight tracking over time&lt;/li&gt;
&lt;li&gt;Health goal setting and progress monitoring&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  AI-Powered Chatbot
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Nutrition consultation and advice&lt;/li&gt;
&lt;li&gt;Exercise recommendations&lt;/li&gt;
&lt;li&gt;Health guidance and tips&lt;/li&gt;
&lt;li&gt;Personalized responses based on user data&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Technical Implementation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Database Architecture
&lt;/h3&gt;

&lt;p&gt;The application uses PostgreSQL with the following key tables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;users&lt;/code&gt; - User profiles and preferences&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;food&lt;/code&gt; - Comprehensive food database&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;macro_nutrients&lt;/code&gt; - Protein, carbs, fat tracking&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;micro_nutrients&lt;/code&gt; - Vitamins and minerals&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;meal_log&lt;/code&gt; - Daily meal records&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;exercise_log&lt;/code&gt; - Exercise tracking&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;water_log&lt;/code&gt; - Hydration monitoring&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sleep_log&lt;/code&gt; - Sleep quality data&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;weight_log&lt;/code&gt; - Weight progression&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  AI Chatbot Implementation
&lt;/h3&gt;

&lt;p&gt;The chatbot utilizes Anthropic's Claude 3.5 Sonnet model with a custom RAG (Retrieval-Augmented Generation) system:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Context Retrieval&lt;/strong&gt;: The system extracts relevant information from the database using keyword-based search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt Engineering&lt;/strong&gt;: User queries are enhanced with retrieved context and conversation history&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Response Generation&lt;/strong&gt;: Claude generates personalized responses based on the enriched context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conversation Management&lt;/strong&gt;: Maintains conversation history for contextual continuity&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  RAG Methodology
&lt;/h3&gt;

&lt;p&gt;The application implements a simplified RAG system that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Builds documents from database tables (food, nutrients, exercise data)&lt;/li&gt;
&lt;li&gt;Performs keyword-based similarity search&lt;/li&gt;
&lt;li&gt;Retrieves relevant context for user queries&lt;/li&gt;
&lt;li&gt;Enhances AI responses with domain-specific information&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Development Challenges and Solutions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Database Integration
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Challenge&lt;/strong&gt;: Managing complex nutritional data with multiple related tables&lt;br&gt;
&lt;strong&gt;Solution&lt;/strong&gt;: Implemented a normalized database schema with proper relationships and efficient querying&lt;/p&gt;

&lt;h3&gt;
  
  
  AI Context Management
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Challenge&lt;/strong&gt;: Providing relevant, personalized responses without overwhelming the AI model&lt;br&gt;
&lt;strong&gt;Solution&lt;/strong&gt;: Developed a targeted document retrieval system that extracts only the most relevant information from the database&lt;/p&gt;

&lt;h3&gt;
  
  
  Real-time Data Processing
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Challenge&lt;/strong&gt;: Handling concurrent user requests and maintaining data consistency&lt;br&gt;
&lt;strong&gt;Solution&lt;/strong&gt;: Implemented proper database transactions and connection pooling&lt;/p&gt;

&lt;h3&gt;
  
  
  Frontend-Backend Communication
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Challenge&lt;/strong&gt;: Ensuring seamless data flow between Vue.js frontend and Flask backend&lt;br&gt;
&lt;strong&gt;Solution&lt;/strong&gt;: Designed RESTful APIs with proper error handling and data validation&lt;/p&gt;




&lt;h2&gt;
  
  
  User Experience Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Modern Interface
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Responsive design that works on desktop and mobile devices&lt;/li&gt;
&lt;li&gt;Light/dark theme support for user preference&lt;/li&gt;
&lt;li&gt;Intuitive navigation with clear visual hierarchy&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Personalization
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;User-specific recommendations based on profile data&lt;/li&gt;
&lt;li&gt;Adaptive interface that learns from user behavior&lt;/li&gt;
&lt;li&gt;Customizable dashboard with preferred metrics&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Data Visualization
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Progress charts for weight, exercise, and nutrition goals&lt;/li&gt;
&lt;li&gt;Nutritional breakdown with visual representations&lt;/li&gt;
&lt;li&gt;Historical data analysis with trend identification&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  API Architecture
&lt;/h2&gt;

&lt;p&gt;The application provides comprehensive REST APIs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Authentication endpoints for user management&lt;/li&gt;
&lt;li&gt;CRUD operations for all health tracking features&lt;/li&gt;
&lt;li&gt;AI chatbot integration with conversation management&lt;/li&gt;
&lt;li&gt;Data export and import capabilities&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Security Implementation
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;JWT-based authentication with secure token management&lt;/li&gt;
&lt;li&gt;Password hashing using bcrypt&lt;/li&gt;
&lt;li&gt;Environment variable configuration for sensitive data&lt;/li&gt;
&lt;li&gt;Input validation and sanitization&lt;/li&gt;
&lt;li&gt;CORS configuration for secure cross-origin requests&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Performance Optimization
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Database indexing for fast query execution&lt;/li&gt;
&lt;li&gt;Efficient document retrieval for AI context&lt;/li&gt;
&lt;li&gt;Frontend caching strategies&lt;/li&gt;
&lt;li&gt;Optimized API response times&lt;/li&gt;
&lt;li&gt;Minimal dependency footprint&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Fitera represents a modern approach to health and fitness tracking, combining traditional data management with cutting-edge AI technology. The application successfully bridges the gap between comprehensive health monitoring and personalized guidance, providing users with both the tools to track their progress and the intelligence to make informed decisions about their health.&lt;/p&gt;




&lt;h2&gt;
  
  
  Future Development
&lt;/h2&gt;

&lt;p&gt;While the current version provides a solid foundation for nutrition and fitness tracking, potential future enhancements could include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Integration with wearable devices for automatic data collection&lt;/li&gt;
&lt;li&gt;Advanced machine learning for predictive health insights&lt;/li&gt;
&lt;li&gt;Social features for community support and motivation&lt;/li&gt;
&lt;li&gt;Mobile application development for enhanced accessibility&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The project serves as an excellent example of how modern web technologies can be combined with AI to create meaningful, user-centric health applications.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Fitera was developed as a comprehensive health tracking solution, showcasing the potential of AI-enhanced personal health management tools.&lt;/em&gt; &lt;/p&gt;

</description>
      <category>python</category>
      <category>vue</category>
      <category>langchain</category>
      <category>rag</category>
    </item>
    <item>
      <title>Tagwise: The Story Behind an AI-Powered Bookmark Categorization Project</title>
      <dc:creator>ebrargunay</dc:creator>
      <pubDate>Sat, 31 May 2025 12:00:35 +0000</pubDate>
      <link>https://dev.to/mantis-stajyer/tagwise-the-story-behind-an-ai-powered-bookmark-categorization-project-4jdp</link>
      <guid>https://dev.to/mantis-stajyer/tagwise-the-story-behind-an-ai-powered-bookmark-categorization-project-4jdp</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;INTRODUCTION&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Tagwise is a project that aims to solve a common problem faced by many internet users: organizing bookmarks. Today, users save hundreds of links, but managing and categorizing them becomes a time-consuming and messy process. Tagwise was created to automate this task and make users’ lives easier.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Starting the Project: The Naming Process&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The first step of the project was to find a suitable name. We wanted a name that clearly reflected the function of the site and communicated its purpose to users. Since the project is based on tagging and categorizing bookmarks, the word “&lt;strong&gt;tag&lt;/strong&gt;” stood out. In addition, because the system offers smart suggestions, we added the word “&lt;strong&gt;wise&lt;/strong&gt;.” Combining these two words, the name “&lt;strong&gt;Tagwise&lt;/strong&gt;” was born — a name that both describes the function and suggests intelligence.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Logo Design: Colors and Symbols&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;After deciding on the name, we moved on to designing the visual identity of the project. For the logo, we decided to use shapes and symbols that reflect artificial intelligence. This was important to emphasize the tech-savvy and smart structure of the system.&lt;/p&gt;

&lt;p&gt;When choosing colors, we went with &lt;strong&gt;blue&lt;/strong&gt; and &lt;strong&gt;yellow&lt;/strong&gt;. Blue represents trust and professionalism, while yellow symbolizes energy and creativity. This combination aligned well with our goal of offering a user-friendly and calming interface.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Technical Foundation and Technologies Used&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The technical foundation of Tagwise is built on modern web technologies and AI systems.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We used &lt;strong&gt;Django&lt;/strong&gt; and &lt;strong&gt;Django REST Framework&lt;/strong&gt; for backend development.&lt;/li&gt;
&lt;li&gt;The database was built with &lt;strong&gt;PostgreSQL&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;For handling HTTP requests and HTML parsing, we used &lt;strong&gt;httpx&lt;/strong&gt; and &lt;strong&gt;BeautifulSoup4&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Selenium&lt;/strong&gt; and &lt;strong&gt;webdriver_manager&lt;/strong&gt; were added for web automation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On the AI side, we integrated &lt;strong&gt;OpenAI GPT-4o&lt;/strong&gt; and &lt;strong&gt;Google Gemini API&lt;/strong&gt;.&lt;br&gt;
For YouTube links, we used &lt;strong&gt;yt-dlp&lt;/strong&gt; and &lt;strong&gt;youtube-transcript-api&lt;/strong&gt; to extract titles, descriptions, and transcripts when available.&lt;/p&gt;

&lt;p&gt;To enable users to search their bookmarks using natural language, we implemented a chatbot using &lt;strong&gt;LangChain&lt;/strong&gt; and &lt;strong&gt;FAISS&lt;/strong&gt;, allowing semantic search over the stored content.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How the System Works&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. URL Processing:&lt;/strong&gt;&lt;br&gt;
When a user submits a link, it is fetched using httpx and parsed with BeautifulSoup4 to extract title, description, and main content.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. YouTube Links:&lt;/strong&gt;&lt;br&gt;
For YouTube URLs, video titles and descriptions are retrieved via yt-dlp. If transcripts are available, they’re also extracted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Fallback with Screenshots:&lt;/strong&gt;&lt;br&gt;
In cases where HTML content cannot be fetched, Selenium is used to capture a screenshot of the page. This image is then analyzed by the AI model for categorization. Screenshots also serve as thumbnails when one is not provided by the source site.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Categorization Approach&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Categorization is handled entirely through large language models (LLMs). The extracted content is sent to GPT-4o or Gemini API for category prediction.&lt;br&gt;
Note: The system does &lt;strong&gt;not&lt;/strong&gt; use vector stores, RAG, or embedding techniques for categorization.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Chatbot and Vector Store Usage&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The chatbot allows users to query their bookmark archives in natural language. It works by embedding the content and storing it in a &lt;strong&gt;FAISS&lt;/strong&gt; vector store via &lt;strong&gt;LangChain&lt;/strong&gt;.&lt;br&gt;
When users type a query, the system uses &lt;strong&gt;retrieval-augmented generation (RAG)&lt;/strong&gt; to fetch relevant bookmarks and present them as answers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Importantly&lt;/strong&gt;, this embedding and vector store functionality is used &lt;strong&gt;only for the chatbot&lt;/strong&gt;, not for categorization.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Challenges and Solutions&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;- HTML Content Access:&lt;/strong&gt;&lt;br&gt;
When content could not be retrieved via standard HTTP requests, Selenium was used to capture screenshots for AI-based analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Missing Transcripts on YouTube:&lt;/strong&gt;&lt;br&gt;
When YouTube transcripts were unavailable, categorization relied only on video titles and descriptions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Missing Thumbnails:&lt;/strong&gt;&lt;br&gt;
If a link didn’t provide a thumbnail, a screenshot of the page was used instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Tagwise offers a smart, user-friendly solution to organize and categorize bookmarks automatically.&lt;br&gt;
It was developed as part of an internship program and, while there are currently no plans to extend the project further, the experience and system created during this process lay a strong foundation for future applications.&lt;/p&gt;

</description>
      <category>uidesign</category>
      <category>bookmarkmanager</category>
      <category>productdevelopement</category>
      <category>uxdesign</category>
    </item>
    <item>
      <title>Tagwise: Technical Review of AI-Powered Bookmark Categorization Project</title>
      <dc:creator>Onur Ceyhan</dc:creator>
      <pubDate>Thu, 29 May 2025 22:46:24 +0000</pubDate>
      <link>https://dev.to/mantis-stajyer/tagwise-technical-review-of-ai-powered-bookmark-categorization-project-8k</link>
      <guid>https://dev.to/mantis-stajyer/tagwise-technical-review-of-ai-powered-bookmark-categorization-project-8k</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Tagwise is a straightforward and effective AI-powered web application developed as an internship project to automatically categorize bookmarked links.&lt;/p&gt;

&lt;p&gt;You can checkout project at &lt;a href="https://github.com/Mantis-Software-Company-Interns/tagwise" rel="noopener noreferrer"&gt;Mantis Intern's Github&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This article clearly discusses the project's technical infrastructure, methodologies, and developed solutions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fahjygsgy3gwdtbfys79i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fahjygsgy3gwdtbfys79i.png" alt="Tagwise Overview" width="800" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Project Objective
&lt;/h2&gt;

&lt;p&gt;Modern internet users frequently bookmark hundreds of links, but manually organizing these links is often time-consuming. Tagwise aims to automate this task, quickly and accurately categorizing bookmarks from a single URL input.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technologies Used
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Backend Framework:&lt;/strong&gt; Django, Django REST Framework&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database:&lt;/strong&gt; PostgreSQL (psycopg2)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HTTP Requests:&lt;/strong&gt; httpx&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HTML Parsing:&lt;/strong&gt; BeautifulSoup4&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web Automation:&lt;/strong&gt; Selenium, webdriver_manager&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Artificial Intelligence:&lt;/strong&gt; OpenAI GPT-4o, Google Gemini API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;YouTube Integration:&lt;/strong&gt; yt-dlp, youtube-transcript-api&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector Store &amp;amp; Chatbot:&lt;/strong&gt; LangChain, FAISS (used only for chatbot functionality)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  System Workflow and Process
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;URL Processing&lt;/strong&gt;&lt;br&gt;
Users enter only the URL. The content from the URL is retrieved in HTML format using httpx. HTML content is parsed into the title, description, and main content using BeautifulSoup4.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Special Process for YouTube Links&lt;/strong&gt;&lt;br&gt;
For YouTube links, video titles and descriptions are fetched using yt-dlp. If available, transcripts (subtitles) are retrieved using youtube-transcript-api. The gathered content is then sent to the AI for categorization.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Alternative Content Capture (Selenium)&lt;/strong&gt;&lt;br&gt;
For sites where HTML content cannot be fetched or parsed, a screenshot of the page is captured using Selenium. This screenshot is sent as visual data to the AI model for category determination. Additionally, if the site lacks a thumbnail (og:image), the Selenium screenshot is automatically used as a thumbnail.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F13mnadqiiod3pcddu0k4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F13mnadqiiod3pcddu0k4.png" alt="System Workflow" width="563" height="768"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Category Assignment Approach
&lt;/h2&gt;

&lt;p&gt;The categorization process is entirely performed through large language models (LLMs). Prompt engineering methods send content directly to OpenAI GPT-4o or Google Gemini API, automatically determining the category. Technologies such as vector store, RAG, or embeddings are not used in the category determination process.&lt;/p&gt;

&lt;h2&gt;
  
  
  Chatbot Feature and Vector Store Usage
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhcnqi9jf2780fqrkf374.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhcnqi9jf2780fqrkf374.png" alt="Chatbot Functionality" width="800" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The project also includes a chatbot feature allowing users to query their bookmark archives in natural language. This chatbot operates by converting bookmark content into embeddings via LangChain, which are then stored in a FAISS vector store. When a user query is received, relevant content is retrieved using the Retrieval-Augmented Generation (RAG) methodology, and presented to the user. These vector store and embedding operations are exclusively for chatbot functionality and are not involved in the categorization process.&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenges Encountered and Solutions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fetching HTML Content:&lt;/strong&gt; Selenium screenshot solutions were employed for content that could not be directly fetched with httpx and BeautifulSoup4.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;YouTube Transcript Absence:&lt;/strong&gt; Categorization was conducted solely based on video titles and descriptions when transcripts were unavailable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thumbnail Absence:&lt;/strong&gt; Selenium screenshots were utilized as thumbnails when og:image or similar visuals were missing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Tagwise offers a simple yet efficient solution for automatically categorizing bookmarks quickly. The project was developed as part of an internship. No further development is planned for the time being.&lt;/p&gt;

&lt;p&gt;Feel free to reach out with your questions and comments!&lt;/p&gt;

</description>
      <category>python</category>
      <category>webdev</category>
      <category>langchain</category>
      <category>ai</category>
    </item>
    <item>
      <title>Tam metin araması (Full-Text Search) nasıl çalışır?</title>
      <dc:creator>Elif Albakır</dc:creator>
      <pubDate>Thu, 30 Jan 2025 12:27:05 +0000</pubDate>
      <link>https://dev.to/mantis-stajyer/tam-metin-aramasi-full-text-search-nasil-calisir-3k1c</link>
      <guid>https://dev.to/mantis-stajyer/tam-metin-aramasi-full-text-search-nasil-calisir-3k1c</guid>
      <description>&lt;p&gt;Full-Text Search dokümanlar içinde serbest metin üzerinden arama yapılmasına olanak sağlayan, Web arama motorlarında ve web sayfalarında en çok kullanılan arama metodlarından biridir. &lt;/p&gt;

&lt;p&gt;Full-Text Search (tam metin araması) büyük veri blokları arasından herhangi bir kaynaktan alınan metin belgeleri içinden, anahtar kelimenin aratılarak anahtar kelime ile eşleşen dokümanların bulunduğu sonuca hızlı ve daha isabetli bir şekilde erişebilmenizi sağlar. &lt;/p&gt;

&lt;p&gt;Tam metin araması şu şekilde çalışır:&lt;br&gt;
İlk önce verileri hızlıca arayabilmek için bir invented index (ters indeks) oluşturulur. Daha sonra bu indeks kullanılarak TF değeri (term frequency, geçiş sıklığı) ve IDF değeri (inverse document frequency, ters belge sıklığı)  hesaplanır ve en son bulunan bu iki değer çarpılırak her bir doküman için vektörler oluşturulur ve sorgu cümlesinin vektörü ile arasındaki açı hesaplanır (cosine similarity). Sorgu vektörü ile doküman vektörü arasındaki açı ne kadar küçükse o doküman o kadar ilgili demektir. &lt;/p&gt;

&lt;p&gt;TF-IDF değeri tam metin araması dışında belge sınıflandırma, konu modelleme ve stop-word filtreleme olmak üzere çeşitli durumlarda kullanılmaktadır.&lt;/p&gt;

&lt;p&gt;Ters indeksleme, dokümanlarınızın içindeki her bir kelime için hangi dokümanlarda o kelimenin olduğu bilgisini tutan bir sistemdir. Ters indeksleme işleminde dokümanlardak kelimeler satırlar şeklinde bölerek onları sütun şeklinde indexler böylece performanslı bir arama yöntemi olur.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fstr90d1l9sysayyk2p0w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fstr90d1l9sysayyk2p0w.png" alt="The inverted index" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;TF, terim frekansı anlamına gelir. Bir kelimenin bir belgedeki geçiş sıklığıdır.&lt;/p&gt;

&lt;p&gt;IDF, ters belge frekansı anlamına gelir. Belge sayısının incelenen kelimeyi içeren topluluktaki belge sayısına bölümünün logaritması alınarak hesaplanır. Örneğin, belge sayımız 100 ise ve aratılan kelimemiz yalnızca 10 belgede görünüyorsa bu durumda IDF değerimiz 1’dir. Toplam doküman sayısı ile geçen doküman sayısı arasındaki fark çok büyük olabileceğinden değeri normalize etmek için logaritması alınır.&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;idf=log⁡Nnk
 idf = \log{\frac{N}{n_{k}}}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;i&lt;/span&gt;&lt;span class="mord mathnormal"&gt;df&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mop"&gt;lo&lt;span&gt;g&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;n&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;k&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;N&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;p&gt;
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;NN &lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;N&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 = Toplam Belge sayısı&lt;br&gt;

&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;nkn_{k} &lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;n&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;k&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 = k kelimesinin geçtiği belge sayısı&lt;/p&gt;

&lt;p&gt;Ardından bulunan TF ve IDF sonuçları kelimenin belge de ne kadar önemli olduğunu bulabilmek amacıyla çarpılır. Bu sayede yaygın kelimeler ve kullanılan ekler düşük ağırlık alırken, spesifik kelimeler daha yüksek önem alır. &lt;/p&gt;


&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;wik=tfik⋅idfk
  w_{ik} = tf_{ik}  \cdot idf_{k}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;w&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;ik&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;t&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;f&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;ik&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;⋅&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;i&lt;/span&gt;&lt;span class="mord mathnormal"&gt;d&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;f&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;k&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;p&gt;
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;tfiktf_{ik} &lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;t&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;f&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;ik&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 = k kelimesinin i belgesindeki geçiş sıklığı&lt;/p&gt;

&lt;p&gt;Doküman vektörlerinin daha sağlıklı karşılaştırılabilmesi için terimlerin ağırlıklarının normalize edilmesi gerekir. &lt;/p&gt;


&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;N(wik)=tfik⋅log⁡Nnk∑k=0M−1(tfik)2⋅[log⁡(Nnk)]2
N(w_{ik}) = \frac{tf_{ik}  \cdot \log{\frac{N}{n_{k}}}}{\sqrt{\sum_{k=0}^{M-1} (tf_{ik})^{2} \cdot \left[ \log\left( \frac{N}{n_{k}} \right)  \right] ^{2} }}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;N&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;w&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;ik&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord sqrt"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span class="svg-align"&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mop"&gt;&lt;span class="mop op-symbol small-op"&gt;∑&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;k&lt;/span&gt;&lt;span class="mrel mtight"&gt;=&lt;/span&gt;&lt;span class="mord mtight"&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;M&lt;/span&gt;&lt;span class="mbin mtight"&gt;−&lt;/span&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;t&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;f&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;ik&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;⋅&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="minner"&gt;&lt;span class="minner"&gt;&lt;span class="mopen delimcenter"&gt;&lt;span class="delimsizing size2"&gt;[&lt;/span&gt;&lt;/span&gt;&lt;span class="mop"&gt;lo&lt;span&gt;g&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="minner"&gt;&lt;span class="mopen delimcenter"&gt;&lt;span class="delimsizing size2"&gt;(&lt;/span&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;n&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size3 size1 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;k&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;N&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose delimcenter"&gt;&lt;span class="delimsizing size2"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose delimcenter"&gt;&lt;span class="delimsizing size2"&gt;]&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="hide-tail"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;t&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;f&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;ik&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;⋅&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mop"&gt;lo&lt;span&gt;g&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;n&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size3 size1 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;k&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;N&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;p&gt;
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;MM &lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;M&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 = Toplam kelime sayısı&lt;/p&gt;

&lt;p&gt;Her bir kelime için elde edilmiş TF-IDF değerleri hesaplanarak doküman vektörleri oluşturulur. Örneğin Mantis ve Yazılım kelimelerinden oluşan vektör şu şekilde olacaktır. &lt;/p&gt;


&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;D1→=[wmantisd1,wyazılımd1]
\overrightarrow{D_{1}} = \left[ w_{mantis_{d1}}, w_{yazılım_{d1}} \right]
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord accent"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;D&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="svg-align"&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="hide-tail"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="minner"&gt;&lt;span class="mopen delimcenter"&gt;[&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;w&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;man&lt;/span&gt;&lt;span class="mord mathnormal mtight"&gt;t&lt;/span&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;s&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size3 size1 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;d&lt;/span&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;w&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;y&lt;/span&gt;&lt;span class="mord mathnormal mtight"&gt;a&lt;/span&gt;&lt;span class="mord mathnormal mtight"&gt;z&lt;/span&gt;&lt;span class="mord latin_fallback mtight"&gt;ı&lt;/span&gt;&lt;span class="mord mathnormal mtight"&gt;l&lt;/span&gt;&lt;span class="mord latin_fallback mtight"&gt;ı&lt;/span&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;m&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size3 size1 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;d&lt;/span&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose delimcenter"&gt;]&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;p&gt;İlgili dokümanları bulmak için sorgu vektörü ile doküman vektörleri arasındaki yakınlığı bulmak gerekir. Bunun için de iki vektör arasındaki açı Cosine Similarity ile hesaplanarak iki vektör arasındaki benzerlik ölçülür. İki açı arasındaki fark ne kadar küçükse o kadar ilgilidir. Böylece işlenen sorgu, inverted index üzerindeki belgelerle eşleştirilir, alaka (relevans) düzeyine göre sıralanır ve kullanıcıya döndürülür.&lt;/p&gt;

&lt;p&gt;İki vektör A ve B için kosinüs benzerliği şu formülle hesaplanır:&lt;/p&gt;


&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;cos⁡(θ)=∑i=1nAiBi∑i=1nAi2⋅∑i=1nBi2
\cos(\theta) = \frac{\sum\limits_{i=1}^{n} A_i B_i}{\sqrt{\sum\limits_{i=1}^{n} A_i^2} \cdot \sqrt{\sum\limits_{i=1}^{n} B_i^2}}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mop"&gt;cos&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;θ&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord sqrt"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span class="svg-align"&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mop op-limits"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;span class="mrel mtight"&gt;=&lt;/span&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="mop op-symbol small-op"&gt;∑&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;n&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;A&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="hide-tail"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;⋅&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord sqrt"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span class="svg-align"&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mop op-limits"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;span class="mrel mtight"&gt;=&lt;/span&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="mop op-symbol small-op"&gt;∑&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;n&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;B&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="hide-tail"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mop op-limits"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;span class="mrel mtight"&gt;=&lt;/span&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="mop op-symbol small-op"&gt;∑&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;n&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;A&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;B&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;p&gt;BM25 ise TF-IDF’in geliştirilmiş bir versiyonudur ve belgelerin sıralama skorlarını hesaplamak için kullanılır. Aradığınız kelime ile en çok eşleşen belgeleri sunar. Bunu yaparken kelime sayacı, kelimelerin yeri ve kelimelerin uzunluğuna dikkat eder. Kelime sayacı ile aradığınız kelimenin dokümanda ne kadar sık geçtiğine bakar, aranılan kelimelerin dokümanda nerede geçtiğine dikkat eder, aratılan kelimenin başlıkta mı yoksa metinde mi olduğuna bakar. Aranan kelimelerin uzunluğu ve spesifikliği istenilen sonucun daha doğru olmasını sağlar. İnternette kullandığımız birçok web sitesinin arama motorunda (Solr, Elasticsearch, Opensearch, ...) bu algoritma kullanılır. Aynı zamanda e-posta filtreleme, ürün önerisi sistemleri ve chatbot’lar gibi birçok farklı yerde karşınıza çıkabilir. &lt;/p&gt;

&lt;h3&gt;
  
  
  SONUÇ
&lt;/h3&gt;

&lt;p&gt;Bu yazının amacı Full-text Search’ün günlük hayatta kullandığımız Google, Yandex gibi arama motorlarının nasıl size istediğiniz dokümanları sunduğunu anlatmaktır. Full-text Search, kelimelerin dokümanlarla bağlantılarını çözerek istediğiniz dokümanın ne olduğunu algılayabilmek için içerisinde TF ve IDF hesaplamaları yapması ve bunlar sayesinde sorgulanan kelime ile ilişkili dokümanlara ulaşmayı sağlamaktadır. TF-IDF aslına bakılırsa sizin girdiğiniz kelimelerin alaka seviyelerini ölçerek sizin için en iyi dokümanı ortaya çıkartmak ister. Bir arama motorunda herhangi bir şey arattığınızda en üstte çıkan sitelerin her zaman konunuzla daha alakalı olduğunu fark etmişsinizdir. İşte bunun olmasının sebebi Full-Text Search’dür. Eğer Full-Text Search’ün nasıl çalıştığını algılayabilirseniz daha etkili ve işinize yarar sorgular yapabilirsiniz.&lt;/p&gt;

&lt;h3&gt;
  
  
  REFERANS
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.turhost.com/blog/tf-idf-nedir/" rel="noopener noreferrer"&gt;https://www.turhost.com/blog/tf-idf-nedir/&lt;/a&gt;&lt;br&gt;
&lt;a href="https://medium.com/algorithms-data-structures/tf-idf-term-frequency-inverse-document-frequency-53feb22a17c6" rel="noopener noreferrer"&gt;https://medium.com/algorithms-data-structures/tf-idf-term-frequency-inverse-document-frequency-53feb22a17c6&lt;/a&gt;&lt;br&gt;
&lt;a href="https://medium.com/@kamillgun/full-text-search-e22a1251539" rel="noopener noreferrer"&gt;https://medium.com/@kamillgun/full-text-search-e22a1251539&lt;/a&gt;&lt;br&gt;
&lt;a href="https://erolakgul.net/2015/09/13/full-text-search-mimarisi/" rel="noopener noreferrer"&gt;https://erolakgul.net/2015/09/13/full-text-search-mimarisi/&lt;/a&gt;&lt;br&gt;
&lt;a href="https://barisakdas.medium.com/bm25-best-match-re%C5%9Fevancy-algoritmas%C4%B1-nedir-a72f4103031c" rel="noopener noreferrer"&gt;https://barisakdas.medium.com/bm25-best-match-re%C5%9Fevancy-algoritmas%C4%B1-nedir-a72f4103031c&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>LLM Nedir, Transformers nasıl Çalışır?</title>
      <dc:creator>Elif Albakır</dc:creator>
      <pubDate>Fri, 24 Jan 2025 08:45:07 +0000</pubDate>
      <link>https://dev.to/mantis-stajyer/llm-nedir-transformers-nasil-calisir-4ne3</link>
      <guid>https://dev.to/mantis-stajyer/llm-nedir-transformers-nasil-calisir-4ne3</guid>
      <description>&lt;p&gt;Genel anlamda aslında LLM (Large Language Models) yani bir diğer adıyla Büyük Dil Modelleri hepimizin günlük yaşantısında bir şekilde kullandığı bir yapay zekâ modelidir. Buna örnek göstermek istersek en popüler örneklerinden birisi ChatGPT’dir. LLM’i farklı yapan şey nedir dersek ortaya kesinlikle insan zekasına yakın bir yaratıcılık ve düşünme yetisi diyebiliriz. LLM daha insanvari bir yapıya sahip olarak, insan duygularını analiz edebilme, kelime tahminlerinde bulunabilme ve kelimeler arasında bağlantı kurabilme gibi özelliklere sahiptir. LLM bunu verilen cümlelere göre sonraki kelimeleri tahmin ederek yapıyor. Bunu basitçe cep telefonunuzdaki kelime tamamlama özelliği gibi düşünebiliriz. Elbette LLM bunu kendiliğinden yapmıyor. Büyük miktarda metin verisiyle eğitmek ve üzerinde ince ayarlar yapmak gerekiyor. Yani LLM'ler büyük veri setleri ile eğitilmiş, kelimeler arasındaki bağlamsal ve dilbilgisel bağlantıları anlayarak uygun kelimeleri seçen bir derin öğrenme modelidir.&lt;/p&gt;

&lt;p&gt;Bunu da kendi kendine öğrenme teknikleri aracılığıyla yapar. Bu noktada Transformers mimarisi devreye girer. Dönüştürücü yapay sinir ağlarının hatırlama ve önerme gibi yetenekleri için kullanılır. Transformers mimarisi kullanılarak girdilerin nasıl çıktılara dönüştüğünü adım adım anlatmakla başlayalım.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.tldraw.com/ro/HSpqDRQVvmDtVNtKNgenB?d=v-2220.-2059.7945.3719.page" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgc3bcwen8h7a7zvwqd0w.png" alt="Transformers mimarisi" width="800" height="1311"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1.
&lt;/h3&gt;

&lt;p&gt;Modele gelen sorgular token adı verilen modelin anlayabileceği birimlere bölünür. Bu birimler, kelime, ek ya da özel karakterler olabilir. Ama model kelimeleri değil sayıları anlar bu sebeple tokenler, modelin anlayabileceği dile, yani embedinglere çevrilmesi için modelin içerisindeki token id'lere çevrilir.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffy7gdygwvaxnj94i1vaq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffy7gdygwvaxnj94i1vaq.png" alt="Tokenizasyon işlemi" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Modelin çalışmaya başlaması için girilen sorgunun bittiğini anlaması gerekir. Bunun için başlangıç tokenleri (&lt;code&gt;&amp;lt;|endoftext|&amp;gt;&lt;/code&gt;) kullanılır. Başlangıç tokenleri basitçe anlatmak gerekirse cümle sonundaki nokta görevini görürler. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4upn9ttjeen2ff38buj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4upn9ttjeen2ff38buj.png" alt="Tokenizasyon işlemi ve token idler" width="800" height="157"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2.
&lt;/h3&gt;

&lt;p&gt;Bu tokenler, model tarafından işlenmeden önce sayıların bulunduğu yüksek boyutlu bir vektöre (tensor) dönüştürülür ve vektör uzayında temsil edilirler. Buna ise embedding yani gömme denir. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkhoq3oxkt9y7rzzf3ooo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkhoq3oxkt9y7rzzf3ooo.png" alt="Embeding işlemi" width="462" height="223"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3.
&lt;/h3&gt;

&lt;p&gt;Pozisyonel kodlama ile embedinglere tokenlerin sırası eklenerek kodlayıcı (encoder) katmanına gönderilir. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F788ktn0umvrzkr1rbvnw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F788ktn0umvrzkr1rbvnw.png" alt="Pozisyonel kodlama işlemi ve encoder katmanı" width="445" height="535"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4.
&lt;/h3&gt;

&lt;p&gt;Encoder katmanı değişken uzunlukta bir veriyi alarak sabit bir değişken haline getirir, embedingler bu kısımda self-attention ve feed forward işlemlerinden de geçer. Self attention, tokenlar arasında bağlantı kurabilme veya kelime öneriminde bulunabilmesi için gereken bir işlemdir. Token arasındaki bağlam ve anlam bütünlüğünü oluşturulmasını sağlar. Buradaki işlemler paralel çalışır.  &lt;/p&gt;

&lt;h3&gt;
  
  
  5.
&lt;/h3&gt;

&lt;p&gt;Encoder katmanında üretilen girdi temsili decoder (çözücü) katmanına gider. Bu katman başlangıç tokenini (&lt;code&gt;&amp;lt;|endoftext|&amp;gt;&lt;/code&gt;) gördüğünde bir sonraki kelime için olası cevap kombinasyonları hesaplanır. Ve bu cevap kombinasyonları embedinglere dönüştürülerek olasılıklarının hesaplanması için lineer katmana yollanır ve lineer katmanda oluşturulan olasılıklar softmax katmanında normalize edildikten sonra  ilk aday kelime bu katmana geri gönderilir. Bu aday kelime girdi temsili de kullanılarak sonraki kelimeler tahmin edilir. Model cevabı bitirdiğinde başlangıç tokenini (&lt;code&gt;&amp;lt;|endoftext|&amp;gt;&lt;/code&gt;) dönerek çıktının bittiğini bildirir.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi19bjkkw3e4ij1oe8ico.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi19bjkkw3e4ij1oe8ico.png" alt="Decoder katmanı" width="417" height="642"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  6.
&lt;/h3&gt;

&lt;p&gt;Lineer katmanda kelime sıralarının korunması ve sıraya dikkat edilir. Modelin tahmin ettiği kelime olasılıkların dağılımını hesaplar. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8yhbsgup6urpbcdsvjmj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8yhbsgup6urpbcdsvjmj.png" alt="Lineer katman" width="372" height="245"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  7.
&lt;/h3&gt;

&lt;p&gt;Lineer katmanda tahmin edilen kelime olasılıkları softmax katmanına yollanır. Girdi ne olursa olsun (pozitif, negatif vs.) softmax fonksiyonu bu girdiyi 0 ve 1 arasında bir değere dönüştürür. Yani olasılık dağılımını normalize eder.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb3zptum1eqzzp3f4s3ki.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb3zptum1eqzzp3f4s3ki.png" alt="Softmax katmanı" width="330" height="208"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Sonuç
&lt;/h2&gt;

&lt;p&gt;Bu yazı LLM’in ne olduğu ve transformers mimarisinin nasıl çalıştığını temel olarak anlatmak amacıyla yazılmıştır. LLM’ler gelişmiş büyük bir veri setleriyle eğitilmeleri ve transformers mimarisini kullanmaları sayesinde girilen sorgulara uygun çıktılar verebilmeleri, sonraki kelimeyi tahmin etme ve olasılıklar oluşturarak yaratıcı cevaplar verebilme yeteneğine sahiptirler. Bu yetenekleri NLP’ler için devrim yaratacak seviyelere gelmiştir. Elbette bu seviyelerin temelinde transformers mimarisinin temel yapı taşı olduğunu belirtmek gerekir. LLM’ler insan ve makine arasındaki iletişimin sağlanması ve etkileşimin artmasını sağlamışlardır. LLM’ler sayesinde doğal dil işleme (NLP) kazandığı bu farklı boyut sayesinde gelecekte çok daha ileri seviyelerde olacak bir gelişim göstermeye devam etmektedir.&lt;/p&gt;

&lt;h2&gt;
  
  
  Referanslar
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://rpradeepmenon.medium.com/introduction-to-large-language-models-and-the-transformer-architecture-534408ed7e61" rel="noopener noreferrer"&gt;https://rpradeepmenon.medium.com/introduction-to-large-language-models-and-the-transformer-architecture-534408ed7e61&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://developers.google.com/machine-learning/crash-course/llm/transformers?hl=tr" rel="noopener noreferrer"&gt;https://developers.google.com/machine-learning/crash-course/llm/transformers?hl=tr&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/what-is/large-language-model/" rel="noopener noreferrer"&gt;https://aws.amazon.com/what-is/large-language-model/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://bulutistan.com/blog/large-language-model-llm-nedir-uygulama-ornekleri/" rel="noopener noreferrer"&gt;https://bulutistan.com/blog/large-language-model-llm-nedir-uygulama-ornekleri/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://youtu.be/wjZofJX0v4M" rel="noopener noreferrer"&gt;https://youtu.be/wjZofJX0v4M&lt;/a&gt; &lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
