<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Onur Ceyhan</title>
    <description>The latest articles on DEV Community by Onur Ceyhan (@onur_ceyhan_2c76958adb396).</description>
    <link>https://dev.to/onur_ceyhan_2c76958adb396</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3199297%2Ff4d4b935-db1b-4b72-af7d-16252bad316b.png</url>
      <title>DEV Community: Onur Ceyhan</title>
      <link>https://dev.to/onur_ceyhan_2c76958adb396</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/onur_ceyhan_2c76958adb396"/>
    <language>en</language>
    <item>
      <title>Tagwise: Technical Review of AI-Powered Bookmark Categorization Project</title>
      <dc:creator>Onur Ceyhan</dc:creator>
      <pubDate>Thu, 29 May 2025 22:46:24 +0000</pubDate>
      <link>https://dev.to/mantis-stajyer/tagwise-technical-review-of-ai-powered-bookmark-categorization-project-8k</link>
      <guid>https://dev.to/mantis-stajyer/tagwise-technical-review-of-ai-powered-bookmark-categorization-project-8k</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Tagwise is a straightforward and effective AI-powered web application developed as an internship project to automatically categorize bookmarked links.&lt;/p&gt;

&lt;p&gt;You can checkout project at &lt;a href="https://github.com/Mantis-Software-Company-Interns/tagwise" rel="noopener noreferrer"&gt;Mantis Intern's Github&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This article clearly discusses the project's technical infrastructure, methodologies, and developed solutions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fahjygsgy3gwdtbfys79i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fahjygsgy3gwdtbfys79i.png" alt="Tagwise Overview" width="800" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Project Objective
&lt;/h2&gt;

&lt;p&gt;Modern internet users frequently bookmark hundreds of links, but manually organizing these links is often time-consuming. Tagwise aims to automate this task, quickly and accurately categorizing bookmarks from a single URL input.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technologies Used
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Backend Framework:&lt;/strong&gt; Django, Django REST Framework&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database:&lt;/strong&gt; PostgreSQL (psycopg2)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HTTP Requests:&lt;/strong&gt; httpx&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HTML Parsing:&lt;/strong&gt; BeautifulSoup4&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web Automation:&lt;/strong&gt; Selenium, webdriver_manager&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Artificial Intelligence:&lt;/strong&gt; OpenAI GPT-4o, Google Gemini API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;YouTube Integration:&lt;/strong&gt; yt-dlp, youtube-transcript-api&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector Store &amp;amp; Chatbot:&lt;/strong&gt; LangChain, FAISS (used only for chatbot functionality)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  System Workflow and Process
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;URL Processing&lt;/strong&gt;&lt;br&gt;
Users enter only the URL. The content from the URL is retrieved in HTML format using httpx. HTML content is parsed into the title, description, and main content using BeautifulSoup4.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Special Process for YouTube Links&lt;/strong&gt;&lt;br&gt;
For YouTube links, video titles and descriptions are fetched using yt-dlp. If available, transcripts (subtitles) are retrieved using youtube-transcript-api. The gathered content is then sent to the AI for categorization.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Alternative Content Capture (Selenium)&lt;/strong&gt;&lt;br&gt;
For sites where HTML content cannot be fetched or parsed, a screenshot of the page is captured using Selenium. This screenshot is sent as visual data to the AI model for category determination. Additionally, if the site lacks a thumbnail (og:image), the Selenium screenshot is automatically used as a thumbnail.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F13mnadqiiod3pcddu0k4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F13mnadqiiod3pcddu0k4.png" alt="System Workflow" width="563" height="768"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Category Assignment Approach
&lt;/h2&gt;

&lt;p&gt;The categorization process is entirely performed through large language models (LLMs). Prompt engineering methods send content directly to OpenAI GPT-4o or Google Gemini API, automatically determining the category. Technologies such as vector store, RAG, or embeddings are not used in the category determination process.&lt;/p&gt;

&lt;h2&gt;
  
  
  Chatbot Feature and Vector Store Usage
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhcnqi9jf2780fqrkf374.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhcnqi9jf2780fqrkf374.png" alt="Chatbot Functionality" width="800" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The project also includes a chatbot feature allowing users to query their bookmark archives in natural language. This chatbot operates by converting bookmark content into embeddings via LangChain, which are then stored in a FAISS vector store. When a user query is received, relevant content is retrieved using the Retrieval-Augmented Generation (RAG) methodology, and presented to the user. These vector store and embedding operations are exclusively for chatbot functionality and are not involved in the categorization process.&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenges Encountered and Solutions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fetching HTML Content:&lt;/strong&gt; Selenium screenshot solutions were employed for content that could not be directly fetched with httpx and BeautifulSoup4.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;YouTube Transcript Absence:&lt;/strong&gt; Categorization was conducted solely based on video titles and descriptions when transcripts were unavailable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thumbnail Absence:&lt;/strong&gt; Selenium screenshots were utilized as thumbnails when og:image or similar visuals were missing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Tagwise offers a simple yet efficient solution for automatically categorizing bookmarks quickly. The project was developed as part of an internship. No further development is planned for the time being.&lt;/p&gt;

&lt;p&gt;Feel free to reach out with your questions and comments!&lt;/p&gt;

</description>
      <category>python</category>
      <category>webdev</category>
      <category>langchain</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
