DEV Community

ebrargunay for Mantis Stajyer Blogu

Posted on

Tagwise: The Story Behind an AI-Powered Bookmark Categorization Project

INTRODUCTION

Tagwise is a project that aims to solve a common problem faced by many internet users: organizing bookmarks. Today, users save hundreds of links, but managing and categorizing them becomes a time-consuming and messy process. Tagwise was created to automate this task and make users’ lives easier.

Starting the Project: The Naming Process

The first step of the project was to find a suitable name. We wanted a name that clearly reflected the function of the site and communicated its purpose to users. Since the project is based on tagging and categorizing bookmarks, the word “tag” stood out. In addition, because the system offers smart suggestions, we added the word “wise.” Combining these two words, the name “Tagwise” was born — a name that both describes the function and suggests intelligence.

Logo Design: Colors and Symbols

After deciding on the name, we moved on to designing the visual identity of the project. For the logo, we decided to use shapes and symbols that reflect artificial intelligence. This was important to emphasize the tech-savvy and smart structure of the system.

When choosing colors, we went with blue and yellow. Blue represents trust and professionalism, while yellow symbolizes energy and creativity. This combination aligned well with our goal of offering a user-friendly and calming interface.

Technical Foundation and Technologies Used

The technical foundation of Tagwise is built on modern web technologies and AI systems.

  • We used Django and Django REST Framework for backend development.
  • The database was built with PostgreSQL.
  • For handling HTTP requests and HTML parsing, we used httpx and BeautifulSoup4.
  • Selenium and webdriver_manager were added for web automation.

On the AI side, we integrated OpenAI GPT-4o and Google Gemini API.
For YouTube links, we used yt-dlp and youtube-transcript-api to extract titles, descriptions, and transcripts when available.

To enable users to search their bookmarks using natural language, we implemented a chatbot using LangChain and FAISS, allowing semantic search over the stored content.

How the System Works

1. URL Processing:
When a user submits a link, it is fetched using httpx and parsed with BeautifulSoup4 to extract title, description, and main content.

2. YouTube Links:
For YouTube URLs, video titles and descriptions are retrieved via yt-dlp. If transcripts are available, they’re also extracted.

3. Fallback with Screenshots:
In cases where HTML content cannot be fetched, Selenium is used to capture a screenshot of the page. This image is then analyzed by the AI model for categorization. Screenshots also serve as thumbnails when one is not provided by the source site.

Categorization Approach

Categorization is handled entirely through large language models (LLMs). The extracted content is sent to GPT-4o or Gemini API for category prediction.
Note: The system does not use vector stores, RAG, or embedding techniques for categorization.

Chatbot and Vector Store Usage

The chatbot allows users to query their bookmark archives in natural language. It works by embedding the content and storing it in a FAISS vector store via LangChain.
When users type a query, the system uses retrieval-augmented generation (RAG) to fetch relevant bookmarks and present them as answers.

Importantly, this embedding and vector store functionality is used only for the chatbot, not for categorization.

Challenges and Solutions

- HTML Content Access:
When content could not be retrieved via standard HTTP requests, Selenium was used to capture screenshots for AI-based analysis.

- Missing Transcripts on YouTube:
When YouTube transcripts were unavailable, categorization relied only on video titles and descriptions.

- Missing Thumbnails:
If a link didn’t provide a thumbnail, a screenshot of the page was used instead.

Conclusion

Tagwise offers a smart, user-friendly solution to organize and categorize bookmarks automatically.
It was developed as part of an internship program and, while there are currently no plans to extend the project further, the experience and system created during this process lay a strong foundation for future applications.

Top comments (0)