<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Piotr</title>
    <description>The latest articles on DEV Community by Piotr (@rumca-js).</description>
    <link>https://dev.to/rumca-js</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2300993%2F4731864a-3e42-44b6-943c-ba2734af9ab5.png</url>
      <title>DEV Community: Piotr</title>
      <link>https://dev.to/rumca-js</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rumca-js"/>
    <language>en</language>
    <item>
      <title>Web Crawling and RSS Reading Made Easy</title>
      <dc:creator>Piotr</dc:creator>
      <pubDate>Fri, 31 Jan 2025 15:46:11 +0000</pubDate>
      <link>https://dev.to/rumca-js/web-crawling-and-rss-reading-made-easy-5340</link>
      <guid>https://dev.to/rumca-js/web-crawling-and-rss-reading-made-easy-5340</guid>
      <description>&lt;p&gt;Tired of building yet another RSS client or web crawler?&lt;/p&gt;

&lt;p&gt;Don't worry - Crawler Buddy is here to save the day! This project makes it easy to crawl web pages and return digestible responses in JSON format.&lt;/p&gt;

&lt;p&gt;Key Features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No more reliance on external tools: Forget about yt-dlp or Beautiful Soup for link metadata extraction.&lt;/li&gt;
&lt;li&gt;Standardized metadata: Get consistent fields like title, description, date_published, and more.&lt;/li&gt;
&lt;li&gt;Bot protection? No problem: Access RSS feeds—even on sites with tricky bot protection—without custom HTTP wrappers.&lt;/li&gt;
&lt;li&gt;Automatic feed detection: It can automatically discover RSS feed URLs for websites and YouTube channels in many cases.&lt;/li&gt;
&lt;li&gt;Simplified data handling: Skip parsing RSS files. Just consume easy-to-use JSON.&lt;/li&gt;
&lt;li&gt;Unified interface: Access all metadata from a single, simple interface.&lt;/li&gt;
&lt;li&gt;Containerized Docker environment: Isolate problems from your host OS for seamless operation.&lt;/li&gt;
&lt;li&gt;Scalability: Whether you're running a single server or multiple, Crawler Buddy fits your needs.&lt;/li&gt;
&lt;li&gt;UTF-8 encoding: Say goodbye to encoding issues—everything is in UTF.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Available Crawlers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RequestsCrawler: Python requests&lt;/li&gt;
&lt;li&gt;CrawleeScript: Crawlee with BeautifulSoup&lt;/li&gt;
&lt;li&gt;PlaywrightScript: Crawlee with Playwright&lt;/li&gt;
&lt;li&gt;SeleniumUndetected: Undetected Selenium&lt;/li&gt;
&lt;li&gt;SeleniumChromeHeadless: Selenium in headless mode&lt;/li&gt;
&lt;li&gt;SeleniumChromeFull: Full Selenium mode&lt;/li&gt;
&lt;li&gt;StealthRequestsCrawler: Stealthy requests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Want to learn more?&lt;br&gt;
Check out the official repository: &lt;a href="https://github.com/rumca-js/crawler-buddy" rel="noopener noreferrer"&gt;Crawler Buddy GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>rss</category>
      <category>python</category>
    </item>
    <item>
      <title>Django bookmark management software</title>
      <dc:creator>Piotr</dc:creator>
      <pubDate>Tue, 29 Oct 2024 13:22:36 +0000</pubDate>
      <link>https://dev.to/rumca-js/django-bookmark-management-software-2ngg</link>
      <guid>https://dev.to/rumca-js/django-bookmark-management-software-2ngg</guid>
      <description>&lt;h1&gt;
  
  
  Overview
&lt;/h1&gt;

&lt;p&gt;Two years ago, I started a personal project with a big goal: creating a truly complete RSS client. I know what you're probably thinking—aren't there already thousands of RSS clients out there? It's true, but I believe none of them have yet delivered the ultimate user experience.&lt;/p&gt;

&lt;p&gt;Of course, there are some fantastic tools in the realm of bookmark managers and RSS clients, like the impressive &lt;a href="https://github.com/goniszewski/grimoire" rel="noopener noreferrer"&gt;Grimoire&lt;/a&gt; project. There's also a wealth of other resources on GitHub’s &lt;a href="https://github.com/awesome-selfhosted/awesome-selfhosted?tab=readme-ov-file#bookmarks-and-link-sharing" rel="noopener noreferrer"&gt;Awesome Selfhosted&lt;/a&gt; list.&lt;/p&gt;

&lt;p&gt;After much trial and error, I realized what I truly wanted from a manager:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Self-hostable&lt;/strong&gt;: No syncing across external platforms. I want my bookmarks secure and fully managed on my own server.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalable&lt;/strong&gt;: It must handle thousands of bookmarks with ease.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Powerful search and tagging&lt;/strong&gt;: With so many bookmarks, an efficient search and tagging system is essential.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Comment and note support&lt;/strong&gt;: I need the ability to add detailed notes or context to each bookmark.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;File over function&lt;/strong&gt;: The ability to import/export in multiple formats is a must.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open Source&lt;/strong&gt;: I want full transparency, and I aim to prevent the "enshittification" that often creeps into closed systems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Small footpring&lt;/strong&gt;: I want it to run on Raspberry Pi, or small NAS&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Looking at other RSS clients, I found that very few could meet my criteria. Many, in my opinion, fall short in features or flexibility.&lt;/p&gt;

&lt;h1&gt;
  
  
  Introducing Django-link-archive
&lt;/h1&gt;

&lt;p&gt;I’ve developed most of these features in my project, &lt;em&gt;Django-link-archive&lt;/em&gt;, which has become my primary tool for managing bookmarks. It’s transformed how I navigate content online—I control what I want to see and avoid the distractions pushed by social media algorithms.&lt;/p&gt;

&lt;p&gt;Take a look if you’re interested:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/rumca-js/Django-link-archive" rel="noopener noreferrer"&gt;Django-link-archive GitHub Repository&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Seeking Feedback
&lt;/h3&gt;

&lt;p&gt;Now, I'm looking for feedback. Are there other requirements you’d expect from a robust RSS client or bookmark manager? Any features you find especially useful?&lt;/p&gt;

&lt;p&gt;I've already received insightful ideas from the Reddit community. For example, I recently added a kiosk-like feature where the list of entries refreshes periodically. I also integrated jQuery, making interactions much more fluid.&lt;/p&gt;

&lt;h1&gt;
  
  
  Additional Projects
&lt;/h1&gt;

&lt;p&gt;As I continued to work with RSS data, I was able to build out some related repositories, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/rumca-js/Internet-Places-Database" rel="noopener noreferrer"&gt;Internet Places Database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/rumca-js/RSS-Link-Database" rel="noopener noreferrer"&gt;RSS Link Database&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In some ways, this project has evolved into a simplified web crawler. I’ve added options for changing "browser" mechanisms in the backend to include &lt;em&gt;requests&lt;/em&gt;, &lt;em&gt;Selenium&lt;/em&gt;, and &lt;em&gt;Crawlee&lt;/em&gt;. This setup is entirely configurable through a GUI, so I can assign specific crawling methods to particular domains—for instance, Spotify might require a full Selenium browser, while Crawlee performs better with other domains.&lt;/p&gt;

&lt;p&gt;Maintaining this ecosystem solo has been a lot, and things do occasionally break. Still, I’m excited to share this with the community and hear your thoughts!&lt;/p&gt;

&lt;p&gt;Thank you for reading, and I look forward to any feedback you may have.&lt;/p&gt;

</description>
      <category>python</category>
      <category>django</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
