<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Дмитрий</title>
    <description>The latest articles on DEV Community by Дмитрий (@_c6b40244e4cdb3f10).</description>
    <link>https://dev.to/_c6b40244e4cdb3f10</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3658068%2F38598a29-856f-4798-9bfd-9ad90f0b63e9.png</url>
      <title>DEV Community: Дмитрий</title>
      <link>https://dev.to/_c6b40244e4cdb3f10</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/_c6b40244e4cdb3f10"/>
    <language>en</language>
    <item>
      <title>Stop feeding garbage to your LLM: How to get clean Markdown from Documentation</title>
      <dc:creator>Дмитрий</dc:creator>
      <pubDate>Sat, 13 Dec 2025 14:09:02 +0000</pubDate>
      <link>https://dev.to/_c6b40244e4cdb3f10/stop-feeding-garbage-to-your-llm-how-to-get-clean-markdown-from-documentation-1l69</link>
      <guid>https://dev.to/_c6b40244e4cdb3f10/stop-feeding-garbage-to-your-llm-how-to-get-clean-markdown-from-documentation-1l69</guid>
      <description>&lt;p&gt;Building a RAG (Retrieval-Augmented Generation) pipeline sounds easy until you hit the data ingestion step.&lt;/p&gt;

&lt;p&gt;If you are trying to build a "Chat with Docs" app for a modern framework (like Next.js, Stripe, or Supabase), you know the pain:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Hydration issues:&lt;/strong&gt; Standard &lt;code&gt;fetch&lt;/code&gt; or &lt;code&gt;BeautifulSoup&lt;/code&gt; get an empty &lt;code&gt;div&lt;/code&gt; because the content loads via JS.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Noise:&lt;/strong&gt; You scrape the content, but you also get the navbar, the footer, the "Copyright 2025", and the "Sign Up" button. All this junk wastes your context window tokens.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Broken formatting:&lt;/strong&gt; Code blocks lose their structure, and tables turn into a mess.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The Solution
&lt;/h3&gt;

&lt;p&gt;I got tired of fixing these issues manually for every project, so I built a specialized Actor on Apify designed specifically for RAG pipelines.&lt;/p&gt;

&lt;p&gt;It does three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Uses a headless browser&lt;/strong&gt; to wait for the page to fully hydrate.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Smart extraction:&lt;/strong&gt; It identifies the main content area (&lt;code&gt;&amp;lt;article&amp;gt;&lt;/code&gt;, &lt;code&gt;main&lt;/code&gt;, etc.) and strips away the UI noise.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Markdown conversion:&lt;/strong&gt; It turns the HTML into clean Markdown, preserving code blocks and tables.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  How to use it
&lt;/h3&gt;

&lt;p&gt;You can try it for free on Apify. You just plug in the URL of the documentation (e.g., &lt;code&gt;https://docs.stripe.com/&lt;/code&gt;) and get a JSON/Markdown file ready for your Vector Database.&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;Link to the tool:&lt;/strong&gt; &lt;a href="https://apify.com/hedelka/tech-docs-scraper" rel="noopener noreferrer"&gt;https://apify.com/hedelka/tech-docs-scraper&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'm currently using it to feed Pinecone for my personal projects. Let me know if it helps with your data ingestion layer!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webscraping</category>
      <category>rag</category>
      <category>python</category>
    </item>
    <item>
      <title>Build Better RAG Pipelines: Scraping Technical Docs to Clean Markdown</title>
      <dc:creator>Дмитрий</dc:creator>
      <pubDate>Fri, 12 Dec 2025 03:56:06 +0000</pubDate>
      <link>https://dev.to/_c6b40244e4cdb3f10/build-better-rag-pipelines-scraping-technical-docs-to-clean-markdown-461b</link>
      <guid>https://dev.to/_c6b40244e4cdb3f10/build-better-rag-pipelines-scraping-technical-docs-to-clean-markdown-461b</guid>
      <description>&lt;p&gt;Building a RAG (Retrieval-Augmented Generation) pipeline usually starts with a simple goal: &lt;strong&gt;"I want to chat with this documentation."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You write a quick script to scrape the site, feed it into your vector database, and... the results are garbage. 🗑️&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem with Generic Scraping
&lt;/h3&gt;

&lt;p&gt;If you simply &lt;code&gt;curl&lt;/code&gt; a documentation page or use a generic crawler, your LLM context gets flooded with noise:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Navigation menus&lt;/strong&gt; repeated on every single page ("Home &amp;gt; Docs &amp;gt; API...").&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Sidebars&lt;/strong&gt; that confuse semantic search.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Footers, cookie banners, and scripts.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Broken code blocks&lt;/strong&gt; that lose their language tags.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your retrieval system ends up matching the "Terms of Service" link in the footer instead of the actual API method you were looking for.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: A Framework-Aware Scraper
&lt;/h3&gt;

&lt;p&gt;I built &lt;a href="https://apify.com/hedelka/tech-docs-scraper" rel="noopener noreferrer"&gt;Tech Docs to LLM-Ready Markdown&lt;/a&gt; to solve this exact problem. &lt;/p&gt;

&lt;p&gt;Instead of treating every page as a bag of HTML tags, this actor &lt;strong&gt;detects the documentation framework&lt;/strong&gt; (Docusaurus, GitBook, MkDocs, etc.) and intelligently extracts &lt;em&gt;only&lt;/em&gt; the content you care about.&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://apify.com/hedelka/tech-docs-scraper" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fapify.com%2Fog-image%2Factor%3FactorName%3DTech%2BDocs%2Bto%2BLLM-Ready%2BMarkdown%26uniqueName%3Dhedelka%252Ftech-docs-scraper%26categories%3DDEVELOPER_TOOLS%26users%3D17%26runs%3D232%26pictureUrl%3Dhttps%253A%252F%252Fapify-image-uploads-prod.s3.us-east-1.amazonaws.com%252FU58YeLvOhZ7XGS7Zu-actor-sgjYDvPoTyrRR3Yrn-4YQooup6JO-logo.png%26authorName%3DDmitry%2BGoncharov%26userPictureUrl%3Dhttps%253A%252F%252Fimages.apifyusercontent.com%252FNRMRsotHBq8k951bPOrmoxp8_cIsfsRbwVkPcVAGE-s%252Frs%253Afill%253A224%253A224%252Fcb%253A1%252FaHR0cHM6Ly9saDMuZ29vZ2xldXNlcmNvbnRlbnQuY29tL2EvQUNnOG9jTER0THVYM0hiTWQzNzlIRVgxMTZ1Q041RWFxd2ZWdkdtLVdoSm9JLWNQVV9wT3F3PXM5Ni1j.webp" height="auto" class="m-0"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://apify.com/hedelka/tech-docs-scraper" rel="noopener noreferrer" class="c-link"&gt;
            Tech Docs to Markdown for RAG &amp;amp; LLM · Apify
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            Scrape technical documentation from Docusaurus, GitBook, MkDocs and convert to clean Markdown for RAG pipelines and AI training.
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fapify.com%2Ffavicon.ico%3Ffavicon.07789f7d.ico"&gt;
          apify.com
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;




&lt;h3&gt;
  
  
  🚀 Key Features for RAG Pipelines
&lt;/h3&gt;

&lt;p&gt;Here is why this is better than writing your own BeautifulSoup script:&lt;/p&gt;

&lt;h4&gt;
  
  
  1. Smart Framework Detection
&lt;/h4&gt;

&lt;p&gt;It automatically identifies the underlying tech stack and applies specialized extraction rules.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  ✅ &lt;strong&gt;Docusaurus&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  ✅ &lt;strong&gt;GitBook&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  ✅ &lt;strong&gt;MkDocs&lt;/strong&gt; (Material)&lt;/li&gt;
&lt;li&gt;  ✅ &lt;strong&gt;ReadTheDocs&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  ✅ &lt;strong&gt;VuePress / Nextra&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. Auto-Cleaning
&lt;/h4&gt;

&lt;p&gt;It automatically strips out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Sidebars &amp;amp; Top Navigation&lt;/li&gt;
&lt;li&gt;  "Edit this page" links&lt;/li&gt;
&lt;li&gt;  Table of Contents (redundant for embeddings)&lt;/li&gt;
&lt;li&gt;  Footers &amp;amp; Legal text&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. RAG-First Output Format 🤖
&lt;/h4&gt;

&lt;p&gt;This is the game-changer. The scraper doesn't just output text; it outputs structured data designed for vector DBs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;&lt;code&gt;doc_id&lt;/code&gt;&lt;/strong&gt;: A stable, unique hash of the URL (great for deduplication).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;code&gt;section_path&lt;/code&gt;&lt;/strong&gt;: The breadcrumb path (e.g., &lt;code&gt;Guides &amp;gt; Advanced &amp;gt; Configuration&lt;/code&gt;). Essential for filtering retrieval results!&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;code&gt;chunk_index&lt;/code&gt;&lt;/strong&gt;: Built-in chunking support so you don't have to re-chunk huge pages later.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"doc_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"acdb145c14f4310b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Introduction | Crawlee"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"section_path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Guides &amp;gt; Quick Start &amp;gt; Introduction"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"# Introduction&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;Crawlee covers your crawling..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"framework"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"docusaurus"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"metadata"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"wordCount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;358&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"crawledAt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2025-12-12T03:34:46.151Z"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  🛠️ Integration with LangChain
&lt;/h3&gt;

&lt;p&gt;Since the output is structured, loading it into LangChain is trivial using the &lt;code&gt;ApifyDatasetLoader&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.document_loaders&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ApifyDatasetLoader&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.docstore.document&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Document&lt;/span&gt;

&lt;span class="c1"&gt;# Load results from Apify Dataset
&lt;/span&gt;&lt;span class="n"&gt;loader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ApifyDatasetLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;dataset_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_DATASET_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;dataset_mapping_function&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doc_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doc_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;section&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;section_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="c1"&gt;# &amp;lt;--- Filter by section later!
&lt;/span&gt;        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Your docs are now ready for embeddings!
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Loaded &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; clean documents.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  📉 Cost &amp;amp; Performance
&lt;/h3&gt;

&lt;p&gt;The actor uses a custom lightweight extraction engine (on top of Cheerio), so it's extremely fast and cheap.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Pricing:&lt;/strong&gt; Pay-per-result ($0.50 per 1,000 pages).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Speed:&lt;/strong&gt; Can process hundreds of pages per minute.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Try it out
&lt;/h3&gt;

&lt;p&gt;If you are building an AI assistant for a library, SDK, or internal docs, give it a shot. It saves hours of data cleaning time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apify.com/hedelka/tech-docs-scraper" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Try Tech Docs Scraper&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;Let me know in the comments if there are other documentation frameworks you'd like me to add! 👇&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>rag</category>
      <category>wideas</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
