<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Miguel Álvarez</title>
    <description>The latest articles on DEV Community by Miguel Álvarez (@malvads).</description>
    <link>https://dev.to/malvads</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3746027%2F99e06ff5-b4eb-4e3a-a3f8-a8ac49051b44.png</url>
      <title>DEV Community: Miguel Álvarez</title>
      <link>https://dev.to/malvads</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/malvads"/>
    <language>en</language>
    <item>
      <title>Slopper GitHub Action: Fighting AI Slop contributions on Open Source Projects</title>
      <dc:creator>Miguel Álvarez</dc:creator>
      <pubDate>Sat, 06 Jun 2026 18:04:42 +0000</pubDate>
      <link>https://dev.to/malvads/slopper-github-action-fighting-ai-slop-contributions-on-open-source-projects-57pl</link>
      <guid>https://dev.to/malvads/slopper-github-action-fighting-ai-slop-contributions-on-open-source-projects-57pl</guid>
      <description>&lt;p&gt;Open source maintainers are drowning in AI-generated PRs that look clean but add nothing. curl, the Linux kernel, Godot, Node.js, they've all been hit. Polished descriptions, passing CI, and zero real value. The community calls it "slop," and it's becoming one of the biggest problems in open source today.&lt;/p&gt;

&lt;p&gt;Slopper is a very experimental open source initiative to fight back. It's a GitHub Action that sits in your repo and analyzes every pull request, author reputation, commit patterns, code quality, and behavioral signals to answer one question: does this PR actually add value?  &lt;/p&gt;

&lt;p&gt;⚠️  This is early-stage, highly experimental software. It works, but expect rough edges, false positives, and breaking changes. I'm sharing it now because the problem is urgent and I'd rather iterate in the open with community feedback than polish it in private.  &lt;br&gt;
It &lt;em&gt;try&lt;/em&gt; to catch the patterns maintainers have been reporting like phantom fixes for bugs that don't exist, unnecessary refactoring, duplicate functionality, documentation that restates the obvious, spray-and-pray accounts submitting to dozens of unrelated repos, and reputation farming in critical infrastructure.&lt;br&gt;
it posts a risk score (0–10) with a detailed breakdown, applies labels, and if you configure it, can auto-close high-risk PRs or request reviewers.&lt;br&gt;
For deeper analysis, optionally add an AI provider&lt;br&gt;
Setup is one workflow file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;malvads/slopper@v1&lt;/span&gt;
&lt;span class="na"&gt;    with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;      github-token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.GITHUB_TOKEN }}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And an optional .slopper config in your repo root to tune everything:&lt;br&gt;
&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt; &lt;span class="na"&gt;vouched&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="s"&gt;    - dependabot[bot]&lt;/span&gt;
&lt;span class="s"&gt;    - renovate[bot]&lt;/span&gt;
&lt;span class="s"&gt;    - trusted-maintainer&lt;/span&gt;

&lt;span class="na"&gt;  banned&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="s"&gt;    - known-slop-account&lt;/span&gt;

&lt;span class="na"&gt;  actions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;    auto_close&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;      enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;      threshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;9&lt;/span&gt;
&lt;span class="na"&gt;    auto_request_review&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;      enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;      threshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;6&lt;/span&gt;
&lt;span class="na"&gt;      reviewers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="s"&gt;        - senior-maintainer&lt;/span&gt;

&lt;span class="na"&gt;  thresholds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;    low&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
&lt;span class="na"&gt;    medium&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
&lt;span class="na"&gt;    high&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt;

&lt;span class="na"&gt;  label_thresholds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;    ai_likely&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;70&lt;/span&gt;
&lt;span class="na"&gt;    ai_possibly&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;40&lt;/span&gt;
&lt;span class="na"&gt;    spray_score&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;60&lt;/span&gt;
&lt;span class="na"&gt;    new_account_days&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
&lt;span class="na"&gt;    activity_burst_prs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;span class="na"&gt;    spray_weights&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;      repos&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;40&lt;/span&gt;
&lt;span class="na"&gt;      volume&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
&lt;span class="na"&gt;      merge_ratio&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
&lt;span class="na"&gt;      account_age&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;

&lt;span class="na"&gt;  ignore_paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="s"&gt;    - "*.md"&lt;/span&gt;
&lt;span class="s"&gt;    - "docs/**"&lt;/span&gt;
&lt;span class="s"&gt;    - "*.lock"&lt;/span&gt;

&lt;span class="na"&gt;  rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;    require_description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;    require_linked_issue&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;  MIT licensed, built for the community. Feedback, bug reports, and ideas very welcome — especially from&lt;br&gt;
  anyone already dealing with this problem.&lt;br&gt;
  Repo: &lt;a href="https://github.com/malvads/Slopper" rel="noopener noreferrer"&gt;https://github.com/malvads/Slopper&lt;/a&gt;&lt;br&gt;
  Action: &lt;a href="https://github.com/marketplace/actions/slopper-ai-slop-detector" rel="noopener noreferrer"&gt;https://github.com/marketplace/actions/slopper-ai-slop-detector&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>githubactions</category>
      <category>opensource</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Mojo: A Lightweight C++ Web Crawler for converting websites to RAG ready data (Fast, Simple, CI/CD-Friendly)</title>
      <dc:creator>Miguel Álvarez</dc:creator>
      <pubDate>Sun, 01 Feb 2026 23:17:13 +0000</pubDate>
      <link>https://dev.to/malvads/mojo-a-lightweight-c-web-crawler-for-converting-websites-to-rag-ready-data-fast-simple-36ia</link>
      <guid>https://dev.to/malvads/mojo-a-lightweight-c-web-crawler-for-converting-websites-to-rag-ready-data-fast-simple-36ia</guid>
      <description>&lt;p&gt;When building RAG systems or LLM-powered pipelines, you often don’t need a massive distributed crawler or a cloud scraping platform.&lt;/p&gt;

&lt;p&gt;Most of the time, you just want to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Crawl a website deeply&lt;/li&gt;
&lt;li&gt;Convert pages into clean text (Markdown)&lt;/li&gt;
&lt;li&gt;Feed them into embeddings or downstream processing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, many existing tools introduce complexity or overhead:&lt;/p&gt;

&lt;p&gt;Scrapy is extremely powerful and flexible, but requires writing spiders, managing Python dependencies, and building custom pipelines.&lt;/p&gt;

&lt;p&gt;Apify offers a full scraping platform, but relies on cloud infrastructure, subscriptions, and heavier runtime environments (Node.js/Python).&lt;/p&gt;

&lt;p&gt;Firecrawl and similar APIs are great for large-scale ingestion, but can be overkill if you want reproducible, local-first CI workflows.&lt;/p&gt;

&lt;p&gt;That’s why I built Mojo, a lightweight, cross-platform C++ web crawler designed specifically for LLM/RAG workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why building Mojo
&lt;/h2&gt;

&lt;p&gt;Mojo focuses on one simple thing, efficiently crawling websites and producing clean, structured output suitable for LLM pipelines.&lt;/p&gt;

&lt;p&gt;Compared to Python/Node-based crawlers, Mojo is significantly faster and lighter on CPU/RAM, making it ideal for cloud jobs, Lambdas, CI pipelines or cheap servers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Example
&lt;/h2&gt;

&lt;p&gt;Crawl an entire documentation site up to depth 2 and export everything as Markdown:&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;&lt;code&gt;./mojo -d 2 https://docs.example.com -o ./docs&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  For JS-rendered websites (SPAs):
&lt;/h2&gt;



&lt;p&gt;&lt;code&gt;./mojo --render https://spa-example.com -o ./docs_rendered&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;p&gt;Note: --render requires Chromium/Chrome installed on the machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using proxies:
&lt;/h2&gt;



&lt;p&gt;&lt;code&gt;./mojo -p socks5://127.0.0.1:9050 https://target.com&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Or with a proxy list:
&lt;/h2&gt;



&lt;p&gt;&lt;code&gt;./mojo --config example_config.yaml https://target.com&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Perfect for CI/CD Pipelines
&lt;/h2&gt;

&lt;p&gt;Mojo was built with automation in mind.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example GitHub Actions workflow:
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;name: Generate docs with Mojo

on:
  workflow_dispatch:
  schedule:
    - cron: '0 3 * * *'

jobs:
  crawl:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Download Mojo
        run: |
          curl -L -o mojo https://github.com/malvads/mojo/releases/download/v0.1.0/mojo-0.1.0-linux-x86_64
          chmod +x mojo

      - name: Run crawler
        run: ./mojo -d 2 https://docs.example.com -o ./generated_docs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  When Should You Use Mojo?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Want fast website → Markdown conversion&lt;/li&gt;
&lt;li&gt;Prefer local tools over cloud services&lt;/li&gt;
&lt;li&gt;Care about performance and reproducibility&lt;/li&gt;
&lt;li&gt;Are building RAG, search, or LLM pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You might prefer heavier frameworks if you need advanced scraping logic per page or complex data extraction workflows&lt;/p&gt;

&lt;p&gt;But for most LLM ingestion use cases, Mojo keeps things simple and efficient.&lt;/p&gt;

&lt;p&gt;Mojo is fully open source under the MIT license.&lt;/p&gt;

&lt;p&gt;Feel free to check out -&amp;gt; &lt;a href="https://github.com/malvads/mojo" rel="noopener noreferrer"&gt;https://github.com/malvads/mojo&lt;/a&gt; :)&lt;/p&gt;

</description>
      <category>cpp</category>
      <category>rag</category>
      <category>showdev</category>
      <category>webscraping</category>
    </item>
  </channel>
</rss>
