DEV Community

Cover image for Fighting Spam at Scale: How We Use Gemini to Protect the DEV Community
Ben Halpern Subscriber for The DEV Team

Posted on

Fighting Spam at Scale: How We Use Gemini to Protect the DEV Community

Eliminating spam has been a massive priority for us at DEV (and the wider Forem ecosystem). If you’ve been with us for a while, you recall that this has been a significant problem in the past. While no platform can claim to be 100% spam-free, the situation is much improved today.

Our primary goal with these recent updates is simple: get super low-quality content off the platform before a human moderator ever has to deal with it.

Moderator burnout is real. By automating the removal of the "obvious" junk, we allow our mods to focus on nuanced community interactions rather than deleting hundreds of crypto-scam posts.

The Hybrid Approach: Algorithms + AI

We don’t rely on just one tool. We use a combination of upstream algorithmic detection and "call-outs" to Gemini for independent analysis.

  1. Algorithmic Upstream Action: If we detect a clear trend—for example, a massive burst of posts linking to a specific spam website—we take action upstream. We prefer to handle these efficiently without needing to query an LLM for every single instance.
  2. Gemini Analysis: For individual posts that require judgment, we send a custom prompt to Gemini 3. We try to err on the side of "no false positives," but when the AI detects clear indicators of spam or harmful content, it applies a label that triggers automated workflows.

Under the Hood: The ContentModerationLabeler

To give you a better picture of how this works technically, let's look at our ContentModerationLabeler service. This Ruby class is responsible for building the context and asking Gemini to categorize the post.

We hone these custom prompts over time to ensure accuracy. Here is how we define the assessment criteria within the prompt. We tell Gemini exactly what we are looking for, distinguishing between "Safety," "Quality," and "Spam":

def build_prompt
  # ... (Context setup)

  <<~PROMPT
    Analyze the following article and assign it a content moderation label based on quality, relevance, and community standards.

    # ...

    **Assessment Criteria:**

    1. **Safety First**: Is the content harmful, exploitative, or inciting violence/hostility?
    2. **Content Quality**: Is the content well-written, informative, and valuable?
    3. **Community Relevance**: Does the content align with the community's purpose and interests?
    4. **Authenticity**: Does the content appear to be written by a real person with genuine insights?
    5. **Spam Indicators**: Are there signs of promotional content, low-effort posts, or automated generation?
    6. **Community Building**: Does the content foster discussion and community engagement?

    # ...
  PROMPT
end
Enter fullscreen mode Exit fullscreen mode

Context is King

One key to our success is that we don't just send Gemini the text of the article. We also build a User Context.

A post that looks "okay" might actually be spam if it comes from a brand-new account with zero history. Conversely, a trusted member with badges and years of history gets the benefit of the doubt. We feed these metrics into the prompt so Gemini has the full picture:

def build_user_context
  user = @article.user
  <<~USER_CONTEXT
    Author: #{user.name} (@#{user.username})
    Member since: #{user.created_at.strftime('%B %Y')}
    Badge achievements: #{user.badge_achievements_count}
    Articles published: #{user.articles.published.count}
    Comments made: #{user.comments.count}
    Profile summary: #{user.profile&.summary || 'No summary provided'}
  USER_CONTEXT
end
Enter fullscreen mode Exit fullscreen mode

View the ContentModerationLabeler on GitHub

The "Spam Peak" Irony

One interesting observation from our data is regarding the volume of spam. Ironically, the cumulative amount of labeled spam peaked right before we fully integrated these new systems. Spam peak was summer of 2025.

spam peak

This happens because deterrence works. As soon as bad actors realize their automated scripts are hitting a wall and their posts are being nuked immediately by our automated systems, they stop wasting resources on our platform. The high wall of entry leads to fewer attempts over time.

Moving Forward

This system will continue to evolve. We are constantly tweaking our prompts and our upstream algorithms to adapt to new spam tactics.

While there are several services that handle different types of spam across the platform, you can view the full source code for this specific approach in our open-source repository:

GitHub logo forem / forem

For empowering community 🌱


Forem 🌱

For Empowering Community



Build Status Build Status GitHub commit activity GitHub issues ready for dev GitPod badge

Welcome to the Forem codebase, the platform that powers dev.to. We are so excited to have you. With your help, we can build out Forem’s usability, scalability, and stability to better serve our communities.

What is Forem?

Forem is open source software for building communities. Communities for your peers, customers, fanbases, families, friends, and any other time and space where people need to come together to be part of a collective See our announcement post for a high-level overview of what Forem is.

dev.to (or just DEV) is hosted by Forem. It is a community of software developers who write articles, take part in discussions, and build their professional profiles. We value supportive and constructive dialogue in the pursuit of great code and career growth for all members. The ecosystem spans from beginner to advanced developers, and all are welcome to find their place…

Happy coding ❤️

Top comments (0)