If you've been around DEV for the last few days, we apologize for the feed having too much spam.
The spam fight is an ongoing battle for any platform like ours, and it's something we always need to improve on, especially as we propagate our open source code into the universe via Forem more and more.
We have had several spam mitigation efforts in place, but they're only as good as the patterns we're preventing, and this bout with spammers is providing us with opportunities to really close the loop on some of the big outstanding issues.
The biggest problem with our current spam tactics is... observability... We have functionality in place, but it's a little burried and hard for us to get a sense of what is going on. So our improvements need to be steeped in allowing us to identify when spammy tactics are occurring so we can adjust our code.
Fighting spam is a multifaceted issue that touches rate limiting, user experience for early users, balancing concerns over false negatives vs false positives, etc.
Patterns
A quality of spam is that it usually easy to spot patterns... That is because for spam needs scale to be effective, as well as a particular outcome in mind.... Totally chaotic spam does not have the same incentives as precise spam.... Though chaos is also worth fighting.
We already fight off certain patterns, but we need to do more of that, and we need the actions to be as observable as possible. Currently we just don't raise the issue enough to the people involved and it's hard to be aggressive when you're treating things as too much of a black box.
Add ability for admin to add anti-spam terms #10615
What type of PR is this? (check all applicable)
- [ ] Refactor
- [x] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
Description
This adds the ability for admins to modify a list of terms which may indicate spam. But unlike past "quiet" spam indicators, this automatically creates a vomit reaction which an admin can later manually reverse or at least be aware of. In the past we have modified spam-related scores but it hasn't really worked effectively into our workflows. I think this reversible action should be how we raise spam automatically in general.
With comments I decided to limit it to newer accounts because we're examining the whole comment and not just the title. But this can be modified over time. If a support admin is seeing false positives with a term they should consider removing those. We can alter the logic over time to ensure as few false-positive scenarios as possible.
This is the start of a pattern-based spam prevention approach that raises the issue to human mods. This will get more sophisticated over time. It will pair with more adjustments to rate limiting and onboarding.
Further adjustments to the feed are also forthcoming, to ensure that even if there is spam, it affects fewer users directly.
As an open source company we look forward to squashing these issues in the open and sharing all of our learnings going forward.
Happy coding ❤️
Oldest comments (36)
Ironically, the spammers are probably reading this post, and that pull request... Such is the nature of open source. In the long run, we seek to make this a game they cannot win because we will all collaborate to ensure our mitigation strategies rely on sophistication, not obfuscation.
Hey Ben here are also a few more generic rules that could be added. These are similar to the ones I had implemented on a game a while ago:
Could easily be done with numbers, random words from a dictionary, etc. Filtering out the numbers will not have any lasting effect.
From what I see, it's mostly one post per account, so that won't help.
Allowing a new account to make a single post is enough; any limitation on top of that will mostly get in the way of legitimate users.
For long term I had other suggestions on the list. There is never any perfect spam protection you always will have services like account creators and captcha bypassers and others. The goal is to mitigate to maximum current and future threats by taking into consideration past and possible attempts.
Sorry but I saw a few some single accounts with 24 posts in 10 minutes and their account was created on the same day.
Captcha is the devil for accessibility so never ever add captcha anywhere... ever. There’s other ways to validate without bombing your application accessibility for users with access needs.
Otherwise good points overall.
I disagree totally there are now invisible captcha for example 😊
Which is inaccessible too.
Captcha is always in the top 10 issues users with access needs bring up, doesn’t matter the type of captcha. Captcha in its current form is a problem and not a solution. The users have spoken 🤷♂️
really? We have it on 13 of our commercial solutions and never had an issue with it so :/
let's agree to disagree.
Captcha has improved, but it’s not accessible to all users yet or even a plurality. Issues remain. Old versions in the wild. Etc.
There are valid captcha alternatives which solve many of captchas (all versions) issues but still captcha has improved as I said, it’s just not all the way there for users with access needs or those with privacy concerns.
I couldn't determine the details of your spam prevention system from just this one post, but do you have a spam classifier already set up? I started using Paul Graham's "Popfile" to filter my POP email back in 2000 or so and it was amazingly effective, with at least 99% accuracy and a similarly low false positive rate. And that was just a bag-of-words Naïve Bayes classifier; probably even a small deep neural net with word embeddings would be much better.
Anyway, if you clarify each submission ass they come in, they could be automatically flagged for review and hidden if the spam likelihood is greater than some threshold (95%?). Then mods could attend to the flagged queue, which would hopefully consist almost entirely of obvious spam.
Apologies if I've just described something you've already been doing for ages -- but just in case :)
Thank you for taking care of it, I know how hard it is to fight bad actors, it's a neverending thing! 💜
Question to DEV team: if we notice such spam, is it useful for you if we report it, or does it just add noise to your list of reports and cause more harm than good?
I was going to address this in an E-Mail or something. Glad this was taken seriously.
P.S.
Honestly though, I'd report more spam posts if the verification process (recognizing cars and fire hydrants) was not this rigorous.
Sometimes I leave them midway after recognizing them for 10 times in a row.
Thanks for Listening to Us.I recently reported 4 or 5 Accounts related to Spam's.
Ben, as opted in another discussion can't we get a flag option? Above a treshhold a user message gets say delayed. Above a second treshold their message abillity is (temp) revoked. For fairness, false flagging wil be penalized too.
This way you delegate the problem and I'm sure most members are willing to help. It's at least a solution untill something better is in place.
I too report spammed 2 posts but both of the time..if was kind of that infinte captcha..the likes of which you get in tor browser.. but I still did it twice..coz I love dev.to but for the third time I didn't have that much patience.
From what I have seen.
The spam posts have 4 buzz words.
They have a phone number with a random letter/s attached.
I think a regex based spam filter can combat the issue effectively.
Glad to find this post - I just scrolled through and reported a few and stumbled upon this, glad to see its being addressed. Thanks to the DEV team!
If you find 10 integers in topic.... Most probably it's spam.
That's what I was thinking about this current batch of SPAM.
Now they are spamming forem issues. Seems like the fix made them salty, haha
Those fuckers
I really don't understand how that could be effective for the spammer?! Who reads that and thinks, oh I must call that number immediately. 🤔
I'd bet that it is probably an SEO thing. Having those words and number in other places might bump their website a bit.
It's kind of like when WordPress websites get hacked and the abuser creates thousands of pages linking to their websites.
It is, not only are they trying to falsely improve their SEO, but it is also a kind of phishing attempt. They make google display the wrong number in their top results for legit brands (like Google pay, etc.) and end up Scamming unaware folks who think that these are legit customer care phone numbers / websites. This kind of fraud has been doing its rounds in India recently.
Pretty toxic stuff. 😔
Is AI/Machine Learning something you're all looking into for this problem... Sorry if it was mentioned I skim read most of it.
I appreciate the action you all are taking regarding SPAM. I'll keep reporting it when I see it. I will say, reporting SPAM accounts, comments, posts, etc made for a productive alternative for doom scrolling.
I did have a question. If I have to view a post/comment to confirm a post/comment is spam before reporting it, does that view figure into the algorithm that determines which posts should be more visible? Or does reporting/vomiting/marking as abuse cancel out any views, etc?
Yesterday I reported two threads that were spam. Thanks Ben for making Dev community all great again.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.