DEV Community

loading...

Discussion on: Protect Your Contact Information From Crawlers

Collapse
dirtycode1337 profile image
Dirty-Co.de • Edited

I use to encode the email part (and other sensitive data) with Hex codes in HTML like this:


mailto:info@example.com

which will render as

mailto:info@example.com

I got my email embedded like this and didn't receive spam so far. It has the advantage of users getting the right values to see because the browser shows their normal entities (a normal user doesn't even recognize that the chars are written in HEX HTML codes), click on a mail address works too and no JS is required.

@edit: fun fact: this form also interpreted the hex encoded values :)

Collapse
kor3k profile image
kor3k • Edited

yes, this was nice approach, but problem is it works with "text" crawlers only. nowadays, there are many "headless browsers", which actually render the page's dom in memory, even run javascript code, and then crawl the output.

this of course applies to Mr. Heinlein's approach as well.

in both cases, just add recaptcha and you'll be good... for now...

Collapse
dirtycode1337 profile image
Dirty-Co.de

I know that this is possible, but luckily by now there aren't many crawlers now here which crawl emails using JavaScript - I guess. Maybe that "business" is not so interesting any more? That solution is simple and easy to implement, but doesn't keep all bots outside.

And if I ever can avoid captchas, I will. Example: dirty-co.de/user-experience/wenn-d... - I wanted to get information about my package which should be delivered by DHL but their captcha didn't load properly so I was stuck there ... bad UX!

Thread Thread
kor3k profile image
kor3k

lol, i had the exact same captcha-not-shown problem with huawei website some time ago.

but google's recaptcha is pretty neat lately. you don't even have to copy/write anything if recaptcha evaluates you as "human" (which it does in most cases).

Thread Thread
dirtycode1337 profile image
Dirty-Co.de

Ah well, I didn't read exactly enough ;). So if you're respecting the GDPR (as we are forced to in Germany here) you may come to a new issue trying to use Google Recaptcha ... :S

Thread Thread
bahe007 profile image
Bastian Heinlein Author

This! But it's not only the regulations like GDPR, I personally wouldn't like to give Google more information about the people using my websites and even more important about the website's usage.

Thread Thread
kor3k profile image
kor3k

oh, i didn't think about that in GDPR context, why there is a problem with recaptcha? (i'm more into tech than law stuff, so i don't know).

Thread Thread
bahe007 profile image
Bastian Heinlein Author

This is my imperfect understanding: While it is still possible to use Google Recaptcha, it causes a lot of privacy headaches, because Google processes personal data and places cookies. The latter means that you'll need at least some kind of cookie banner and make sure, cookies are only placed after this was explicitly allowed. But more importantly is the former: Google not only processes personal data, but it does this possibly in the US or somewhere else. This means - as I understand - that you'll need some kind of contract with them to protect your and your user's interests. That is usually a standard contract, however it is a legal binding contract.

And in some related cases of which I'm aware, courts ruled that you could theoretically be partially responsible if your contract partner disobeys privacy regulations.

While this is my best knowledge, there are of course no guarantees that my probably out-of-date-for-several-months knowledge isn't necessarily anymore correct.

Collapse
bahe007 profile image
Bastian Heinlein Author

Nice, it seems like the easiest solutions work best, sometimes :-)

Forem Open with the Forem app