Properly validating e-mail addresses

tux0r on August 30, 2018

If you are a developer of "web applications", you probably have written (or copy-pasted) some code which tries to validate an entered e-mail addr... [Read Full]
markdown guide
 

^.+@.+\..+$ is the regex I use. It needs something before an @, something between that @ and a following ., and something after that ..

Does it accept some invalid email addresses? Probably. I ask users to verify their email address anyway; worst case scenario, they enter an invalid address and the email bounces. That one is on them.

Does it deny some valid email addresses? Maybe, but in my opinion if your email address is so weird it doesn't have an @ and a . in it, then you already know you're a signup error waiting to happen.

For a little while, I was hearing complaints from users who accidentally typed a space while entering their email address. So I added a validator to make sure the email address has no spaces. I know this denies even more valid email addresses, but my goal isn't to match the RFC perfectly, it's to allow the maximum number of users to sign up with the minimum number of problems. This prioritizes a large number of users who make mistakes over a tiny number of users who do not, which may not be fair but makes sense as a business decision.

 

+1

I'm not really here to impress cranks who have garbage-fire emails because the spec allows it.

 

More than that, in most languages and frameworks, there's already some perfectly working validation mechanisms, so there's pretty much no need to validate an e-mail by "our" own logic.

Adding this small C library is awesome anyways, thanks for the ref.

 

I have tried some of the "perfectly working" mechanisms. Even if your language has one, it will most likely not cover corner cases. (I admit to not have tried every single one.)

 

In my case, PHP (the filter_var method) is known to be technically "perfectly" working (and I also admit I didn't try any extreme case, even if I tried some already serious cases).

I have just tested a local address with an emoji. PHP does not accept that.

Seems like PHP implementation is perfect, but they check against older RFC.

Isn't emojis in domain names a hasardous spec anyways ? (talking in terms of support and implementations).

So the PHP implementation becomes increasingly less usable as more and more Unicode domains are registered.

Hazardous, but rule-compliant. libvldmail has a compiler flag for that, so you could make it reject them if needed.

Yeah, beyond just pure technical ability to validate email address against rules, it's a weird one on so many levels:

theguardian.com/technology/2017/ap...

Can see pros and cons which ever way you go, tho having a dedicated library does help with faster updates over built in functions that might take years to release next version.

 

Once I've read a similar article, yes the RFC is not implemented correctly nowhere ( beside your library now I guess 😉), but a user with a special email will have more serious trouble than registering to your site, since basically nothing on the web will allow that email to be used ( or created in the first place)

 

That problem will fade as more web developers integrate my library! ;-)

 

Wait, wait, wait... your library?
I'm calling the :oncoming_police_car:

Because you have been so successful in framing your advertising post as an informational post that I took the bait.
Fortunately, police don't care about truth or justice, they just want to inflict some damage. This is the rare case when they're just what the doctor ordered.

I'm not advertising, I'm explaining. There is a lot of advertising on DEV. I'm not a company. I don't care how many people know my software. I don't sell anything to anyone.

Please complain to actual advertisers instead.

 

Whether Unicode is allowed in the name part rather depends on whether the SMTP servers involved support the SMTPUTF8 extension. In my experience they mostly don't. AWS's SES and Sendgrid don't for example.

 

That's not the problem of the validator though...

 

Well it is if you are letting through addresses your own SMTP server doesn't support.

 

user registers with an IP address for a domain

Oh, such fun! I definitely want this happening to me and my apps!

And no, we shouldn't be internationalizing domains, we should be de-nationalizing people.

/en/ shouldn't just be the default aliased to /, it should be the only language of the web.

 

Or German, which is the most-spoken language in Europe (and soon, when the only English-speaking countries leave the EU, even more relevant here)...

 

Then you should have said Chinese.
I'm talking about the language (and charset) of legacy systems, and the language of programming language keywords.

 
 

I am not sure. I have not tried any of those. If you find something is missing, please submit a proper bug report (or even a fix).

From looking through that page, it seems to respect the RFC 822 which are declared obsolete.

 

Interesting. RFC 5322 gives us this regex as the standard for an email address, and that seems to be baked into most implementations these days.

 
 

which I have developed to solve this very problem once and for all.

You cannot do that as long as the internet evolves.

 
code of conduct - report abuse