loading...

Regex Bootcamp (or Nobody Seems to be Using HTML5 Validation)

dexygen profile image George Jempty ・4 min read

In this post I will be referring back to simplified versions of my regex's (regular expressions) from my previous post on form validation. That reminds me, though, that it was suggested in comments to my post previous to that that I was somehow amiss in not at all relying on HTML5 validation.

So before I wrote another post I decided to check an example on w3schools that tests input against a regex pattern. First if you look at the code, it seems this sort of validation is only useful on submit? But I've been doing validation on blur and/or keyup, so HTML5 validation wouldn't seem to pertain.

Furthermore, I think a lot of validation on real-world sites are also not relying on HTML5 validation. For instance, I intentionally entered 2 characters instead of 3 to coerce an error, and saw this:

I doubt I have ever seen such an error message (I am using Chrome) since the introduction of HTML5. Rather, every site seems to be customizing validation to their needs.

I'm assuming that like mine in the two posts linked above, real-world validation frequently does rely on regex patterns. So have a look at my simplified validation for zip codes (rather than zip "plus 4") at the following revision of my previous gist, or just follow along from the below Javascript code without all the HTML, bearing in mind that I won't cover all the code besides the regexes.

  const zip = document.getElementById('zip');
  const zipErrEl = document.getElementById('zip-error');
  const errorMsg = "Please enter 5 digits, optionally followed by a dash and 4 digits";
  const keyupRegex = new RegExp("^\\d{0,5}$");
  const blurRegex = new RegExp("^\\d{5}$");

  function validateZip(regex) {
    if (regex.test(this.value)) {
      zipErrEl.innerHTML = '';
    }
    else {
      zipErrEl.innerHTML = errorMsg;
    }
  }

  zip.focus();

  zip.addEventListener('keyup', function() {
    validateZip.call(this, keyupRegex);
  });

  zip.addEventListener('blur', function() {
    validateZip.call(this, blurRegex);
  });

First let's quickly define what a regular expression is. Very succinctly, it describes a pattern for matching (or not) a string.

Let us breakdown the first regex above:
const keyupRegex = new RegExp("^\\d{0,5}$");

There are two ways to define regexes in Javascript, one creating a new RegExp object as above, but perhaps more commonly to use regex "literal" syntax between slashes, where the above would instead be:

/^\d{0,5}$/

Note though, that you cannot assign a regex defined as a literal to a variable, but doing so with the new RegExp syntax as I did allows the regex to be named meaningfully. Next note the first difference between the two formats:

  1. new RegExp has 2 slashes in front of the "d"
  2. regex literal syntax has just one slash in front

In a regex, \d stands for a digit character. new RegExp takes a string as it's argument, but the slash is a special character within a regex, so it must be "escaped" with another slash on front.

Now let's go through the new RegExp characters one by one. First the "^" means that the string must adhere to the pattern from the very its very first character. Skipping to the last character, "$", it means that nothing besides what the pattern describes as it's final character, is allowed before end of the string that is being matched.

These two symbols, "^" and "$" are collectively known as "anchors". When they both occur as in our example, it means the string must match the pattern exactly, with nothing different in front or at the end. Note that if you want to match one of these two symbols within a string, you must escape them with a slash.

So now all that's left to consider is the {0,5}. It's a "quantifier" and quantifies what comes before it, specifically the digit character \d.

This specific form of quantifier means that 0 to 5 instances of what comes before it are allowed. So with all of this information, we now know what the entire pattern matches: 0 to 5 digits characters, with nothing before and nothing after.

Without going into all of the in's and out's, the onkeyup event handler "delegates" to the validateZip function which immediately tests if the string matches the pattern as so:

regex.test(this.value);

As to the reasoning for performing keyup validation in this manner, it lets the user type between 0 and 5 digits without getting a premature error for instance after typing just one digit, that it's not a valid zip code. It will even let them enter one digit and back up and enter a different digit if the first one accidentally was not correct, since we are "matching" as few as zero digits. Of course, if they enter something other than a digit, they will then get an error.

The only difference in the onblur validation is that the digit character quantifier is {5} instead. This means that there must be precisely that many digits, not a range. So if for instance they only enter 4 digits, without an error as they type, because keypress handles that with it's regex digit quantifier of {0,5}, but then click out of the field, they will get an error.

You can't even come close to doing any of this with HTML5 validation. Look for an upcoming installment breaking down some other parts of this code. In the meantime I highly recommend the following "playground" for honing your regex skills.

Posted on by:

dexygen profile

George Jempty

@dexygen

- Full-stack/front-end web developer since 1999 - Speaker at technical user meetings - Writer of pre-publication technical reviews

Discussion

markdown guide
 

Regexes are easy and useful in the backend, but I feel like they are extremely user-unfriendly.
How do I explain the "requested format" to the end user?
That's why I feel like something like a declaratively composed validator is more useful, since it can say which attribute of the string is wrong.

 

You realize regex's ARE declarative? "Common declarative languages include those of database query languages (e.g., SQL, XQuery), regular expressions": en.wikipedia.org/wiki/Declarative_...

 

Sure, and you could even reverse-engineer some messages, but you'd get something like

Phone must begin with "00" or "+" or nothing, followed by "1" or "7" or "20" or "27" or "3...

Whereas with a proper syntax description you could automatically insert a "+", delete a leading "00", and if the following two digits aren't a real country code have an informative error like

Please include the country code in the phone number

 

"you cannot assign a regex defined as a literal to a variable"

I'm not sure if I got this right and could probably a stupid question. But by saying you cannot assign, do you mean that the following is invalid or frowned upon?

const myRegex = /^\d{0,5}$/;
 

Hmm I could be wrong I was trying something like this but it was barfing on the \d I think, as soon as I used new RegExp and escaped the \d with a slash it worked.