There comes a time for every web developer when they have to do some type of input validation. A form isn't a blog post where a user can wax poetically about their love of Yahoo Mail in an email field. Eventually, there needs to be word count limits, checks for specific characters, and simple validation techniques that stops the user from sending a junk POST request.
However, what if you need to validate a URL? And to add another layer to the problem, what if you only want the hostname of a URL, no paths, no protocol, just the "dubya dubya dubya dot" (www.)
and the .com
.
Let's start with knowing if a URL is a URL. A link requires a second and top level domain, the walmart
and .com
in walmart.com
respectively, and a scheme (https://
). Without these parts, the link doesn't link to anything and becomes no different from a line of text.
But now that we know the parts of a URL, we reach a fork in or development path. Should the validation restrict the user at the field or sanitize the user input when the data is sent to the server?
There are merits and deficiencies in either options:
Validation Before Submission
If you restrict the user from submitting an invalid URL, it allows you to easily take the data on the server side without any extra work by forcing the user to submit the exact input structure you need. In this case, the pattern
attribute for the input
element combined with some regex would allow for some good old fashioned field validation.
Here's an example of this approach:
<input type="text" pattern="https?://.*"
However, it comes with a downside of restricting the user. It requires the user to have specific parts to their input and if you just need there to be a .com
, then the long regex pattern might be overkill.
Validation After Submission
On the other hand, if you choose to sanitize the data after the user submits it, it allows the user to type anything and lets the server decide what to do with the data. Javascript's URL
constructor does the validation for you, returning a TypeError
if the input is invalid and also allowing you to extract specific parts of the URL like the origin or hostname.
Here's an example of this approach:
export const formatWebsiteAfterDomain = (website: string): string => {
if (!website.trim().length) {
return '';
}
const regEx = /:\/\//;
const websiteTrimmed = website.trim();
const hasProtocol = regEx.exec(websiteTrimmed);
const updatedWebsite = hasProtocol
? websiteTrimmed
: `https://${websiteTrimmed}`;
try {
const url = new URL(updatedWebsite);
return hasProtocol ? url.origin : url.origin.replace('https://', '');
} catch (_err) {
return websiteTrimmed;
}
};
However, because you give the user so much freedom in their input, it requires some compromises in what the server does with the data. If the user puts an invalid URL, what do you do with it? Do you use the TypeError
response and notify the user or do you just allow the server to consume what the user sent? Furthermore, the URL
constructor validates the input by checking if there is a scheme present (https://
or http://
), which may be too little validation for your uses.
In the end, the path taken depends on the specific edge cases of your problem. A combination of both solutions might be the most comprehensive and versatile or one of the choices might be just enough. The user can put in any input and your solution will be determined on the amount of freedom you're willing to give the user. However, what remains universal is that the ability of the user to type anything will always force the user and developer to come to some sort of compromise (often the developer gets a specific input pattern and the user gets to use their application).
But since the peculiarities of user input are eternal, there will always be developers frantically pushing out solutions so their web apps don't break when users try to paste images in the URL field of a form.
Top comments (0)