I recently needed a way of checking if the syntax of a URL is valid, you'd think that JavaScript (a language designed for the web) would have an easy way of doing so. I found that the answer is not so simple.
What makes a URI valid or invalid? As specified in RFC 3986 - URI Generic Syntax (Berners Lee, et al. 2005) there are a set of rules and allowed characters making up a URI. here I will discuss the methods of validating a URI in JavaScript:
URL Constructor
The most simple and obvious method, try to create an instance of JavaScript's native URL object using the string.
new URL('https://url-with-invalid-[chars].com/');
> Uncaught TypeError: Failed to construct 'URL': Invalid URL
However if it is invalid it will throw an error. Generally throwing Errors as flow control is bad practice in most programming languages as it is a resource intensive operation. However as you will see later, this is not such an issue.
Wrapping the constructor function in a try/catch it becomes more usable:
function isUriValid(string) {
try {
new URL (string);
return true;
} catch {
return false;
}
}
Based on my tests this seems to take < 0.2 ms to compute.
Pro's:
- Simple & easy to read / understand.
Con's:
- Throwing Errors as flow control is bad practice.
- Does not throw if invalid characters are used (EG:
http://£*(){}.com/
).
DOM Element
Improves on the previous method by natively returning true/false instead of throwing an error, but still has many of the same shortcomings. It works by taking advantage of the fact that the href of an element defaults to the window hostname if you try to manually set it to an invalid URL.
function isUriValid(string) {
let a = document.createElement('a');
a.href = 'invalid url';
return (a.host && a.host != window.location.host);
}
In my tests this took 0.2 - 0.6 ms to complete.
Pro's:
- No Error throwing.
Con's:
- returns false if protocol is missing (EG:
https://
orftp://
). - does not return false when invalid characters are used (EG:
http://*£){.com
).
RegEx
Using a good old Regular Expression, all be it a rather lengthy one. The benefit of this method is that it can handle all characters, however I would not recommend it due to the fact that it is VERY slow for long URLs, and the compute time scales with the length of the string. It's worth noting that you could modify it to make it quicker if you only want to validate the hostname for example. Here is an example of one written by somebody else.
function isUriValid(string) {
let re = new RegExp('^([a-z]*:\\/\\/)?'+ // protocol
'((([a-z\\d]([a-z\\d-]*[a-z\\d])*)\\.)+[a-z]{2,}|'+ // hostname
'((\\d{1,3}\\.){3}\\d{1,3}))'+ // or ipv4 address
'(\\:\\d+)?(\\/[-a-z\\d%_.~+]*)*'+ // port and path
'(\\?[;&a-z\\d%_.~+=-]*)?'+ // query string
'(\\#[-a-z\\d_]*)?$','i' // hash
);
return re.test(string);
In my tests this took 1100 - 1300 ms with a 30 character string.
Pro's:
- Handles all domains and invalid characters.
- Flexibility to be modified for custom use cases.
Con's:
- Hard to read.
- Very slow.
Conclusion
None of these methods are ideal in my opinion, but if I had to pick one I'd go for the URL class constructor, however this doesn't give you 100% validation as it doesn't check for some invalid characters.
The best way to truly validate a URL is use a combination of the URL constructor with RegEx. Below I have detailed a function that should handle all possible URL cases.
Complete Method
function isUriValid(url) {
let uri;
try { uri = new URL(url); }
catch { return false; }
// Validate the Protocol:
if (!/[a-z]+\:\/\/$/.test(uri.protocol)) { return false; }
// Now Validate the Hostname:
if (uri.hostname !== '') {
if (!/^[a-zA-Z\d]+[a-zA-Z\d-.@]*[a-zA-Z\d]+$/.test(uri.hostname)) { return false; }
}
// The Port:
if (uri.port!== '') {
if (!/^([1-9][0-9]{0,3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[0-2][0-9]|6553[0-5])$/.test(uri.pathname)) { return false; }
}
// Now the Path:
if (uri.pathname !== '/') {
if (!/^\/[-a-zA-Z\d%_.~+]+$/.test(uri.pathname)) { return false; }
}
// Search Parameters:
if (uri.search !== '') {
if (!/^[?;&a-zA-Z\d%_.~+=-]+$/.test(uri.pathname)) { return false; }
}
// Lastly the URI Fragment:
if (uri.hash !== '') {
if (!/^\#[-a-z\d_]+$/.test(uri.pathname)) { return false; }
}
return true;
}
If you know of any other methods of validating URL's other than those stated above, please leave a comment on this post as I'd love to hear some other options.
Top comments (0)