DEV Community

Cover image for Regex learning by creating URL regex
Prachi Tripathi
Prachi Tripathi

Posted on

Regex learning by creating URL regex

In Software development, Regular expressions (Regex)are handy tools for pattern matching and text processing.
Input field values in a UI form can be validated using Pattern Matching.
This pattern matching can be used to validate input field values in a form shown in UI.
The goal of this article is to teach you how to create a regex for URL validation.
Let's get started! ๐Ÿš€

How does a normal URL look like??
https://www.google.com

Here's how to add pattern to each part of this url -

1. Addition of Prefix(http vs https) -

Websites may use http or https. Here 's' might or might not be in the url. The '?' quantifier adds optionality to the preceding character or group. Also for addition of forward slash character(/), you need to use a backslash() since the forward slash by default ends the pattern.
So let's add the pattern to the Regex like this - ((http)s?:\/\/)

2. Addition of sub-domain(www) -

The example we took above uses www as a sub-domain name, which is the most common, but let's also look at these URLs - https://www3.ntu.edu.sg/home/ehchua/programming/howto/Regexe.html and https://todoist.com.
www isn't always a subdomain, and it's not always in the URL.
Check out the URL for dev.io.
Regex uses the /d metacharacter to add digits. This part may or may not contain the number.
We use a Kleene Star(*) to represents 0 or more repetition of preceding character or group.
Also for addition of period character(.), you need to use a slash() since by default period character represents all allowed characters.
so let's add the pattern to the Regex for this part - (www\d*.)?

3. Addition of domain name -

All alphanumeric(A-Z, a-z, 0-9) and special characters like _, ., -, ~, /, # and - are allowed in the domain name.
To represent alphanumeric characters, \w metacharacter is used in Regex.
Kleene Plus(+) represents 1 or more repetitions of a preceding character or group.
We use square brackets([]) inside which we can write all the allowed characters to match the specific characters.
So let's write the Regex pattern for this part - ([\w_.-~\/#-]+)

4. Addition of extension(.com) -

Subdomain names can also contain all characters similar to the domain name as well as a question mark.
So let's also write the Regex pattern for this part - ([\w_.-~\/?=#-]+)

Now finalising our Regex pattern for the URL - ^(((https:\/\/)|(http:\/\/))(www\d*.)[\w.-~\/#-]+(.[\w_.-~\/?=#-]+)+)$_
Where both the start and end of the line are described by the special metacharacters hat (^) and dollar sign ($).

That's all for now folks! ๐ŸŽ‰

This might not be the best Regex for all usecases, so let me know what yours is.

Keep learning and sharing! ๐Ÿš€

Top comments (0)