DEV Community

Muhammad
Muhammad

Posted on • Originally published at muhammadraza.me on

Using Regex To Extract Links.

Did you know we can use this regular expression to extract links

(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+
Enter fullscreen mode Exit fullscreen mode

This will match all the urls in the file and we can write a python script to extract the urls.

text = "<CONTAINING URLS>"
urls = re.findall('(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+', text)
print(urls)
Enter fullscreen mode Exit fullscreen mode

Latest comments (2)

Collapse
 
learnbyexample profile image
Sundeep

github.com/madisonmay/CommonRegex would be better suited for such tasks. It has methods for various tasks like extracting links, time, date, phone number etc

Collapse
 
moopet profile image
Ben Sinclair

Or why it matches ftp (so we're not just talking web addresses) but not any other schemes, and how to expand it to do so?