loading...

Using Regex To Extract Links.

mraza007 profile image Muhammad Originally published at muhammadraza.me on ・1 min read

Did you know we can use this regular expression to extract links

(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+

This will match all the urls in the file and we can write a python script to extract the urls.

text = "<CONTAINING URLS>"
urls = re.findall('(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+', text)
print(urls)

Discussion

pic
Editor guide
 

Or why it matches ftp (so we're not just talking web addresses) but not any other schemes, and how to expand it to do so?

 

github.com/madisonmay/CommonRegex would be better suited for such tasks. It has methods for various tasks like extracting links, time, date, phone number etc