DEV Community

Muhammad
Muhammad

Posted on • Originally published at muhammadraza.me on

Using Regex To Extract Links.

Did you know we can use this regular expression to extract links

(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+

This will match all the urls in the file and we can write a python script to extract the urls.

text = "<CONTAINING URLS>"
urls = re.findall('(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+', text)
print(urls)

Discussion (3)

Collapse
moopet profile image
Ben Sinclair

Or why it matches ftp (so we're not just talking web addresses) but not any other schemes, and how to expand it to do so?

Collapse
learnbyexample profile image
Sundeep

github.com/madisonmay/CommonRegex would be better suited for such tasks. It has methods for various tasks like extracting links, time, date, phone number etc