Using Regex To Extract Links.

#todayilearned #learn #python #beginners

Did you know we can use this regular expression to extract links

(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+

This will match all the urls in the file and we can write a python script to extract the urls.

text = "<CONTAINING URLS>"
urls = re.findall('(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+', text)
print(urls)

Latest comments (2)

Sundeep • Oct 8 '19

github.com/madisonmay/CommonRegex would be better suited for such tasks. It has methods for various tasks like extracting links, time, date, phone number etc

Ben Sinclair • Oct 8 '19

Or why it matches ftp (so we're not just talking web addresses) but not any other schemes, and how to expand it to do so?