Did you know we can use this regular expression to extract links
(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+
This will match all the urls in the file and we can write a python script to extract the urls.
text = "<CONTAINING URLS>"
urls = re.findall('(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+', text)
print(urls)
Latest comments (2)
github.com/madisonmay/CommonRegex would be better suited for such tasks. It has methods for various tasks like extracting links, time, date, phone number etc
Or why it matches ftp (so we're not just talking web addresses) but not any other schemes, and how to expand it to do so?