DEV Community

Anthony Slater
Anthony Slater

Posted on

Refactoring Verify-URL

GitHub logo abhaseen / Verify-URL

A Python script that verifies the return code of URLs.

The first thing I noticed was the entire program was written inside the main() function. To help me understand how the program worked, I decided I would refactor Verify-URL into smaller functions.

I reordered main() into the following:

  • get_args()
  • check_args()
  • get_urls()
  • verify_url()
  • print_status()

Everything looked good and worked as expected... except when I supplied a filename.

>>> python verify_url.py src.html
Invalid URL '': No schema supplied. Perhaps you meant http://?

I traced the error to get_urls() as it looked as though a NoneType object was being appended into what was supposed to be a list of strings.

for url in soup.find_all('a'):
  urls.append(url.get('href'))

To fix, I cast every item as a string and checked to see if it started with 'http'

for url in soup.find_all("a"):
  href = str(url.get('href'))
  if href.startswith('http'):   
    urls.append(href)

I ran the tool again from the command line and supplied a file to check. This time it ran smoothly and printed a colourful list.

I submitted a Pull Request, so hopefully these changes pass the Author's test! That wasn't supposed to rhyme...

Top comments (0)