DEV Community

Anthony Slater
Anthony Slater

Posted on

Concurrent Futures and Showing Progress

I made a few updates to Gustavo: one major; one minor; and some refactoring.

GitHub logo slaterslater / gustavo

Print a colourized report of all HTTP urls in a file

The main issue I wanted to address was performance. I found that Gus took far too long to make HTTP requests compared to other similar tools. I didn't implement a time object to get precise measurements, but it felt like a list of 800 URLs could take anywhere from 5-8 minutes to process. Gus would check a list of URLs one after the other and I wanted them checked (practically) all at once.

This project is one of my first times working with Python and I was really impressed with how easy it was to implement threading. After importing concurrent.futures I found that using a list comprehension really made for an elegant solution.

with concurrent.futures.ThreadPoolExecutor() as executor:
  results = [executor.submit(get_status, url) for url in urlist]
Enter fullscreen mode Exit fullscreen mode

That's all it took to exponentially improve the performance. The first line creates a ThreadPoolExecutor which creates, starts and joins threads. The second line is a list comprehension which creates a list of threads executing get_status() for each url in the list.

During the processing, the iterative version of Gus would update the console with a print statement ie: "Checking URL 123 of 456." This didn't really make much sense in the threaded version of Gus as it would print "Checking URL 456 of 456." in a fraction of a second and then waits for the get_status() functions to complete. I thought a progress bar would be a nice touch.

GitHub logo verigak / progress

Easy to use progress bars for Python

In the end I used a Spinner which advances the animation each time a thread completes. I noticed that once the Spinner finishes, it would leave itself on the console line and the program would continue. I read through the source code and found that the Spinner class had a writeln() which takes a string. Thank you very much Stack Overflow for answering the question: Is there go up line character?

The updated function looks like this:

def process_list(urlist, wanted, output):
  processed = []
  formatted = out.rtf_format if output  == 'rtf' else out.json_format if output  == 'json' else out.std_format
  spinner = Spinner('Checking URLs ')
  with concurrent.futures.ThreadPoolExecutor() as executor:
    results = [executor.submit(get_status, url) for url in urlist]
    for connection in concurrent.futures.as_completed(results):
      status = connection.result()
      if status['desc'] in wanted:
        processed.append(formatted(status))
      spinner.next()  
  spinner.writeln("\033[F")  # move cursor to the beginning of previous line
  spinner.finish()
  return processed
Enter fullscreen mode Exit fullscreen mode

The remaining changes I made were to refactor gus.py into smaller related files:

File Purpose
gus.py creates a Gus class
const.py any constant values needed for processing
args.py handles all command line argument parsing
urls.py all HTTP checking functions
out.py output formatting functions

Top comments (0)