For the past several weeks, I've been working on Gustavo: a URL checking command line tool written in python. I wanted to gain some experience forking repos; setting up remotes; and merging branches, so I got permission to work on a new feature for checkThatLink
-i
or --ignore
to be passed along with a path to a file. The program should open the file; read the contents; and if any URLs are present, omit their domain from the HTTP status checking process.
To begin, I needed to understand how this program worked before I could add any features. It just so happens, that Gustavo and checkThatLink are very similar in purpose and language. I'm by no means and expert in Python, but I felt pretty comfortable reading through the source code. As I walked through the program i determined it followed these steps:
- Create a parsed command line arguments object and pass it into a new checkFile object
checkFile(args)
-
checkFile
initializes all of its member variables from theargs
object -
checkFile
runs its main functioncheckThatFile()
- Each line of the source file is inspected to see if it contains a URL
- A HTTP connection is made for each URL requesting the HEAD and the URL and its response status are appended in a list of dictionaries
allLinks
- A series of conditional statements determine the correct function should generate the output from
allLinks
I didn't want the program to waste time checking a link that would later be omitted from the output, so I figured I should insert my feature/check in between step 4. & 5.
I first created a function that would open the provided file with urls to ignore; use a regular expression to pick out all the valid urls; and return the ignoreList
of domains.
Next I added a condition before step 5. above, so that each url's domain would be checked against the domains in the ignoreList
At this point, I thought I was finished. I had the author review my work before merging and I was informed that some updates were needed. The issue requirements stated that if the program receives a file containing domains to ignore, but no comments or urls are present, then the program should exit. It was a pretty straight-forward update to make:
- I added another regular expression to find comments.
- If both regular expressions (domains and comments) are empty, the program exits.
def getIgnoreList(self, ignoreFile):
found = []
try:
if ignoreFile:
with open(ignoreFile) as src:
text = src.read()
found = re.findall('^https?://.*[^\s/]', text, flags=re.MULTILINE)
comment = re.search('^#.*', text, flags=re.MULTILINE)
return found if comment or found else sys.exit(1)
return found
except:
print(f'Error with {ignoreFile}')
sys.exit(1)
The above update passed the test and the code was merged upstream!
On the other side of the ball, I had a volunteer to implement the same ignore-feature in Gustavo. We worked through the bugs together. I found it pretty easy to make a new branch from the contributor's remote and make some fixes. I hope I wasn't too overbearing in this regard; I didn't just point out the existence of a bug, I made suggestions and provided the code to fix them.
I did learn the hard way that it is a good idea to run git status
or git branch
to know where you are in your git tree before fetching or pulling. It wasn't too serious, as I got issue-8
and issue-8-fix
mixed up. I'm getting more comfortable with git each week, but clearly I need more practice.
Top comments (0)