In this post I’ll talk about my experience building a command-line interface application with Ruby; generating a gem directory, scraping the data with Nokogiri, and putting together the user interface.
After several weeks of gradually levelling up my Ruby-wielding skills through Flatiron School, this was my first solo project --a command-line interface application. In the exercises leading up to it, there had been tests to pass or fail, to tell me when I was indeed headed in the right direction. But this time my only guides were my vision of the end result and these project requirements:
- Provide a CLI.
- CLI must provide access to data from a webpage.
- The data provided must go at least one level deep, generally by showing the user a list of available data and then being able to drill down into a specific item.
- Use good object-oriented design patterns.
Before I could write a line of code, I needed to pick a website to scrape. This would dictate the kind of data my application would provide access to i.e., its purpose. And, I needed to choose something I could scrape from without too much difficulty.
Thankfully, I had a website in my back pocket that I suspected would work just fine. The Hack Design website provides lessons about design in various categories. Its pages lent themselves to the one layer deep model that was required. And they were rendered via static HTML putting them just in the range of difficulty I was hoping for. Being able to access these lessons from the minimal environment of the command-line struck me as a cool idea. So after a cursory assessment of whether I’d be able to scrape the content I needed, I decided to go for it.
I knew at the beginning that I wanted to be able to wrap my application in the self-contained, distributable format of a gem. Bundler made it easy to get started with a scaffold directory.
The first time you run the
bundle gem command, you have the option of including a
LICENSE.txt. I chose the license. I updated my
.gemspec file with details about my gem: a short summary of what my gem would do and a slightly more detailed description of the same. As I nudged its functionality forward, I added a short list of other gems it would depend on to function.
I started by writing code for a user interface that would mimic the intended behaviour of my application. The user would be greeted by a welcome message and options to: view categories, view lessons, view a random lesson, or quit. I didn’t have any categories or lessons yet so, I threw in some fillers to start. This code lives in
lib/hack_design/cli.rb. I would call this code in an executable file called
bin/hack-design and run my program.
I didn’t have any tests but, I knew what I expected my code to do. Debugging was a matter of trial-and-error-ing my way to success. This technique would take me through the development of my app.
I intended to make the data two layers deep --providing content (including exercises) from within a lesson, and lessons from within a category. Taking my cue from the UI I had built, I created classes to model a
Exercise, and a
Scraper. I used
./bin/console to test drive these classes by looking for expected behaviour. However, as I set out to teach my
Scraper how to find and gather categories, life threw me a few curve balls which effectively stalled my progress for about 2 weeks.
When I returned, I was anxious about the time I had lost. Closer investigation of the data from my
Scraper soon revealed that scraping each category would prove to be more difficult that I first imagined. So, I thought about it. First and foremost, I wanted to provide users with lesson content. Was listing categories really that essential to the purpose of the application? I decided not, and began to simplify.
I threw out the
Exercise classes. And I was left with what I needed: a
Lesson class whose objects would organize data about each lesson, a
Scraper to gather this data from the website, and a
CLI to manage the user interface. I refactored the UI to reflect the changes. It would list lessons to choose from, not categories. And, the data was now just one layer deep.
With my UI was up and running, it was time to supply it with real data. I knew it would take a few tries to get the search queries right. From my previous attempt at scraping, I also knew that too many queries would very likely get me blocked. I decided to solve this problem by copying the pages I’d be scraping to a fixtures folder. This would allow me to keep my requests to the live site to a minimum.
The way Hack Design is set up, all 51 lessons can be found listed on a lesson homepage. From that page, each lesson is links to its own page. Copying down the source code of the homepage was simply accomplished by using cURL to get the HTML and shovelling it into a new file.
curl http://hackdesign.org/lessons/ >> fixtures/site/lessons.html
The other 51 pages however, would pose a greater challenge. No way was I going to navigate to each page and copy down hundreds of lines of HTML for 51 individual files. I wrote a little Bash script to do it instead.
#!/bin/bash #get-lesson.sh typeset -i - END let END=50 i=0 while ((i<=END)); do echo “Script starting now…” curl https://hackdesign.org/lessons/$i -O let i++ done echo “Done”
Before long, I had all the files I needed to get to scraping. For this task, I used an HTML Parser called Nokogiri. It has the ability to search documents using CSS. I used CSS selectors to zero in on the HTML elements containing the data I wanted. In keeping with the setup of the Hack Design website, the
Scraper has two methods.
::scrape_lessons_page scrapes the lesson homepage. creates a hash for each lesson, adds the lesson title and url to that hash, then adds that hash to an array of all the lessons.
::scrape_lessonscrapes a lesson's content page. It adds the instructor name and other content for a particular lesson to a hash and returns the result.
CLI is where the
Lesson classes join forces to produce
Lesson objects that model individual lessons filled with content. Here, hashes act as the glue between the
Lesson classes, carrying data from the
Scraper to the
Lesson, allowing them to work together while remaining independent and focused in purpose.
A title and a URL are all I need to create a
Lesson object. I created 51
Lesson objects by iterating over the array of hashes from
::scrape_lessons_page. Then I passed the URL of each
Lesson object to
::scrape_lesson which returned a hash of data I could add to my ready-made
Lesson objects. This process went smoothly until… it didn’t.
One of the pages was breaking my
Scraper. Using Pry and some temporary tweaks to my code, I was able to track down the page that was causing trouble. Turns out lesson 41 has a list within one of its exercises. However, my
Scraper was identifying exercises as list items. On discovering this list within a list, it would class the list item as another exercise and then crash when it didn't find the typical contents of an exercise inside. It needed a way to differentiate between an exercise and its content. To do this, I made the CSS selectors a little more specific. I had fun debugging this one.
Lesson objects made filling out the user interface wonderfully straightforward. I could just loop through all the
Lesson objects and display the information they contained accordingly. I added a few methods to enable navigation from one lesson’s content to another without returning to the list, and added a bit of color with the colorize gem. And then, I was done. I had built a Ruby command-line-application!
To wrap up, I packaged, installed, and tested my gem. Bundler makes packaging and releasing your gem as easy as it does getting started. I chose not to release my gem but, I have included instructions on how to package and install it locally. Check out my code, here.
Completing this project was an exercise in confronting fear of failure. Every step was absolutely worth it. I am proud of this simpler, final product and really excited about what’s next.
Thanks for taking the time to read this post. See you in the next one!
Cover Art: Mayumi Matsumoto