DEV Community

Cover image for Nokogiri...CLI...Gems...Oh My!
Syd
Syd

Posted on

Nokogiri...CLI...Gems...Oh My!

My experience did not prepare me for this: Web Scraping! My latest endeavors in coding booting have brought me to something completely different in a language I am still learning...and it was a challenge.

Honestly, web scraping was not the difficult part. Nor was finding the right CSS or what to do with the data. It was setting up my data & methods and then breaking my code so bad I could not get it to work again before I had to submit it. self.ouch

Instead of wallowing in painful memories, I would like to discuss some of the items I used to build my first CLI application. Flatiron Schools introduced my cohort to Nokogiri. I know what you are thinking. No, it is not a tasty snack. The word actually translates to 'saw' as in hacksaw, handsaw, table saw, but not "I saw(past tense) dead people". It is actually a decent web scraper that works with XML and HTML. It was easy to install and setup. Since it is widely used, there is a lot of great documentation on the web around it.

Setup: please start in your project 
in terminal
`gem install nokogiri`

back in your editor
(in your GEMFILE)
`gem "nokogiri"`

(in your scraper file)
`require 'nokogiri'`
`require 'open-uri'`

def nameofyourgetpagemethod
Nokogiri::HTML(open(http://somepage.com))
end
Enter fullscreen mode Exit fullscreen mode

Here are some of my favorite links:

During the building of my CLI, I switched sites often as I felt I was not getting the "right" data I wanted to use. Fortunately, Nokogiri was able to handle any site I threw at it as long as I was able to correctly parse the CSS. I was able to use your average everyday CSS selectors or even table selectors. There was a bit of plug and play as I figured it out. Thank goodness for 'binding.pry'! Was I tempted to say forget it and try getting data from an API instead; however, I was already halfway through.

My biggest challenge and the one that hurt me the most was gemifying my project. The day the project was due (softdue) with moments to spare, I decided to refactor the code a bit to see if I could complete the extra challenge of turning my little thing into a Ruby Gem. Well.....there is a reason we are always told to commit early and commit often. I did not complete the Gem challenge, but stay turned.....it is coming. For now, don't be scared to scrape a site for your own data needs. It honestly is not that bad.

Making a Gem isn't that bad either

Discussion (0)