I was working on a project and I had to scrape a web page so I look into the options and I found Nokogiri.
Nokogiri is an HTML, XML, SAX, and Reader parser. Among Nokogiri's many features is the ability to search documents via XPath or CSS3 selectors.
To get the document I used HTTParty.
HTTParty makes http fun! Also, makes consuming restful web services dead easy.
For this example, I will be scrapping https://rubygems.org/search?query=%s.
Script
The final script is given below:
require 'HTTParty'
require 'Nokogiri'
class RubygemsScrapper
attr_accessor :parse_page
# initialize repo for ruby gems requires query string
def initialize(q)
doc = HTTParty.get("https://rubygems.org/search?query=#{q}")
@parse_page ||= Nokogiri::HTML(doc)
end
# get the first result's version or if not found returns -1
def get_latest_version
begin
parse_page.css('.gems__gem').css('.gems__gem__version').children[0].text
rescue
-1 # Not found
end
end
# get the first result's link to ruby gems org or if not found returns -1
def get_link
begin
"https://rubygems.org" + parse_page.css('.gems__gem').attribute('href').value
rescue
-1 # not found
end
end
# Calling scrapper
scrapper = RubygemsScrapper.new('yiya')
p scrapper.get_latest_version
p scrapper.get_link
end
This class would get the name of gem to be searched and returns the first element’s latest version and link to it.
Top comments (0)