loading...

Web Scraping Script in Ruby

sulmanweb profile image Sulman Baig Originally published at sulmanweb.com ・1 min read

I was working on a project and I had to scrape a web page so I look into the options and I found Nokogiri.
Nokogiri is an HTML, XML, SAX, and Reader parser. Among Nokogiri's many features is the ability to search documents via XPath or CSS3 selectors.
To get the document I used HTTParty.
HTTParty makes http fun! Also, makes consuming restful web services dead easy.
For this example, I will be scrapping https://rubygems.org/search?query=%s.

Script

The final script is given below:

require 'HTTParty'
require 'Nokogiri'

class RubygemsScrapper
  attr_accessor :parse_page

  # initialize repo for ruby gems requires query string
  def initialize(q)
    doc = HTTParty.get("https://rubygems.org/search?query=#{q}")
    @parse_page ||= Nokogiri::HTML(doc)
  end

  # get the first result's version or if not found returns -1
  def get_latest_version
    begin
      parse_page.css('.gems__gem').css('.gems__gem__version').children[0].text
    rescue
      -1 # Not found
    end
  end

  # get the first result's link to ruby gems org or if not found returns -1
  def get_link
    begin
      "https://rubygems.org" + parse_page.css('.gems__gem').attribute('href').value
    rescue
      -1 # not found
    end
  end

  # Calling scrapper
  scrapper = RubygemsScrapper.new('yiya')
  p scrapper.get_latest_version
  p scrapper.get_link
end

This class would get the name of gem to be searched and returns the first element’s latest version and link to it.

Posted on by:

sulmanweb profile

Sulman Baig

@sulmanweb

Software Engineer @ MailMunch. Experienced in Ruby on Rails, NodeJS and VueJS web application development. Working in industry for more than 7 years now.

Discussion

markdown guide