DEV Community

Cover image for Meet Your New Queens
JaredHarbison
JaredHarbison

Posted on • Edited on

1 1

Meet Your New Queens

Rails + React + Redux - Pt 4


This post is going to focus on some of the more challenging tasks I encountered while scraping data from Fandom then connecting the data appropriately into the schema established in the last post. In retrospect I would scrape seasons before queens, so I've accounted for the order here. This post will focus on defining the get_queens method and the get_seasons will follow. The gist is heavily commented!


Let's get started!


1. Def get_seasons in season.rb to scrape the list of season names from Fandom, concatenate each season name into an array of URLS for each Season's Wikipedia page, then iterate through the array to .create!() an instance of each Season.

__see previous post for the gist__

2. Def get_queens in queen.rb to scrape the list of queens' names from Fandom, concatenate each queen's name into an array of URLs for each Queen's Fandom page, then iterate through the array to .create!() an instance of each Queen and her attributes (including associations for Quotes and Trivia.

class Queen < ApplicationRecord
has_many :trivia, dependent: :destroy
has_many :quotes, dependent: :destroy
has_many :appearances
has_many :seasons, through: :appearances
accepts_nested_attributes_for :trivia, :quotes, :seasons, :appearances
def get_queens
#### define the url containing the index of queens then open it with Nokogiri
queen_index_url = "https://rupaulsdragrace.fandom.com/wiki/Category:Queens"
queen_index_doc = Nokogiri::HTML(open(queen_index_url))
#### scrape an array of queens names from the Nokogiri doc then remove the duplicates
queen_list = queen_index_doc.css('.tabber').last.css('.thumbimage')
cleaned_queen_list = queen_list.select.with_index {|_, i| i.even?}
#### iterate through the array through US queens then concatenate a url for each queen
cleaned_queen_list[1..185].each do |queen|
queen_url = "https://rupaulsdragrace.fandom.com/wiki/#{queen.attr("title")}"
#### transliterate each url, pass it through a regex expression, then open it with Nokogiri
I18n.enforce_available_locales = false
cleaned_queen_url = I18n.transliterate(queen_url).split(' ').join('_').gsub(/\(.+/, '')
queen_doc = Nokogiri::HTML(open(cleaned_queen_url))
#### scrape each queens seasons into an array for association to seasons through appearances
queen_seasons = queen_doc.xpath(
'//*[@data-source="Season"]/div[@class="pi-data-value pi-font"]/a')
.map { |e| e.text.gsub(/[^0-9a-z%&!\n\/(). ]/i, '') }
#### scrape each queens biographical information
queen_drag_name = queen_doc.css(
"#mw-content-text > aside > section:nth-child(3) > div:nth-child(2) > div")
.text.gsub(/[^0-9a-z%&!\n\/(). ]/i, '')
queen_real_name = queen_doc.css(
"#mw-content-text > aside > section:nth-child(3) > div:nth-child(3) > div")
.text.split(' ').join(' ')
queen_date_of_birth = queen_doc.xpath(
'//*[@data-source="birth year"]/div[@class="pi-data-value pi-font"]/text()')
.text[0..-3]
queen_ethnicity = queen_doc.xpath(
'//*[@data-source="Ethnicity"]/div[@class="pi-data-value pi-font"]/text()')
.map { |e| e.text.gsub(/[^0-9a-z%&!\n\/(). ]/i, '') }
queen_current_city = queen_doc.xpath(
'//*[@data-source="Current City"]/div[@class="pi-data-value pi-font"]/text()').text
#### Fandom labels "Birthplace" or "Hometown", scrape for both and define them as one
variable_queen_birthplace = queen_doc.xpath(
'//*[@data-source="Birthplace"]/div[@class="pi-data-value pi-font"]/text()').text
variable_queen_hometown = queen_doc.xpath(
'//*[@data-source="Hometown"]/div[@class="pi-data-value pi-font"]/text()').text
queen_hometown = variable_queen_birthplace + variable_queen_hometown
#### scrape each queens social/web links, define "Official Website" and "Website" as one
queen_instagram = queen_doc.xpath('//a[text()="Instagram"]').attribute('href')
queen_twitter = queen_doc.xpath('//a[text()="Twitter"]').attribute('href')
queen_facebook = queen_doc.xpath('//a[text()="Facebook"]').attribute('href')
queen_youtube = queen_doc.xpath('//a[text()="YouTube"]').attribute('href')
queen_imdb = queen_doc.xpath('//a[text()="IMDB Page"]').attribute('href')
queen_wikipedia = queen_doc.xpath('//a[text()="Wikipedia"]').attribute('href')
variable_queen_site = queen_doc.xpath('//a[text()="Site"]').attribute('href')
variable_queen_official_site = queen_doc.xpath('//a[text()="Official Website"]').attribute('href')
queen_website = variable_queen_site ? variable_queen_site : variable_queen_official_site
#### scrape each queen_doc after any element that has the Trivia id until the end of the element
#### run the trivia through a regex expression
queen_trivia = queen_doc.xpath(
'//*[preceding::*[@id="Trivia"] and following::*[@class="mw-headline"]]//following-sibling::li')
.map { |e| e.text.split(' ').join(' ').gsub(/[^0-9a-z%&!\n\/(). ]/i, '') }
#### scrape each queens quotes in the various ways Fandom labels them then define them as one
#### scrape each queen_doc after any element that has the #{variable} id until the end of the element
#### run the trivia through a regex expression
variable_fandom_quotes = queen_doc.xpath(
'//*[preceding::*[@id="Quotes"] and following::*[@class="mw-headline"]]//following-sibling::li')
.map { |e| e.text.gsub(/[^0-9a-z%&!\n\/(). ]/i, '') }
variable_memorable_quotes = queen_doc.xpath(
'//*[preceding::*[@id="Memorable_Quotes"] and following::*[@class="mw-headline"]]//following-sibling::li')
.map { |e| e.text.gsub(/[^0-9a-z%&!\n\/(). ]/i, '') }
variable_memorable_quotes_C2A0 = queen_doc.xpath(
'//*[preceding::*[@id="Memorable_Quotes.C2.A0"] and following::*[@class="mw-headline"]]//following-sibling::li')
.map { |e| e.text.gsub(/[^0-9a-z%&!\n\/(). ]/i, '') }
queen_quotes = variable_fandom_quotes + variable_memorable_quotes + variable_memorable_quotes_C2A0
#### create an instance of a queen for each url, assigning all the appropriate attributes
Queen.create!(drag_name: queen_drag_name,
real_name: queen_real_name,
primary_image: queen_primary_image,
date_of_birth: queen_date_of_birth,
hometown: queen_hometown,
current_city: queen_current_city,
ethnicity: queen_ethnicity,
instagram: queen_instagram,
twitter: queen_twitter,
facebook: queen_facebook,
youtube: queen_youtube,
website: queen_website,
imdb: queen_imdb,
wikipedia: queen_wikipedia,
#### for each instance of a queen, create instances of their trivia and quotes
trivia_attributes: queen_trivia.map {|trivium| {content: trivium}},
quotes_attributes: queen_quotes.map {|quote| {content: quote}},
#### associations to Seasons through Appearances can be defined here using the scraped array of seasons
#### but I'll define those associations from within the get_seasons method using the array of contestants
)
end
end
end
view raw queen.rb hosted with ❤ by GitHub

3. With Seasons and Queens instantiated, iterate through the Seasons and .create!() an appearance for each episode per Queen and her appropriate appearance attributes.

__see the next post for the gist__

That's all folks!

Top comments (0)