Rails + React + Redux - Pt 3
This post is going to focus on some of the more challenging tasks I encountered while scraping data from Fandom and connecting the data appropriately into the schema established in the last post. This post will focus on defining the get_seasons method and the get_queens method will follow. The gists are heavily commented!
Let's get started!
1. Def get_seasons in season.rb to scrape the list of season names from Fandom, concatenate each season name into an array of URLS for each Season's Wikipedia page, then iterate through the array to .create!() an instance of each Season.
class Season < ApplicationRecord | |
has_many :episodes | |
accepts_nested_attributes_for :episodes | |
def get_seasons | |
I18n.enforce_available_locales = false | |
#### define the location of the season index and open it with Nokogiri | |
seasons_index_url = "https://rupaulsdragrace.fandom.com/wiki/Category:Seasons" | |
seasons_index_doc = Nokogiri::HTML(open(seasons_index_url)) | |
#### define an array of season names and instantiate an object for each name | |
seasons_list = seasons_index_doc.xpath('//td[1]/a[1]').map {|season| season.text} | |
seasons_list.each {|season| Season.create!(season_name: season)} | |
#### define an array of urls for each season, distinguishing between the original series and All Stars | |
seasons_urls = seasons_list.map do |season| | |
if season.starts_with?("All Stars") | |
#### remove the numbers from the end of the season name to concatenate them into the url | |
all_stars_season = season[-1, 1] | |
season_url = "https://en.wikipedia.org/wiki/RuPaul%27s_Drag_Race_All_Stars_(season_#{all_stars_season})" | |
else | |
if season.length > 8 | |
rpdr_season = season[-2, 2] | |
else | |
rpdr_season = season[-1, 1] | |
end | |
season_url = "https://en.wikipedia.org/wiki/RuPaul%27s_Drag_Race_(season_#{rpdr_season})" | |
end | |
end | |
#### iterate through the season urls to open each one with Nokogiri | |
seasons_urls.map.with_index do |season, index| | |
season_doc = Nokogiri::HTML(open(season)) | |
season_id = index + 1 | |
#### scrape the seasons's episode header row and reject any empty cells | |
season_episodes = season_doc.xpath('//*[@id="mw-content-text"]/div/table[3]/tbody/tr[1]/th/text() | //*[@id="mw-content-text"]/div/table[3]/tbody/tr[1]/th/b/text()').map {|episode| episode.text} | |
season_episodes = season_episodes.reject {|episode| episode.length > 3 || episode.blank?} | |
#### create a unique episode identiefier (i.e. S4E10) to avoid future collisions | |
season_episodes_codes = season_episodes.map {|episode| "S" + season_id.to_s + "E" + episode.to_s} | |
#### scrape the contestants list to store in the episode object for easier appearance creation | |
season_contestants = season_doc.xpath('//*[@id="mw-content-text"]/div/table[3]/tbody/tr/td[1]/b').map {|contestant| contestant.text.downcase} | |
#### iterate through the episodes array to create each Episode | |
season_episodes.map.with_index do |episode, index| | |
Episode.create( | |
season_id: season_id, | |
episode_name: episode, | |
episode_code: season_episodes_codes[index] | |
) | |
end | |
end | |
end | |
end |
2. Def get_queens in queen.rb to scrape the list of queens' names from Fandom, concatenate each queen's name into an array of URLs for each Queen's Fandom page, then iterate through the array to .create!() an instance of each Queen and her attributes (including associations for Quotes and Trivia.
3. With Seasons and Queens instantiated, iterate through the Seasons and .create!() an appearance for each episode per Queen and her appropriate appearance attributes.
Top comments (2)
Great final project! Love seeing fellow Flatiron grads projects, well done!
Thanks Ben!