DEV Community

Cover image for Rank Performances
JaredHarbison
JaredHarbison

Posted on • Edited on

1 1

Rank Performances

Rails + React + Redux - Pt 6


This post is going to revisit the get_seasons and get_appearances method with the most complex aspect of the scrape so far, the condensed and transformed ranks of each queen in each episode.


Let's get started!


1. Revisit get_seasons in season.rb to scrape, clean, and transform the ranks of each episode per queen and store them in the episode for use in get_appearances.

class Season < ApplicationRecord
has_many :episodes
accepts_nested_attributes_for :episodes
def get_seasons
I18n.enforce_available_locales = false
#### define the location of the season index and open it with Nokogiri
seasons_index_url = "https://rupaulsdragrace.fandom.com/wiki/Category:Seasons"
seasons_index_doc = Nokogiri::HTML(open(seasons_index_url))
#### define an array of season names
seasons_list = seasons_index_doc.xpath('//td[1]/a[1]').map {|season| season.text}
#### define an array of urls for each season, distinguishing between the two series...
seasons_urls = seasons_list.map do |season|
if season.starts_with?("All Stars")
#### ... by removing the numbers from the end of the season name to concatenate them into the url...
all_stars_season = season[-1, 1]
season_url = "https://rupaulsdragrace.fandom.com/wiki/RuPaul%27s_Drag_Race_All_Stars_(Season_#{all_stars_season})"
else
#### ... and accomodating for double digit season numbers in the original series
if season.length > 8
rpdr_season = season[-2, 2]
else
rpdr_season = season[-1, 1]
end
season_url = "https://rupaulsdragrace.fandom.com/wiki/RuPaul%27s_Drag_Race_(Season_#{rpdr_season})"
end
end
#### Now create a Season object for each of the listed seasons, with the season name and fandom url
seasons_list.each.with_index do |season, index|
Season.create!(
season_name: season,
fandom_season_url: seasons_urls[index]
)
end
#### iterate through the season urls to open each one with Nokogiri and predict the season ids
seasons_urls.map.with_index do |season, index|
season_doc = Nokogiri::HTML(open(season))
season_id = index + 1
#### scrape simple details for each season_doc
season_premiere = season_doc.xpath('//*[@data-source="premiere"]/div[@class="pi-data-value pi-font"]/text()').text
season_finale = season_doc.xpath('//*[@data-source="finale"]/div[@class="pi-data-value pi-font"]/text()').text
season_judges = season_doc.xpath('//*[@data-source="judges"]/div[@class="pi-data-value pi-font"]/a/text()').map {|judge| judge.text}
#### scrape the seasons's episode header row from the progress table and reject any empty cells
table_headers = season_doc.xpath('//*[@class="wikitable"]//following-sibling::th').map {|episode| episode.text}
#### find the contestants column, identify the index integer, then add 1 and turn it into a string to prepare for xPath
find_contestants_column = table_headers.map.with_index do |header, index|
header.starts_with?(" Contest" || "Contest")
end
contestants_column_index = find_contestants_column.find_index(true)
contestants_column_number = contestants_column_index + 1
contestants_column_number_string = contestants_column_number.to_s
#### concatenate the column index into the xPath string and pull the contestants from the column
contestants_column_xpath = '//*[@id="mw-content-text"]//td[' + contestants_column_number_string + ']/b/a'
season_contestants = season_doc.xpath(contestants_column_xpath).map {|header| header.text.gsub(/[^0-9a-z%&!\n\/(). ]/i, '').strip}
#### find the episode title column, identify the index integer, then add 1 and turn it into a string to prepare for xPath
episodes_table_headers = season_doc.xpath('//center/table[@class="wikitable"]//th').map {|episode| episode.text}
find_episode_title_column = episodes_table_headers.map.with_index do |header, index|
header.starts_with?("Title", " Episode Title")# || header.starts_with?("Episode Title")
end
title_column_index = find_episode_title_column.find_index(true)
title_column_number = title_column_index.to_i + 1
title_column_number_string = title_column_number.to_s
title_column_xpath = '//center/table[@class="wikitable"]//td[' + title_column_number_string + ']'
#### pull the episode titles row into an array and reject the blanks
episode_titles_row = season_doc.xpath(title_column_xpath).map {|header| header.text}
episode_titles = episode_titles_row.reject {|episode| episode.blank?}
#### iterate through the headers to find the episode columns, pull the episode number, then reject any blanks
episode_number_headers = table_headers.map do |episode|
if episode.starts_with?("Ep.", "Ep. ", " Ep.", " Ep.")
episode.gsub(/[^0-9]/, '')
else
episode = ""
end
end
episode_indices = episode_number_headers.map.with_index {|header, index| header.to_i >= 1 ? index : ""}
episode_indices.reject! {|x| x.blank?}
#### define a range of cells indices containing episode numbers, turn them to row numbers, then scrape the rows
episode_indices_range = episode_indices.first..episode_indices.last #will need .to_s
row_numbers = season_contestants.map.with_index {|cont, index| index + 2}
row_lookups = row_numbers.map do |row|
'//*[@class="wikitable"]//following-sibling::tr['+row.to_s+']/td'
end
#### clean up the cell contents then use the strings to apply numerical ranks
draft_lookups = row_lookups.map do |lookup|
season_doc.xpath(lookup).map {|el| el.text.squish}
end
draft_final = draft_lookups.map do |lookup|
lookup[episode_indices_range]
end
final_ranks = draft_final.map do |ranks|
ranks.map do |rank|
if rank.include?("WINNER")
final_ranks = 10
elsif rank.include?("RUNUP" || "LOST")
final_ranks = 9
elsif rank.include?("MISSCON")
final_ranks = 8
elsif rank.include?("WIN")
final_ranks = 7
elsif rank.include?("HIGH")
final_ranks = 6
elsif rank.include?("TOP")
final_ranks = 5
elsif rank.include?("SAFE"|| "RUNNING")
final_ranks = 4
elsif rank.include?("LOW")
final_ranks = 3
elsif rank.include?("BTM")
final_ranks = 2
elsif rank.include?("ELIM" || "OUT" || "JUROR" || "RTRN")
final_ranks = 1
else
final_ranks = 0
end
end
end
#### combine contestants names and ranks due to sqlite database limitations
season_contestants = season_contestants.zip(final_ranks)
#### create unique episode keys of the same length (i.e. S04E10)
episode_numbers = episode_number_headers.reject {|episode| episode.blank?} #.gsub(/[^0-9]/, '')
season_episodes_codes = episode_numbers.map do |episode|
if season_id.to_s.length < 2 && episode.to_s.length < 2
"S0" + season_id.to_s + "E0" + episode.to_s
elsif season_id.to_s.length < 2 && episode.to_s.length == 2
"S0" + season_id.to_s + "E" + episode.to_s
elsif season_id.to_s.length == 2 && episode.to_s.length < 2
"S" + season_id.to_s + "E0" + episode.to_s.to_s
else
"S" + season_id.to_s + "E" + episode.to_s
end
end
#### iterate through the episodes array to create each Episode
episode_numbers.map.with_index do |episode, index|
Episode.create(
season_id: season_id,
episode_name: episode_titles[index],
episode_code: season_episodes_codes[index],
contestants: season_contestants
)
end
end
end
end
view raw season.rb hosted with ❤ by GitHub

2. Revisit get_appearances in appearance.rb to instantiate each and every appearance with the queens numerical rank for that episode.

class Appearance < ApplicationRecord
belongs_to :queen
belongs_to :episode
def get_appearances
#### iterate through each season, pulling episode ids and setting the index of each iterative episodes' ranks
Season.all.each do |season|
season.episodes.each.with_index do |episode, index|
episode_id = episode.id
rank_index = index + 2
#### clean the contestants array for Appearance creation
contestants = episode.contestants.split(", ").map do |contestant|
contestant.gsub(/[^0-9a-z%&!\n\/(). ]/i, '').strip
end
contestants = contestants.in_groups_of(season.episodes.length + 1)
#### iterate through the array of contestants to create an Appearance for each Episode
contestants.map do |contestant|
Appearance.create(
episode_id: episode_id,
#### use the contestants list as it appeared on the season's Fandom page to find the corresponding Queens
queen_id: Queen.find_or_create_by(drag_name: contestant[0]).id,
#### set the queen's numerical rank for her appearance in each episode
rank: contestant[rank_index]
)
end
end
end
end
end
view raw appearance.rb hosted with ❤ by GitHub

I'd love to refactor the get_XXXX methods from the previous posts as soon as possible, but I may move on to some UI in my post next week!


That's all folks!

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read full post →

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more