DEV Community

Dave Parr
Dave Parr

Posted on • Updated on • Originally published at

Webscraping with rvest and themeing ggplot

Enter fullscreen mode Exit fullscreen mode

ggplot is the ‘default’ plotting
library in R. It’s a very old package now, but has been kept up-to-date
and is one of the core ‘tidyverse’ packages.
rvest is also a tidyverse package that
deals with web scrapping, inspired by equivalents like “beautiful

There is a table of hex colour codes used by
for each pokemon type. I’d like top be able to use this for plots made
with my pokedex package.

Webscraping with rvest

Get the data

read_html("") %>% 
  html_nodes(".wikitable") %>%
  .[[1]] %>% 
  html_table() -> pokemon_colour_table
Enter fullscreen mode Exit fullscreen mode

This very simple pipe goes to the url and detects all html nodes with a
class of "wikitable" and puts them in a list. It then takes the first
element (of one in this case), converts it into a table, and assigns it
to a variable pokemon_colour_table

Clean the data

pokemon_colour_table %>%
  janitor::clean_names() %>%
  slice(1:75) %>%
  select(-video_game_types_3) %>%
  rename(type_full = video_game_types, colour = video_game_types_2) %>%
  filter(type_full != "") %>%
    type = tolower(str_trim(str_remove_all(type_full, "color|light|dark|\\:"))),
    colour_var = case_when(
      str_detect(type_full, "light") ~ "light",
      str_detect(type_full, "dark") ~ "dark"
  ) %>%
  mutate(colour = paste0("#", colour)) %>%
  select(-type_full, type, colour_var, colour) -> type_colours
Enter fullscreen mode Exit fullscreen mode

Cleaning the data is the more irritating part, as always. First,
janitor::clean_names() does a bunch of sane default things to make
sure our table names are snakecase, with no mad characters and
duplication etc.. Then, as we only want the first part we slice it, and
as we only want the first 2 columns, we drop the third. We then give the
remaining columns sane names, and remove rows that have empty strings.

The meat of the data cleaning comes next, parsing the label column to
get just the type out and convert it to lower case and putting it into a
new column, then conditionally checking if the row is a variant
light/dark hue, or the default, and making a column to represent that.
Finally we convert the colour code to an actual hex string.

Format for ggplot2 colour scale

ggplot2 wants the scale as a named list. Making this in a tidy way is
very straightforward.

type_colours %>%
  filter( %>%
  select(-colour_var) %>%
  mutate(colour = set_names(colour, type)) %>%
  pull(colour) -> pokemon_type_scale_colours
Enter fullscreen mode Exit fullscreen mode

In this particular case we select all the values that do not have a
colour_var value, i.e. the defaults, drop the colour_var column, and
set the names of the colour column to the value of the type column.
We have to do this because scale_*_manual() in ggplot will expect a
named list, where the names are the type categorical variable, and the
contents of the list are the hex colour codes for that type. Then when
we pull that column into a list we will have a named list.


Add a font with showtext

Keeping the video game flavour, lets also make a quick theme using the a
video game font. We can use
showtext to easily add the
“Press Start 2P” font from google fonts.

font_add_google("Press Start 2P")
Enter fullscreen mode Exit fullscreen mode

Then, starting from the theme_minimal we can replace the default font,
and rotate the text labels on the bottom axis.

theme_pokedex <- function () {
  theme_minimal() %+replace%
      text = element_text(family = "Press Start 2P"),
      axis.text.x = element_text(angle = -90)

Enter fullscreen mode Exit fullscreen mode


To demonstrate, lets make a simple plot showing the key stats of the

pokemon %>% 
  filter(evolution_chain_id == 67) %>% 
  select(identifier, hp:speed, type_1) %>% 
  pivot_longer(cols = c(hp:speed),
               names_to = "stat") %>% 
  ggplot(aes(x = stat, y = value, fill = type_1)) +
  geom_col() +
  facet_wrap(. ~ identifier) +
  scale_fill_manual(values = pokemon_type_scale_colours) +
  labs(title = "eeveelutions stats")
Enter fullscreen mode Exit fullscreen mode

Alt Text

Top comments (0)