DEV Community

Joohyun "Steve" Kim
Joohyun "Steve" Kim

Posted on • Updated on

Ruby CLI application: scraping, object relationships and single source of truth

There’s an exciting aspect to building a CLI application, or rather the CLI in general. For the average user like myself who hasn’t a clue about the inner workings of a computer and has been confined to the comfort of GUIs, even just typing commands into the Terminal can give you the sensation of feeling like you’ve become some notorious hacker in a spy movie. And thus, to be able to successfully build an entire CLI-based application from scratch was truly a rewarding experience.

The Goal

I decided to build a very basic application for Premier League football (or soccer, in some countries) where it shows the current league standings, all the clubs in the league, and information for each individual club, and perhaps even stats for individual players (although I never got that far). It started off as a breeze. I thought I had a pretty good grasp of the concepts being dealt with, how I was going to obtain the data by scraping, and so on and so forth. To be fair, I had a great guide to follow, using a video demo that was done by my bootcamp instructor. All of it made so much sense when I was watching the demo, but as with most things in life, it’s one thing to watch someone do it and completely another to actually do it.

My first despair

Scraping consumed a bulk of my time. Not fully understanding and knowing how to effectively utilize Nokogiri methods was my downfall. I was fixated on chaining my .css selectors when later on I discovered that I could have much more easily grabbed the same data by joining ids and classes directly to tags. For instance, a line in my scraper class that grabs a piece of data like so:

.css('.tableBodyContainer.isPL').css('tr:not(.expandable)').css('.long').text
Enter fullscreen mode Exit fullscreen mode

could have just as easily accomplished the same thing using:

.search('tbody.tableBodyContainer.isPL span.long').text
Enter fullscreen mode Exit fullscreen mode

My second despair

I knew two important rules about building relationships when going into this project. One was that the objects will need to follow the principle of maintaining a ‘single source of truth’ when building relationships across classes and that this should be done by having the object which “belongs-to” another object be accountable for holding the relationship. And once I’ve done that, I knew I’d be expected to establish the remaining relationships only through methods. Simple enough, right? The only problem was that this seemed much easier in my head when the relationship was A -< B >- C as opposed to what I had to do which was A -< B -< C. So instead of B keeping track of both A and C, I needed to have B accountable for A and C accountable for B, then somehow build methods that will allow A to interact with C and vice versa. After building and re-building my classes over and over and hours of rubber duck debugging, I got it done.

league = League.find_or_create_by_name(league_name)
new_club = Club.new(name, league, position, matches_played, matches_won, matches_drawn, matches_lost, goals_for, goals_against, goal_diff, points)
Player.new(new_club, player_number, player_name, player_position)
Enter fullscreen mode Exit fullscreen mode

My Club class was keeping track of my League class and my Player class was keeping track of my Club class.
Then I went on to build methods in my League class that could communicate with my Player class, like so:

def clubs
    Club.all.select {|club| club.league == self}
end

def players
    Player.all.select {|player| self.clubs.include?(player.club)}
End
Enter fullscreen mode Exit fullscreen mode

And then finally an instance method inside my Player class to access my League class:

def league
    League.all.select {|league| league.players.include?(self)}
End
Enter fullscreen mode Exit fullscreen mode

Final thoughts

There’s something I have yet to figure out and that’s a way to delay my deeper level scrapes until they're needed, instead of scraping all of my data in advance when the application first runs. The scraping simply takes way too long at the moment. Although I would ideally like to store the URL for my deeper scrapes as instance variables and then pass it into a scraper method as needed, this is proving to be a lot more difficult than I had anticipated primarily because of the way my second scrape is designed and the way my logic is currently built in the CLI class. Hopefully as I dive deeper into programming and become more skillful, I will be able to find a more elegant solution.

Top comments (0)