Prior to beginning Flatiron's remote software engineering program, I had a little experience programming with JavaScript. But my experience only really extended to essential mechanics and syntax. In other words, I had never really created a program, even a simplistic program, from the ground up. So, I was EXTREMELY excited to begin my first major project to do just that: creating a Ruby CLI (command line interface) program.
I studied film in college, so I really wanted to create a basic CLI program that interacted in some way with film-related data. After giving it some consideration, I decided to create a program that could display and sort data related to the American Film Institute's Top 100 film list.
You can take a look at the list here: https://www.afi.com/afis-100-years-100-movies-10th-anniversary-edition/
Now, by no means do I consider this to be a particularly definitive list of the 100 best American films. In fact, there is a lot that I dislike about the list (prime example: The Sixth Sense is ranked 89th, AHEAD of films like Goodfellas, Pulp Fiction, Do the Right Thing, and Bladerunner.) That being said, the website is a really good resource that contains all of the relevant information on one webpage. If you're a film buff like me, here are the kinds of questions you're prone to ask while reviewing such a list:
- Was x, y or z director included in the AFI Top 100 (spoiler alert: if you're a fan of David Lynch, the answer is no!)
- How many films from a certain director were included?
- How many films from a certain actor were included?
- How many films from a certain year were included?
These are just A FEW of the questions you might want answered. In essence, I wanted to create a CLI program that would take in all of that AFI data and give the user the ability to answer some of these questions.
Scraping the Website
First things first, I knew I would need a scraper class to take in all of the relevant data from the webpage. Here is a section of the website to give you an idea of all of the info I wanted to capture.
On the most basic level, here is what I wanted my scraper to do:
- Scrape the information related to each INDIVIDUAL film (name, title, rank, year, director, writer, cast, producer, cinematographer, editor, and production company).
- Create a hash containing all of this data... Something like: {:name=>"Citizen Kane, :rank=>1, :year=>"1941", :director=>"Orson Welles"}
- Finally, instantiate those hashes as new instances of a Movies class with all of those keys and values set as attributes and added to a Movies class array containing each instance.
Scraping was by far the most complicated and labor intensive part of the project because it's where MOST of the work of the program is actually being done! Once upon I was able to effectively scrape the data and use it to instantiate each new instance in the Movies class it was just a matter of accessing that data and sorting based on various types of functionality.
Let's take a look at that site picture again to give you a sense of why scraping was a little tricky...
There are essentially two levels of data here. Level 1 is the movie title, year, and rank, for instance:
So, right when we begin we can set the title, year, and rank, but THEN we need to iterate into the SECOND LEVEL. Level 2 includes cast, director, writer, producer, cinematographer, editor, and production company:
Working with this data presented its own unique challenges because, as you can see, it's bizarrely formatted and includes these numbered ids for every individual credited EXCEPT for the cast members. Here is what my code initially looked like just to spit this information out in a way that was appropriate:
So much splitting and, thankfully, just a dash of Regex! After it was all said and done, however, I was able to instantiate Movie instances that looked like this:
From here, it was just a matter of coding the methods for the most important things I wanted the program to do:
- Display the list
- Display the artists
- Sort by artists
- Sort by year
- Display the information of each individual movie
Top comments (0)