Originally posted in my blog at: https://snir.dev/blog/apartments-bot/
Starting and completing a weekend project, is often challenging. Feature creep, tech stack choices and unwaranted tech excelency (100% test coverage for a weekend project anyone?) are your enemies, making the project timeline go beyond one weekend, which in turn usually results in the project never being done.
After countless such projects, I got to a place where I can confidently estimate a small project, define its features and use tech as a tool to complete my project. Preventing some ideological concepts and unnecessary perfection from getting in my way.
I've built a small project this weekend that will help me find my next apartment. And it is a great case study to demonstrate all lessons learned.
I'm looking for a new apartment to rent. In my local market, most land lords only advertise in private facebook groups. Each group is dedicated to another city/area. Which for me resulted in 7 different groups to cover my search area.
Going through the ads in facebook groups is a mess:
- You need to individually visit several group pages in every "scan".
- Posts with recent comments are bumped up, making you go through posts you already saw.
- There is no easy way to filter posts.
- This is tiresome, which might lead to me missing great opportunities that are time constrained (as these are high demand locations).
Before we dive in, if you are interested in the end result, my code is freely available at Github: https://github.com/snird/apartments_bot
Manually scanning multiple facebook groups every few hours is a long error-prone process. My automated solution had these requirements:
- Visit all groups manually and receive all ads.
- Remember which ads it already sent me, and filter those out.
- Filter ads by some text, in my case "3 rooms", to get only relevant ads.
- Send it to me in a way easily shareable with my partner.
I wanted to solve each of these requirements in the most pragmatic way possible. My top priority was to make it work, and be done with it within the span of a weekend. In the rest of this article I will share with you my technical decisions, reflecting on why I made them.
My initial thought was to use the facebook API to retrieve the posts in an easy to consume way. Unfortunately, some of the groups I needed are defined as "secret" and you cannot get their posts using the API.
With API usage out of the question, I had to resort to web scraping. I have strong experience with nodejs so it was no brainer for me to go that route. For node ecosystem Puppeteer came up and I started using it.
Puppeteers default is a headless browser. Every run is a clean start with no prior state (cookies, sessions etc'). Which means that I would have to authenticate to facebook every time. Handling authentication seemed out of my scope, and it is a one-time thing: connect to facebook.
Puppeteer also allows for a full regular Chrome session, connected through the debug socket. This allowed me to open Chrome, connect to my facebook account manually and then let my script use this Chrome instance. Not everything has to be solved by programming. I have goals I want to keep - functionality and time constraints.
Interested in reading the rest of the article?
Its available in my blog: https://snir.dev/blog/apartments-bot/