TL:DR;
use the Python Facebook-Scraper module with the sqlite3 module, and the "re" regex module to scrape, parse, filter, and store Facebook posts in Apartment rental groups. You can run it in browser at the bottom of this post!
Design
I am looking for a new apartment rental, but I don't have a lot of patience for reading endless Facebook posts and manually parsing the details. Instead, I want to run a cli program and have a list of relevant apartments to investigate further. I want to filter by number of rooms, price, and have pictures and the original post text to read.
Since this is built on top of many wonderful open-source libraries, I will use the "we" case for this post. Consider the code below as MIT licensed.
We want to have a CLI application that can fill an sqllite3 database with relevant facebook posts. So we delegate the cli documentation and flags to the Python built-in argparser module.
We delegate the responsibility to scrape public Facebook groups to the excellent Facebook-Scraper module.
kevinzg / facebook-scraper
Scrape Facebook public pages without an API key
We convert the posts from that module into our Data Transfer Object (DTO) , a Facebook post class. Using our own DTO class lets us add typing interfaces to all our functions and methods, which increases our velocity and our confidence in the code through static analysis and intellisense.
After we have a FacebookPost DTO, we can pass it through "append" builder functions which have regular expressions for extracting the price and the number of rooms from the post text. Since I am looking for an apartment in Israel, the regex's look for price and the number of rooms in Hebrew.
We composed a set of filters all of which run on instances of our FacebookPost DTO class. In this example, I filter on price and number of rooms. By using arguments to the CLI we can control the number of rooms and the price we are looking for.
We also created a post "printer" based on the FacebookPost DTO. One challenge we had was printing the right-to-left hebrew text in the console. By using the wonderful python-bidi module, I was able to print the right-to-left and left-to-right languages correctly.
MeirKriheli / python-bidi
BIDI algorithm related functions
However, since I want to eventually display the posts from a sqlite database, I removed the python-bidi requirement since it won't be relevant. I primarily use the sqlitebrowser to view the data and run SQL analysis on it
sqlitebrowser / sqlitebrowser
Official home of the DB Browser for SQLite (DB4S) project. Previously known as "SQLite Database Browser" and "Database Browser for SQLite". Website at:
The full code
Below is the full code. You can run the code in browser without installing anything using the repl.it embed below.
If you have questions, let me know in a comment below!
Below is the same code at a Github Gist which you can run locally using
$ git clone https://gist.github.com/barakplasma/34e8edf1640a4265479e9183fba38e47 \
pip install -r requirements.txt
Next up (if there's interest): How to build a static site using the sqlite database!
Leave a comment below if you want to see this next!
Top comments (1)
awesome way of scraping, also e-scraper.com/ can extract data from FB marketplace, it is a tricky place and buns me every time.