** Disclaimer: This is a hobbyist work !! there are better ways to do so !!**
Last week, I spotted a post asking about how to scrape Facebook events, I've been wanting to do so for so long and I decided to give it a try and following is the result.
1- The tooling !
For this you'll need a:
- Modern browser dev tools
- Simple HTTP client to test things.
- JS/Browser HTTP client (optional).
- Web server to serve as a proxy to avoid CORS problems (optional).
2- Browser network tools ๐ฅ๐ฅ
Network tools are the most underrated dev tools of all time.
In this tutorial we'll use some of firefox's network tools features, namely filters and copy as cURL.
First thing you want to do is to visit fb.com/events and open the network tools. and put out some filter magic.
Facebook makes a post request to /events and what we want is to intercept the request and analyze it's content.
method: POST regexp:events/
Then click the request line, and let's explore the results.
If nothing appeared scroll down a bit until the browser makes another POST request.
3- Let's explore the thing
The next step, is exploring the request's body and learning from it.
To be honest, I don't know what all these params mean and are for but OH WELL !
First let's right click on the request and choose copy cURL from the copy menu.
Alright let's do some more. Next copy paste the right into your terminal (you have curl or some curl compatible tool) and Tadaaa !!
The is a little bit long isn't it ?? let's strip it down to :
curl 'https://www.facebook.com/events/discover/query/' --data 'suggestion_token=%7B%22city%22%3A%22default_108085075885850%22%7D&timezone_id=1&__a=1' > events.json
This is all you need to get your events right away ! the most useful parameter is the suggestion_token which contain preferences about the location you want events from, the topic and the date ...
4- Hakuna Matata ! Let's make something
Now we need to use our events for something (display them in a nice web page for example.
To do so, we need a browser http client library (or fetch API or just our old friend XHR).
But this won't give us what we want, because the bloody browser will block our cross origin request (for our good ofc ๐).
So in this case we will need to setup a proxy (web server) which will do the fetching for us. Then we will get the json on the client via fetch or jquery or axios or whatever.
Steps again:
- Because we are too lazy to rewrite the http request in our favorite language we're gonna use this website to translate the from cURL syntax to whatever syntax :p . curl to whatever
import requests
data = { 'suggestion_token': '{"city":"default_108085075885850"}', 'timezone_id': '1', '__a': '1'}
response = requests.post('https://www.facebook.com/events/discover/query/', data=data)
- setup a web server to serve what has to be served. Again, we're too lazy to setup a proper proxy (using flask or whatever) or at least write a decent one. So we will write a sh*ty one in less than 1 or 2 minutes .. Daaaamn, this is so messed up :p
from http.server import HTTPServer, BaseHTTPRequestHandler
import requests
class SimpleHTTPRequestHandler(BaseHTTPRequestHandler):
def do_GET(self):
data = {
'suggestion_token': '{"city":"default_108085075885850"}',
'timezone_id': '1',
'__a': '1'
}
response = requests.post('https://www.facebook.com/events/discover/query/', data=data)
self.send_response(200)
self.send_header("Access-Control-Allow-Origin", "*") # this is the key line of code !
self.end_headers()
self.wfile.write(response.text.encode())
PORT = 8080
httpd = HTTPServer(("", PORT), SimpleHTTPRequestHandler)
httpd.serve_forever()
- fetch it using axios or fetch or something and put it inside your beautiful web page.
axios.get('localhost:8080')
.then(function (response) {
parseAndAddToMyBeautifulWebPage(response); // beware the for(;;); in the beginning of the json data
})
That's it ! Thank you for reading :D, Sorry for the mess.
Oldest comments (2)
This was a fun little pass the time project. :) thanks
great post, thankyou! I've been wondering about this for a while too.
Of course facebook have changed things since your post to make it a little harder but it's still possible by intercepting one of multiple calls to facebook.com/api/graphql/ after clicking on 'more events'