I was going through this project that scrapes twitter however it is now not working properly as Twitter has changed its front-end code structure and even the way how tweets fetch from the backend. Now, sending an HTTP request and parsing that HTML source code to get the tweet's data does not work and I needed even more data than what twitter's API can offer. So, I created this project to run with a headless web browser and get the tweet's data.
What data do we get?
Key | Type | Description |
tweet_id | String | Post Identifier(integer casted inside string) |
username | String | Username of the profile |
name | String | Name of the profile |
profile_picture | String | Profile Picture link |
replies | Integer | Number of replies of tweet |
retweets | Integer | Number of retweets of tweet |
likes | Integer | Number of likes of tweet |
is_retweet | boolean | Is the tweet a retweet? |
retweet_link | String | If it is retweet, then the retweet link else it'll be empty string |
posted_time | String | Time when tweet was posted in ISO 8601 format |
content | String | content of tweet as text |
hashtags | Array | Hashtags presents in tweet, if they're present in tweet |
mentions | Array | Mentions presents in tweet, if they're present in tweet |
images | Array | Images links, if they're present in tweet |
videos | Array | Videos links, if they're present in tweet |
tweet_url | String | URL of the tweet |
link | String | If any link is present inside tweet for some external website. |
What we can scrape?
- Any profile's tweet that exists on Twitter.
- Scrape by keyword as well, like "google".
- Scrape by hashtags like "#india".
What if the IP is getting blocked due to too many requests?
- It has a feature to set proxies as well, authenticated as well as unauthenticated.
To know more about it's usage check the entire repository here
Top comments (0)