Scrape twitter profiles and hashtags

Sajid Shaikh — Mon, 01 Nov 2021 07:18:10 +0000

I was going through this project that scrapes twitter however it is now not working properly as Twitter has changed its front-end code structure and even the way how tweets fetch from the backend. Now, sending an HTTP request and parsing that HTML source code to get the tweet's data does not work and I needed even more data than what twitter's API can offer. So, I created this project to run with a headless web browser and get the tweet's data.

What data do we get?

Key	Type	Description
tweet_id	String	Post Identifier(integer casted inside string)
username	String	Username of the profile
name	String	Name of the profile
profile_picture	String	Profile Picture link
replies	Integer	Number of replies of tweet
retweets	Integer	Number of retweets of tweet
likes	Integer	Number of likes of tweet
is_retweet	boolean	Is the tweet a retweet?
retweet_link	String	If it is retweet, then the retweet link else it'll be empty string
posted_time	String	Time when tweet was posted in ISO 8601 format
content	String	content of tweet as text
hashtags	Array	Hashtags presents in tweet, if they're present in tweet
mentions	Array	Mentions presents in tweet, if they're present in tweet
images	Array	Images links, if they're present in tweet
videos	Array	Videos links, if they're present in tweet
tweet_url	String	URL of the tweet
link	String	If any link is present inside tweet for some external website.

What we can scrape?

Any profile's tweet that exists on Twitter.
Scrape by keyword as well, like "google".
Scrape by hashtags like "#india".

What if the IP is getting blocked due to too many requests?

It has a feature to set proxies as well, authenticated as well as unauthenticated.

To know more about it's usage check the entire repository here

Scrape Facebook public pages without an API key or limitations

Sajid Shaikh — Mon, 04 Jan 2021 15:02:44 +0000

Facebook's API is really difficult to setup and have rate limiting as well. Why not getting public data with some automation?. Here's a python library that does the job.

Install it with pypi:

pip install facebook-page-scraper

Or
Install it from source:
Download it using git:

git clone https://github.com/shaikhsajid1111/facebook_page_scraper.git

and open terminal inside folder and enter command:

python3 setup.py install

.

How to use it?
Well its simple!,
Just import class from the package,instantiate and start scraping.

Suppose I want posts from Facebook AI,

from facebook_page_scraper import Facebook_scraper

#instantiate the Facebook_scraper class

page_name = "facebookai"
posts_count = 10
browser = "firefox"

facebook_ai = Facebook_scraper(page_name,posts_count,browser)

Above was instantiation part, Suppose you want data in JSON format than just call the

scrap_to_json()

method.
Like:

json_data = facebook_ai.scrap_to_json()
print(json_data)

And you will get the JSON Output:

{
    "1730063790503900": {
        "name": "Facebook AI",
        "shares": 65,
        "reactions": {
            "likes": 305,
            "loves": 31,
            "wow": 7,
            "cares": 0,
            "sad": 0,
            "angry": 0,
            "haha": 0
        },
        "reaction_count": 343,
        "comments": 11,
        "content": "We\u2019re training computer vision models that leverage Transformers, a deep neural network architecture. Data-efficient image Transformers (DeiT) use less data and computing resources to produce high-performance image classification AI models.  We hope to advance the field of computer vision by sharing this work with the broader community, making large-scale systems that train AI models more accessible to researchers and engineers.",
        "posted_on": "2020-12-24T04:05:27",
        "video": "",
        "image": [
            "https://scontent-bom1-2.xx.fbcdn.net/v/t39.2365-6/p540x282/131570013_988138305044034_3894567585410559092_n.png?_nc_cat=109&ccb=2&_nc_sid=eaa83b&_nc_ohc=mAeDelparrEAX-3Mk7E&_nc_ht=scontent-bom1-2.xx&_nc_tp=30&oh=3fedb0e3cea6ad6f934ca20f77bec624&oe=600CB4C9"
        ],
        "post_url": "https://www.facebook.com/facebookai/posts/1730063790503900"
    },    ...

}

if you want to save the data to CSV file directly, Just call the

scrap_to_csv()

method.

Like:

filename = "data_file"  #file name without CSV extension,where data will be saved
directory = "E:\data" #directory where CSV file will be saved
facebook_ai.scrap_to_csv(filename,directory)

Output:

source

DEV Community: Sajid Shaikh

Scrape twitter profiles and hashtags

Scrape Facebook public pages without an API key or limitations