DEV Community

loading...

Twint (Twitter Intelligence)- A advanced Twitter Scrapper

vishwasnarayan5 profile image vishwasnarayanre Updated on ・5 min read

Twint is a Python-based specialized Twitter scraping application that helps you to scrape Tweets from Twitter accounts without using Twitter's API.

Twint makes use of Twitter's search operators to allow you to scrape Tweets from individual accounts, scrape Tweets related to specific themes, hashtags, and patterns, or extract confidential information from Tweets such as e-mail addresses and phone numbers. This is really helpful to me, and you can get very imaginative about it as well.

Twint also sends unique requests to Twitter, enabling you to scrape a Twitter user's followers, Tweets likes, and who they support without the need for authentication, API, Selenium, or browser emulation.

Few advantages of using Twint over Twitter API:

lets start with the installation, its better that we can hack something new here, its basically scrapping the information that is available on web twitter-Twitter restricts scrolling when viewing a user's account. This means that by using. Profile or. Favorites will enable you to receive 3200 tweets..
Please keep in mind that Twitter reduces the amount of time you will spend scrolling through a user's timeline. As a result, you will not be able to see all of the tweets when running. Alternatively, Favorite Things

Lets just start with some installation.

git clone --depth=1 https://github.com/twintproject/twint.git
cd twint
pip3 install . -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

or via pip

pip3 install twint or else pip3 install --user --upgrade git+https://github.com/twintproject/twint.git@origin/master#egg=twint

or else with pipenv

pipenv install git+https://github.com/twintproject/twint.git#egg=twint

This tool is so flexible that we can use this tool on the Command line also.

A few simple examples to help you understand the basics:

  • twint -u username - Scrape all the Tweets of a user (doesn't include retweets but includes tweets or replies from the user).

  • twint -u username -s TensorFlow - Scrape all Tweets from the user's timeline containing TensorFlow and also gives them with the other Information about the tweets.

  • twint -s TensorFlow - Collect every Tweet containing TensorFlow from everyone's Tweets.

  • twint -u username --year 2019 - Collect Tweets that were tweeted before 2019.

  • twint -u username --since "2015-12-20 20:30:15" - Collect Tweets that were tweeted since 2015-12-20 20:30:15.

  • twint -u username --since 2015-12-20 - Collect Tweets that were tweeted since 2015-12-20 00:00:00.

  • twint -u username -o file.txt - Scrape Tweets and save to file.txt.

  • twint -u username -o file.csv --csv - Scrape Tweets and save as a csv file, you can apply the Machine learning on these datasets as it is going to be a structured dataset to some extent.

  • twint -u username --email --phone - Show Tweets that might have phone numbers or email addresses only tweets only, retweets will be neglected.

  • twint -s "Some-celebrity-profile-id" --verified - Display Tweets by verified users that Tweeted about "Some-celebrity-profile-id".

  • twint -g="12.880058,77.385935,1km" -o file.csv --csv - Scrape Tweets from a radius of 1km around a place in this longitude and latitude and export them to a csv file.

  • twint -u username -es localhost:9200 - Output Tweets to Elasticsearch, this will basically be the port for the elastic search.

  • twint -u username -o file.Json --json - Scrape Tweets and save as a json file.

  • twint -u username --database tweets.db - Save Tweets to a SQLite database so its a table format of the SQL.

  • twint -u username --followers - Scrape a Twitter user's followers in that time.

  • twint -u username --following - Scrape who a Twitter user follows.

  • twint -u username --favorites - Collect all the Tweets a user has favorited (gathers ~3200 tweet).

  • twint -u username --following --user-full - Collect full user information a person follows

  • twint -u username --timeline - Use an effective method to gather Tweets from a user's profile (Gathers ~3200 Tweets, including retweets & replies).

  • twint -u username --retweets - Use a quick method to gather the last 900 Tweets (that includes retweets) from a user's profile.

  • twint -u username --resume resume_file.txt - Resume a search starting from the last saved scroll-id.

if you want to even explore some more then you can find the basic commands here from the GIT for the twint in wiki section in Github.

If you are a programmer like me and want to learn more from programing here I give you a docker container for the Twint.

import twint

c = twint.Config()
c.Username = "username"
c.Format = "ID {id} | Username {username}"

twint.run.Search(c)
Enter fullscreen mode Exit fullscreen mode

the above is just a boiler plate code for the function in the real time.

import twint

c = twint.Config()
c.Username = "username"
c.Custom["tweet"] = ["id", "username"]
c.Output = "tweets.csv"
c.Store_csv = True

twint.run.Search(c)
Enter fullscreen mode Exit fullscreen mode

you can use the above code to get the required information and also them in storing it as a CSV(Comma Separated Value) file.

Scrapping is made very easy with this tool thus Scrapping is a best form to get the information About the user.

import twint

c = twint.Config()
c.Username = "twitter"

twint.run.Search(c)
Enter fullscreen mode Exit fullscreen mode

We can get a targeted information with code that can be a saved in any file format from this commands/lines of code/program.

import twint

c = twint.Config()
c.Username = "twitter"

twint.run.Lookup(c)
Enter fullscreen mode Exit fullscreen mode

What exactly is shadow banning? What should we do in such a situation?

A person could be shadow barred for a variety of reasons. Whatever the cause, you will no longer be able to check for tweets sent by that person. This does not imply that the user has been temporarily or indefinitely suspended; if we go to the user's website, we will always be able to see his/her tweets. It is suggested in cases like this one. It won't return many tweets, however for the time being, it's the better option.

You will also look for tweets sent to the user as well as tweets that reference him/her. There are other aspects to cover here where you can store the information in RAM(Random-access memory) it is very much not recommended.

But if you are storing it as a real time variable store it as data frame and you can use it as a pandas dataframe,thats why the dependency will be called for the pandas also.

Alt Text

#this code is for the tweets from the User

import twint

c = twint.Config()
c.Limit = 20
c.Username = username
c.Pandas = True

twint.run.Search(c)

Tweets_df = twint.storage.panda.Tweets_df
Enter fullscreen mode Exit fullscreen mode

Thus, I hope this article that I have written will be more useful here and also thank you for reading till here.

Discussion (0)

Forem Open with the Forem app