DOs & DONTs for Twitter Scraping 2025

#webscraping #twitter #ai #rag

As on the very first day of the calendar year 2025, the best free tool to scrape tweets from X(formerly Twitter) seems to be ElizaOS' agent-twitter-client from the team behind ElizaOS & ai16z.

Requiring no API keys & no visible rate-limits for a multitude of actions on an active twitter client seems like a win-win for all involved parties ! Well...... maybe not X 😅.

Checking out their README.md file provides a treasure trove of information on the available features and the best way to utilise them achieving expected results.

Configure environment variables for authentication.

TWITTER_USERNAME=    # Account username
TWITTER_PASSWORD=    # Account password
TWITTER_EMAIL=       # Account email
PROXY_URL=           # HTTP(s) proxy for requests (necessary for browsers)

# Twitter API v2 credentials for tweet and poll functionality
TWITTER_API_KEY=               # Twitter API Key
TWITTER_API_SECRET_KEY=        # Twitter API Secret Key
TWITTER_ACCESS_TOKEN=          # Access Token for Twitter API v2
TWITTER_ACCESS_TOKEN_SECRET=   # Access Token Secret for Twitter API v2

For read-only operations, just the TWITTER_USERNAME, TWITTER_PASSWORD & TWITTER_EMAIL fields are sufficient. No need to even bother with the rest of the environment variables.

To utilise this tool effectively in your scripts, bots, scraper, etc, do checkout the DOs & DONTs.

DOs

Cache the cookies as mentioned in the README wherever you run the application. Try to prevent multiple login requests in your scripts. Each request will be visible on your X page.
There can be rather long pauses between queries randomly, at times around fetching more than 100,000 tweets. The tool fetches 20 tweets at a time. But depending on the number of max tweets you pass as arguments, it will loop over & ensure that the max tweets are fetched. Do wait for it to restart.
Conditional Searches are possible but they return a maximum of 50 tweets even if a larger number is passed as an argument. Search for words instead of trying to get multiple match searches or pattern searches.
The tweets search method is an AsyncGenerator. To get the desired tweets use the following syntax.
https://github.com/elizaOS/agent-twitter-client/issues/24

const mentions = this.scraper.searchTweets(
          '#nodejs', 20, SearchMode.Latest
        );

for await (const mention of mentions) {
  console.log(mention);
};

Pass appropriate SearchMode in your queries.
READ THE F*****G MANUAL!
Check the supported Media Types

// Image formats and their MIME types
const imageTypes = {
  '.jpg':  'image/jpeg',
  '.jpeg': 'image/jpeg',
  '.png':  'image/png',
  '.gif':  'image/gif'
};

// Video format
const videoTypes = {
  '.mp4': 'video/mp4'
};

Check the media upload limitations

Maximum 4 images per tweet
Only 1 video per tweet
Maximum video file size: 512MB
Supported image formats: JPG, PNG, GIF
Supported video format: MP4

Use residential IPs if & when scaling up
Create burner accounts or purchase accounts if required.

DONTs

Even though there is an option, DO NOT use a custom fetch function whilst creating a Scraper instance. Unless you know what you are doing and understand the magic going down under the hood, this option is best left untouched. Passing custom fetch function leads to rate-limits near instantaneously within few hundred tweets fetched.
Do not use your personal account credentials unless you would be willing to burn the account if you were to get shadow-banned, rate-limited or perma-banned.
Do not use same account credentials across regions, countries, etc via VPNs or proxies. It greatly increases the chances of your account getting banned.
Do not utilise the credentials on Cloud providers like AWS, GCP, Azure, DigitalOcean, Hetzner, etc. The IPs of these providers are well known and significant account activity from such IPs shall probably trigger review / rate-limit / ban of your account.

Now that you are caught up on the DOs & DONTs of twitter scraping using the hottest library currently, GO & GET THOSE TWEETS to feed your RAGs, LLMS, etc.

Happy Scrapping in 2025. Feedback on what works and any hacks for the above are very welcome. Help build the knowledge base for your co-scrapers !

We are Simplr.sh | Everything web, but just Simplr!

Find our Open Source work @ https://github.com/simplr-sh

DEV Community

DOs & DONTs for Twitter Scraping 2025

Configure environment variables for authentication.

DOs

DONTs

Now that you are caught up on the DOs & DONTs of twitter scraping using the hottest library currently, GO & GET THOSE TWEETS to feed your RAGs, LLMS, etc.

Top comments (0)

Read next

On the Andy Pavlo's DB review

🎈5 AI Coding Tools That Will Change the Way You Develop Forever🎇(You Won't Believe #3!)

I created a whole macOS app with Flutter just using AI, here is my report

Lists of open-source frameworks for building RAG applications