DEV Community

Cover image for DOs & DONTs for Twitter Scraping 2025
Simplr
Simplr

Posted on • Edited on • Originally published at blog.simplr.sh

7 1 1 1 1

DOs & DONTs for Twitter Scraping 2025

As on the very first day of the calendar year 2025, the best free tool to scrape tweets from X(formerly Twitter) seems to be ElizaOS' agent-twitter-client from the team behind ElizaOS & ai16z.

Requiring no API keys & no visible rate-limits for a multitude of actions on an active twitter client seems like a win-win for all involved parties ! Well...... maybe not X 😅.

Checking out their README.md file provides a treasure trove of information on the available features and the best way to utilise them achieving expected results.

Configure environment variables for authentication.

TWITTER_USERNAME=    # Account username
TWITTER_PASSWORD=    # Account password
TWITTER_EMAIL=       # Account email
PROXY_URL=           # HTTP(s) proxy for requests (necessary for browsers)

# Twitter API v2 credentials for tweet and poll functionality
TWITTER_API_KEY=               # Twitter API Key
TWITTER_API_SECRET_KEY=        # Twitter API Secret Key
TWITTER_ACCESS_TOKEN=          # Access Token for Twitter API v2
TWITTER_ACCESS_TOKEN_SECRET=   # Access Token Secret for Twitter API v2
Enter fullscreen mode Exit fullscreen mode

For read-only operations, just the TWITTER_USERNAME, TWITTER_PASSWORD & TWITTER_EMAIL fields are sufficient. No need to even bother with the rest of the environment variables.

To utilise this tool effectively in your scripts, bots, scraper, etc, do checkout the DOs & DONTs.

DOs

  1. Cache the cookies as mentioned in the README wherever you run the application. Try to prevent multiple login requests in your scripts. Each request will be visible on your X page.

  2. There can be rather long pauses between queries randomly, at times around fetching more than 100,000 tweets. The tool fetches 20 tweets at a time. But depending on the number of max tweets you pass as arguments, it will loop over & ensure that the max tweets are fetched. Do wait for it to restart.

  3. Conditional Searches are possible but they return a maximum of 50 tweets even if a larger number is passed as an argument. Search for words instead of trying to get multiple match searches or pattern searches.

  4. The tweets search method is an AsyncGenerator. To get the desired tweets use the following syntax.
    https://github.com/elizaOS/agent-twitter-client/issues/24

const mentions = this.scraper.searchTweets(
          '#nodejs', 20, SearchMode.Latest
        );

for await (const mention of mentions) {
  console.log(mention);
};
Enter fullscreen mode Exit fullscreen mode
  1. Pass appropriate SearchMode in your queries.

  2. READ THE F*****G MANUAL!

  3. Check the supported Media Types

// Image formats and their MIME types
const imageTypes = {
  '.jpg':  'image/jpeg',
  '.jpeg': 'image/jpeg',
  '.png':  'image/png',
  '.gif':  'image/gif'
};

// Video format
const videoTypes = {
  '.mp4': 'video/mp4'
};
Enter fullscreen mode Exit fullscreen mode
  1. Check the media upload limitations
Maximum 4 images per tweet
Only 1 video per tweet
Maximum video file size: 512MB
Supported image formats: JPG, PNG, GIF
Supported video format: MP4
Enter fullscreen mode Exit fullscreen mode
  1. Use residential IPs if & when scaling up
  2. Create burner accounts or purchase accounts if required.

DONTs

  1. Even though there is an option, DO NOT use a custom fetch function whilst creating a Scraper instance. Unless you know what you are doing and understand the magic going down under the hood, this option is best left untouched. Passing custom fetch function leads to rate-limits near instantaneously within few hundred tweets fetched.

  2. Do not use your personal account credentials unless you would be willing to burn the account if you were to get shadow-banned, rate-limited or perma-banned.

  3. Do not use same account credentials across regions, countries, etc via VPNs or proxies. It greatly increases the chances of your account getting banned.

  4. Do not utilise the credentials on Cloud providers like AWS, GCP, Azure, DigitalOcean, Hetzner, etc. The IPs of these providers are well known and significant account activity from such IPs shall probably trigger review / rate-limit / ban of your account.

Now that you are caught up on the DOs & DONTs of twitter scraping using the hottest library currently, GO & GET THOSE TWEETS to feed your RAGs, LLMS, etc.

Happy Scrapping in 2025. Feedback on what works and any hacks for the above are very welcome. Help build the knowledge base for your co-scrapers !

We are Simplr.sh | Everything web, but just Simplr!

Find our Open Source work @ https://github.com/simplr-sh

Heroku

Build apps, not infrastructure.

Dealing with servers, hardware, and infrastructure can take up your valuable time. Discover the benefits of Heroku, the PaaS of choice for developers since 2007.

Visit Site

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay