DEV Community

drake
drake

Posted on

Scrapy Ja3改造

  • 已经有第三方库了,但是更新速度较慢,不是很成熟
  • 库名:scrapy-ja3

  • 使用方式1:直接在settings.py配置文件中加入一行
# ja3伪造
DOWNLOAD_HANDLERS = {
    'http': 'scrapy_ja3.download_handler.JA3DownloadHandler',
    'https': 'scrapy_ja3.download_handler.JA3DownloadHandler'
}
Enter fullscreen mode Exit fullscreen mode
  • 使用方式2:在爬虫文件中实现(settings.py文件中不配置)

from scrapy import Request, Spider


class Ja3TestSpider(Spider):
    name = 'ja3_test'

    custom_settings = {
        'DOWNLOAD_HANDLERS': {
            'http': 'scrapy_ja3.download_handler.JA3DownloadHandler',
            'https': 'scrapy_ja3.download_handler.JA3DownloadHandler',
        }
    }

    def start_requests(self):
        start_urls = [
            'https://tls.browserleaks.com/json',
        ]
        for url in start_urls:
            yield Request(url=url, callback=self.parse_ja3)

    def parse_ja3(self, response):
        self.logger.info(response.text)
        self.logger.info("ja3_hash: " + response.json()['ja3_hash'])
Enter fullscreen mode Exit fullscreen mode

  • 安装依赖的方式:

由于scrapy-ja3不支持最新版的scrapy
前两个依赖一定要指定版本,否则一定会出现各种依赖问题

pip install Twisted==22.10.0
pip install Scrapy==2.9.0
pip install scrapy-ja3
Enter fullscreen mode Exit fullscreen mode

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more →

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more