DEV Community

drake
drake

Posted on

Scrapy Ja3改造

  • 已经有第三方库了,但是更新速度较慢,不是很成熟
  • 库名:scrapy-ja3

  • 使用方式1:直接在settings.py配置文件中加入一行
# ja3伪造
DOWNLOAD_HANDLERS = {
    'http': 'scrapy_ja3.download_handler.JA3DownloadHandler',
    'https': 'scrapy_ja3.download_handler.JA3DownloadHandler'
}
Enter fullscreen mode Exit fullscreen mode
  • 使用方式2:在爬虫文件中实现(settings.py文件中不配置)

from scrapy import Request, Spider


class Ja3TestSpider(Spider):
    name = 'ja3_test'

    custom_settings = {
        'DOWNLOAD_HANDLERS': {
            'http': 'scrapy_ja3.download_handler.JA3DownloadHandler',
            'https': 'scrapy_ja3.download_handler.JA3DownloadHandler',
        }
    }

    def start_requests(self):
        start_urls = [
            'https://tls.browserleaks.com/json',
        ]
        for url in start_urls:
            yield Request(url=url, callback=self.parse_ja3)

    def parse_ja3(self, response):
        self.logger.info(response.text)
        self.logger.info("ja3_hash: " + response.json()['ja3_hash'])
Enter fullscreen mode Exit fullscreen mode

  • 安装依赖的方式:

由于scrapy-ja3不支持最新版的scrapy
前两个依赖一定要指定版本,否则一定会出现各种依赖问题

pip install Twisted==22.10.0
pip install Scrapy==2.9.0
pip install scrapy-ja3
Enter fullscreen mode Exit fullscreen mode

Sentry image

Hands-on debugging session: instrument, monitor, and fix

Join Lazar for a hands-on session where you’ll build it, break it, debug it, and fix it. You’ll set up Sentry, track errors, use Session Replay and Tracing, and leverage some good ol’ AI to find and fix issues fast.

RSVP here →

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay