DEV Community

Cover image for Faster Than Requests with MultiThread Web Scraper
Juan Carlos
Juan Carlos

Posted on

Faster Than Requests with MultiThread Web Scraper

  • Alternative HTTP Client, new version 0.9, API for Humans.
  • Added Multi-Thread web scraper Built-in one-liner.
  • Added Multi-Thread file downloader Built-in one-liner.
  • 1 file, 0 dependency, ~100 Lines of code, 2.7 to 3.8, Alpine & ARM.
  • GitHub Actions CI building from scratch.
  • GitHub Actions CI running Unittests from scratch.
  • Examples for web scraper and file downloader.
  • Extras for Data Science, Web Scrapping, HTTP REST JSON APIs.
  • Examples, Dockerfile, tests, FAQ, CoC, Debug helpers, JSON helpers.
  • Docs has all functions with detailed arguments and returns with types.
Library Speed Files LOC Dependency Devs Scraper
PyWGET 152.39 1 338 Wget >17 No
Requests 15.58 >20 2558 >=7 >527 No
Urllib 4.00 ??? 1200 0(std lib) ??? No
Urllib3 3.55 >40 5242 >5(SSL) >188 No
PyCurl 0.75 >15 5932 Curl,LibCurl >50 No
FTR 0.45 1 99 0 1 Yes, 2

Hello World

requests.get("http://httpbin.org/get")
Enter fullscreen mode Exit fullscreen mode
  • GET, POST, PATCH, PUT, DELETE and more.

Multi-Thread Web Scraper Built-in

requests.scrapper(["http://example.org", "http://example.io"], threads=True)
Enter fullscreen mode Exit fullscreen mode
  • Theres 2 ready-made Web Scrapers built-in, easy to use one-liner.

Multi-Thread File Downloader Built-in

requests.download2([("http://example.org/foo.jpg", "output.jpg"), ], threads=True)
Enter fullscreen mode Exit fullscreen mode
  • delay=1000 for 1 Second delay sleep between downloads.

Multi-Thread Bulk GET

requests.get2str2(["http://example.org", "http://example.io"], threads=True)
Enter fullscreen mode Exit fullscreen mode
  • threads=False for No Multi-Thread.

GitHub

πŸπŸ˜ΌπŸ‘

Discussion (5)

Collapse
thepeoplesbourgeois profile image
Josh • Edited on

I keep seeing the double-p in scrapper and think, "Like, a scrappy fighter?" Is it possible you intended to call the method scraper?

Collapse
kcespedes profile image
kcespedes

I'm passing headers to faster_than_requests but it gives 400 Bad request.

this is my code sample:

import faster_than_requests as requests

headers = [("Host", "api.sample.com"), ("Connection", "Keep-Alive"), ("Accept-Encoding", "gzip")]

response = requests.post(url=self.offer_url, body=self.offer_data, http_headers=headers, proxy_url=self.proxyURL)

print(response["status"])

any help would be greatly appreciated. I currently use the requests library but is too slow.

Collapse
titanhero profile image
Lex

Cool, I like the color scheme of your text editor...animus, I gonna put attention to th is librarie...animus

Collapse
rohansawant profile image
Rohan Sawant

Great Post Juan!

If you think this is interesting, checkout Async for HTTP requests, I bet it will blow your mind! πŸ¦„

Collapse
rhymes profile image
rhymes

Nice, I see you implemented it in Nim eheh!

Have you tried httpx's async support as well?