Juan Carlos

Posted on Nov 20, 2019

Faster Than Requests with MultiThread Web Scraper

#python

Library	Speed	Files	LOC	Dependency	Devs	Scraper
PyWGET	`152.39`	1	338	Wget	>17	No
Requests	`15.58`	>20	2558	>=7	>527	No
Urllib	`4.00`	???	1200	0(std lib)	???	No
Urllib3	`3.55`	>40	5242	>5(SSL)	>188	No
PyCurl	`0.75`	>15	5932	Curl,LibCurl	>50	No
FTR	`0.45`	1	99	0	1	Yes, 2

Hello World

requests.get("http://httpbin.org/get")

requests.scrapper(["http://example.org", "http://example.io"], threads=True)

requests.download2([("http://example.org/foo.jpg", "output.jpg"), ], threads=True)

requests.get2str2(["http://example.org", "http://example.io"], threads=True)

🐍😼👍

I keep seeing the double-p in scrapper and think, "Like, a scrappy fighter?" Is it possible you intended to call the method scraper?

I'm passing headers to faster_than_requests but it gives 400 Bad request.

this is my code sample:

import faster_than_requests as requests

headers = [("Host", "api.sample.com"), ("Connection", "Keep-Alive"), ("Accept-Encoding", "gzip")]

response = requests.post(url=self.offer_url, body=self.offer_data, http_headers=headers, proxy_url=self.proxyURL)

print(response["status"])

any help would be greatly appreciated. I currently use the requests library but is too slow.

Cool, I like the color scheme of your text editor...animus, I gonna put attention to th is librarie...animus

Great Post Juan!

If you think this is interesting, checkout Async for HTTP requests, I bet it will blow your mind! 🦄

Nice, I see you implemented it in Nim eheh!

Have you tried httpx's async support as well?