dataflowkit.com Web Scraping platform for gophers is always looking for contributors!
We've just launched recently and looking for people who can spread an information about our framework.
Here are some facts about DFK:
Dataflow kit is fast. It takes about 4-6 seconds to fetch and then parse 50 pages.
Dataflow kit is suitable to process quite large volumes of data. Our tests show the time needed to parse appr. 4 millions of pages is about 7 hours.
Headless chrome is used for data extraction from JavaScript driven web pages;
Data scraping from paginated websites;
Automatic Processing of infinite scrolled pages.
Sсraping of websites behind login form;
Cookies and sessions handling;
Following links and detailed pages processing;
Managing delays between requests per domain;
Following robots.txt directives;
Various storage types support. The following storage types are currently available Diskv, Cassandra;
Save results as CSV, JSON, XML;
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
dataflowkit.com Web Scraping platform for gophers is always looking for contributors!
We've just launched recently and looking for people who can spread an information about our framework.
Here are some facts about DFK:
Dataflow kit is fast. It takes about 4-6 seconds to fetch and then parse 50 pages.
Dataflow kit is suitable to process quite large volumes of data. Our tests show the time needed to parse appr. 4 millions of pages is about 7 hours.
Headless chrome is used for data extraction from JavaScript driven web pages;
Data scraping from paginated websites;
Automatic Processing of infinite scrolled pages.
Sсraping of websites behind login form;
Cookies and sessions handling;
Following links and detailed pages processing;
Managing delays between requests per domain;
Following robots.txt directives;
Various storage types support. The following storage types are currently available Diskv, Cassandra;
Save results as CSV, JSON, XML;