DEV Community

loading...

Discussion on: What are your programming goals for 2017?

Collapse
ben profile image
Ben Halpern Author

Fascinating. The HTML standard does look really interesting as I take a glance. I've been reading the CommonMark markdown spec myself lately. How did you first get interested in the browser project?

Collapse
tbodt profile image
tbodt

My company does a lot of web scraping, it's basically the entire business. Originally we were using Selenium and PhantomJS, but we started running into scaling issues. So now a scraping grid consists of 32 servers each with 8 cores and each costing hundreds of dollars a month. The servers are mostly at like 30% CPU usage. We have like 300k in free servers from various hosting companies so improving efficiency isn't too high priority, but something will have to be done eventually.

The obvious alternative to Selenium is to just make HTTP requests, but we have to crawl a lot of really crappy sites that use JavaScript for no apparent reason, and we want to be able to add a new site without spending a lot of time figuring out how to form spoof. So we're just making our own browser. It uses V8 to run JavaScript, which I had to write a Python C++ extension to do.

Admittedly it's not the most useful thing I could be doing. But it's hella fun.

Thread Thread
ben profile image
Ben Halpern Author

Well whether or not this specific activity is "useful", I'm sure you'll get a hell of a lot out of reading the whole HTML standard!

Forem Open with the Forem app