We are the Surfsky team, and today we would like to tell you about our product and how it can be useful to you. Currently in the alpha stage, you can give it a try for free and in return, please share your feedback.
What is Surfsky?
Surfsky is a cloud-based browser. It launches in the cloud and offers an interface to connect automation libraries and frameworks, such as Puppeteer and Playwright. It also has a convenient web inspector that allows you to see what is happening on a page to help you write the necessary data extraction algorithm correctly.
But the main feature is advanced fingerprint spoofing to bypass security systems.
Why would I need such a browser?
As you probably have guessed already, for data extraction. In an ideal world each service would come with an API for data collection. However, in the real one not every service offers such an API, and those who do might limit it in terms of the data it provides, its speed, or terms of use. That’s why to bypass these restrictions, information is often collected from publicly available pages.
You need a browser to collect publicly available information automatically. This is because services that you’d like to scrape are often created as single-page applications, that is, websites partially or completely created with JS rendering. To display such a web page correctly and fully, it has to be opened in a browser.
To make collecting information from multiple services easier for you, we have created a browser that launches in the cloud at your request. Using it, you can easily extract all the necessary data from the pages that interest you.
Why not use a regular browser?
One of the most common problems encountered when web scraping is getting banned, restricted, or blocked. Services use special systems to identify bots and cases of illegal data usage or access, and consequently block page rendering: they either cover the requested page with a CAPTCHA or don’t show it at all.
In our browser we have done everything to prevent websites from thinking that you’re a bot, so that you are always able to get the necessary information. We collect digital fingerprints of real users, analyze them, and use them in our browser to spoof parameters checked by security systems.
Such checks include, but are not limited to:
- (in)consistency of the GPU data and its parameters;
- (in)consistency of system fonts;
- (in)consistency of the network connection and geolocation data;
- presence of automation tools;
Changing the browser fingerprint sometimes may not be enough for sufficient concealment, so we have added support for several proxy types: http, https, socks5, and ssh, as well as support for the OpenVPN protocol.
As an example, let us open the website https://nowsecure.nl
, which uses Cloudflare defenses, in both headless Chrome running on a server and Surfsky.
The server result:
Surfsky:
This is how easily Surfsky can bypass Cloudflare defensive measures.
How it works: some examples.
Let’s see how we can launch our browser in the cloud and connect to it using Python.
After registering you will receive an API access token that you will need to use to run queries. To launch the browser you need to run one query specifying the proxy type with which you would like to launch the browser.
const axios = require('axios')
const BROWSER_API = axios.create({
baseURL: 'api-public.surfsky.io',
timeout: 100000,
})
const { wsEndpoint } = BROWSER_API.post(
'/profiles/one_time',
{ proxy: 'http.your-favourite-proxy.com' },
{ headers: { 'X-Cloud-Api-Token': API_TOKEN } }
).then((r) => {
return { wsEndpoint: r.data.ws_url }
})
And this is how by running just one query you get a browser fully ready to work. Let’s open amazon.com and take its screenshot using the Playwright framework:
const { chromium } = require('playwright')
const browser = await chromium.connectOverCDP(wsEndpoint)
const page = await browser.newPage()
await page.goto('https://amazon.com')
await page.screenshot({ path: 'screen.png' })
await browser.close()
We have easily launched the browser, opened the page that we wanted, and got its screenshot.
Using our service you can launch the necessary amount of browser instances from the same device without any restrictions and work with them as if they were running on your system.
If you already have browser automation code, you can simply substitute the way to launch the browser with a single http request, and everything will continue working exactly as before, no additional changes necessary.
We thank you for reading! We hope that you are now interested in trying Surfsky out. We are currently in the alpha testing stage, and our team will be happy to receive any constructive feedback. You can register to take part in the alpha testing here: https://surfsky.io/.
We hope to see you onboard soon!
Top comments (1)
Hi all! I am one of the developers of Surfsky. If you have any questions about our product, you can ask them here and I will try to answer