DEV Community

loading...
Cover image for Turn any website into an API, with no code
Bornfight

Turn any website into an API, with no code

napravicukod profile image Rudolf Jurišić ・3 min read

Why scrape in the first place

There are many reasons why you might want to extract data from a specific public website. Usually, the most common reason is because the data you want is not accessible by an API.

Use cases

  • Scrape products from your favorite webshop, add a notification mechanism, and make sure to never miss that discount again.
  • Your sales team needs a list of potential clients listed on some huge directory.
  • Scrape a real estate directory, make sure to be the first one to give the offer for that cozy condo you are looking for

Whatever the reason and the use case, scraping is an automated way of data extraction from websites.

Let's code

As a developer, your first instinct is to solve problems by coding. But as a problem solver, you should not presume your problem is unique and you should look for an existing solution to your problem.
Also, the title suggests no coding :)

Parsehub

Parsehub is a powerful web scraping GUI tool for efficient fetching and manipulating data from any webpage. It helps you create an API output for a given website. You can even sanitize your content by using regex or replace function.
So the input is a URL and the output is a structured json file.

An example

For example, your input is Bornfight careers page URL. And your output is formatted json with all data that you want to use.

Alt Text

{
  "jobs": [
    {
      "name": "Sales and Account Manager - m/f",
      "url": "https://www.bornfight.com/careers/strategic-partnerships-executive/",
      "location": "Zagreb",
      "due_date": "Open until filled",
      "type": "Full time job",
    },
    {
      "name": "iOS Developer - m/f",
      "url": "https://www.bornfight.com/careers/ios-developer/",
      "location": "Zagreb / remote",
      "due_date": "Open until filled",
      "type": "Full time job",
    },
    {
      "name": "Office Assistant (student job) - m/f",
      "url": "https://www.bornfight.com/careers/office-assistant-student-job/",
      "location": "Zagreb",
      "due_date": "Open until filled",
      "type": "Student job (part-time)",
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

How to

This is a short video for the given example. It demonstrates the basic features of the tool.

Scrape multiple pages

To add more relevant data to your API, you can instruct the tool to click on each of the job posting, "visit" that single page and add more data to your json output.

What else?

  • click through the page navigation and ajax links
  • use conditional statements
  • create flows with multiple templates
  • scroll
  • hover
  • sanitize data by string replacement and regex

How to get the data

You can download the extracted data in json/csv format, but better yet, you can access it via Parsehub API.

Alt Text

Parsehub API

You can automate the extraction execution via the API, fetch the extracted data and control multiple projects you might have in the tool.

Conclusion

Parsehub is a powerful scraping tool. It can handle complex scraping scenarios and it's great for most use cases. You should follow the guiding tutorial once you create your first project. The documentation is good and you should check it to find out more.

Parsing the Bornfight careers page is a good first exercise. However, if you're interested in joining our team, and there is no open position, you should apply to the open application :)

If you have any questions, feel free to ask in the comments.

Discussion (3)

pic
Editor guide
Collapse
aleksandarperc profile image
Collapse
renatoruk profile image
Renato Ruk

Really awesome method, did not know about it! Thanks!

Collapse
napravicukod profile image
Rudolf Jurišić Author

You're welcome Renato!