Create account

DEV Community

Retiago Drago

Posted on Mar 17, 2023

Leveraging GraphQL API Over Web Scraping: A Backend Approach

#api #graphql #backend #datascience

Outlines

Introduction 🌟

Where I Went Wrong 🤦‍♂️

Why API (Backend) is Better than Web Scraping (Frontend) 🆚💻

Code Walkthrough 💻🚶‍♂️

Conclusion 🤝✅

Introduction 🌟

In this post, we will analyze two different approaches to extracting information from a website. The first approach is web scraping from the frontend, which didn't work effectively for our specific use case. The second approach is using GraphQL API to fetch data directly from the backend. We will dive into why the API-based approach is more advantageous than web scraping in our case.

Our task was to filter and retrieve Ethereum addresses for the Blockchain category and set the Lock Creation Date to the previous day on a particular website.

Where I Went Wrong 🤦‍♂️

Initially, we attempted to scrape data from the website using frontend scripting, but it wasn't efficient nor accurate. Here's the baseline when a user does it manually:

Let's focus on the lower and upper attributes of the request. The image below shows a manual request:

This video shows an automated request using a script:

My script simulated writing instead of clicking on the datepicker. I chose this approach because it was easier to code the writing simulation than to click on the datepicker. However, these two approaches produced different values that I didn't expect. Now, take a look at the lower and upper attributes of the automated request.

The boundaries above are incorrect because they do not always point at 0 o'clock in the morning for the previous day. After discussing with a friend, we decided to try a backend approach using the GraphQL API.

Why API (Backend) is Better than Web Scraping (Frontend) 🆚💻

By inspecting the website using the Developer Tools, we discovered that the client sends GraphQL requests to the server. GraphQL is a query language and runtime for APIs, enabling clients to request exactly the data they need. This allows for more efficient and flexible data retrieval than traditional REST APIs.

In our case, the GraphQL API offers several advantages over web scraping:

Accuracy: The data we need is fetched directly from the backend, ensuring accuracy and consistency with the website's actual data.
Efficiency: Using the API reduces the need to parse and extract information from the HTML code, making the process more efficient.
Reliability: APIs are designed to be consumed programmatically, making them more reliable than scraping the ever-changing structure of a web page.
Flexibility: GraphQL allows us to request only the data we need, reducing the amount of data transfer and processing required.

Code Walkthrough 💻🚶‍♂️

In the successful Jupyter Notebook, we used the gql library to interact with the GraphQL API. Here's a high-level overview of the process:

Install and import the required libraries.
Define the GraphQL endpoint URL and set the time range for the previous 24 hours.
Create a GraphQL client using the AIOHTTPTransport and the endpoint URL.
Define the GraphQL query with placeholders for the time range and pagination limit.
Execute the query asynchronously, increasing the pagination limit until all data is retrieved.
Extract the token addresses from the query result and store the complete response in a JSON file.

This approach allowed us to fetch the Ethereum addresses directly from the backend, providing a more efficient and reliable solution compared to web scraping. The complete code is available here for exploration.

Conclusion 🤝✅

In this post, we compared two approaches to extracting information from a website: web scraping from the frontend and using the GraphQL API from the backend. For our specific use case, leveraging the GraphQL API proved to be a more advantageous solution due to its accuracy, efficiency, reliability, and flexibility.

It's important to note that the best approach may vary depending on the specific website and data requirements. In some cases, web scraping might be the only option if no API is available. However, when possible, using an API is generally a more efficient and reliable way to fetch data programmatically.

We hope this post provided valuable insights into the benefits of using APIs, particularly GraphQL, over web scraping when extracting data from websites. I will be busy filling in my current project on this repo, but there is more to come! Check out my portfolio repository here:

ranggakd / DAIly

A bunch of Data analysis +AI notebooks I'd worked on almost a daiLY basis

DAIly

A bunch of Data Analysis and Artificial Intelligence notebooks 🤖 I'd worked on almost a daiLY basis 👨‍💻

Ideas

This directory might contain notes or outlines of potential data analysis or AI projects that I'm considering working on in the future. These might be in the form of brainstorming notebooks, rough outlines powerpoint of project ideas, or notes on interesting data sources or tools that I want to explore further

back to ⬆

Tips

This directory might contain more practical information, such as code snippets or tutorials that I've found helpful in my data analysis and AI work. These could be tips on how to use specific libraries or tools, how to preprocess data for analysis, or how to approach common data analysis or AI tasks

Fantastic Docs and Where to Find Them

Reading and understanding any documentation with minimum effort on Google Colab

back…

View on GitHub

Follow me anywhere

ranggakd - Link in Bio & Creator Tools | Beacons

@ranggakd | center details summary summary Oh hello there I m a an Programmer AI Tech Writer Data Practitioner Statistics Math Addict Open Source Contributor Quantum Computing Enthusiast details center.

beacons.ai