DEV Community

Cover image for Web Scraping Google With Rust
Serpdog
Serpdog

Posted on • Edited on • Originally published at serpdog.io

Web Scraping Google With Rust

Web Scraping Google With Rust

Introduction

The Rust programming language was developed by Mozilla in 2010, as an alternative to C and C++. It is widely used for web and game development.

In this tutorial, we will learn web scraping Google Search Results using Rust. This language has gained little popularity, but due to its speed, and reliability, it can be used as an alternative option for other web scraping languages.

This blog post will mainly focus on how to extract data from Google Search Results and some pros and cons of this programming language.

Web Scraping Google With Rust 1

We are going to use two libraries, request and scraper, in this tutorial for scraping and parsing the raw HTML data.

By the end of this article, you will have a basic understanding of scraping Google Search Results with Rust. You can also leverage this knowledge for future web scraping projects with other programming languages.

Why Rust for Web Scraping?

Rust is a high-level powerful programming language that has given a good amount of significance to its performance and concurrency. Its asynchronous programming structure allows scraping servers to handle multiple requests simultaneously, which makes it a reliable choice for handling large amounts of data and complex web scraping tasks.

Another advantage of using Rust is that it has several powerful libraries for web scraping such as reqwest that can help developers extract and process web pages easily. Also, libraries like html5ever and scraper are blazingly fast when it comes to HTML parsing.

Overall, Rust can be an efficient choice for not only scraping Google but also for other web scrapings tasks, which is only possible due to its insane emphasis on performance and its robust system of libraries.

Scraping Google Search Results With Rust

Scraping Google Search Results With Rust is pretty easy. We will extract the first page results from Google, consisting of the title, link, and description. I must say, it will take a while for beginners to get used to Ruby syntax. But yeah, practice makes you better!

The scraping would be in two parts:

  1. Fetching the raw HTML.

  2. HTML parsing to filter and get the required data.

Set-Up:

For beginners, to install Rust on your device, you can watch these videos:

  1. How to install Rust on Windows?

  2. How to install Rust on macOS?

Requirements:

After installing Rust successfully, you can run this command to create a Rust project folder:

cargo new project
Enter fullscreen mode Exit fullscreen mode

Then, we will install two libraries that we will use in this tutorial.

  1. reqwest — It will be used for making HTTP requests at the target URL.

  2. scraper — It will be used for selecting the HTML elements and parsing the extracted HTML data.

You can add these libraries to your cargo.toml file like this:

[dependencies]
reqwest = { version = “0.11.14”, features = [“blocking”] }
scraper = “0.14.0”
Enter fullscreen mode Exit fullscreen mode

Now, these libraries are finally accessible in the src/main.rs file.

Process:

So, we are done with the setup. You can open your project file in your editor and write the below code to import both the required crates:

use reqwest::blocking::Client;
use scraper::{Html, Selector};
Enter fullscreen mode Exit fullscreen mode

Then, we will create the main function to extract the results.

fn main() -> Result<(), Box<dyn std::error::Error>> 
{ 
 let client = Client::new(); 
 let query = "rust+programming+language" 
 let url = format!(“https://www.google.com/search?q={}" , query); 
 let res = client.get(&url).send()?;
 let html = res.text()?;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation:

  1. After creating the function, we created an instance of the Client struct from the reqwest crate to make HTTP requests.

  2. Then, we create a string query and set it to “rust programming language”, which we will pass in the below-formatted string url.

  3. In the next line, we made a connection with the URL by sending an HTTP request.

  4. After that, we try to get the HTML content returned from the HTTP request.

    let fragment = Html::parse_document(&html);

Now, with the help of Html object from the scraper crate, we convert the extracted HTML into a document object model.

Then, with the help of a Selector object from the scraper crate, we will select all the matching div with the class name g.

    let selector = Selector::parse("div.g").unwrap();
Enter fullscreen mode Exit fullscreen mode

If you iterate the Google Search Page, you will get to know that every organic result is inside the div with class g.

Web Scraping Google With Rust 2

Let us now iterate over all these selected divs to get the required search data.

    for element in fragment.select(&selector) {
        let title_selector = Selector::parse("h3").unwrap();
        let title_element = element.select(&title_selector).next().unwrap();
        let title = title_element.text().collect::<Vec<_>>().join("");

        let link_selector = Selector::parse(".yuRUbf > a").unwrap();
        let link_element = element.select(&link_selector).next().unwrap();
        let link = link_element.value().attr("href").unwrap();

        let snippet_selector = Selector::parse(".VwiC3b").unwrap();
        let snippet_element = element.select(&snippet_selector).next().unwrap();
        let snippet = snippet_element.text().collect::<Vec<_>>().join("");

        println!("Title: {}", title);
        println!("Link: {}", link);
        println!("Snippet: {}", snippet);
    }
    Ok(())
}
Enter fullscreen mode Exit fullscreen mode

After creating a new Selector object for the h3 tag, we will select the first title that matches this selector from the element variable and extract the text content inside it. And then, we set the title to the text content inside our title_element.

We have followed the same process for the link and snippet, but a little difference in the link is we have extracted the href attribute of the anchor(represented by a in the code) tag.

You can find the tags for the title, link, and description under the container div.g.

Web Scraping Google With Rust 3

After running the code successfully, your results should look like this:

Title: Rust Programming Language
Link: https://www.rust-lang.org/
Snippet: Rust has great documentation, a friendly compiler with useful error messages, and top-notch tooling — an integrated package manager and build tool, smart multi- ...

Title: Rust (programming language) - Wikipedia
Link: https://en.wikipedia.org/wiki/Rust_(programming_language)
Snippet: Rust is a multi-paradigm systems programming language focused on safety, especially safe concurrency.

Title: Introduction to Rust Programming Language - GeeksforGeeks
Link: https://www.geeksforgeeks.org/introduction-to-rust-programming-language/
Snippet: Rust is a blazing fast and memory-efficient static compiled language with a rich type system and ownership model. It can be used to power ...
Enter fullscreen mode Exit fullscreen mode

So, this is the fundamental way of scraping Google Search results with Rust.

This solution is not suggested for mass scraping, as it can result in your IP block. But, if you want to consider a more hassle-free and streamlined solution, you can try this Google SERP API to scrape Google Search Results. Serpdog rotates millions of residential proxies on its end so that its users can scrape Google peacefully without encountering any blockage or CAPTCHAs.

Benefits of Using Rust

Web Scraping Google With Rust 4

  1. Rust can handle a large amount of data thanks to its robust concurrency model.

  2. Extreme level of performance by Rust makes it a popular choice for game development, system-level programming, and other intensive applications.

  3. It supports reverse compatibility, allowing it to run older code even on the latest versions of the language.

Disadvantages Of Using Rust

Web Scraping Google With Rust 5

  1. The code can be challenging to understand for beginners because of the complex syntax.

  2. Community support has not grown much, so the developer has to struggle even more to solve a small error.

Conclusion

In this tutorial, we learned to scrape Google Search Results with Rust. We also learned some benefits and disadvantages of using Rust.

Please do not hesitate to message me if I missed something. If you think we can complete your custom scraping projects, feel free to contact us.

Follow me on Twitter. Thanks for reading!

Additional Resources

  1. Web Scraping Google With Node JS

  2. Web Scraping Google With Java

  3. Web Scraping Google With Ruby

  4. Web Scraping Yelp Data

Top comments (1)

Collapse
 
crawlbase profile image
Crawlbase • Edited

Thanks alot! It's Great tutorial on web scraping Google with Rust! Appreciate the detailed explanation and step-by-step process. Rust's emphasis on performance and concurrency makes it a promising choice for such tasks.
And yes, if you're looking for hassle-free scraping solutions, do check out Crawlbase .