DEV Community

Cover image for Web Scraping With PowerShell
Felipe Ishihara
Felipe Ishihara

Posted on

Web Scraping With PowerShell

PowerShell is a command-line shell and scripting language that you can use to automate tasks, manage systems, and perform several operations.

It has been the default shell for Windows since 2016, but unless you're a system or server administrator, chances are you've rarely used it. Most people don't realize how powerful it is.

But why PowerShell? Well, depends on your use case, but it's useful for quickly checking our APIs, without having to setup anything or change your project. You can also automate the execution of scripts to run them periodically.

I'm using PowerShell 5.1, but the examples below run on newer versions and PowerShell Core. If you want to upgrade it in Windows, please refer to Microsoft's documentation.

If youโ€™re not a Windows user, donโ€™t worry! PowerShell is cross-platform, and you can check how to install it on Linux and MacOS.

The Basics of PowerShell

Here's PowerShell in a nutshell:

  • In PowerShell, named commands are called cmdlets (pronounced command-lets).
  • cmdlets follow a Verb-Noun convention.
  • Variables in PowerShell always start with a $ like PHP.
  • By convention, variables in PowerShell use PascalCase.
  • Everything is an object in PowerShell.

For this tutorial we're going to use a single cmdlet: Invoke-RestMethod. This cmdlet sends a request to a REST API and returns an object formatted differently depending on the response.

To understand Invoke-RestMethod better, let's use two other cmdlets first:

  • Invoke-WebRequest
  • ConvertFrom-Json

Invoke-WebRequest is PowerShell's version of cURL. It makes a request and returns a response. And ConvertFrom-Json converts a JSON string into an object (or hash table for later versions of PowerShell).

Using SerpApi

Let's use the URL in SerpApi's web page where it says "Easy integration" and pass it to PowerShell using the -Uri flag:

Invoke-WebRequest -Uri "https://serpapi.com/search.json?q=Coffee&location=Austin,+Texas,+United+States&hl=en&gl=us&google_domain=google.com&api_key=YOUR_API_KEY"
Enter fullscreen mode Exit fullscreen mode

This will give us a response like this (with some of its content redacted for brevity):

StatusCode        : 200
StatusDescription : OK
Content           : {...}
RawContent        : HTTP/1.1 200 OK
                    Connection: keep-alive
                    CF-Ray: 883ac74bedb8f655-NRT
                    CF-Cache-Status: EXPIRED
                    Vary: Accept-Encoding
                    referrer-policy: strict-origin-when-cross-origin
                    serpapi-search-id: 664350bfe93...
Forms             : {}
Headers           : {...}
Images            : {}
InputFields       : {}
Links             : {}
ParsedHtml        : System.__ComObject
RawContentLength  : 48676
Enter fullscreen mode Exit fullscreen mode

The JSON we actually want is inside the Content property. We could pipe Invoke-WebRequest output into the Select-Object cmdlet to access Content, by using the -ExpandProperty flag with Content as the property we want to expand. Since everything is an object in PowerShell, we can also access Content by using dot notation:

# Getting Content with Select-Object
Invoke-WebRequest -Uri "https://serpapi.com/search.json?q=Coffee&location=Austin,+Texas,+United+States&hl=en&gl=us&google_domain=google.com&api_key=YOUR_API_KEY" | Select-Object -ExpandProperty Content

# Getting Content with dot notation
(Invoke-WebRequest -Uri "https://serpapi.com/search.json?q=Coffee&location=Austin,+Texas,+United+States&hl=en&gl=us&google_domain=google.com&api_key=YOUR_API_KEY").Content
Enter fullscreen mode Exit fullscreen mode

Either way, we can now access the JSON we want:

{
  "search_metadata": {
    "id": "664350bfe93ff45eb2993ec0",
    "status": "Success",
    "json_endpoint": "https://serpapi.com/searches/3bc827959d2dd083/664350bfe93ff45eb2993ec0.json",
    "created_at": "2024-05-14 11:53:35 UTC",
    "processed_at": "2024-05-14 11:53:35 UTC",
    "google_url": "https://www.google.com/search?q=Coffee&oq=Coffee&uule=w+CAIQICIaQXVzdGluLFRleGFzLFVuaXRlZCBTdGF0ZXM&hl=en&gl=us&sourceid=chrome&ie=UTF-8",
    "raw_html_file": "https://serpapi.com/searches/3bc827959d2dd083/664350bfe93ff45eb2993ec0.html",
    "total_time_taken": 1.16
  },
  ...
}
Enter fullscreen mode Exit fullscreen mode

We can then pipe this into the ConvertFrom-Json cmdlet to convert the JSON string into an object we can use. To make it easier to access later, we'll assign everything to a variable. Here's how your command should look like:

$Json = Invoke-WebRequest -Uri "https://serpapi.com/search.json?q=Coffee&location=Austin,+Texas,+United+States&hl=en&gl=us&google_domain=google.com&api_key=YOUR_API_KEY" | Select-Object -ExpandProperty Content | ConvertFrom-Json
Enter fullscreen mode Exit fullscreen mode

Now let's go back to Invoke-RestMethod. What it does is wrap everything we just did in a single command. Instead of running the command above, we could use:

$Json = Invoke-RestMethod -Uri "https://serpapi.com/search.json?q=Coffee&location=Austin,+Texas,+United+States&hl=en&gl=us&google_domain=google.com&api_key=YOUR_API_KEY"
Enter fullscreen mode Exit fullscreen mode

Since we used a variable, there's no output this time. You can type the variable name and press Enter to have its entire content printed out to the console. You can also redirect the output to a file in its current working directory by using the > operator:

$Json > out.json
Enter fullscreen mode Exit fullscreen mode

You can now see the JSON response inside the out.json file. If you're having encoding problems, consider using the Out-File cmdlet instead of the > operator. If you want to export it as a CSV instead, take a look at the Export-CSV cmdlet and combine it with the > operator.

We can access keys inside this $Json object by using dot notation like we did before when accessing the response Content property.

For example, $Json.search_metadata will return all the keys and values inside search_metadata, and $Json.search_metadata.id will return just the value 664350bfe93ff45eb2993ec0.

For keys that have arrays as its value, you can use brackets notation to access specific elements inside the array.

For example, $Json.organic_results will return all 8 search results, while $Json.organic_results[0] will return the first one.

You can then use dot notation again to get a specific value from this specific organic result. For example, $Json.organic_results[0].link will return the first organic results' URL.

You can also use the snippet of code below instead of having everything inside a single line:

$Uri = "https://serpapi.com/search.json"

$Parameters = @{
    q = "Coffee"
    location = "Austin,+Texas,+United+States"
    hl = "en"
    gl = "us"
    google_domain = "google.com"
    api_key = "YOUR_API_KEY"
}

$Json = Invoke-RestMethod -Uri $Uri -Body $Parameters
Enter fullscreen mode Exit fullscreen mode

Note: If you donโ€™t want to keep opening the terminal every time, you can also save everything in a PowerShell script file. Just open a text file, paste the snippet of code, save and give it a .ps1 extension. Now you can run it by double-clicking the file.

Wrapping up

I hope this beginners tutorial was able to showcase some of PowerShell's capabilities. It's pretty much a full-fledged programming language, so this is just a small taste of its power. You can use PowerShell to do everything something like Python can do.

While this isnโ€™t an in-depth tutorial, if you want to parse the HTML directly, you could combine Invoke-WebRequest with the PSParseHTML module or AngleSharp .NET libraries. With this, you can scrape data from web pages, not just the search results we provide.

Feel free to access our Google Search Engine Results API and modify the parameters to test our API, and don't forget to sign up for a free account to get 100 credits/month if you haven't already. That's plenty for testing and simple task automation.

If you have any questions or concerns, feel free to contact our team at contact@serpapi.com!

Learn more about PowerShell

Top comments (0)