<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Felipe Ishihara</title>
    <description>The latest articles on DEV Community by Felipe Ishihara (@felipe_ishihara).</description>
    <link>https://dev.to/felipe_ishihara</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1602806%2Fa4b77c53-eed6-4712-aa43-c1b7dea2335e.jpg</url>
      <title>DEV Community: Felipe Ishihara</title>
      <link>https://dev.to/felipe_ishihara</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/felipe_ishihara"/>
    <language>en</language>
    <item>
      <title>Web Scraping With PowerShell</title>
      <dc:creator>Felipe Ishihara</dc:creator>
      <pubDate>Tue, 11 Jun 2024 07:20:39 +0000</pubDate>
      <link>https://dev.to/felipe_ishihara/web-scraping-with-powershell-1nad</link>
      <guid>https://dev.to/felipe_ishihara/web-scraping-with-powershell-1nad</guid>
      <description>&lt;p&gt;PowerShell is a command-line shell and scripting language that you can use to automate tasks, manage systems, and perform several operations.&lt;/p&gt;

&lt;p&gt;It has been the default shell for Windows since 2016, but unless you're a system or server administrator, chances are you've rarely used it. Most people don't realize how powerful it is.&lt;/p&gt;

&lt;p&gt;But why PowerShell? Well, depends on your use case, but it's useful for quickly checking our APIs, without having to setup anything or change your project. You can also automate the execution of scripts to run them periodically.&lt;/p&gt;

&lt;p&gt;I'm using PowerShell 5.1, but the examples below run on newer versions and PowerShell Core. If you want to upgrade it in Windows, please refer to &lt;a href="https://learn.microsoft.com/en-us/powershell/scripting/install/installing-powershell-on-windows?view=powershell-7.4"&gt;Microsoft's documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you’re not a Windows user, don’t worry! PowerShell is cross-platform, and you can check how to install it on &lt;a href="https://learn.microsoft.com/en-us/powershell/scripting/install/installing-powershell-on-linux?view=powershell-7.4"&gt;Linux&lt;/a&gt; and &lt;a href="https://learn.microsoft.com/en-us/powershell/scripting/install/installing-powershell-on-macos?view=powershell-7.4"&gt;MacOS&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Basics of PowerShell
&lt;/h2&gt;

&lt;p&gt;Here's PowerShell in a nutshell:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In PowerShell, named commands are called &lt;code&gt;cmdlets&lt;/code&gt; (pronounced &lt;em&gt;command-lets&lt;/em&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cmdlets&lt;/code&gt; follow a &lt;em&gt;Verb-Noun&lt;/em&gt; convention.&lt;/li&gt;
&lt;li&gt;Variables in PowerShell always start with a &lt;code&gt;$&lt;/code&gt; like PHP.&lt;/li&gt;
&lt;li&gt;By convention, variables in PowerShell use PascalCase.&lt;/li&gt;
&lt;li&gt;Everything is an object in PowerShell.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For this tutorial we're going to use a single cmdlet: &lt;code&gt;Invoke-RestMethod&lt;/code&gt;. This cmdlet sends a request to a REST API and returns an object formatted differently depending on the response.&lt;/p&gt;

&lt;p&gt;To understand &lt;code&gt;Invoke-RestMethod&lt;/code&gt; better, let's use two other cmdlets first:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Invoke-WebRequest&lt;/li&gt;
&lt;li&gt;ConvertFrom-Json&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;Invoke-WebRequest&lt;/code&gt; is PowerShell's version of cURL. It makes a request and returns a response. And &lt;code&gt;ConvertFrom-Json&lt;/code&gt; converts a JSON string into an object (or hash table for later versions of PowerShell).&lt;/p&gt;

&lt;h2&gt;
  
  
  Using SerpApi
&lt;/h2&gt;

&lt;p&gt;Let's use the URL in SerpApi's web page where it says "Easy integration" and pass it to PowerShell using the &lt;code&gt;-Uri&lt;/code&gt; flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Invoke-WebRequest &lt;span class="nt"&gt;-Uri&lt;/span&gt; &lt;span class="s2"&gt;"https://serpapi.com/search.json?q=Coffee&amp;amp;location=Austin,+Texas,+United+States&amp;amp;hl=en&amp;amp;gl=us&amp;amp;google_domain=google.com&amp;amp;api_key=YOUR_API_KEY"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will give us a response like this (with some of its content redacted for brevity):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;StatusCode        : 200
StatusDescription : OK
Content           : &lt;span class="o"&gt;{&lt;/span&gt;...&lt;span class="o"&gt;}&lt;/span&gt;
RawContent        : HTTP/1.1 200 OK
                    Connection: keep-alive
                    CF-Ray: 883ac74bedb8f655-NRT
                    CF-Cache-Status: EXPIRED
                    Vary: Accept-Encoding
                    referrer-policy: strict-origin-when-cross-origin
                    serpapi-search-id: 664350bfe93...
Forms             : &lt;span class="o"&gt;{}&lt;/span&gt;
Headers           : &lt;span class="o"&gt;{&lt;/span&gt;...&lt;span class="o"&gt;}&lt;/span&gt;
Images            : &lt;span class="o"&gt;{}&lt;/span&gt;
InputFields       : &lt;span class="o"&gt;{}&lt;/span&gt;
Links             : &lt;span class="o"&gt;{}&lt;/span&gt;
ParsedHtml        : System.__ComObject
RawContentLength  : 48676
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The JSON we actually want is inside the &lt;code&gt;Content&lt;/code&gt; property. We could &lt;a href="https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_pipelines?view=powershell-7.4"&gt;pipe&lt;/a&gt; &lt;code&gt;Invoke-WebRequest&lt;/code&gt; output into the &lt;code&gt;Select-Object&lt;/code&gt; cmdlet to access &lt;code&gt;Content&lt;/code&gt;, by using the &lt;code&gt;-ExpandProperty&lt;/code&gt; flag with &lt;code&gt;Content&lt;/code&gt; as the property we want to expand. Since everything is an object in PowerShell, we can also access &lt;code&gt;Content&lt;/code&gt; by using dot notation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Getting Content with Select-Object&lt;/span&gt;
Invoke-WebRequest &lt;span class="nt"&gt;-Uri&lt;/span&gt; &lt;span class="s2"&gt;"https://serpapi.com/search.json?q=Coffee&amp;amp;location=Austin,+Texas,+United+States&amp;amp;hl=en&amp;amp;gl=us&amp;amp;google_domain=google.com&amp;amp;api_key=YOUR_API_KEY"&lt;/span&gt; | Select-Object &lt;span class="nt"&gt;-ExpandProperty&lt;/span&gt; Content

&lt;span class="c"&gt;# Getting Content with dot notation&lt;/span&gt;
&lt;span class="o"&gt;(&lt;/span&gt;Invoke-WebRequest &lt;span class="nt"&gt;-Uri&lt;/span&gt; &lt;span class="s2"&gt;"https://serpapi.com/search.json?q=Coffee&amp;amp;location=Austin,+Texas,+United+States&amp;amp;hl=en&amp;amp;gl=us&amp;amp;google_domain=google.com&amp;amp;api_key=YOUR_API_KEY"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;.Content
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Either way, we can now access the JSON we want:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"search_metadata"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"664350bfe93ff45eb2993ec0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Success"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"json_endpoint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://serpapi.com/searches/3bc827959d2dd083/664350bfe93ff45eb2993ec0.json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"created_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2024-05-14 11:53:35 UTC"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"processed_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2024-05-14 11:53:35 UTC"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"google_url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://www.google.com/search?q=Coffee&amp;amp;oq=Coffee&amp;amp;uule=w+CAIQICIaQXVzdGluLFRleGFzLFVuaXRlZCBTdGF0ZXM&amp;amp;hl=en&amp;amp;gl=us&amp;amp;sourceid=chrome&amp;amp;ie=UTF-8"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"raw_html_file"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://serpapi.com/searches/3bc827959d2dd083/664350bfe93ff45eb2993ec0.html"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"total_time_taken"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.16&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can then pipe this into the &lt;code&gt;ConvertFrom-Json&lt;/code&gt; cmdlet to convert the JSON string into an object we can use. To make it easier to access later, we'll assign everything to a variable. Here's how your command should look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$Json&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; Invoke-WebRequest &lt;span class="nt"&gt;-Uri&lt;/span&gt; &lt;span class="s2"&gt;"https://serpapi.com/search.json?q=Coffee&amp;amp;location=Austin,+Texas,+United+States&amp;amp;hl=en&amp;amp;gl=us&amp;amp;google_domain=google.com&amp;amp;api_key=YOUR_API_KEY"&lt;/span&gt; | Select-Object &lt;span class="nt"&gt;-ExpandProperty&lt;/span&gt; Content | ConvertFrom-Json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now let's go back to &lt;code&gt;Invoke-RestMethod&lt;/code&gt;. What it does is wrap everything we just did in a single command. Instead of running the command above, we could use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$Json&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; Invoke-RestMethod &lt;span class="nt"&gt;-Uri&lt;/span&gt; &lt;span class="s2"&gt;"https://serpapi.com/search.json?q=Coffee&amp;amp;location=Austin,+Texas,+United+States&amp;amp;hl=en&amp;amp;gl=us&amp;amp;google_domain=google.com&amp;amp;api_key=YOUR_API_KEY"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since we used a variable, there's no output this time. You can type the variable name and press &lt;code&gt;Enter&lt;/code&gt; to have its entire content printed out to the console. You can also &lt;a href="https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_redirection?view=powershell-7.4"&gt;redirect&lt;/a&gt; the output to a file in its current working directory by using the &lt;code&gt;&amp;gt;&lt;/code&gt; operator:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$Json&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; out.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can now see the JSON response inside the &lt;code&gt;out.json&lt;/code&gt; file. If you're having encoding problems, consider using the &lt;a href="https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/out-file?view=powershell-7.4"&gt;Out-File&lt;/a&gt; cmdlet instead of the &lt;code&gt;&amp;gt;&lt;/code&gt; operator. If you want to export it as a CSV instead, take a look at the &lt;a href="https://learn.microsoft.com/en-gb/powershell/module/microsoft.powershell.utility/export-csv?view=powershell-7.4"&gt;Export-CSV&lt;/a&gt; cmdlet and combine it with the &lt;code&gt;&amp;gt;&lt;/code&gt; operator.&lt;/p&gt;

&lt;p&gt;We can access keys inside this &lt;code&gt;$Json&lt;/code&gt; object by using dot notation like we did before when accessing the response &lt;code&gt;Content&lt;/code&gt; property.&lt;/p&gt;

&lt;p&gt;For example, &lt;code&gt;$Json.search_metadata&lt;/code&gt; will return all the keys and values inside &lt;code&gt;search_metadata&lt;/code&gt;, and &lt;code&gt;$Json.search_metadata.id&lt;/code&gt; will return just the value &lt;code&gt;664350bfe93ff45eb2993ec0&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For keys that have arrays as its value, you can use brackets notation to access specific elements inside the array.&lt;/p&gt;

&lt;p&gt;For example, &lt;code&gt;$Json.organic_results&lt;/code&gt; will return all 8 search results, while &lt;code&gt;$Json.organic_results[0]&lt;/code&gt; will return the first one.&lt;/p&gt;

&lt;p&gt;You can then use dot notation again to get a specific value from this specific organic result. For example, &lt;code&gt;$Json.organic_results[0].link&lt;/code&gt; will return the first organic results' URL.&lt;/p&gt;

&lt;p&gt;You can also use the snippet of code below instead of having everything inside a single line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$Uri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"https://serpapi.com/search.json"&lt;/span&gt;

&lt;span class="nv"&gt;$Parameters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; @&lt;span class="o"&gt;{&lt;/span&gt;
    q &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Coffee"&lt;/span&gt;
    location &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Austin,+Texas,+United+States"&lt;/span&gt;
    hl &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"en"&lt;/span&gt;
    gl &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us"&lt;/span&gt;
    google_domain &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"google.com"&lt;/span&gt;
    api_key &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"YOUR_API_KEY"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="nv"&gt;$Json&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; Invoke-RestMethod &lt;span class="nt"&gt;-Uri&lt;/span&gt; &lt;span class="nv"&gt;$Uri&lt;/span&gt; &lt;span class="nt"&gt;-Body&lt;/span&gt; &lt;span class="nv"&gt;$Parameters&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: If you don’t want to keep opening the terminal every time, you can also save everything in a PowerShell script file. Just open a text file, paste the snippet of code, save and give it a &lt;code&gt;.ps1&lt;/code&gt; extension. Now you can run it by double-clicking the file.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;I hope this beginners tutorial was able to showcase some of PowerShell's capabilities. It's pretty much a full-fledged programming language, so this is just a small taste of its power. You can use PowerShell to do everything something like Python can do.&lt;/p&gt;

&lt;p&gt;While this isn’t an in-depth tutorial, if you want to parse the HTML directly, you could combine &lt;code&gt;Invoke-WebRequest&lt;/code&gt; with the &lt;a href="https://github.com/EvotecIT/PSParseHTML"&gt;PSParseHTML&lt;/a&gt; module or &lt;a href="https://github.com/AngleSharp/AngleSharp"&gt;AngleSharp&lt;/a&gt; .NET libraries. With this, you can scrape data from web pages, not just the search results we provide.&lt;/p&gt;

&lt;p&gt;Feel free to access our &lt;a href="https://serpapi.com/search-api"&gt;Google Search Engine Results API&lt;/a&gt; and modify the parameters to test our API, and don't forget to &lt;a href="https://serpapi.com/users/sign_up"&gt;sign up&lt;/a&gt; for a free account to get 100 credits/month if you haven't already. That's plenty for testing and simple task automation.&lt;/p&gt;

&lt;p&gt;If you have any questions or concerns, feel free to contact our team at &lt;u&gt;&lt;a href="mailto:contact@serpapi.com"&gt;contact@serpapi.com&lt;/a&gt;&lt;/u&gt;!&lt;/p&gt;

&lt;h2&gt;
  
  
  Learn more about PowerShell
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/training/modules/introduction-to-powershell/"&gt;Microsoft Learn: Introduction to PowerShell&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-gb/powershell/"&gt;PowerShell Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://devblogs.microsoft.com/scripting/weekend-scripter-the-best-ways-to-learn-powershell/"&gt;Weekend Scripter: The Best Ways to Learn PowerShell&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/invoke-webrequest?view=powershell-7.4"&gt;Invoke-WebRequest&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/convertfrom-json?view=powershell-7.4"&gt;ConvertFrom-Json&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/powershell/module/Microsoft.PowerShell.Utility/Select-Object?view=powershell-7.4"&gt;Select-Object&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/invoke-restmethod?view=powershell-7.4"&gt;Invoke-RestMethod&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>windows</category>
      <category>scraping</category>
      <category>powershell</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
