<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Illia Zub</title>
    <description>The latest articles on DEV Community by Illia Zub (@ilyazub).</description>
    <link>https://dev.to/ilyazub</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F135726%2Feafe5529-ccd7-4c0c-8e74-e84de060cac1.jpg</url>
      <title>DEV Community: Illia Zub</title>
      <link>https://dev.to/ilyazub</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ilyazub"/>
    <language>en</language>
    <item>
      <title>How we reverse-engineered Google Maps pagination</title>
      <dc:creator>Illia Zub</dc:creator>
      <pubDate>Tue, 05 Dec 2023 11:01:52 +0000</pubDate>
      <link>https://dev.to/serpapi/how-we-reverse-engineered-google-maps-pagination-38e9</link>
      <guid>https://dev.to/serpapi/how-we-reverse-engineered-google-maps-pagination-38e9</guid>
      <description>&lt;p&gt;In this story, you'll see the process of decoding URL parameters for pagination on Google Maps. It involved deobfuscation of Closure-compiled JavaScript, reverse-engineering of &lt;a href="https://en.wikipedia.org/wiki/Protocol_Buffers"&gt;Protobuf&lt;/a&gt; data structures, and a bit of math. We tried to decode URL parameters by ourselves, by using &lt;a href="https://github.com/marin-m/pbtk"&gt;&lt;code&gt;pbtk&lt;/code&gt;&lt;/a&gt;, and attempted to outsource this work. In the end, we succeeded after several pair programming sessions.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Google Maps pagination works
&lt;/h2&gt;

&lt;p&gt;We can get the link for the next page only by clicking on the “next page” button.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3pJlJAuO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://miro.medium.com/v2/resize:fit:510/0%2A6wua4iOqwQtsdLOL.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3pJlJAuO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://miro.medium.com/v2/resize:fit:510/0%2A6wua4iOqwQtsdLOL.png" alt="Next page button on Google Maps results" width="408" height="146"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Links look like this.&lt;/p&gt;

&lt;p&gt;Page 1&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://www.google.com/search?tbm=map&amp;amp;authuser=0&amp;amp;hl=en&amp;amp;gl=us&amp;amp;pb=!4m8!1m3!1d24182.00605141337!2d-74.0083012!3d40.7455096!3m2!1i1024!2i768!4f13.1!7i20!8i20!10b1!12m18!2m3!5m1!6e2!20e3!6m11!4b1!23b1!26i1!27i1!41i2!45b1!63m0!67b1!73m0!74i150000!89b1!10b1!16b1!19m4!2m3!1i360!2i120!4i8!20m57!2m2!1i203!2i100!3m2!2i4!5b1!6m6!1m2!1i86!2i86!1m2!1i408!2i240!7m42!1m3!1e1!2b0!3e3!1m3!1e2!2b1!3e2!1m3!1e2!2b0!3e3!1m3!1e3!2b0!3e3!1m3!1e8!2b0!3e3!1m3!1e3!2b1!3e2!1m3!1e9!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e10!2b0!3e4!2b1!4b1!9b0!22m3!1sjWLeXbmnHIXt-gTm2ouwDg%3A23!2zMWk6Mix0OjEyNjk2LGU6MSxwOmpXTGVYYm1uSElYdC1nVG0yb3V3RGc6MjM!7e81!24m40!1m12!13m6!2b1!3b1!4b1!6i1!8b1!9b1!18m4!3b1!4b1!5b1!6b1!2b1!5m5!2b1!3b1!5b1!6b1!7b1!10m1!8e3!14m1!3b1!17b1!20m2!1e3!1e6!24b1!25b1!26b1!30m1!2b1!36b1!43b1!52b1!55b1!56m2!1b1!3b1!26m4!2m3!1i80!2i92!4i8!30m28!1m6!1m2!1i0!2i0!2m2!1i458!2i768!1m6!1m2!1i974!2i0!2m2!1i1024!2i768!1m6!1m2!1i0!2i0!2m2!1i1024!2i20!1m6!1m2!1i0!2i748!2m2!1i1024!2i768!34m9!3b1!4b1!6b1!8m2!1b1!3b1!9b1!12b1!14b1!37m1!1e81!42b1!46m1!1e9!47m0!49m1!3b1!50m40!1m39!2m7!1u3!4sOpen+now!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIMHKBc!10m2!3m1!1e1!2m7!1u2!4sTop+rated!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIQHKBg!10m2!2m1!1e1!2m7!1u1!4sCheap!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIUHKBk!10m2!1m1!1e1!2m7!1u1!4sUpscale!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIYHKBo!10m2!1m1!1e2!3m1!1u2!3m1!1u1!3m1!1u3!4BIAE!59BQ2dBd0Fn&amp;amp;q=Coffee&amp;amp;tch=1&amp;amp;ech=1&amp;amp;psi=jWLeXbmnHIXt-gTm2ouwDg.1574855312836.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Page 2&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://www.google.com/search?tbm=map&amp;amp;authuser=0&amp;amp;hl=en&amp;amp;gl=us&amp;amp;pb=!4m8!1m3!1d24182.00605141337!2d-74.0083012!3d40.7455096!3m2!1i1024!2i768!4f13.1!7i20!8i20!10b1!12m18!2m3!5m1!6e2!20e3!6m11!4b1!23b1!26i1!27i1!41i2!45b1!63m0!67b1!73m0!74i150000!89b1!10b1!16b1!19m4!2m3!1i360!2i120!4i8!20m57!2m2!1i203!2i100!3m2!2i4!5b1!6m6!1m2!1i86!2i86!1m2!1i408!2i240!7m42!1m3!1e1!2b0!3e3!1m3!1e2!2b1!3e2!1m3!1e2!2b0!3e3!1m3!1e3!2b0!3e3!1m3!1e8!2b0!3e3!1m3!1e3!2b1!3e2!1m3!1e9!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e10!2b0!3e4!2b1!4b1!9b0!22m3!1sjWLeXbmnHIXt-gTm2ouwDg%3A78!2zMWk6Mix0OjEyNjk2LGU6MSxwOmpXTGVYYm1uSElYdC1nVG0yb3V3RGc6Nzg!7e81!24m40!1m12!13m6!2b1!3b1!4b1!6i1!8b1!9b1!18m4!3b1!4b1!5b1!6b1!2b1!5m5!2b1!3b1!5b1!6b1!7b1!10m1!8e3!14m1!3b1!17b1!20m2!1e3!1e6!24b1!25b1!26b1!30m1!2b1!36b1!43b1!52b1!55b1!56m2!1b1!3b1!26m4!2m3!1i80!2i92!4i8!30m28!1m6!1m2!1i0!2i0!2m2!1i458!2i768!1m6!1m2!1i974!2i0!2m2!1i1024!2i768!1m6!1m2!1i0!2i0!2m2!1i1024!2i20!1m6!1m2!1i0!2i748!2m2!1i1024!2i768!34m9!3b1!4b1!6b1!8m2!1b1!3b1!9b1!12b1!14b1!37m1!1e81!42b1!46m1!1e9!47m0!49m1!3b1!50m40!1m39!2m7!1u3!4sOpen+now!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIMHKBc!10m2!3m1!1e1!2m7!1u2!4sTop+rated!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIQHKBg!10m2!2m1!1e1!2m7!1u1!4sCheap!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIUHKBk!10m2!1m1!1e1!2m7!1u1!4sUpscale!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIYHKBo!10m2!1m1!1e2!3m1!1u2!3m1!1u1!3m1!1u3!4BIAE!59BQ2dBd0Fn&amp;amp;q=Coffee&amp;amp;tch=1&amp;amp;ech=2&amp;amp;psi=jWLeXbmnHIXt-gTm2ouwDg.1574855312836.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;They lead to a &lt;code&gt;f.txt&lt;/code&gt; file that contains the next page results.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Plan
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Find out how the &lt;code&gt;pb (protobuf)&lt;/code&gt; string is constructed.&lt;/li&gt;
&lt;li&gt;Generate the next page link by setting the required parameters.&lt;/li&gt;
&lt;li&gt;Catch and parse the data.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Decoding the URL parameters for Google Maps pagination
&lt;/h3&gt;

&lt;p&gt;Google Maps URLs contain the &lt;code&gt;pb&lt;/code&gt; parameter contains string-encoded &lt;a href="https://en.wikipedia.org/wiki/Protocol_Buffers"&gt;Protobuf&lt;/a&gt;. The format is the same as for the &lt;code&gt;data&lt;/code&gt; parameter in the browser URL on Google Maps. It contains &lt;code&gt;!&lt;/code&gt;-separated values. There are &lt;a href="https://stackoverflow.com/q/47017387/1291371"&gt;several&lt;/a&gt; &lt;a href="https://stackoverflow.com/a/34275131/1291371"&gt;answers&lt;/a&gt; on StackOverflow, &lt;a href="https://gist.github.com/jeteon/e71fa21c1feb48fe4b5eeec045229a0c"&gt;gists&lt;/a&gt; on &lt;a href="https://gist.github.com/mingalevme/04702bb7e5e361448cbe44cb7b3895d5"&gt;GitHub&lt;/a&gt;, &lt;a href="http://blog.themillhousegroup.com/2016/08/deep-diving-into-google-pb-embedded-map.html"&gt;some&lt;/a&gt; &lt;a href="https://andrewwhitby.com/2014/09/09/google-maps-new-embed-format/"&gt;blog&lt;/a&gt; &lt;a href="https://medium.com/@supun1001/how-to-generate-google-embed-links-programmatically-for-iframes-for-routes-only-d6dc225e59e8"&gt;posts&lt;/a&gt; &lt;a href="https://medium.com/@marin_m/how-i-found-a-5-000-google-maps-xss-by-fiddling-with-protobuf-963ee0d9caff"&gt;about&lt;/a&gt; decoding, and even a kinda official &lt;a href="https://github.com/protobufjs/protobuf.js/wiki/How-to-reverse-engineer-a-buffer-by-hand"&gt;guide on reverse engineering protobuf&lt;/a&gt;, but none of this touches pagination.&lt;/p&gt;

&lt;p&gt;We tried to use &lt;a href="https://github.com/marin-m/pbtk"&gt;&lt;code&gt;pbtk&lt;/code&gt;&lt;/a&gt; but it wasn't able to extract structures and crashed. Several attempts of reading pretty-printed obfuscated JavaScript didn't work.&lt;/p&gt;

&lt;p&gt;After pairing with &lt;a href="https://aciddjus.medium.com"&gt;Milos&lt;/a&gt;, we found out most of the variables in Google Maps pagination URI: &lt;code&gt;latitude&lt;/code&gt;, &lt;code&gt;longitude&lt;/code&gt;, &lt;code&gt;altitude_in_feets&lt;/code&gt;, &lt;code&gt;pagination_offset&lt;/code&gt;, &lt;code&gt;some parameter that is equal to psi but I don't know what it's meaning&lt;/code&gt;. &lt;code&gt;psi&lt;/code&gt; changes after each page reload and it's in &lt;code&gt;window.APP_OPTIONS[11]&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5Oet1MON--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://miro.medium.com/v2/resize:fit:875/0%2AU3E541ZTP9H0nx5B.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5Oet1MON--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://miro.medium.com/v2/resize:fit:875/0%2AU3E541ZTP9H0nx5B.png" alt="" width="800" height="326"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;psi parameter in the window.APP_OPTIONS on Google Maps&lt;/p&gt;

&lt;p&gt;Another moving part is a list of filters, but we don't know how to parse them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--gh-JdAaJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://miro.medium.com/v2/resize:fit:875/0%2AwGQ0fJOvKgDPIIt_.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--gh-JdAaJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://miro.medium.com/v2/resize:fit:875/0%2AwGQ0fJOvKgDPIIt_.png" alt="" width="800" height="453"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;List of filters in Google Maps URLs for pagination&lt;/p&gt;

&lt;p&gt;We understood that we can make the first request for Google Maps, extract variables and construct pagination URI. Like for our &lt;a href="https://serpapi.com/youtube-search-api"&gt;API to scrape YouTube&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;
  &lt;span class="n"&gt;long&lt;/span&gt; &lt;span class="o"&gt;||=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;73.91476977539236&lt;/span&gt;
  &lt;span class="n"&gt;lat&lt;/span&gt; &lt;span class="o"&gt;||=&lt;/span&gt; &lt;span class="mf"&gt;40.68525694561602&lt;/span&gt;
  &lt;span class="n"&gt;alt&lt;/span&gt; &lt;span class="o"&gt;||=&lt;/span&gt; &lt;span class="mf"&gt;120027.44487325678&lt;/span&gt;

  &lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;||=&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;

  &lt;span class="n"&gt;psi&lt;/span&gt; &lt;span class="o"&gt;||=&lt;/span&gt; &lt;span class="s2"&gt;"b24JYPPGOoaJrwTXlbHACw"&lt;/span&gt;

  &lt;span class="n"&gt;google_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;query_scheme_and_domain&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/search?tbm=map&amp;amp;authuser=0&amp;amp;hl=en&amp;amp;gl=ua&amp;amp;pb=!4m12!1m3!1d&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;alt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;!2d&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;!3d&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;long&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;!2m3!1f0!2f0!3f0!3m2!1i1920!2i549!4f13.1!7i20!8i&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;!10b1!12m8!1m1!18b1!2m3!5m1!6e2!20e3!10b1!16b1!19m4!2m3!1i360!2i120!4i8!20m65!2m2!1i203!2i100!3m2!2i4!5b1!6m6!1m2!1i86!2i86!1m2!1i408!2i240!7m50!1m3!1e1!2b0!3e3!1m3!1e2!2b1!3e2!1m3!1e2!2b0!3e3!1m3!1e3!2b0!3e3!1m3!1e8!2b0!3e3!1m3!1e3!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e9!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e10!2b0!3e4!2b1!4b1!9b0!22m3!1s&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;psi&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;!2s1i%3A2%2Ct%3A12696%2Ce%3A1%2Cp%3A&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;psi&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;%3A1273!7e81!24m56!1m16!13m7!2b1!3b1!4b1!6i1!8b1!9b1!20b0!18m7!3b1!4b1!5b1!6b1!9b1!13b0!14b0!2b1!5m5!2b1!3b1!5b1!6b1!7b1!10m1!8e3!14m1!3b1!17b1!20m4!1e3!1e6!1e14!1e15!24b1!25b1!26b1!29b1!30m1!2b1!36b1!43b1!52b1!54m1!1b1!55b1!56m2!1b1!3b1!65m5!3m4!1m3!1m2!1i224!2i298!89b1!26m4!2m3!1i80!2i92!4i8!30m28!1m6!1m2!1i0!2i0!2m2!1i458!2i549!1m6!1m2!1i1870!2i0!2m2!1i1920!2i549!1m6!1m2!1i0!2i0!2m2!1i1920!2i20!1m6!1m2!1i0!2i529!2m2!1i1920!2i549!31b1!34m16!2b1!3b1!4b1!6b1!8m4!1b1!3b1!4b1!6b1!9b1!12b1!14b1!20b1!23b1!25b1!26b1!37m1!1e81!42b1!46m1!1e9!47m0!49m1!3b1!50m73!1m68!2m7!1u3!4sOpen+now!5e1!9s0ahUKEwiPrMCgia3uAhUjlYsKHanUBmYQ_KkBCNUJKBg!10m2!3m1!1e1!2m7!1u2!4sTop+rated!5e1!9s0ahUKEwiPrMCgia3uAhUjlYsKHanUBmYQ_KkBCNYJKBk!10m2!2m1!1e1!2m7!1u1!4sCheap!5e1!9s0ahUKEwiPrMCgia3uAhUjlYsKHanUBmYQ_KkBCNcJKBo!10m2!1m1!1e1!2m7!1u1!4sUpscale!5e1!9s0ahUKEwiPrMCgia3uAhUjlYsKHanUBmYQ_KkBCNgJKBs!10m2!1m1!1e2!2m7!1u16!4sVisited!5e1!9s0ahUKEwiPrMCgia3uAhUjlYsKHanUBmYQ_KkBCNkJKBw!10m2!16m1!1e1!2m7!1u16!4sHaven%27t+visited!5e1!9s0ahUKEwiPrMCgia3uAhUjlYsKHanUBmYQ_KkBCNoJKB0!10m2!16m1!1e2!3m1!1u3!3m1!1u2!3m2!1u1!3e1!3m11!1u16!2m4!1m2!16m1!1e1!2sVisited!2m4!1m2!16m1!1e2!2sHaven%27t+visited!4BIAE!2e2!3m2!1b1!3b1!59BQ2dBd0Fn!65m0!69i540&amp;amp;q=&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;safe_query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;amp;tch=1&amp;amp;ech=1&amp;amp;psi=&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;psi&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.1611230833287.1"&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The quick check confirmed that the algorithm works.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;bundle &lt;span class="nb"&gt;exec &lt;/span&gt;rails runner &lt;span class="s1"&gt;'puts Search.new(engine: :google_maps, q: "coffee", lat: 36.3996184, long: -113.9511419, alt: 2124931.1267513777, offset: 20, psi: "rZkJYJuoINHmrgTP1IVI").query_randomized'&lt;/span&gt; | xargs curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-k&lt;/span&gt; &lt;span class="nt"&gt;-x&lt;/span&gt; &lt;span class="nv"&gt;$HTTP_PROXY&lt;/span&gt; &lt;span class="nt"&gt;-A&lt;/span&gt; &lt;span class="s1"&gt;'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.46'&lt;/span&gt; - &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; f.txt

&lt;span class="c"&gt;# "https://www.google.com/search?tbm=map&amp;amp;authuser=0&amp;amp;hl=en&amp;amp;gl=ua&amp;amp;pb=!4m8!1m3!1d2124931.1267513777!2d36.3996184!3d-113.9511419!3m2!1i1024!2i768!4f13.1!7i20!8i20!10b1!12m25!1m1!18b1!2m3!5m1!6e2!20e3!6m16!4b1!23b1!26i1!27i1!41i2!45b1!49b1!63m0!67b1!73m0!74i150000!75b1!89b1!105b1!109b1!110m0!10b1!16b1!19m4!2m3!1i360!2i120!4i8!20m65!2m2!1i203!2i100!3m2!2i4!5b1!6m6!1m2!1i86!2i86!1m2!1i408!2i240!7m50!1m3!1e1!2b0!3e3!1m3!1e2!2b1!3e2!1m3!1e2!2b0!3e3!1m3!1e3!2b0!3e3!1m3!1e8!2b0!3e3!1m3!1e3!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e9!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e10!2b0!3e4!2b1!4b1!9b0!22m3!1sgpYJYKSwMsH9rgT0_7nAAg!2zMWk6Mix0OjEyNjk2LGU6MSxwOmdwWUpZS1N3TXNIOXJnVDBfN25BQWc6MjM!7e81!24m55!1m15!13m7!2b1!3b1!4b1!6i1!8b1!9b1!20b0!18m6!3b1!4b1!5b1!6b1!13b0!14b0!2b1!5m5!2b1!3b1!5b1!6b1!7b1!10m1!8e3!14m1!3b1!17b1!20m4!1e3!1e6!1e14!1e15!24b1!25b1!26b1!29b1!30m1!2b1!36b1!43b1!52b1!54m1!1b1!55b1!56m2!1b1!3b1!65m5!3m4!1m3!1m2!1i224!2i298!89b1!26m4!2m3!1i80!2i92!4i8!30m28!1m6!1m2!1i0!2i0!2m2!1i458!2i768!1m6!1m2!1i974!2i0!2m2!1i1024!2i768!1m6!1m2!1i0!2i0!2m2!1i1024!2i20!1m6!1m2!1i0!2i748!2m2!1i1024!2i768!34m17!2b1!3b1!4b1!6b1!7b1!8m4!1b1!3b1!4b1!6b1!9b1!12b1!14b1!20b1!23b1!25b1!26b1!37m1!1e81!42b1!46m1!1e9!47m0!49m1!3b1!50m72!1m68!2m7!1u3!4sOpen+now!5e1!9s0ahUKEwizhPrmpK3uAhXBvosKHfR_DigQ_KkBCMgJKBY!10m2!3m1!1e1!2m7!1u2!4sTop+rated!5e1!9s0ahUKEwizhPrmpK3uAhXBvosKHfR_DigQ_KkBCMkJKBc!10m2!2m1!1e1!2m7!1u1!4sCheap!5e1!9s0ahUKEwizhPrmpK3uAhXBvosKHfR_DigQ_KkBCMoJKBg!10m2!1m1!1e1!2m7!1u1!4sUpscale!5e1!9s0ahUKEwizhPrmpK3uAhXBvosKHfR_DigQ_KkBCMsJKBk!10m2!1m1!1e2!2m7!1u16!4sVisited!5e1!9s0ahUKEwizhPrmpK3uAhXBvosKHfR_DigQ_KkBCMwJKBo!10m2!16m1!1e1!2m7!1u16!4sHaven%27t+visited!5e1!9s0ahUKEwizhPrmpK3uAhXBvosKHfR_DigQ_KkBCM0JKBs!10m2!16m1!1e2!3m11!1u16!2m4!1m2!16m1!1e1!2sVisited!2m4!1m2!16m1!1e2!2sHaven%27t+visited!3m1!1u2!3m2!1u1!3e0!3m1!1u3!4BIAE!2e2!3m1!3b1!59BQ2dBd0Fn!65m0!69i540&amp;amp;q=coffee&amp;amp;tch=1&amp;amp;ech=1&amp;amp;psi=rZkJYJuoINHmrgTP1IVI.1611241093531.1"&lt;/span&gt;

&lt;span class="c"&gt;# Hit next page in browser, copy URL and curl it.&lt;/span&gt;

&lt;span class="nv"&gt;$ &lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-k&lt;/span&gt; &lt;span class="nt"&gt;-x&lt;/span&gt; &lt;span class="nv"&gt;$HTTP_PROXY&lt;/span&gt; &lt;span class="nt"&gt;-A&lt;/span&gt; &lt;span class="s1"&gt;'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.46'&lt;/span&gt; &lt;span class="s1"&gt;'https://www.google.com/search?tbm=map&amp;amp;authuser=0&amp;amp;hl=en&amp;amp;gl=ua&amp;amp;pb=!4m8!1m3!1d2124931.1267513777!2d-113.9511419!3d36.3996184!3m2!1i1024!2i768!4f13.1!7i20!8i20!10b1!12m25!1m1!18b1!2m3!5m1!6e2!20e3!6m16!4b1!23b1!26i1!27i1!41i2!45b1!49b1!63m0!67b1!73m0!74i150000!75b1!89b1!105b1!109b1!110m0!10b1!16b1!19m4!2m3!1i360!2i120!4i8!20m65!2m2!1i203!2i100!3m2!2i4!5b1!6m6!1m2!1i86!2i86!1m2!1i408!2i240!7m50!1m3!1e1!2b0!3e3!1m3!1e2!2b1!3e2!1m3!1e2!2b0!3e3!1m3!1e3!2b0!3e3!1m3!1e8!2b0!3e3!1m3!1e3!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e9!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e10!2b0!3e4!2b1!4b1!9b0!22m3!1srZkJYJuoINHmrgTP1IVI!2zMWk6Mix0OjEyNjk2LGU6MSxwOnJaa0pZSnVvSU5IbXJnVFAxSVZJOjIz!7e81!24m55!1m15!13m7!2b1!3b1!4b1!6i1!8b1!9b1!20b0!18m6!3b1!4b1!5b1!6b1!13b0!14b0!2b1!5m5!2b1!3b1!5b1!6b1!7b1!10m1!8e3!14m1!3b1!17b1!20m4!1e3!1e6!1e14!1e15!24b1!25b1!26b1!29b1!30m1!2b1!36b1!43b1!52b1!54m1!1b1!55b1!56m2!1b1!3b1!65m5!3m4!1m3!1m2!1i224!2i298!89b1!26m4!2m3!1i80!2i92!4i8!30m28!1m6!1m2!1i0!2i0!2m2!1i458!2i768!1m6!1m2!1i974!2i0!2m2!1i1024!2i768!1m6!1m2!1i0!2i0!2m2!1i1024!2i20!1m6!1m2!1i0!2i748!2m2!1i1024!2i768!34m17!2b1!3b1!4b1!6b1!7b1!8m4!1b1!3b1!4b1!6b1!9b1!12b1!14b1!20b1!23b1!25b1!26b1!37m1!1e81!42b1!46m1!1e9!47m0!49m1!3b1!50m72!1m68!2m7!1u3!4sOpen+now!5e1!9s0ahUKEwig-8Lpp63uAhVRs4sKHU9qAQkQ_KkBCN8KKBY!10m2!3m1!1e1!2m7!1u2!4sTop+rated!5e1!9s0ahUKEwig-8Lpp63uAhVRs4sKHU9qAQkQ_KkBCOAKKBc!10m2!2m1!1e1!2m7!1u1!4sCheap!5e1!9s0ahUKEwig-8Lpp63uAhVRs4sKHU9qAQkQ_KkBCOEKKBg!10m2!1m1!1e1!2m7!1u1!4sUpscale!5e1!9s0ahUKEwig-8Lpp63uAhVRs4sKHU9qAQkQ_KkBCOIKKBk!10m2!1m1!1e2!2m7!1u16!4sVisited!5e1!9s0ahUKEwig-8Lpp63uAhVRs4sKHU9qAQkQ_KkBCOMKKBo!10m2!16m1!1e1!2m7!1u16!4sHaven%27t+visited!5e1!9s0ahUKEwig-8Lpp63uAhVRs4sKHU9qAQkQ_KkBCOQKKBs!10m2!16m1!1e2!3m11!1u16!2m4!1m2!16m1!1e1!2sVisited!2m4!1m2!16m1!1e2!2sHaven%27t+visited!3m1!1u2!3m2!1u1!3e0!3m1!1u3!4BIAE!2e2!3m1!3b1!59BQ2dBd0Fn!65m0!69i540&amp;amp;q=coffee&amp;amp;tch=1&amp;amp;ech=1&amp;amp;psi=rZkJYJuoINHmrgTP1IVI.1611241905488.1'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; correct.txt

&lt;span class="c"&gt;# Compare f.txt and correct.txt in a text editor - they almost the same.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We didn't want to add new parameters (&lt;code&gt;long&lt;/code&gt;, &lt;code&gt;lat&lt;/code&gt;, &lt;code&gt;alt&lt;/code&gt;) to our API for pagination specifically, so we tried to found ways to &lt;a href="https://groups.google.com/g/google-maps-js-api-v3/c/hDRO4oHVSeM"&gt;convert&lt;/a&gt; &lt;code&gt;alt&lt;/code&gt; &lt;a href="https://gis.stackexchange.com/a/13010/176905"&gt;from&lt;/a&gt; &lt;a href="https://docs.microsoft.com/en-us/bingmaps/articles/bing-maps-tile-system#ground-resolution-and-map-scale"&gt;zoom&lt;/a&gt;. But &lt;a href="https://gis.stackexchange.com/a/178905/176905"&gt;those&lt;/a&gt; formulas don't equal the altitude in pagination URLs that Google Maps use.&lt;/p&gt;

&lt;p&gt;Also, altitude depends on the number of pixels per inch which is different on different devices, and Google re-scales map to fit all places on the map. (This was irrelevant actually). &lt;a href="https://aciddjus.medium.com"&gt;Milos&lt;/a&gt; combined multiple formulas from the JS code in Google Maps to the formula to convert altitude to zoom for the given latitude.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;EARTH_RADIUS_IN_METERS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;6371010&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;TILE_SIZE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;256&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;SCREEN_PIXEL_HEIGHT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;768&lt;/span&gt;

&lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;zoom&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;altitude&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;latitude&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PI&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;180&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;13.1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;SCREEN_PIXEL_HEIGHT&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PI&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;TILE_SIZE&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;altitude&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;EARTH_RADIUS_IN_METERS&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cos&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PI&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;180&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;latitude&lt;/span&gt;&lt;span class="p"&gt;))))&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;LN2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nf"&gt;zoom&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;24182.00605141337&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;40.7455096&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// =&amp;gt; 14&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The last missing part is the reverse formula.&lt;/p&gt;

&lt;h3&gt;
  
  
  Convert zoom to an altitude
&lt;/h3&gt;

&lt;p&gt;We've simplified the &lt;a href="https://www.wolframalpha.com/input/?i=z+%3D+f%28a%2C+l%29+%3D+ln%281+%2F+tan%28PI+%2F+180+*+13.1+%2F+2%29+*+%28768+%2F+2%29+*+2+*+PI+%2F+%28256+*+a+%2F+%286371010+*+cos%28PI+%2F+180+*+l%29%29%29%29+%2F+ln%282%29"&gt;zoom formula in Wolfram Alpha&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--vGaSMTNg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://miro.medium.com/v2/0%2AI-ab_aG8zo6Fa0g0" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--vGaSMTNg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://miro.medium.com/v2/0%2AI-ab_aG8zo6Fa0g0" alt="" width="415" height="126"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The initial formula for zoom = f(alt, lat)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--PFfaunQA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://miro.medium.com/v2/resize:fit:369/0%2ArjI3IlKWakiZOc1P" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--PFfaunQA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://miro.medium.com/v2/resize:fit:369/0%2ArjI3IlKWakiZOc1P" alt="" width="295" height="102"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The simplified formula in Wolfram Alpha. Pretty nice, huh?&lt;/p&gt;

&lt;p&gt;After several hours of reading middle-school math books on logarithmic equations, we reversed the formula.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0EFsYJpI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://miro.medium.com/v2/resize:fit:875/0%2AQyj6rPMTiVf1rzsf" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0EFsYJpI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://miro.medium.com/v2/resize:fit:875/0%2AQyj6rPMTiVf1rzsf" alt="z = f(alt, lat); alt = f(z, lat); and my reflection on the whiteboard" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Ruby code of &lt;code&gt;alt = f(zoom, lat)&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="no"&gt;EARTH_RADIUS_IN_METERS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;6371010&lt;/span&gt;
&lt;span class="no"&gt;TILE_SIZE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;256&lt;/span&gt;
&lt;span class="no"&gt;SCREEN_PIXEL_HEIGHT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;768&lt;/span&gt;
&lt;span class="no"&gt;RADIUS_X_PIXEL_HEIGHT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;27.3611&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="no"&gt;EARTH_RADIUS_IN_METERS&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="no"&gt;SCREEN_PIXEL_HEIGHT&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;altitude&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;zoom&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;latitude&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;RADIUS_X_PIXEL_HEIGHT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="no"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cos&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;latitude&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="no"&gt;Math&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;PI&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;180&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="n"&gt;zoom&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="no"&gt;TILE_SIZE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Code to generate &lt;code&gt;pb&lt;/code&gt; parameter
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;pb&lt;/code&gt; parameter for the Google Maps pagination is a function of &lt;code&gt;ll&lt;/code&gt; from the URL and the&lt;code&gt;offset&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ll&lt;/code&gt; can contain negative and positive &lt;code&gt;latitude&lt;/code&gt;, &lt;code&gt;longitude&lt;/code&gt;, and &lt;code&gt;zoom&lt;/code&gt;. We decided to extract those parameters with regular expression.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="no"&gt;PAGINATION_PARAMETERS_REGEX&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;%r{
  &lt;/span&gt;&lt;span class="se"&gt;\A&lt;/span&gt;&lt;span class="sr"&gt;                                      # Start of string
  (?:&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;*)                                 # initial possible whitespace
  @(?&amp;lt;latitude&amp;gt;[-+]?&lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="sr"&gt;{1,2}(?:[.,]&lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="sr"&gt;+)?)  # latitude: @10.78472
  (?:&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;*,&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;*)                             # separator between latitude and longitude
  (?&amp;lt;longitude&amp;gt;[-+]?&lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="sr"&gt;{1,3}(?:[.,]&lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="sr"&gt;+)?)  # longitude: @-110
  (?:&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;*,&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;*)                             # separator between longitude and zoom
  (?&amp;lt;zoom&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="sr"&gt;{1,2}(?:[.,]&lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="sr"&gt;+)?)z           # zoom: 9.22
  &lt;/span&gt;&lt;span class="se"&gt;\z&lt;/span&gt;&lt;span class="sr"&gt;                                      # End of string
}x&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Conversion of the &lt;code&gt;ll&lt;/code&gt; URL parameter to &lt;code&gt;pb&lt;/code&gt; for the specific results offset (&lt;code&gt;start&lt;/code&gt;) on Google Maps looks like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;pagination&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ll&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;extracted_parameters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ll&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;PAGINATION_PARAMETERS_REGEX&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt; &lt;span class="k"&gt;unless&lt;/span&gt; &lt;span class="n"&gt;extracted_parameters&lt;/span&gt;

  &lt;span class="s2"&gt;"!4m8!1m3!1d"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="n"&gt;altitude&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;extracted_parameters&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:zoom&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;to_f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;extracted_parameters&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:latitude&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;to_f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="s2"&gt;"!2d"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="n"&gt;extracted_parameters&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:longitude&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="s2"&gt;"!3d"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="n"&gt;extracted_parameters&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:latitude&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="s2"&gt;"!3m2!1i1024!2i768!4f13.1!7i20!8i"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="s2"&gt;"0"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="s2"&gt;"!10b1!12m25!1m1!18b1!2m3!5m1!6e2!20e3!6m16!4b1!23b1!26i1!27i1!41i2!45b1!49b1!63m0!67b1!73m0!74i150000!75b1!89b1!105b1!109b1!110m0!10b1!16b1!19m4!2m3!1i360!2i120!4i8!20m65!2m2!1i203!2i100!3m2!2i4!5b1!6m6!1m2!1i86!2i86!1m2!1i408!2i240!7m50!1m3!1e1!2b0!3e3!1m3!1e2!2b1!3e2!1m3!1e2!2b0!3e3!1m3!1e3!2b0!3e3!1m3!1e8!2b0!3e3!1m3!1e3!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e9!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e10!2b0!3e4!2b1!4b1!9b0!22m3!1s!2z!7e81!24m55!1m15!13m7!2b1!3b1!4b1!6i1!8b1!9b1!20b0!18m6!3b1!4b1!5b1!6b1!13b0!14b0!2b1!5m5!2b1!3b1!5b1!6b1!7b1!10m1!8e3!14m1!3b1!17b1!20m4!1e3!1e6!1e14!1e15!24b1!25b1!26b1!29b1!30m1!2b1!36b1!43b1!52b1!54m1!1b1!55b1!56m2!1b1!3b1!65m5!3m4!1m3!1m2!1i224!2i298!89b1!26m4!2m3!1i80!2i92!4i8!30m28!1m6!1m2!1i0!2i0!2m2!1i458!2i768!1m6!1m2!1i974!2i0!2m2!1i1024!2i768!1m6!1m2!1i0!2i0!2m2!1i1024!2i20!1m6!1m2!1i0!2i748!2m2!1i1024!2i768!34m16!2b1!3b1!4b1!6b1!8m4!1b1!3b1!4b1!6b1!9b1!12b1!14b1!20b1!23b1!25b1!26b1!37m1!1e81!42b1!46m1!1e9!47m0!49m1!3b1!50m53!1m49!2m7!1u3!4s!5e1!9s!10m2!3m1!1e1!2m7!1u2!4s!5e1!9s!10m2!2m1!1e1!2m7!1u16!4s!5e1!9s!10m2!16m1!1e1!2m7!1u16!4s!5e1!9s!10m2!16m1!1e2!3m11!1u16!2m4!1m2!16m1!1e1!2s!2m4!1m2!16m1!1e2!2s!3m1!1u2!3m1!1u3!4BIAE!2e2!3m1!3b1!59B!65m0!69i540"&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Putting all together
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# https://regex101.com/r/nOoiJ6/2&lt;/span&gt;
&lt;span class="no"&gt;PAGINATION_PARAMETERS_REGEX&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;%r{
  &lt;/span&gt;&lt;span class="se"&gt;\A&lt;/span&gt;&lt;span class="sr"&gt;                                      # Start of string
  (?:&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;*)                                 # initial possible whitespace
  @(?&amp;lt;latitude&amp;gt;[-+]?&lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="sr"&gt;{1,2}(?:[.,]&lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="sr"&gt;+)?)  # latitude: @10.78472
  (?:&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;*,&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;*)                             # separator between latitude and longitude
  (?&amp;lt;longitude&amp;gt;[-+]?&lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="sr"&gt;{1,3}(?:[.,]&lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="sr"&gt;+)?)  # longitude: @-110
  (?:&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;*,&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;*)                             # separator between longitude and zoom
  (?&amp;lt;zoom&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="sr"&gt;{1,2}(?:[.,]&lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="sr"&gt;+)?)z           # zoom: 9.22
  &lt;/span&gt;&lt;span class="se"&gt;\z&lt;/span&gt;&lt;span class="sr"&gt;                                      # End of string
}x&lt;/span&gt;

&lt;span class="no"&gt;EARTH_RADIUS_IN_METERS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;6371010&lt;/span&gt;
&lt;span class="no"&gt;TILE_SIZE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;256&lt;/span&gt;
&lt;span class="no"&gt;SCREEN_PIXEL_HEIGHT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;768&lt;/span&gt;
&lt;span class="no"&gt;RADIUS_X_PIXEL_HEIGHT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;27.3611&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="no"&gt;EARTH_RADIUS_IN_METERS&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="no"&gt;SCREEN_PIXEL_HEIGHT&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;pagination&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ll&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;extracted_parameters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ll&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;PAGINATION_PARAMETERS_REGEX&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt; &lt;span class="k"&gt;unless&lt;/span&gt; &lt;span class="n"&gt;extracted_parameters&lt;/span&gt;

  &lt;span class="s2"&gt;"!4m8!1m3!1d"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="n"&gt;altitude&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;extracted_parameters&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:zoom&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;to_f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;extracted_parameters&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:latitude&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;to_f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="s2"&gt;"!2d"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="n"&gt;extracted_parameters&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:longitude&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="s2"&gt;"!3d"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="n"&gt;extracted_parameters&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:latitude&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="s2"&gt;"!3m2!1i1024!2i768!4f13.1!7i20!8i"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="s2"&gt;"0"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="s2"&gt;"!10b1!12m25!1m1!18b1!2m3!5m1!6e2!20e3!6m16!4b1!23b1!26i1!27i1!41i2!45b1!49b1!63m0!67b1!73m0!74i150000!75b1!89b1!105b1!109b1!110m0!10b1!16b1!19m4!2m3!1i360!2i120!4i8!20m65!2m2!1i203!2i100!3m2!2i4!5b1!6m6!1m2!1i86!2i86!1m2!1i408!2i240!7m50!1m3!1e1!2b0!3e3!1m3!1e2!2b1!3e2!1m3!1e2!2b0!3e3!1m3!1e3!2b0!3e3!1m3!1e8!2b0!3e3!1m3!1e3!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e9!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e10!2b0!3e4!2b1!4b1!9b0!22m3!1s!2z!7e81!24m55!1m15!13m7!2b1!3b1!4b1!6i1!8b1!9b1!20b0!18m6!3b1!4b1!5b1!6b1!13b0!14b0!2b1!5m5!2b1!3b1!5b1!6b1!7b1!10m1!8e3!14m1!3b1!17b1!20m4!1e3!1e6!1e14!1e15!24b1!25b1!26b1!29b1!30m1!2b1!36b1!43b1!52b1!54m1!1b1!55b1!56m2!1b1!3b1!65m5!3m4!1m3!1m2!1i224!2i298!89b1!26m4!2m3!1i80!2i92!4i8!30m28!1m6!1m2!1i0!2i0!2m2!1i458!2i768!1m6!1m2!1i974!2i0!2m2!1i1024!2i768!1m6!1m2!1i0!2i0!2m2!1i1024!2i20!1m6!1m2!1i0!2i748!2m2!1i1024!2i768!34m16!2b1!3b1!4b1!6b1!8m4!1b1!3b1!4b1!6b1!9b1!12b1!14b1!20b1!23b1!25b1!26b1!37m1!1e81!42b1!46m1!1e9!47m0!49m1!3b1!50m53!1m49!2m7!1u3!4s!5e1!9s!10m2!3m1!1e1!2m7!1u2!4s!5e1!9s!10m2!2m1!1e1!2m7!1u16!4s!5e1!9s!10m2!16m1!1e1!2m7!1u16!4s!5e1!9s!10m2!16m1!1e2!3m11!1u16!2m4!1m2!16m1!1e1!2s!2m4!1m2!16m1!1e2!2s!3m1!1u2!3m1!1u3!4BIAE!2e2!3m1!3b1!59B!65m0!69i540"&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;altitude&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;zoom&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;latitude&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="no"&gt;RADIUS_X_PIXEL_HEIGHT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="no"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cos&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;latitude&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="no"&gt;Math&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;PI&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;180&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="n"&gt;zoom&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="no"&gt;TILE_SIZE&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;to_s&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Parse the paginated data
&lt;/h3&gt;

&lt;p&gt;We already have an &lt;a href="https://serpapi.com/google-maps-api"&gt;API to scrape Google Maps&lt;/a&gt; that extracts data from the inline JavaScript in the HTML. &lt;a href="https://aciddjus.medium.com"&gt;Milos&lt;/a&gt; refactored it to support extraction from inline JS in HTML and from pagination responses. What we can say here is our parser gets the data from the deeply nested arrays and objects.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;It's possible to extract data from complex single-page applications without browser automation. For us, it's more fun to understand how the scraped website works instead of tuning timeouts in &lt;code&gt;waitFor&lt;/code&gt; function calls. It also runs faster and is simpler to maintain. If this is something that excites you, we'd love for you to &lt;a href="https://serpapi.com/careers"&gt;join us&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>serpapi</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Playwright’s getByRole is 1.5x slower than CSS selectors</title>
      <dc:creator>Illia Zub</dc:creator>
      <pubDate>Tue, 14 Nov 2023 19:05:15 +0000</pubDate>
      <link>https://dev.to/serpapi/playwrights-getbyrole-is-15x-slower-than-css-selectors-21m0</link>
      <guid>https://dev.to/serpapi/playwrights-getbyrole-is-15x-slower-than-css-selectors-21m0</guid>
      <description>&lt;p&gt;&lt;em&gt;Playwright’s getByRole uses of querySelectorAll('*') and matches elements by the accessible name.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In the realm of web automation and testing with Playwright, understanding the performance of various locator strategies is key. This article delves into the &lt;code&gt;getByRole&lt;/code&gt; locator's efficiency compared to CSS selectors, offering insights into the technical workings and practical implications of these choices.&lt;/p&gt;

&lt;p&gt;I wanted to use &lt;code&gt;Page#getByRole&lt;/code&gt;'s underlying CSS selectors in &lt;a href="https://serpapi.com"&gt;SerpApi&lt;/a&gt; code base. But the &lt;code&gt;getByRole&lt;/code&gt; locator was 1.5 times slower compared to the &lt;a href="https://serpapi.com/blog/web-scraping-with-css-selectors-using-python/#selectors_types"&gt;standard CSS selectors&lt;/a&gt;, prompting an investigation into the root cause. This performance discrepancy, likely stemming from Playwright’s use of &lt;a href="https://github.com/microsoft/playwright/blob/main/packages/playwright-core/src/server/injected/roleSelectorEngine.ts#L163-L173"&gt;&lt;code&gt;querySelectorAll('*')&lt;/code&gt;&lt;/a&gt; and &lt;a href="https://github.com/microsoft/playwright/blob/main/packages/playwright-core/src/server/injected/roleUtils.ts#L402-L711"&gt;matching elements by the accessible name&lt;/a&gt;, raises essential considerations for developers prioritizing speed in their automation scripts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deep Dive: How &lt;code&gt;getByRole&lt;/code&gt; Works
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;getByRole&lt;/code&gt; function in Playwright is more than just a method to locate web elements; it's a complex mechanism with multiple layers of interaction within the Playwright architecture. Let's demystify this process with an example. Consider this code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByRole&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;button&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Enter address manually&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;click&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command sets off a cascade of actions within Playwright:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Page#getByRole &lt;a href="https://github.com/microsoft/playwright/blob/main/packages/playwright-core/src/client/locator.ts#L175-L177"&gt;creates a &lt;code&gt;Locator&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Locator#click delegates call to &lt;a href="https://github.com/microsoft/playwright/blob/main/packages/playwright-core/src/client/locator.ts#L95-L97"&gt;&lt;code&gt;Frame#click&lt;/code&gt; passing the &lt;code&gt;Locator#_selector&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Frame#click delegates to &lt;a href="https://github.com/microsoft/playwright/blob/main/packages/playwright-core/src/client/frame.ts#L284-L286"&gt;&lt;code&gt;Channel#click&lt;/code&gt;&lt;/a&gt;. &lt;code&gt;Frame&lt;/code&gt; inherits &lt;code&gt;_channel&lt;/code&gt; from &lt;code&gt;ChannelOwner&lt;/code&gt;. &lt;a href="https://github.com/microsoft/playwright/blob/main/packages/playwright-core/src/client/channelOwner.ts#L138-L160"&gt;&lt;code&gt;ChannelOwner#_channel&lt;/code&gt;&lt;/a&gt; is a JS Proxy object based on the &lt;code&gt;EventEmitter&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Client &lt;code&gt;Frame&lt;/code&gt; dispatches an event to the server &lt;a href="https://github.com/microsoft/playwright/blob/main/packages/playwright-core/src/server/frames.ts#L1144-L1149"&gt;&lt;code&gt;Frame#click&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;FrameSelector#resolveInjectedForSelector injects the &lt;a href="https://github.com/microsoft/playwright/blob/2afd857642c26980e56f269d05df72d4d69f57e7/packages/playwright-core/src/server/dom.ts#L83-L109"&gt;&lt;code&gt;FrameExecutionContext#injectedScript&lt;/code&gt;&lt;/a&gt; script to the page, controlled by Playwright. The &lt;code&gt;InjectedScript#constructor&lt;/code&gt; adds the &lt;a href="https://github.com/microsoft/playwright/blob/2afd857642c26980e56f269d05df72d4d69f57e7/packages/playwright-core/src/server/injected/injectedScript.ts#L125"&gt;engine for &lt;code&gt;getByRole&lt;/code&gt; locator&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;createRoleEngine calls &lt;a href="https://github.com/microsoft/playwright/blob/main/packages/playwright-core/src/server/injected/roleSelectorEngine.ts#L179-L195"&gt;&lt;code&gt;parseAttributeSelector&lt;/code&gt; and &lt;code&gt;queryRole&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;queryRole calls &lt;a href="https://github.com/microsoft/playwright/blob/main/packages/playwright-core/src/server/injected/roleSelectorEngine.ts#L129-L161"&gt;&lt;code&gt;querySelectorAll('*')&lt;/code&gt; and matches the element&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Compared to the &lt;code&gt;getByRole&lt;/code&gt;, the locator with the regular CSS selector just traverses DOM and is 1.5 times faster.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance
&lt;/h2&gt;

&lt;p&gt;Comparative tests reveal that a regular locator using CSS selectors outperforms &lt;code&gt;getByRole&lt;/code&gt; by 1.5 times. Interestingly, the &lt;code&gt;$.then&lt;/code&gt; method trailed, being 2x slower in our tests.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;getByRole&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;textContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByRole&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;button&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Enter address manually&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;textContent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;timeEnd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;getByRole&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;locator&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;textContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;locator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.ektjNL&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;textContent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;timeEnd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;locator&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$.then&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;textContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.ektjNL&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;textContent&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;timeEnd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$.then&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Output:&lt;/span&gt;
&lt;span class="c1"&gt;// getByRole: 677.5ms&lt;/span&gt;
&lt;span class="c1"&gt;// locator: 497.306ms&lt;/span&gt;
&lt;span class="c1"&gt;// $.then: 1.135s&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In web automation, understanding Playwright's locators — &lt;code&gt;getByRole&lt;/code&gt; and CSS selectors — is key. &lt;code&gt;getByRole&lt;/code&gt; excels in clarity, while CSS selectors win in speed. This matters in large test suites where every second counts. Choose wisely: &lt;code&gt;getByRole&lt;/code&gt; for readability, CSS selectors for efficiency.&lt;/p&gt;

</description>
      <category>performance</category>
      <category>serpapi</category>
      <category>webscraping</category>
    </item>
    <item>
      <title>When counting lines in Ruby randomly failed SerpApi deployments</title>
      <dc:creator>Illia Zub</dc:creator>
      <pubDate>Thu, 07 Sep 2023 09:49:04 +0000</pubDate>
      <link>https://dev.to/serpapi/when-counting-lines-in-ruby-randomly-failed-serpapi-deployments-40aj</link>
      <guid>https://dev.to/serpapi/when-counting-lines-in-ruby-randomly-failed-serpapi-deployments-40aj</guid>
      <description>&lt;p&gt;Recently, we observed the occasional failing deployments only on two of our servers. The failed servers even were closing my regular SSH connection. In this story, you'll learn how we reduced memory usage and made one piece of SerpApi code 1.5x faster.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;str.count($/)&lt;/code&gt; was 1.5x faster compared to &lt;code&gt;str.lines.count&lt;/code&gt; and didn't allocate additional memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Investigation
&lt;/h2&gt;

&lt;p&gt;Only two servers faced the random failed deployments.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#&amp;lt;Thread:0x000055a170560e70 digital_ocean.rb:80 run&amp;gt; terminated with exception (report_on_exception is true):
SerpApi/vendor/bundle/ruby/2.7.0/gems/net-ssh-5.2.0/lib/net/ssh/transport/server_version.rb:54:in `readpartial': Connection reset by peer (Errno::ECONNRESET)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These servers also randomly closed my SSH connections.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;ssh server-2
Last login: Fri Feb 24 14:23:29 2023 from &lt;span class="o"&gt;{&lt;/span&gt;remote_ip&lt;span class="o"&gt;}&lt;/span&gt;
client_loop: send disconnect: Broken pipe
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;DigitalOcean server's graphs shown that memory usage was near to 95% percent on these two servers. Load average was occasionally peaking at 12 compared to 2 on other servers.&lt;/p&gt;

&lt;p&gt;We checked the Puma server flamegraph. Most of the wall time on production were the &lt;code&gt;SearchSplitter#do_one_request&lt;/code&gt; and Puma thread pool.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fserpapi.com%2Fblog%2Fcontent%2Fimages%2F2023%2F09%2Fimage-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fserpapi.com%2Fblog%2Fcontent%2Fimages%2F2023%2F09%2Fimage-1.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We used &lt;a href="https://github.com/rbspy/rbspy" rel="noopener noreferrer"&gt;&lt;code&gt;rbspy&lt;/code&gt;&lt;/a&gt; to generate the flamegraph:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;rbspy record &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="nv"&gt;$PID_OF_PUMA_PROCESS&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The flamegraph didn't reveal anything actionable and we moved to memory profiling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory profiling
&lt;/h3&gt;

&lt;p&gt;Here's the script we used:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s1"&gt;'memory_profiler'&lt;/span&gt;

&lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;User&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;email: &lt;/span&gt;&lt;span class="s2"&gt;"me"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;MemoryProfiler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;report&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;threads&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
      &lt;span class="n"&gt;threads&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;Thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;search&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Search&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;engine: &lt;/span&gt;&lt;span class="s2"&gt;"google"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;q: &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; * &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;user: &lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="no"&gt;SearchProcessor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;each&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="ss"&gt;:join&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pretty_print&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It turned out that the top allocator was line counting in the response validator.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;elsif&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:html&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:html&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;match?&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&amp;lt;div id=\"main\"&amp;gt;&amp;lt;div id=\"cnt\"&amp;gt;&amp;lt;div class=\"dodTBe\" id=\"sfcnt\"&amp;gt;&amp;lt;\/div&amp;gt;&amp;lt;H1&amp;gt;.+&amp;lt;\/H1&amp;gt;.+&amp;lt;p&amp;gt;.+&amp;lt;\/p&amp;gt;$/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why the &lt;code&gt;String#lines&lt;/code&gt; and &lt;code&gt;Array#count&lt;/code&gt; were the top allocators of the entire app?&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://ruby-doc.org/core-2.6/String.html#method-i-lines" rel="noopener noreferrer"&gt;&lt;code&gt;String#lines&lt;/code&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;The HTML file size varies from 180 KB (regular organic results) to 1.3 MB (Google Shopping with &lt;code&gt;num=100&lt;/code&gt;). The &lt;a href="https://ruby-doc.org/core-2.6/String.html#method-i-lines" rel="noopener noreferrer"&gt;&lt;code&gt;String#lines&lt;/code&gt;&lt;/a&gt; allocated an array multiple time per each search because we send &lt;a href="https://serpapi.com/ludicrous-speed" rel="noopener noreferrer"&gt;multiple requests concurrently per each search&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Thanks to &lt;a href="https://gist.github.com/guilhermesimoes/d69e547884e556c3dc95?permalink_comment_id=4502636" rel="noopener noreferrer"&gt;@guilhermesimoes's gist&lt;/a&gt;, we found that &lt;code&gt;str.each_line.count&lt;/code&gt; should be faster. But it was not optimal and we found a way to improve the solution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solution
&lt;/h2&gt;

&lt;p&gt;The solution was super simple — &lt;code&gt;str.count($/)&lt;/code&gt;. Here's the diff:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="gd"&gt;- elsif response[:html].lines.count &amp;lt; 50 &amp;amp;&amp;amp; response[:html].match?(/&amp;lt;div id=\"main\"&amp;gt;&amp;lt;div id=\"cnt\"&amp;gt;&amp;lt;div class=\"dodTBe\" id=\"sfcnt\"&amp;gt;&amp;lt;\/div&amp;gt;&amp;lt;H1&amp;gt;.+&amp;lt;\/H1&amp;gt;.+&amp;lt;p&amp;gt;.+&amp;lt;\/p&amp;gt;$/)
&lt;/span&gt;&lt;span class="gi"&gt;+ elsif response[:html].count($/) &amp;lt; 50 &amp;amp;&amp;amp; response[:html].match?(/&amp;lt;div id=\"main\"&amp;gt;&amp;lt;div id=\"cnt\"&amp;gt;&amp;lt;div class=\"dodTBe\" id=\"sfcnt\"&amp;gt;&amp;lt;\/div&amp;gt;&amp;lt;H1&amp;gt;.+&amp;lt;\/H1&amp;gt;.+&amp;lt;p&amp;gt;.+&amp;lt;\/p&amp;gt;$/)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To make sure the problem was solved, we benchmarked multiple ways of counting string lines in Ruby. We reused and adopted the gist above to exclude &lt;code&gt;File#read&lt;/code&gt; from the benchmark and added &lt;a href="https://ruby-doc.org/core-2.6/String.html#method-i-count" rel="noopener noreferrer"&gt;&lt;code&gt;String#count&lt;/code&gt;&lt;/a&gt; to the benchmark.&lt;/p&gt;

&lt;h3&gt;
  
  
  Benchmark
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://ruby-doc.org/core-2.6/String.html#method-i-count" rel="noopener noreferrer"&gt;&lt;code&gt;String#count&lt;/code&gt;&lt;/a&gt; was 1.5 times faster that other options:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Warming up --------------------------------------
                size    31.000  i/100ms
              length    75.000  i/100ms
               count    77.000  i/100ms
   each_line + count    81.000  i/100ms
           count($/)   196.000  i/100ms
Calculating -------------------------------------
                size      1.529k (±33.9%) i/s -      4.774k in   5.015361s
              length      1.434k (±38.8%) i/s -      5.025k in   5.139834s
               count      1.335k (±40.7%) i/s -      4.697k in   5.079353s
   each_line + count      1.411k (±39.5%) i/s -      5.022k in   5.110146s
           count($/)      2.231k (± 2.6%) i/s -     11.172k in   5.012323s

Comparison:
           count($/):     2230.5 i/s
                size:     1529.0 i/s - 1.46x  (± 0.00) slower
              length:     1434.2 i/s - 1.56x  (± 0.00) slower
   each_line + count:     1411.0 i/s - 1.58x  (± 0.00) slower
               count:     1334.9 i/s - 1.67x  (± 0.00) slower
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's the script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s2"&gt;"benchmark/ips"&lt;/span&gt;

&lt;span class="n"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;File&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Rails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;root&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"spec/data/google/superhero-movies-mobile-63f582a0defa1345501c6b50-2023-02-23.html"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="no"&gt;Benchmark&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ips&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"size"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"length"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"count"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"each_line + count"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;each_line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"count($/)"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="vg"&gt;$/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compare!&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Memory usage
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;count($/)&lt;/code&gt; doesn't allocate a new array compared to &lt;code&gt;lines&lt;/code&gt;/&lt;code&gt;each_line&lt;/code&gt;/etc.&lt;/p&gt;

&lt;p&gt;We used the awesome &lt;a href="https://github.com/Shopify/heap-profiler" rel="noopener noreferrer"&gt;&lt;code&gt;heap-profiler&lt;/code&gt;&lt;/a&gt; and &lt;a href="https://github.com/zombocom/heapy" rel="noopener noreferrer"&gt;&lt;code&gt;heapy&lt;/code&gt;&lt;/a&gt; Ruby gems to profile memory usage.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;lines&lt;/code&gt;/&lt;code&gt;readlines&lt;/code&gt;/&lt;code&gt;each_line&lt;/code&gt;/etc.
&lt;/h4&gt;

&lt;p&gt;&lt;code&gt;html.lines.count&lt;/code&gt; allocated the new array and referenced the original string for each iteration in the benchmark.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ bundle exec heapy read ./tmp/lines_count/allocated.heap 49 --lines=1

Analyzing Heap (Generation: 49)
-------------------------------

allocated by memory (204879705) (in bytes)
==============================
  204872652  tmp/html_length_vs_count_vs_size_bench.rb:6

object count (5406)
==============================
  5301  tmp/html_length_vs_count_vs_size_bench.rb:6

High Ref Counts
==============================
  5300  tmp/html_length_vs_count_vs_size_bench.rb:6
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We also used the predefined &lt;a href="https://ruby-doc.org/docs/ruby-doc-bundle/Manual/man-1.4/variable.html#slash" rel="noopener noreferrer"&gt;&lt;code&gt;$/&lt;/code&gt; line separator&lt;/a&gt; to allocate even less memory.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;count($/)&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;Most of these memory allocations and all of the object reference counts were gone when we used the &lt;code&gt;String#count($/)&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ bundle exec heapy read ./tmp/count_nl/allocated.heap 48 --lines=1

Analyzing Heap (Generation: 48)
-------------------------------

allocated by memory (2547465) (in bytes)
==============================
  2540804  tmp/html_length_vs_count_vs_size_bench.rb:4

object count (105)
==============================
  27  /usr/local/lib/ruby/gems/2.7.0/gems/activesupport/lib/active_support/deprecation/proxy_wrappers.rb:172

High Ref Counts
==============================
  73  /usr/local/lib/ruby/gems/2.7.0/gems/activesupport/lib/active_support/deprecation/proxy_wrappers.rb:172
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Code
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s2"&gt;"heap-profiler"&lt;/span&gt;

&lt;span class="no"&gt;HeapProfiler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Rails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;root&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'tmp/lines_count'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;File&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Rails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;root&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"spec/data/google/superhero-movies-mobile-63f582a0defa1345501c6b50-2023-02-23.html"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

  &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;times&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="c1"&gt;# 100.times { html.count($/) &amp;lt; 50 }&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Comparison process
&lt;/h4&gt;

&lt;p&gt;The heap diff comparison was a bit manual because the &lt;a href="https://github.com/zombocom/heapy#diff-2-heap-dumps" rel="noopener noreferrer"&gt;&lt;code&gt;heapy diff&lt;/code&gt;&lt;/a&gt; did't provide a &lt;em&gt;diff&lt;/em&gt;. We commented / uncommented &lt;code&gt;100.times { html.lines.count &amp;lt; 50 }&lt;/code&gt; and replaced paths in the command above.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;## Profile heap&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;bundle &lt;span class="nb"&gt;exec &lt;/span&gt;rails r tmp/html_length_vs_count_vs_size_bench.rb

&lt;span class="c"&gt;## Read summary of heap allocations&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;bundle &lt;span class="nb"&gt;exec &lt;/span&gt;heapy &lt;span class="nb"&gt;read&lt;/span&gt; ./tmp/count_nl/allocated.heap

&lt;span class="c"&gt;## Read a specific generation (48) limiting number of lines to output (1)&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;bundle &lt;span class="nb"&gt;exec &lt;/span&gt;heapy &lt;span class="nb"&gt;read&lt;/span&gt; ./tmp/count_nl/allocated.heap 48 &lt;span class="nt"&gt;--lines&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;Immediately after the fix was deployed, memory usage on the affected servers decreased and stabilized. Then memory usage fluctuated again, but deployments and SSH connections stabilized.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fserpapi.com%2Fblog%2Fcontent%2Fimages%2F2023%2F09%2Fimage-14.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fserpapi.com%2Fblog%2Fcontent%2Fimages%2F2023%2F09%2Fimage-14.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Observations and final thoughts
&lt;/h2&gt;

&lt;p&gt;The initial assumption was the Puma graceful restart. During the &lt;code&gt;phased-restart&lt;/code&gt;, Puma spawned additional workers to switch to the new code version (which was expected). It wasn't clear why SSH connections were dropping on two DigitalOcean droplets only.&lt;/p&gt;

&lt;p&gt;Doubling the amount of RAM would also solve the problem, but it wouldn't be as efficient at this point. The fix was deployed half a year ago and the issue is definitely solved.&lt;/p&gt;

&lt;h3&gt;
  
  
  Update  Sep 20th, 2023
&lt;/h3&gt;

&lt;p&gt;Thanks to &lt;a href="https://www.reddit.com/r/ruby/comments/16d4ha6/comment/k1f5mc6/?utm_source=share&amp;amp;utm_medium=web2x&amp;amp;context=3" rel="noopener noreferrer"&gt;&lt;code&gt;@Freaky&lt;/code&gt; from Reddit&lt;/a&gt; for a wonderful feedback and cooperation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;@Freaky&lt;/code&gt; tracked down a &lt;a href="https://bugs.ruby-lang.org/issues/19875" rel="noopener noreferrer"&gt;performance regression&lt;/a&gt; in &lt;code&gt;String#count&lt;/code&gt; in 3.1+&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;@Freaky&lt;/code&gt; brought his old &lt;a href="https://github.com/Freaky/fast-bytecount/" rel="noopener noreferrer"&gt;SIMD bytecount C port&lt;/a&gt; out of mothballs&lt;/li&gt;
&lt;li&gt;MRI maintainer nobu is &lt;a href="https://github.com/nobu/ruby/tree/mm_bytecount" rel="noopener noreferrer"&gt;evaluating it&lt;/a&gt; for inclusion in MRI&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Freaky&lt;/code&gt; noted the bytecount Rust crate &lt;a href="https://github.com/llogiq/bytecount/issues/85" rel="noopener noreferrer"&gt;uses an SSE4.1 intrinsic in SSE2 code&lt;/a&gt; and submitted a fix&lt;/li&gt;
&lt;li&gt;A similar &lt;a href="https://github.com/BurntSushi/aho-corasick/pull/130" rel="noopener noreferrer"&gt;fix for the aho-corasick Rust crate&lt;/a&gt; was made in response&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;I love it when you get this ripple effect from something that initially seems pretty innocuous.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;em&gt;If you enjoy working on such challenges, come work here with us: &lt;a href="https://serpapi.com/careers#open-roles" rel="noopener noreferrer"&gt;https://serpapi.com/careers#open-roles&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>performance</category>
      <category>ruby</category>
      <category>serpapi</category>
    </item>
    <item>
      <title>Reverse Engineering Poocoin API (Part 1)</title>
      <dc:creator>Illia Zub</dc:creator>
      <pubDate>Thu, 24 Aug 2023 08:25:40 +0000</pubDate>
      <link>https://dev.to/ilyazub/reverse-engineering-poocoin-api-part-1-835</link>
      <guid>https://dev.to/ilyazub/reverse-engineering-poocoin-api-part-1-835</guid>
      <description>&lt;ul&gt;
&lt;li&gt;What will be scraped&lt;/li&gt;
&lt;li&gt;Explanation&lt;/li&gt;
&lt;li&gt;Links&lt;/li&gt;
&lt;li&gt;Outro&lt;/li&gt;
&lt;/ul&gt;




&lt;h2 id="what_will_be_scraped"&gt;What will be scraped&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F78694043%2F155510155-acddea16-a29c-4588-8e8e-68435bb351da.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F78694043%2F155510155-acddea16-a29c-4588-8e8e-68435bb351da.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2 id="explanation"&gt;Explanation&lt;/h2&gt;

&lt;p&gt;We want to reverse engineer URL &lt;code&gt;data&lt;/code&gt; parameter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# full url 
https://poocoin.app/api2/candles-bsc?data=[YmLNHK6TU[Kblm4UXqKeF2FTYSSW2KsZ32XfnO6TU[KblJ1UVeONWKGSYebblF{U2S[fWqIVYeObmVzXYqCN1:VTUCQSGmOWHiWUWSCOl6VWU[OSFG2UVSCe2eqTYOKcYixZmetNFmrc4qOblG{TX25e2GYVnukcW[7Z4mKOlmrRkSTS2FyXXuHb1:IXUS[bl1zUVeSN16uVYiOb2qsVWSKfl2GXUSSb1[IUlSLcWqVRYeOblqFVnmKd1mucIWlS2[6[H2Hd1mrc3mOWG[1TXm4bWmuSoqbWYi4TXqwbV2J[{GQSWl1Uoq[OF6V[HiOSFqGUnqkNl2sWYeOWFG5XX2KNWG7VUKSWHirUWWXSV6FVlW[flVzTX5xQR%3E%3E

# data parameter
[YmLNHK6TU[Kblm4UXqKeF2FTYSSW2KsZ32XfnO6TU[KblJ1UVeONWKGSYebblF{U2S[fWqIVYeObmVzXYqCN1:VTUCQSGmOWHiWUWSCOl6VWU[OSFG2UVSCe2eqTYOKcYixZmetNFmrc4qOblG{TX25e2GYVnukcW[7Z4mKOlmrRkSTS2FyXXuHb1:IXUS[bl1zUVeSN16uVYiOb2qsVWSKfl2GXUSSb1[IUlSLcWqVRYeOblqFVnmKd1mucIWlS2[6[H2Hd1mrc3mOWG[1TXm4bWmuSoqbWYi4TXqwbV2J[{GQSWl1Uoq[OF6V[HiOSFqGUnqkNl2sWYeOWFG5XX2KNWG7VUKSWHirUWWXSV6FVlW[flVzTX5xQR%3E%3E
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;data parameter contains token id, time interval, date, and limit argument which I believe is for candles amount to display.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Open dev tools, find relevant XHR request and go to JS source: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F78694043%2F155734199-a9cd96a4-4399-445c-b289-271b7a78534a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F78694043%2F155734199-a9cd96a4-4399-445c-b289-271b7a78534a.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Format JavaScript code:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F78694043%2F155734872-d013c96a-dc74-48a5-b3b3-5fca9ae720bf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F78694043%2F155734872-d013c96a-dc74-48a5-b3b3-5fca9ae720bf.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Go up in the stack trace to see the formation of &lt;code&gt;data&lt;/code&gt; URL argument:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F282605%2F154670571-ac27d02c-81a2-4aa8-9107-808fa3b88a4e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F282605%2F154670571-ac27d02c-81a2-4aa8-9107-808fa3b88a4e.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Evaluate &lt;code&gt;F&lt;/code&gt;, &lt;code&gt;W&lt;/code&gt;, &lt;code&gt;B&lt;/code&gt; &lt;code&gt;I&lt;/code&gt; variables to understand what's happening:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F282605%2F154671155-250845f1-db4b-4c82-bc5a-ecf3043732c8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F282605%2F154671155-250845f1-db4b-4c82-bc5a-ecf3043732c8.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;B = w(476)&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F78694043%2F155934403-325148f2-4373-4fa3-84cb-104256bb5b67.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F78694043%2F155934403-325148f2-4373-4fa3-84cb-104256bb5b67.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;w()&lt;/code&gt; returns function name:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F78694043%2F155935122-5d8f907a-6142-4873-b7e9-2bb9a7794d65.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F78694043%2F155935122-5d8f907a-6142-4873-b7e9-2bb9a7794d65.png" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;map&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;70209pSjRdP&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;13mXPpTk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;32970804XTFOSy&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;33EDoYDa&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;16914prTfjD&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;charCodeAt&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;680Pigxyq&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;5194335PIAMSW&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;stringify&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;430dfxvAD&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;21ASPctV&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;&amp;amp;l=1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;4868dULKHq&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;955100otFPGh&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;substring&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;QWRkcmVzcyI6IjB4MGM1REEwZjA3OTYyZGQwMjU2YzA3OTI0ODY&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;4483450KFkYTE&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;2303RvgaFj&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;lpr&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;host&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;indexOf&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;w&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;467&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;I&lt;/code&gt; contains base64 encoded characters shifted by one charater code to the right:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;obfuscatedFunctionName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;w&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;476&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;// B = w(476),&lt;/span&gt;
&lt;span class="nx"&gt;obfuscatedFunctionName&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;obfuscatedFunctionName&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;QWRkcmVzcyI6IjB4MGM1REEwZjA3OTYyZGQwMjU2YzA3OTI0ODY&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;// =&amp;gt; true&lt;/span&gt;

&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;base64CharactersOfParamsBase64&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;btoa&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;""&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;prefixOfEncodedParams&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;obfuscatedFunctionName&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;suffixOfEncodedParams&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;// I = (I = (I = (I = (I = btoa("" + W + B + S)).split(""))&lt;/span&gt;

&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;base64CharactersPlusOneOfParamsBase64&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;base64CharactersOfParamsBase64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;charCodePlusOne&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;charCodeAt&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="c1"&gt;// e[w(488)](0) + 1; w(488) === "charCodeAt"&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;charCodePlusOne&lt;/span&gt;
&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c1"&gt;// w(482) === "map"&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromCharCode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which is effectively the same as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;base64CharactersPlusOneOfParamsBase64&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;base64CharactersOfParamsBase64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromCharCode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;charCodeAt&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example value and test&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;base64CharactersPlusOneOfParamsBase64&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;[YmLNHK6TU[Kblm4UXqKeF2FTYSSW2KsZ32XfnO6TU[KblJ1UVeONWKGSYebblF{U2S[fWqIVYeObmVzXYqCN1:VTUCQSGmOWHiWUWSGOl6VRU[OSFG2UVSCe2eqTYOKcYixZmetNFmrc4qOblG{TX25e2GYVnukcW[7Z4mKOlmrRkSOWGKFUmSsNF2rTYmOWFFzUXqofmqIVUKTSFVyUmeKOWqFRYe[WHtxUUKKOV6FSUGOflFzUnmKd1mucIWlS2[6[H2Hd1mrc3mOWG[1TXm4bWmuSoqbWYi4TXqwbV2J[4mPSHyrXlSCNV6FXUWPNmlxUWeSN11xXYiSWHe5Xn2Ge2KrWYmPfnyuXUKPcF1zUlePflKrTX5xQR&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="c1"&gt;// =&amp;gt; true&lt;/span&gt;
&lt;span class="nx"&gt;base64CharactersPlusOneOfParamsBase64&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;I&lt;/span&gt; &lt;span class="c1"&gt;// =&amp;gt; true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now compare it with string in request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://poocoin.app/api2/candles-bsc?data=[YmLNHK6TU[Kblm4UXqKeF2FTYSSW2KsZ32XfnO6TU[KblJ1UVeONWKGSYebblF{U2S[fWqIVYeObmVzXYqCN1:VTUCQSGmOWHiWUWSGOl6VRU[OSFG2UVSCe2eqTYOKcYixZmetNFmrc4qOblG{TX25e2GYVnukcW[7Z4mKOlmrRkSOWGKFUmSsNF2rTYmOWFFzUXqofmqIVUKTSFVyUmeKOWqFRYe[WHtxUUKKOV6FSUGOflFzUnmKd1mucIWlS2[6[H2Hd1mrc3mOWG[1TXm4bWmuSoqbWYi4TXqwbV2J[4mPSHyrXlSCNV6FXUWPNmlxUWeSN11xXYiSWHe5Xn2Ge2KrWYmPfnyuXUKPcF1zUlePflKrTX5xQR%3E%3E
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F282605%2F154679736-8a4bb4b1-3352-4e64-8f11-b35fff49b72d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F282605%2F154679736-8a4bb4b1-3352-4e64-8f11-b35fff49b72d.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;In the next part, we'll write the actual code that will construct the URL and retrieve the data.&lt;/p&gt;

&lt;h2 id="outro"&gt;Outro&lt;/h2&gt;

&lt;p&gt;If you have anything to share, any questions, suggestions, or something that isn't working correctly, feel free to reach out via Twitter at &lt;a href="https://twitter.com/ilyazub_" rel="noopener noreferrer"&gt;@ilyazub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;P.S. Thanks to &lt;a class="mentioned-user" href="https://dev.to/dmitryzub"&gt;@dmitryzub&lt;/a&gt; for help on this project and for co-authoring this article.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>reverseengineering</category>
      <category>webscraping</category>
      <category>programming</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Mastering SerpApi: An In-depth Features, Best Practices, and Competitive Edge</title>
      <dc:creator>Illia Zub</dc:creator>
      <pubDate>Thu, 24 Aug 2023 07:58:30 +0000</pubDate>
      <link>https://dev.to/serpapi/mastering-serpapi-an-in-depth-features-best-practices-and-competitive-edge-h59</link>
      <guid>https://dev.to/serpapi/mastering-serpapi-an-in-depth-features-best-practices-and-competitive-edge-h59</guid>
      <description>&lt;p&gt;In this episode, Ryan and Illia share answers to frequently asked questions about SerpApi usage, features, and best practices.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=x_SQ-_qRpTE&amp;amp;list=PLGt0Yb3JKV12RPWTuDGXIS6N5Ex1aYJBx&amp;amp;index=1&amp;amp;pp=gAQBiAQB"&gt;YouTube&lt;/a&gt; | &lt;a href="https://podcasters.spotify.com/pod/show/ilyazub/episodes/Mastering-SerpApi-In-depth-Features--Best-Practices--and-Competitive-Edge--SerpApiPodcast-Ep-11-e27uqd1"&gt;Spotify&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Episode Breakdown
&lt;/h2&gt;

&lt;p&gt;00:00:00 - Video Intro&lt;br&gt;
00:01:30 - An introduction to SerpApi, including its key features and benefits&lt;br&gt;
00:02:55 - Comparison of SerpApi to competitors discussing why SerpApi is a superior choice&lt;br&gt;
00:04:00 - Why use SerpApi compared to an official Google API&lt;br&gt;
00:04:55 - Clarification about data warehouse&lt;br&gt;
00:05:33 - SLA and success rate&lt;br&gt;
00:07:03 - Addressing the legality of web scraping, terms of service, and the US Legal Shield&lt;br&gt;
00:08:30 - the US Legal Shield offering&lt;br&gt;
00:09:05 - Understanding searches and credits, including how they are counted, and discussing duplicates&lt;br&gt;
00:11:00 - Search Archive API, no_cache parameter&lt;br&gt;
00:12:00 - async parameter and multi-threading&lt;br&gt;
00:14:45 - Ludicrous Speed plans for fast latency-sensitive use-cases &lt;br&gt;
00:15:45 - API Status page showcase&lt;br&gt;
00:16:40 - Hourly throughput limit&lt;br&gt;
00:18:15 - Traffic spike management&lt;br&gt;
00:19:30 - Incident response and downtime handling&lt;br&gt;
00:21:40 - unit, integration, and e2e tests&lt;br&gt;
00:22:30 - Experimental AI-based data extraction&lt;br&gt;
00:23:40 - Sponsorship and discounts &lt;br&gt;
00:25:30 - Langchain, BagyAGI and SerpApi cooperation started during Replit's hackathon about LLMs&lt;br&gt;
00:28:00 - Google SERPs and Google Ads inconsistencies&lt;br&gt;
00:30:00 - Improving search engine ranking&lt;br&gt;
00:32:00 - SerpApi's followers on Twitter SEO&lt;br&gt;
00:33:48 - Using the dashboard effectively, including how to find the API key and how to use the account API for credit tracking&lt;br&gt;
00:42:00 - Wrapping up and explaining what will be discussed in the next webinar&lt;/p&gt;




&lt;p&gt;At its core, the &lt;a href="https://www.youtube.com/playlist?list=PLGt0Yb3JKV12RPWTuDGXIS6N5Ex1aYJBx"&gt;SerpApi Podcast&lt;/a&gt; revolves around search engines results scraping, covering areas like parsing, evasion of blocking, web automation, proxies, the legality of scraping, performance enhancement, data extraction, and validation.&lt;/p&gt;

&lt;p&gt;Practical applications for SERP data scraping span a wide array, including programmatic search engine optimization (SEO) and local SEO, machine learning (ML), artificial intelligence (AI), large language models (LLMs), news monitoring, open-source intelligence (OSINT), voice assistants,&lt;br&gt;
and e-commerce competitor analysis.&lt;/p&gt;

</description>
      <category>serpapipodcast</category>
      <category>webscraping</category>
      <category>softwareengineering</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Software Engineer's Life Balance with Dmytro Vasyliev | #SerpApiPodcast, Episode 10</title>
      <dc:creator>Illia Zub</dc:creator>
      <pubDate>Sat, 24 Jun 2023 19:51:00 +0000</pubDate>
      <link>https://dev.to/serpapi/software-engineers-life-balance-with-dmytro-vasyliev-serpapipodcast-episode-10-2a13</link>
      <guid>https://dev.to/serpapi/software-engineers-life-balance-with-dmytro-vasyliev-serpapipodcast-episode-10-2a13</guid>
      <description>&lt;p&gt;In the 10th episode of &lt;a href="https://youtube.com/playlist?list=PLGt0Yb3JKV12RPWTuDGXIS6N5Ex1aYJBx"&gt;SerpApi Podcast&lt;/a&gt;, we discuss work-life balance with Dmytro Vasyliev: "How can you find a balance between your professional activity and personal life?", "How can you maintain your health and avoid burnout?", etc.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://youtu.be/YITDzpvFiU4"&gt;Watch this episode&lt;/a&gt; · &lt;a href="https://podcasters.spotify.com/pod/show/ilyazub/episodes/Software-Engineers-Life-Balance-with-Dmytro-Vasyliev--SerpApiPodcast--Episode-10-e25an6r"&gt;Listen this episode&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Chapters
&lt;/h2&gt;

&lt;p&gt;[00:00:00] Introduction&lt;br&gt;
[00:02:03] What is life balance for Dmytro?&lt;br&gt;
[00:03:00] How Dmytro organizes his day?&lt;br&gt;
[00:05:10] Was you always organized in life or you had to do something in order to feel better?&lt;br&gt;
[00:06:35] About life spheres&lt;br&gt;
[00:12:10] Burnout and how to handle it&lt;br&gt;
[00:14:51] Work-life balance: what helps to be more focused?&lt;br&gt;
[00:29:55] Work from home or from office?&lt;br&gt;
[00:33:08] Work with a coach: when, why and what a result is&lt;br&gt;
[00:38:35] The first time with a psychologist&lt;br&gt;
[00:42:20] Training with a personal trainer -  how does it works?&lt;br&gt;
[00:46:26] A few exercises to keep an eye on life balance&lt;br&gt;
[00:50:32] Organization of a to-do list&lt;br&gt;
[00:58:08] Epilogue&lt;/p&gt;




&lt;p&gt;SerpApi Podcast is about SERP (search engine results pages) and e-commerce data scraping: parsing, circumvention of blocking, web automation, proxies, legal part of scraping, performance, data extraction and validation.&lt;/p&gt;

&lt;p&gt;Use cases for SERP data scraping: programmatic search engine optimization (SEO) and local SEO, machine learning (ML), artificial intelligence (AI), large language models (LLMs), news monitoring, open-source intelligence (OSINT), voice assistants, e-commerce competitor research.&lt;/p&gt;

&lt;p&gt;The podcast is brought to you by SerpApi team: &lt;a href="https://serpapi.com"&gt;https://serpapi.com&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>From Software Engineer to Manager with Oleksii Gyturo | #SerpApiPodcast, Episode 9</title>
      <dc:creator>Illia Zub</dc:creator>
      <pubDate>Fri, 23 Jun 2023 16:46:21 +0000</pubDate>
      <link>https://dev.to/serpapi/from-software-engineer-to-manager-with-oleksii-gyturo-serpapipodcast-episode-9-1921</link>
      <guid>https://dev.to/serpapi/from-software-engineer-to-manager-with-oleksii-gyturo-serpapipodcast-episode-9-1921</guid>
      <description>&lt;p&gt;In episode we discuss the career growth in management. Oleksii, as an aspiring manager, faced a multitude of questions and challenges. Career growth depends not only on how well a manager copes with the tasks, but also on how open they are to new knowledge and experience.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://youtu.be/U7CNEdoTWRc"&gt;Watch this episode&lt;/a&gt; · &lt;a href="https://podcasters.spotify.com/pod/show/ilyazub/episodes/From-Software-Engineer-to-Manager--SerpApiPodcast--Episode-9-e23ogmv"&gt;Listen this episode&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Show notes
&lt;/h2&gt;

&lt;p&gt;[00:00:00] Introduction&lt;br&gt;
[00:02:11] Did you grow from another position or have been hired as the head of mobile?&lt;br&gt;
[00:03:29] You have started as an individual contributor but then you started working on more and more why?&lt;br&gt;
[00:06:55] What was a result of involving you in a leadership strategy?&lt;br&gt;
[00:10:32] Experience exchange (from respondent to interviewer)&lt;br&gt;
[00:20:14] How do you deal with multiple sources of information about the business?&lt;br&gt;
[00:36:04] What is Illia’s mission at SerpApi?&lt;br&gt;
[00:56:02] Long term planning or being spontaneous? How did you manage that?&lt;br&gt;
[01:06:38] Who is an engineering manager?&lt;br&gt;
[01:10:09] Why have you decided to consult businesses?&lt;br&gt;
[01:12:29] Do you treat consulting as a business?&lt;br&gt;
[01:14:34] Epilogue (few words from Oleksii)&lt;/p&gt;




&lt;p&gt;SerpApi Podcast is about SERP (search engine results pages) and e-commerce data scraping: parsing, circumvention of blocking, web automation, proxies, legal part of scraping, performance, data extraction and validation.&lt;/p&gt;

&lt;p&gt;Use cases for SERP data scraping: programmatic search engine optimization (SEO) and local SEO, machine learning (ML), artificial intelligence (AI), large language models (LLMs), news monitoring, open-source intelligence (OSINT), voice assistants, e-commerce competitor research.&lt;/p&gt;

&lt;p&gt;The podcast is brought to you by SerpApi team: &lt;a href="https://serpapi.com"&gt;https://serpapi.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>serpapipodcast</category>
    </item>
    <item>
      <title>Insights from an Engineering Director (#SerpApiPodcast, Ep. 8)</title>
      <dc:creator>Illia Zub</dc:creator>
      <pubDate>Wed, 17 May 2023 14:14:59 +0000</pubDate>
      <link>https://dev.to/serpapi/insights-from-an-engineering-director-serpapipodcast-ep-8-4p9l</link>
      <guid>https://dev.to/serpapi/insights-from-an-engineering-director-serpapipodcast-ep-8-4p9l</guid>
      <description>&lt;p&gt;The success of the company primarily depends on the effective actions and efforts of talented people. Today we would like to introduce you to Milos, one of the Engineering Directors at SerpApi.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://youtu.be/U7CNEdoTWRc"&gt;Watch this episode&lt;/a&gt; · &lt;a href="https://podcasters.spotify.com/pod/show/ilyazub/episodes/Insights-from-an-Engineering-Director-with-Milos--SerpApiPodcast--Episode-8-e23og6j"&gt;Listen this episode&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Show Notes
&lt;/h2&gt;

&lt;p&gt;[00:00] - Intro&lt;br&gt;
[01:26] - What are Milos' responsibilities at SerpApi?&lt;br&gt;
[02:19] - How did you become a manager?&lt;br&gt;
[03:25] - How do you feel about switching from individual contributor to manager?&lt;br&gt;
[05:16] - Where did you gain that knowledge, internally or externally?&lt;br&gt;
[07:41] - Do you prioritize individual tasks?&lt;br&gt;
[09:46] - How do you feel about contributing via others instead of contributing personally?&lt;br&gt;
[11:18] - What is the upper limit for you (number of subordinates)?&lt;br&gt;
[13:22] - What's a goal of a weekly meeting?&lt;br&gt;
[15:13] - How does your team interact with you?&lt;br&gt;
[16:36] - How do you understand and know what is valuable for the business?&lt;br&gt;
[18:19] - Hobbies&lt;br&gt;
[20:29] - How do you organize your time?&lt;br&gt;
[22:23] - Do you work without interruption?&lt;br&gt;
[23:12] - What do you put in the calendar besides meeting?&lt;br&gt;
[24:26] - Milos' actions when he can’t follow the plan ?&lt;br&gt;
[25:00] - What do you do when you finish your work early?&lt;br&gt;
[26:12] - Long term plans or small steps?&lt;br&gt;
[30:40] - How have you changed since you started working in SerpApi?&lt;br&gt;
[32:41] - What are your plans for the future?&lt;br&gt;
[34:20] - What is the feedback about you from the people you manage?&lt;br&gt;
[36:36] - How do you manage disagreements with the people you manage?&lt;br&gt;
[38:15] - How do you decide when to say "no"?&lt;br&gt;
[40:12] - How do you manage to deal with a large amount of work?&lt;br&gt;
[45:48] - What would you like to be changed in your work?&lt;br&gt;
[50:59] - What are your suggestions for yourself in the past?&lt;br&gt;
[53:26] - How do you deal with anxiety?&lt;br&gt;
[56:16] - Do you have a role model?&lt;br&gt;
[57:13] - Epilogue (few words from Milos)&lt;/p&gt;

&lt;h2&gt;
  
  
  Links and resources
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.youtube.com/playlist?list=PLGt0Yb3JKV12RPWTuDGXIS6N5Ex1aYJBx"&gt;#SerpApiPodcast&lt;/a&gt; is about SERP (search engine results pages) and ecommerce data scraping: parsing, circumvention of blocking, web automation, proxies, legal part of scraping, performance, data extraction and validation.&lt;/p&gt;

&lt;p&gt;Use cases for SERP data scraping: SEO, local SEO, ML models, news monitoring, OSINT, voice assistant, ecommerce competitor research.&lt;/p&gt;

&lt;p&gt;The podcast is brought to you by SerpApi team: &lt;a href="https://serpapi.com"&gt;https://serpapi.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>podcast</category>
      <category>serpapi</category>
      <category>webscraping</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>How to reverse engineer a JSON API on a single page application</title>
      <dc:creator>Illia Zub</dc:creator>
      <pubDate>Mon, 26 Dec 2022 13:13:19 +0000</pubDate>
      <link>https://dev.to/serpapi/how-to-reverse-engineer-a-json-api-on-a-single-page-application-4faj</link>
      <guid>https://dev.to/serpapi/how-to-reverse-engineer-a-json-api-on-a-single-page-application-4faj</guid>
      <description>&lt;p&gt;Websites like Bing Image Search and Walmart render pages with JavaScript and deliver page content via JSON APIs. While it's possible to &lt;a href="https://serpapi.com/blog/puppeteer-antipatterns/#using-puppeteer-when-other-tools-are-more-appropriate" rel="noopener noreferrer"&gt;scrape dynamic web pages using the browser automation&lt;/a&gt;, I prefer fetching data from the API endpoints directly. It &lt;em&gt;usually&lt;/em&gt; (not always) works faster and more reliable.&lt;/p&gt;

&lt;p&gt;I was debugging the Bing Image Search to help implementing our new &lt;a href="https://github.com/serpapi/public-roadmap/issues/399" rel="noopener noreferrer"&gt;Bing Reverse Image Search API&lt;/a&gt;. Initially, I've used &lt;a href="https://mitmproxy.org/" rel="noopener noreferrer"&gt;&lt;code&gt;mitmproxy&lt;/code&gt;&lt;/a&gt; because &lt;a href="https://developer.chrome.com/docs/devtools/search/#search-loaded-resources" rel="noopener noreferrer"&gt;&lt;code&gt;Ctrl+Shift+F&lt;/code&gt; in the browser dev tools&lt;/a&gt; haven't found the request. Then I've figured out how to filter network requests in the browser dev tools, examined the response, and made a draft data adapter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Algorithms to reverse engineer a JSON API on the SPA
&lt;/h2&gt;

&lt;p&gt;Two ways I've used to reverse engineer a JSON API used on the Bing Image Search: &lt;code&gt;mitmproxy&lt;/code&gt; and browser developer tools. I explain the devtools process because it's used more often.&lt;/p&gt;

&lt;h3&gt;
  
  
  Browser devtools
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;Ctrl+F&lt;/code&gt; in the Network tab of browser dev tools.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F282605%2F209423751-14a7a2a1-c8c9-43af-951c-4304648b02eb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F282605%2F209423751-14a7a2a1-c8c9-43af-951c-4304648b02eb.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to the &lt;code&gt;Preview&lt;/code&gt; tab of the JSON response.&lt;/li&gt;
&lt;li&gt;Expand JS object recursively (my Brave Browser doesn't search in the collapsed JSON 😕)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F282605%2F209423926-a0f2fd3d-8f58-4106-9432-2418e1233125.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F282605%2F209423926-a0f2fd3d-8f58-4106-9432-2418e1233125.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;Ctrl+F&lt;/code&gt; the target string&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F282605%2F209424004-7bbbf0b5-6aeb-43c4-9bee-a7bb05cb24b7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F282605%2F209424004-7bbbf0b5-6aeb-43c4-9bee-a7bb05cb24b7.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Copy property path&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F282605%2F209423862-bf8a78b5-45bb-4a09-8e1b-b63b176e99f2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F282605%2F209423862-bf8a78b5-45bb-4a09-8e1b-b63b176e99f2.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Navigate up and down in JS object (with arrow keys) to learn its structure and create an adapter.&lt;/li&gt;
&lt;li&gt;Copy as cURL and transform response with &lt;code&gt;jq&lt;/code&gt; to check my assumption.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;mitmproxy&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://developer.chrome.com/docs/devtools/search/#search-loaded-resources" rel="noopener noreferrer"&gt;&lt;code&gt;Ctrl+Shift+F&lt;/code&gt; in the browser dev tools&lt;/a&gt; no longer searches across all responses.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F282605%2F209424769-a5b19cbe-832d-49a5-8b5f-abe1dd2fbc66.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F282605%2F209424769-a5b19cbe-832d-49a5-8b5f-abe1dd2fbc66.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I've proxied the browser network connections via &lt;a href="https://mitmproxy.org/" rel="noopener noreferrer"&gt;&lt;code&gt;mitmproxy&lt;/code&gt;&lt;/a&gt;. Then &lt;a href="https://docs.mitmproxy.org/stable/concepts-filters/" rel="noopener noreferrer"&gt;filtered&lt;/a&gt; response bodies with &lt;code&gt;~bs "TEXT_FROM_THE_HTML_ELEMENT_I_"LOOKING_FOR"&lt;/code&gt;.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Start &lt;code&gt;mitmproxy&lt;/code&gt; with view filter
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;mitmproxy &lt;span class="nt"&gt;--view-filter&lt;/span&gt; &lt;span class="s1"&gt;'~bs "Freshsales"'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Start chromium-based browser with the target URL and the following flags and parameters&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Proxy requests via &lt;code&gt;mitmproxy&lt;/code&gt;: &lt;code&gt;--proxy-server='http://127.0.0.1:8080'&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Use incognito mode (1) with a temporary user profile (2) ignoring insecure connections (3) and certificate errors (4): &lt;code&gt;--temp-profile -incognito --user-data-dir="&lt;/code&gt;mktemp -d&lt;code&gt;" --no-first-run --ignore-certificate-errors --allow-insecure-localhost&lt;/code&gt;. (&lt;em&gt;I ignore certificate errors in a temporary browser profile to not install &lt;code&gt;mitmproxy&lt;/code&gt;'s certificates system-wide.&lt;/em&gt;)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;brave-browser &lt;span class="s1"&gt;'https://www.bing.com/images/search?view=detailV2&amp;amp;insightstoken=bcid_RLKVsIV2BwkFXg*ccid_spWwhXYH&amp;amp;form=SBIHMP&amp;amp;iss=SBIUPLOADGET&amp;amp;sbisrc=ImgPicker&amp;amp;idpbck=1&amp;amp;sbifsz=927+x+524+%c2%b7+25.15+kB+%c2%b7+png&amp;amp;sbifnm=serpapi-serpbear.png&amp;amp;thw=927&amp;amp;thh=524&amp;amp;ptime=223&amp;amp;dlen=34344&amp;amp;expw=798&amp;amp;exph=451&amp;amp;selectedindex=0&amp;amp;id=-1051855017&amp;amp;ccid=spWwhXYH&amp;amp;vt=2&amp;amp;sim=11'&lt;/span&gt; &lt;span class="nt"&gt;--proxy-server&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'http://127.0.0.1:8080'&lt;/span&gt;  &lt;span class="nt"&gt;--temp-profile&lt;/span&gt; &lt;span class="nt"&gt;-incognito&lt;/span&gt; &lt;span class="nt"&gt;--user-data-dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nb"&gt;mktemp&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--no-first-run&lt;/span&gt; &lt;span class="nt"&gt;--ignore-certificate-errors&lt;/span&gt; &lt;span class="nt"&gt;--allow-insecure-localhost&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F282605%2F209424552-595a65a4-5b89-43fd-8020-583fefb151fb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F282605%2F209424552-595a65a4-5b89-43fd-8020-583fefb151fb.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;mitmproxy&lt;/code&gt; will display the matched requests&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F282605%2F209424575-7516b36f-2883-4192-bcb9-c8e7c929e9e0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F282605%2F209424575-7516b36f-2883-4192-bcb9-c8e7c929e9e0.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;mitmproxy&lt;/code&gt; can be used to find the HTTP request with the needed data in addition browser dev tools. At some point, I'll explore &lt;a href="https://www.tcpdump.org/" rel="noopener noreferrer"&gt;&lt;code&gt;tcpdump&lt;/code&gt;&lt;/a&gt; and &lt;a href="https://www.wireshark.org/" rel="noopener noreferrer"&gt;&lt;code&gt;wireshark&lt;/code&gt;&lt;/a&gt; to reverse engineer websites for web scraping and share the learnings with you.&lt;/p&gt;

&lt;p&gt;If you have anything to share, any questions, suggestions, or something that isn't working correctly, feel free to reach out via Twitter at &lt;a href="https://twitter.com/ilyazub_" rel="noopener noreferrer"&gt;@ilyazub_&lt;/a&gt;, or &lt;a href="https://twitter.com/serp_api" rel="noopener noreferrer"&gt;@serp_api&lt;/a&gt;, or Mastodon at &lt;a href="https://fosstodon.org/@iz@fosstodon.org" rel="noopener noreferrer"&gt;@iz&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>webscraping</category>
    </item>
    <item>
      <title>Avoiding Puppeteer Antipatterns</title>
      <dc:creator>Illia Zub</dc:creator>
      <pubDate>Wed, 25 May 2022 14:05:42 +0000</pubDate>
      <link>https://dev.to/serpapi/avoiding-puppeteer-antipatterns-40cc</link>
      <guid>https://dev.to/serpapi/avoiding-puppeteer-antipatterns-40cc</guid>
      <description>&lt;p&gt;&lt;a href="https://github.com/puppeteer/puppeteer/"&gt;Puppeteer&lt;/a&gt; is a popular browser automation library for NodeJS commonly used for web scraping and end-to-end testing. Since Puppeteer offers a rich API that performs complex interactions with the browser in real-time, there's plenty of room for misunderstandings and &lt;a href="https://www.agilealliance.org/glossary/antipattern"&gt;antipatterns&lt;/a&gt; to creep into your scripts.&lt;/p&gt;

&lt;p&gt;In this post, we'll share 9 Puppeteer antipatterns I've used or seen in Puppeteer code over the past few years. While the list isn't exhaustive, we hope it'll elevate your understanding and appreciation of the tool.&lt;/p&gt;

&lt;p&gt;I've assumed that readers arrive with a working familiarity with Puppeteer and have written at least a few scripts with it, enough to have encountered some of the quirks and pitfalls of automation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Overusing &lt;code&gt;waitForTimeout&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/puppeteer/puppeteer/tree/7748730163bc1a14cbb30881809ea529844f887e#q-is-puppeteer-replacing-seleniumwebdriver"&gt;Puppeteer's readme&lt;/a&gt; says it best:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Puppeteer has event-driven architecture, which removes a lot of potential flakiness. There's no need for evil &lt;code&gt;sleep(1000)&lt;/code&gt; calls in puppeteer scripts.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The trouble is, Puppeteer has &lt;a href="https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#pagewaitfortimeoutmilliseconds"&gt;&lt;code&gt;page.waitForTimeout(milliseconds)&lt;/code&gt;&lt;/a&gt;, identical semantics to the evil &lt;code&gt;sleep(1000)&lt;/code&gt; call! It can be handy to have this escape hatch in the API, but it's also easy to abuse.&lt;/p&gt;

&lt;p&gt;Sleeping is evil because it introduces a &lt;a href="https://en.wikipedia.org/wiki/Race_condition"&gt;race condition&lt;/a&gt; that involves two possible outcomes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GJp5eB0r--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://serpapi.com/blog/content/images/2022/03/race_condition.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GJp5eB0r--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://serpapi.com/blog/content/images/2022/03/race_condition.PNG" alt="Diagram showing the race condition between the browser state and the Puppeteer driver for pessimistic and optimistic delays" width="309" height="133"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Sleeping causes a race condition between the browser state and the Puppeteer driver&lt;/p&gt;

&lt;p&gt;One outcome is when the sleep duration is too optimistic and the driver wakes up before the browser is in the desired state. In this case, the issue is correctness: the driver will likely wind up missing data from responses that haven't arrived, or throw errors when attempting to manipulate elements that don't exist.&lt;/p&gt;

&lt;p&gt;The other outcome is when the sleep duration is too pessimistic and the browser reaches the desired state before the driver wakes up. In this case, the issue is efficiency: elapsed time between the desired state appearing and the driver waking up is wasted. Even a small extra delay becomes painful when incurred repeatedly, such as during automated testing.&lt;/p&gt;

&lt;p&gt;Furthermore, a sleep that seems to work can create an illusion of robustness. A duration that's long enough today might be too short tomorrow. Sleeping introduces an overfitted, environment-specific value that may work on one machine, but can easily fail on another. That other machine is often a production deploy on the cloud, where debugging discrepancies against a working local version can be difficult.&lt;/p&gt;

&lt;p&gt;The alternatives to &lt;code&gt;waitForTimeout&lt;/code&gt; include &lt;a href="https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#pagewaitforselectorselector-options"&gt;&lt;code&gt;waitForSelector&lt;/code&gt;&lt;/a&gt;, which blocks until a selector appears, or the more general &lt;a href="https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#pagewaitforfunctionpagefunction-options-args"&gt;&lt;code&gt;waitForFunction&lt;/code&gt;&lt;/a&gt;, which blocks until a predicate condition becomes true. Puppeteer tests the condition in a tight &lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/window/requestAnimationFrame"&gt;&lt;code&gt;requestAnimationFrame&lt;/code&gt;&lt;/a&gt; loop or upon DOM changes with a &lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/MutationObserver"&gt;&lt;code&gt;MutationObserver&lt;/code&gt;&lt;/a&gt;. Both &lt;code&gt;waitForSelector&lt;/code&gt; and &lt;code&gt;waitForFunction&lt;/code&gt; adhere to the event-driven model and eliminate race conditions.&lt;/p&gt;

&lt;p&gt;There are plenty of occasions when &lt;code&gt;page.waitForTimeout&lt;/code&gt; is appropriate, however. It can help with debugging, throttling interaction to simulate human behavior, rate limiting polling of a resource and as a last resort for handling pesky situations that don't respond well to other approaches. But sleeping as the default option when it's entirely possible to block on an explicit predicate is an antipattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  Assuming Puppeteer's API works like the native browser API
&lt;/h2&gt;

&lt;p&gt;Puppeteer's API has different semantics from the native browser API. For example, Puppeteer's &lt;a href="https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#pageclickselector-options"&gt;&lt;code&gt;page.click()&lt;/code&gt;&lt;/a&gt; seems like a straightforward wrapper on the browser's native &lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/click"&gt;&lt;code&gt;HTMLElement.click()&lt;/code&gt;&lt;/a&gt;, but it actually operates quite differently and hides a fair amount of complexity under the hood.&lt;/p&gt;

&lt;p&gt;Instead of invoking the click event handler directly on the element as the native &lt;code&gt;.click()&lt;/code&gt; does, clicking in Puppeteer scrolls the element into view, moves the mouse onto the element, presses one of a few mouse buttons, optionally triggers a delay, then releases the mouse button. You can also trigger multiple clicks. In other words, Puppeteer performs a click like a human would.&lt;/p&gt;

&lt;p&gt;Neither approach is better, but it's a mistake to assume they're the same and indiscriminately use one or the other across the board.&lt;/p&gt;

&lt;p&gt;There are times when using the native browser click makes it possible to access an element the mouse is unable to reach via Puppeteer's click, such as when another element is on top of it. In other cases, such as testing, it's desirable to click with the mouse as a human would, using a &lt;a href="https://w3c.github.io/uievents/#trusted-events"&gt;trusted event&lt;/a&gt;. Puppeteer's &lt;a href="https://github.com/puppeteer/puppeteer#q-whats-the-difference-between-a-trusted-and-untrusted-input-event"&gt;docs&lt;/a&gt; state:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For automation purposes it's important to generate trusted events. &lt;strong&gt;All input events generated with Puppeteer are trusted and fire proper accompanying events.&lt;/strong&gt; If, for some reason, one needs an untrusted event, it's always possible to hop into a page context with &lt;code&gt;page.evaluate&lt;/code&gt; and generate a fake event:&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;button[type=submit]&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;click&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The difference in behavior might be most noticable for &lt;code&gt;page.click()&lt;/code&gt;, but it's worth keeping the distinction in mind when working with the rest of Puppeteer's API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Never using &lt;code&gt;"domcontentloaded"&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;The default event for Puppeteer's navigiation API, such as &lt;a href="https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#pagegotourl-options"&gt;&lt;code&gt;page.goto(url, {waitUntil: "some event"})&lt;/code&gt;&lt;/a&gt;, is &lt;code&gt;{waitUntil: "load"}&lt;/code&gt;. I'm guilty of blindly relying on this default without questioning its implications, but it's worth investigating a bit relative to other options: &lt;code&gt;"domcontentloaded"&lt;/code&gt;, &lt;code&gt;"networkidle2"&lt;/code&gt; and &lt;code&gt;"networkidle0"&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/Window/load_event"&gt;MDN says of the &lt;code&gt;load&lt;/code&gt; event&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The &lt;code&gt;load&lt;/code&gt; event is fired when the whole page has loaded, including all dependent resources such as stylesheets and images. This is in contrast to &lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/Document/DOMContentLoaded_event"&gt;&lt;code&gt;DOMContentLoaded&lt;/code&gt;&lt;/a&gt;, which is fired as soon as the page DOM has been loaded, without waiting for resources to finish loading.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Waiting for &lt;code&gt;load&lt;/code&gt; is a useful choice in scenarios where it's important to see the page as the user would. Example use cases include snapping screenshots or generating PDFs. &lt;code&gt;"networkidle0"&lt;/code&gt; and &lt;code&gt;"networkidle2"&lt;/code&gt; resolve the navigation promise when no more than 0 or 2 network requests are active in the past 500 milliseconds. These settings have similar use cases as &lt;code&gt;"load"&lt;/code&gt; and offer more control for handling pages that might keep a polling connection or two active after load, or when the driver needs to flush requests before proceeding.&lt;/p&gt;

&lt;p&gt;On the other hand, if the goal is to scrape a table of text statistics that arrives from a network request, there's no sense in waiting around for anything more than &lt;code&gt;"domcontentloaded"&lt;/code&gt; and the lone data request, which might look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;waitUntil&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;domcontentloaded&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;el&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;waitForSelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;yourSelector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;visible&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;textContent&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;...where &lt;code&gt;yourSelector&lt;/code&gt; won't be injected into the page until the desired data arrives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Not blocking images and resources
&lt;/h2&gt;

&lt;p&gt;If the goal is to scrape a simple piece of data, there's no sense in wasting time and bandwidth requesting all images, stylesheets and/or scripts on a page. When possible, consider disabling JS with &lt;a href="https://github.com/puppeteer/puppeteer/blob/v13.5.1/docs/api.md#pagesetjavascriptenabledenabled"&gt;&lt;code&gt;page.setJavaScriptEnabled(false)&lt;/code&gt;&lt;/a&gt;, stopping the browser with &lt;code&gt;page.evaluate(() =&amp;gt; window.stop())&lt;/code&gt; or &lt;a href="https://github.com/puppeteer/puppeteer/blob/v13.5.1/docs/api.md#pagesetrequestinterceptionvalue"&gt;enabling request interception&lt;/a&gt; to block images as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setRequestInterception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;request&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resourceType&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;image&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abort&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This helps keep your scripts snappy and reduces waste.&lt;/p&gt;

&lt;p&gt;As an aside, there's no need to &lt;code&gt;await page.on&lt;/code&gt;, which is strictly callback-driven and does not return a promise. Nonetheless, it can be handy to &lt;a href="https://javascript.info/promisify"&gt;promisify&lt;/a&gt; &lt;code&gt;page.on&lt;/code&gt; so results can be awaited later. While most events are available in a promise-based function, others such as &lt;a href="https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#event-dialog"&gt;&lt;code&gt;"dialog"&lt;/code&gt;&lt;/a&gt; aren't.&lt;/p&gt;

&lt;p&gt;You can use &lt;code&gt;.once&lt;/code&gt; and &lt;code&gt;.off&lt;/code&gt; methods to ensure the handler is removed when no longer needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Avoiding &lt;code&gt;page.evaluate&lt;/code&gt; when trusted events aren't necessary
&lt;/h2&gt;

&lt;p&gt;One can easily wind up frustrated when refactoring working browser console code using jQuery or vanilla JS to Puppeteer's trusted interface: &lt;code&gt;page.$&lt;/code&gt;, &lt;code&gt;page.$eval&lt;/code&gt;, &lt;code&gt;page.type&lt;/code&gt;, &lt;code&gt;page.click&lt;/code&gt;, and so on. This is a common situation since experimenting with the DOM in the browser console is a typical first step in building a Puppeteer script. These refactors often go sour when attempting to pass complex structures such as &lt;a href="https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#class-elementhandle"&gt;ElementHandles&lt;/a&gt; between the browser and Node contexts, or encountering behavioral differences in methods like &lt;code&gt;page.click()&lt;/code&gt;, discussed above.&lt;/p&gt;

&lt;p&gt;Puppeteer provides &lt;a href="https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#pageevaluatepagefunction-args"&gt;&lt;code&gt;page.evaluate()&lt;/code&gt;&lt;/a&gt; as a highly-generalized function to run code in the browser. It supports passing serializable data and returning deserialized results. For times when trusted events aren't needed, plopping a chunk of working browser code inside a &lt;code&gt;page.evaluate&lt;/code&gt; and letting it handle the heavy lifting alleviates the time spent on a rewrite and helps mitigate the potential for subtle regressions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Misusing developer tools-generated selectors
&lt;/h2&gt;

&lt;p&gt;The developer tools in modern browsers let you copy CSS selectors and XPaths to the clipboard easily:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--b-OpuP9r--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://serpapi.com/blog/content/images/2022/03/copy_css_path-2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--b-OpuP9r--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://serpapi.com/blog/content/images/2022/03/copy_css_path-2.png" alt="Copying an element's CSS selector in Chrome" width="463" height="367"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Copying an element's CSS selector in Chrome&lt;/p&gt;

&lt;p&gt;This is a boon for developers who might not be familiar with CSS selectors or XPath, and can save time for those who are.&lt;/p&gt;

&lt;p&gt;The problem is, these selectors and paths can be overly-rigid and lead to brittle scripts that break if a wrapper is added as a parent further up the tree, or a sibling element shows up unexpectedly after an interaction on the page.&lt;/p&gt;

&lt;p&gt;For example, Chrome gives the very strict CSS selector &lt;code&gt;#answer-60796572 &amp;gt; div &amp;gt; div.answercell.post-layout--right &amp;gt; div.s-prose.js-post-body &amp;gt; pre&lt;/code&gt; when &lt;code&gt;#answer-60796572 pre&lt;/code&gt; might be more resilient to handling dynamic behavior.&lt;/p&gt;

&lt;p&gt;Choosing how to select elements is an art, not a science. It takes time to build a feel for which specificity is appropriate for a particular use case. The browser-generated selectors and paths are handy, but are ripe for misuse. When using these selectors, it's wise to take a step back and consider the broader context such as site-specific behavior and script goals.&lt;/p&gt;

&lt;p&gt;SerpAPI's post &lt;a href="https://serpapi.com/blog/web-scraping-with-css-selectors-using-python/"&gt;Web Scraping with CSS Selectors using Python&lt;/a&gt; provides a nice introduction to CSS selectors in a web scraping context. (Python isn't essential to the selector information.)&lt;/p&gt;

&lt;h2&gt;
  
  
  Not using the return value of &lt;code&gt;.waitForSelector&lt;/code&gt; and &lt;code&gt;.waitForXPath&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;A common, albeit minor, antipattern is to make an extra call to retrieve an element after waiting for it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;waitForSelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;selector&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;elem&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;selector&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// or:&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;waitForXPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;elem&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;$x&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cleaner and more precise is to use the return value of the &lt;code&gt;waitFor&lt;/code&gt; calls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;elem&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;waitForSelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;selector&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// or:&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;elem&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;waitForXPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you've obtained the ElementHandle, you can use &lt;code&gt;elem.evaluate(el =&amp;gt; el.textContent)&lt;/code&gt; rather than &lt;code&gt;page.evaluate(el =&amp;gt; el.textContent, elem)&lt;/code&gt;. See &lt;a href="https://stackoverflow.com/a/68294113/6243352"&gt;a Stack Overflow answer of mine&lt;/a&gt; for details on passing nested ElementHandles back to the browser.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using a separate HTML parser with Puppeteer
&lt;/h2&gt;

&lt;p&gt;Puppeteer already has access to the full power of browser JS and operates on the page in real-time, so it's an antipattern to bring in a secondary HTML parser like &lt;a href="https://cheerio.js.org/"&gt;Cheerio&lt;/a&gt; without good reason. Using Cheerio with Puppeteer involves taking serialized snapshots of the entire DOM (i.e. using &lt;a href="https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#pagecontent"&gt;&lt;code&gt;page.content()&lt;/code&gt;&lt;/a&gt;), then asking Cheerio to re-parse the HTML before being able to make a selection. This can be slow, adds a potentially confusing layer of indirection between the live page and the separate HTML parser and creates an additional opportunity for misunderstanding the state of the application.&lt;/p&gt;

&lt;p&gt;This approach can be beneficial for debugging and when HTML snapshots or particular features of Cheerio are needed (like &lt;a href="https://github.com/jquery/sizzle/wiki#selectors"&gt;sizzle selectors&lt;/a&gt;), but the common case is to use Puppeteer on its own.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Puppeteer when other tools are more appropriate
&lt;/h2&gt;

&lt;p&gt;Web browser automation is a heavy and slow proposition. It involves launching a browser process, then making network calls to navigate the browser to a page, then manipulating the page to accomplish a goal. Sometimes that goal can be achieved directly using a simple HTTP request to a public API or retrieving data baked into a page's static HTML. Using a request library like &lt;code&gt;fetch&lt;/code&gt; and an HTML parser like Cheerio, when possible, can be a faster and easier method than Puppeteer.&lt;/p&gt;

&lt;p&gt;When collecting requirements for a scraping task, take the time to make sure the data isn't hidden in plain sight. This usually involves viewing the page source to see the static HTML and analyzing network traffic using the browser's developer tools to determine where the data you're after is coming from. Reading the website's FAQ and documentation often points to a public API. Occasionally, other sites offer the same information in a more accessible format.&lt;/p&gt;

&lt;p&gt;For one-time scrapes, I'll often select the data we need using browser JS and copy it to the clipboard from the console by hand instead of reaching for Puppeteer. Userscripts, extensions and bookmarklets are other lightweight ways to manipulate a page without Puppeteer or Node.&lt;/p&gt;

&lt;p&gt;While these alternatives might not cover many of your Puppeteer needs, it's good to keep more than one tool in your belt.&lt;/p&gt;




&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;In this article, I've presented a handful of Puppeteer antipatterns to be on the lookout for. We hope that a critical examination of them will help keep your Puppeteer code clean, fast, reliable and easy to maintain. As the tool grows, I'll be curious to see how these patterns evolve along with it.&lt;/p&gt;

&lt;p&gt;Keep in mind that these guidelines are rules-of-thumb and might need to be broken from time to time. The key is to be aware of the tradeoffs when using them.&lt;/p&gt;

&lt;p&gt;Happy automating!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This article is current as of Puppeteer version 13.5.1.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>SerpApi Integration for Zapier (wip)</title>
      <dc:creator>Illia Zub</dc:creator>
      <pubDate>Thu, 17 Mar 2022 14:33:55 +0000</pubDate>
      <link>https://dev.to/serpapi/serpapi-integration-for-zapier-beta-4ggf</link>
      <guid>https://dev.to/serpapi/serpapi-integration-for-zapier-beta-4ggf</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;SerpApi Integration for Zapier is work-in-progress.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Idea
&lt;/h2&gt;

&lt;p&gt;Two years ago we wrote a &lt;a href="https://medium.com/serpapi/track-daily-google-rankings-for-specific-query-and-domain-via-serpapi-integration-in-zapier-beta-f8c69da88a24"&gt;note about a SerpApi Integration for Zapier&lt;/a&gt;. Currently, we're working on supporting all search parameters that SerpApi has.&lt;/p&gt;

&lt;p&gt;Work-in-progress SerpApi Integration for Zapier works similarly to the &lt;code&gt;SERPAPI_RANK&lt;/code&gt; function from &lt;a href="https://workspace.google.com/marketplace/app/serpapi_search_engine_results_and_ranks/562749385480"&gt;SerpApi Add-On for Google Sheets&lt;/a&gt; (&lt;a href="https://medium.com/@elizabeth_38962/how-to-scrape-google-results-into-a-spreadsheet-serpapi-google-sheets-plugin-e2915e73299f"&gt;tutorial&lt;/a&gt;). It is a Zapier Action that makes a search to SerpApi with the specific parameters.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--PxdwNA_H--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://serpapi.com/blog/content/images/2022/03/image-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--PxdwNA_H--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://serpapi.com/blog/content/images/2022/03/image-1.png" alt="Parameters for the SerpApi Action in Zapier" width="800" height="741"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Returned data
&lt;/h2&gt;

&lt;p&gt;Then returns a search result that matches the domain.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JCGKkmC0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://serpapi.com/blog/content/images/2022/03/2022-03-17_16-08.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JCGKkmC0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://serpapi.com/blog/content/images/2022/03/2022-03-17_16-08.png" alt="Test results of the SerpApi Integration for Zapier" width="800" height="647"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Use cases
&lt;/h2&gt;

&lt;p&gt;A matched search result, &lt;em&gt;for example&lt;/em&gt;, can be passed to the next Zapier Action like "Create Record in Airtable" or "Create Record in Google Sheets".&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--IJnslUoY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://serpapi.com/blog/content/images/2022/03/2022-03-17_16-12.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--IJnslUoY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://serpapi.com/blog/content/images/2022/03/2022-03-17_16-12.png" alt="Using SerpApi result in the Create Record in Airtable Action in Zapier" width="800" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--MNcGSqNd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://serpapi.com/blog/content/images/2022/03/image--1-.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--MNcGSqNd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://serpapi.com/blog/content/images/2022/03/image--1-.png" alt="Zapier Action creates a record in Airtable" width="800" height="355"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  SerpApi + Zapier
&lt;/h2&gt;

&lt;p&gt;SerpApi integration for Zapier is in development. We're working on &lt;a href="https://github.com/serpapi/public-roadmap/issues/69"&gt;accepting more search parameters&lt;/a&gt; that SerpApi supports.&lt;/p&gt;

&lt;h2&gt;
  
  
  Outro
&lt;/h2&gt;

&lt;p&gt;If you have anything to share, any questions, suggestions, or something that isn't working correctly, reach out via Twitter at &lt;a href="https://twitter.com/ilyazub_"&gt;@ilyazub_&lt;/a&gt;, or &lt;a href="https://twitter.com/serp_api"&gt;@serp_api&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Yours,&lt;br&gt;
Illia, and the rest of the SerpApi Team.&lt;/p&gt;




&lt;p&gt;Join us on &lt;a href="https://www.reddit.com/r/SerpApi/"&gt;Reddit&lt;/a&gt; | &lt;a href="https://twitter.com/serp_api"&gt;Twitter&lt;/a&gt; | &lt;a href="https://www.youtube.com/channel/UCUgIHlYBOD3yA3yDIRhg_mg"&gt;YouTube&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/serpapi/public-roadmap/issues/new"&gt;Request a Feature 💫 or Report a Bug 🐞&lt;/a&gt;&lt;/p&gt;

</description>
      <category>nocode</category>
      <category>webscraping</category>
    </item>
    <item>
      <title>Scrape Walmart Search for a specific store</title>
      <dc:creator>Illia Zub</dc:creator>
      <pubDate>Thu, 02 Dec 2021 11:50:33 +0000</pubDate>
      <link>https://dev.to/serpapi/scrape-walmart-search-for-a-specific-store-5c2d</link>
      <guid>https://dev.to/serpapi/scrape-walmart-search-for-a-specific-store-5c2d</guid>
      <description>&lt;p&gt;Walmart responds with results for Sacramento for requests outside of the US.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--muEHxbVj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://user-images.githubusercontent.com/282605/142628959-2feeeb6c-d085-40bb-a282-26c758ce2f31.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--muEHxbVj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://user-images.githubusercontent.com/282605/142628959-2feeeb6c-d085-40bb-a282-26c758ce2f31.png" alt="image" width="439" height="65"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But how to search for products that are available in a specific store? Any store on Walmart can be chosen without browser automation — only by setting relevant cookies in the plain HTTP request.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;To figure out on your own, JS and browser dev tools knowledge will be enough. Some Ruby knowledge is required to understand this post. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Location cookies
&lt;/h2&gt;

&lt;p&gt;I've updated location several times and checked the browser Dev Tools -&amp;gt; Application -&amp;gt; Cookies.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Ws5gD-mT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://user-images.githubusercontent.com/282605/142677221-555a6ff9-3ff3-4f3f-b02b-1a33645b61d9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Ws5gD-mT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://user-images.githubusercontent.com/282605/142677221-555a6ff9-3ff3-4f3f-b02b-1a33645b61d9.png" alt="image" width="800" height="195"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are several cookies being updated after choosing a different location: &lt;code&gt;locGuestData&lt;/code&gt;, &lt;code&gt;locDataV3&lt;/code&gt;, &lt;code&gt;assortmentStoreId&lt;/code&gt;; &lt;code&gt;ACID&lt;/code&gt;, &lt;code&gt;hasACID&lt;/code&gt;, &lt;code&gt;hasLocData&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;location-data&lt;/code&gt; also looks relevant but it contains postal code and address for a store I haven't chosen. Maybe it was used before Walmart migrated to GrapgQL API.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;locDataV3&lt;/code&gt; and &lt;code&gt;locGuestData&lt;/code&gt; are Base64 and URI-encoded JSON objects. &lt;code&gt;locDataV3&lt;/code&gt; contains more data than &lt;code&gt;locGuestData&lt;/code&gt;. But data of &lt;code&gt;locGuestData&lt;/code&gt; can be used for both.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ACID&lt;/code&gt; is a UUID. It can be generated on the client.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;hasACID&lt;/code&gt; and &lt;code&gt;hasLocData&lt;/code&gt; are flags.&lt;/p&gt;

&lt;h3&gt;
  
  
  Understanding &lt;code&gt;locGuestData&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Let's check what's inside this cookie value to understand how to set the store ID.&lt;/p&gt;

&lt;h4&gt;
  
  
  Example of encoded &lt;code&gt;locGuestData&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;When sending requests to Walmart, &lt;code&gt;locGuestData&lt;/code&gt; is a Base64-encoded string.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;eyJpbnRlbnQiOiJTSElQUElORyIsInN0b3JlSW50ZW50IjoiUElDS1VQIiwibWVyZ2VGbGFnIjp0cnVlLCJwaWNrdXAiOnsibm9kZUlkIjoiNDExNSIsInRpbWVzdGFtcCI6MTYzNzMyODUwMDUyM30sInBvc3RhbENvZGUiOnsiYmFzZSI6Ijc4MTU0IiwidGltZXN0YW1wIjoxNjM3MzI4NTAwNTIzfSwidmFsaWRhdGVLZXkiOiJwcm9kOnYyOjUyNzNlMDFjLTA4NzAtNGUwOS05ODU4LTAzYTI2ZDQ5N2ZhOSJ9&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Example of decoded &lt;code&gt;locGuestData&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;This Base64 string is a encoded JSON object.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;decodeURIComponent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;atob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;eyJpbnRlbnQiOiJTSElQUElORyIsInN0b3JlSW50ZW50IjoiUElDS1VQIiwibWVyZ2VGbGFnIjp0cnVlLCJwaWNrdXAiOnsibm9kZUlkIjoiNDExNSIsInRpbWVzdGFtcCI6MTYzNzMyODUwMDUyM30sInBvc3RhbENvZGUiOnsiYmFzZSI6Ijc4MTU0IiwidGltZXN0YW1wIjoxNjM3MzI4NTAwNTIzfSwidmFsaWRhdGVLZXkiOiJwcm9kOnYyOjUyNzNlMDFjLTA4NzAtNGUwOS05ODU4LTAzYTI2ZDQ5N2ZhOSJ9&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;intent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;SHIPPING&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;storeIntent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;PICKUP&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mergeFlag&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pickup&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;nodeId&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;4115&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;timestamp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1637328500523&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;postalCode&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;base&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;78154&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;timestamp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1637328500523&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;validateKey&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;prod:v2:5273e01c-0870-4e09-9858-03a26d497fa9&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After changing Walmart store several times, I've seen that &lt;code&gt;nodeId&lt;/code&gt; and &lt;code&gt;postalCode.base&lt;/code&gt; are changing. &lt;/p&gt;

&lt;h4&gt;
  
  
  Generate &lt;code&gt;timestamp&lt;/code&gt; and &lt;code&gt;acid&lt;/code&gt; for &lt;code&gt;locGuestData&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;code&gt;timestamp&lt;/code&gt; and &lt;code&gt;acid&lt;/code&gt; can be generated on every request.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_i&lt;/span&gt;

&lt;span class="n"&gt;acid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;SecureRandom&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Base64-encode location data
&lt;/h4&gt;

&lt;p&gt;Next, let's Base64-encode that JSON string as Walmart expects.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_i&lt;/span&gt;

&lt;span class="n"&gt;acid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;SecureRandom&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid&lt;/span&gt;

&lt;span class="n"&gt;location_guest_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="ss"&gt;intent: &lt;/span&gt;&lt;span class="s2"&gt;"SHIPPING"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;storeIntent: &lt;/span&gt;&lt;span class="s2"&gt;"PICKUP"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;mergeFlag: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="ss"&gt;pickup: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="ss"&gt;nodeId: &lt;/span&gt;&lt;span class="n"&gt;store_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;timestamp: &lt;/span&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="ss"&gt;postalCode: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="ss"&gt;base: &lt;/span&gt;&lt;span class="n"&gt;postal_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;timestamp: &lt;/span&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="ss"&gt;validateKey: &lt;/span&gt;&lt;span class="s2"&gt;"prod:v2:&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;acid&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;encoded_location_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;urlsafe_encode64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;location_guest_data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Create cookie string
&lt;/h4&gt;

&lt;p&gt;Finally, a location cookie string contains all the required fields.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="sx"&gt;%(ACID=&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;acid&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sx"&gt;; hasACID=true; hasLocData=1; locDataV3=&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;location_guest_data&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sx"&gt;; assortmentStoreId=&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;store_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sx"&gt;; locGuestData=&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;encoded_location_data&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sx"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Complete function to create Walmart location cookie
&lt;/h3&gt;

&lt;p&gt;Putting all together.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;location_cookie&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;store_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;postal_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;store_id&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;blank?&lt;/span&gt;

  &lt;span class="n"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_i&lt;/span&gt;

  &lt;span class="n"&gt;acid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;SecureRandom&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid&lt;/span&gt;

  &lt;span class="n"&gt;location_guest_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="ss"&gt;intent: &lt;/span&gt;&lt;span class="s2"&gt;"SHIPPING"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;storeIntent: &lt;/span&gt;&lt;span class="s2"&gt;"PICKUP"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;mergeFlag: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="ss"&gt;pickup: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="ss"&gt;nodeId: &lt;/span&gt;&lt;span class="n"&gt;store_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;timestamp: &lt;/span&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="ss"&gt;postalCode: &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="ss"&gt;base: &lt;/span&gt;&lt;span class="n"&gt;postal_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="ss"&gt;timestamp: &lt;/span&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="ss"&gt;validateKey: &lt;/span&gt;&lt;span class="s2"&gt;"prod:v2:&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;acid&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="n"&gt;encoded_location_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;urlsafe_encode64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;location_guest_data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

  &lt;span class="sx"&gt;%(ACID=&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;acid&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sx"&gt;; hasACID=true; hasLocData=1; locDataV3=&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;location_guest_data&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sx"&gt;; assortmentStoreId=&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;store_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sx"&gt;; locGuestData=&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;encoded_location_data&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sx"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then make an HTTP request using the language and libraries you've chosen.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;got&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;got&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;STORE_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;4115&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;POSTAL_CODE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;78154&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;locationCookie&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getLocationCookie&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;STORE_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;POSTAL_CODE&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;htmlResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;got&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://www.walmart.com/search?q=cookie&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;cookie&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;locationCookie&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4A1l-I_N--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://user-images.githubusercontent.com/282605/142636729-44dce11c-3566-4bb9-a34a-a4a5988990b2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4A1l-I_N--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://user-images.githubusercontent.com/282605/142636729-44dce11c-3566-4bb9-a34a-a4a5988990b2.png" alt="image" width="338" height="56"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to get store ID and postal code
&lt;/h2&gt;

&lt;p&gt;Well, but we wouldn't hard-code store ID and postal code into the web scraping program. A &lt;a href="https://raw.githubusercontent.com/ilyazub/walmart-store-locator/master/output/walmart_stores.csv"&gt;CSV of 4.6k stores&lt;/a&gt; can be used to find and store ID dynamically.&lt;/p&gt;

&lt;p&gt;Programmatic usage of CSV is out of the scope of this post. All that is needed is to read find store ID and postal code for a specific location in a table.&lt;/p&gt;

&lt;h3&gt;
  
  
  Updating a list of Walmart stores IDs and locations
&lt;/h3&gt;

&lt;p&gt;Walmart provides several sources to find stores. Data can be populated from one of those sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.walmart.com/store/finder"&gt;Store Finder&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.walmart.com/store/directory"&gt;US Store Directory&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Store Directory
&lt;/h4&gt;

&lt;p&gt;Store Directory contains links on four levels: country, states, cities, and stores. To get the data, iterate over all elements on the specific level and make subsequent requests.&lt;/p&gt;

&lt;h5&gt;
  
  
  States
&lt;/h5&gt;

&lt;p&gt;Assuming the country is the US, 51 states can be hard-coded. Walmart front-end requests data from the JSON endpoint &lt;code&gt;https://www.walmart.com/store/electrode/api/store-directory&lt;/code&gt;. It accepts the &lt;code&gt;st&lt;/code&gt; search parameter.&lt;/p&gt;

&lt;p&gt;Example: &lt;code&gt;https://www.walmart.com/store/electrode/api/store-directory?st=AL&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;It returns a list of cities. Each city object contains &lt;code&gt;city&lt;/code&gt;, and &lt;code&gt;storeId&lt;/code&gt; or &lt;code&gt;storeCount&lt;/code&gt;. The city with &lt;code&gt;storeId&lt;/code&gt; contains a single store. The city with &lt;code&gt;storeCount&lt;/code&gt; contains multiple stores.&lt;/p&gt;

&lt;h5&gt;
  
  
  Single store in a city
&lt;/h5&gt;

&lt;p&gt;Request to a specific store returns an HTML page. Example: &lt;code&gt;https://www.walmart.com/store/5744&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--neMLzgo9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://user-images.githubusercontent.com/282605/142639733-32c22320-5cfd-4cb9-b744-8a463dd221f5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--neMLzgo9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://user-images.githubusercontent.com/282605/142639733-32c22320-5cfd-4cb9-b744-8a463dd221f5.png" alt="image" width="361" height="135"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Store address and postal code should be extracted from the HTML. Store ID is already in URI.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;postalCode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;.store-address-postal[itemprop=postalCode]&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;textContent&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;address&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;.store-address[itemprop=address]&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;textContent&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  Multiple stores in a city
&lt;/h5&gt;

&lt;p&gt;Request for multiple stores returns a JSON response. Cities with a single store respond with an empty array (&lt;code&gt;[]&lt;/code&gt;) so we have to parse HTML.&lt;/p&gt;

&lt;p&gt;Example request for multiple stores&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://www.walmart.com/store/electrode/api/store-directory?st=AL&amp;amp;city=Decatur
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sample city from the response&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"displayName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Neighborhood Market"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"storeName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Neighborhood Market"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"address"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1203 6th Ave Se"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"phone"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"256-822-6366"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"postalCode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"35601"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"storeId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2488&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  Putting all together
&lt;/h5&gt;

&lt;p&gt;Pseudo-code to collect store IDs and locations for all US states.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;STATES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AL&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;TX&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;CA&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="cm"&gt;/* ... */&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;walmartStores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;

&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;STATES&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;cities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`https://www.walmart.com/store/electrode/api/store-directory?st=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;storeId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;storeCount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;city&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;cities&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;storeId&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;storeCount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`https://www.walmart.com/store/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;storeId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parseHTML&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;store&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;postalCode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;.store-address-postal[itemprop=postalCode]&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;textContent&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;address&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;.store-address[itemprop=address]&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;textContent&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

      &lt;span class="nx"&gt;walmartStores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;postalCode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;address&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;storeId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;storeId&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;storeId&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;storeCount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;stores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`https://www.walmart.com/store/electrode/api/store-directory?st=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;amp;city=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;city&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="nx"&gt;walmartStores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;stores&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;walmart_stores.csv&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;walmartStores&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Existing programs to scrape Walmart Stores
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://grep.app/search?q=walmart.com/store/"&gt;Search on GitHub&lt;/a&gt; via grep.app shows four relevant repositories&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://grep.app/search?q=walmart.com/store/"&gt;Akamai edge workers example&lt;/a&gt;. But it contains only 471 Walmart stores.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; https://raw.githubusercontent.com/akamai/edgeworkers-examples/master/edgecompute/examples/personalization/storelocator/data/locations.json | jq &lt;span class="s1"&gt;'.elements[].tags | select(."ref:walmart" != null) | .ref'&lt;/span&gt; | &lt;span class="nb"&gt;wc&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt;
471
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/scrapehero/walmart_store_locator/blob/8935e4d743e5507453a2a99294028fa6016cb3c9/walmart_store_locator.py"&gt;&lt;code&gt;scrapehero/walmart_store_locator&lt;/code&gt;&lt;/a&gt; which scrapes stores by postal codes. But finding a list of actual postal codes turned out to be harder than finding a list of actual US states.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/theriley106/WaltonAnalytics/blob/3a8fabc25ec757d94f5354ff2fc7c239a6c51fc7/Analytics.py"&gt;&lt;code&gt;theriley106/WaltonAnalytics&lt;/code&gt;&lt;/a&gt; which is great to extract data from Walmart but not Walmart stores.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/GUI/covid-vaccine-spotter/blob/3995274876a72990ae7699afa6e692b538efe48b/src/providers/Walmart/Stores.js#L129"&gt;&lt;code&gt;GUI/covid-vaccine-spotter&lt;/code&gt;&lt;/a&gt; which scrapes stores by postal codes. But finding a list of actual postal codes turned out to be harder than finding a list of actual US states.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So, I've played with Rust and came up with &lt;a href="https://github.com/ilyazub/walmart-store-locator"&gt;this (rough) program&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;After going through compilation errors, it worked well. Thanks to this helpful &lt;a href="https://gendignoux.com/blog/2021/04/01/rust-async-streams-futures-part1.html"&gt;blog post about async streams in Rust&lt;/a&gt;. Every time my program compiled, it actually worked. Fixing compilation errors is hard (for non-rustacean) but there's was no need to debug the program in runtime which is great.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Scraping Walmart is fairly easy — it contains inline JSON data for all products on the search results page.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--OgNb-o12--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://user-images.githubusercontent.com/282605/142681839-a99751e2-5a81-4cd7-8784-bbbaaec301ec.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--OgNb-o12--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://user-images.githubusercontent.com/282605/142681839-a99751e2-5a81-4cd7-8784-bbbaaec301ec.png" alt="image" width="757" height="110"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Update location cookies to specify the location for plain HTTP requests to Walmart.&lt;/p&gt;

&lt;p&gt;If you have anything to share, any questions, suggestions, or something that isn't working correctly, feel free to drop a comment in the comment section or reach out via Twitter at &lt;a href="https://twitter.com/ilyazub_"&gt;@ilyazub_&lt;/a&gt;, or &lt;a href="https://twitter.com/serp_api"&gt;@serp_api&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Yours,&lt;br&gt;
Ilya, and the rest of the SerpApi Team.&lt;/p&gt;




&lt;p&gt;Join us on &lt;a href="https://www.reddit.com/r/SerpApi/"&gt;Reddit&lt;/a&gt; | &lt;a href="https://twitter.com/serp_api"&gt;Twitter&lt;/a&gt; | &lt;a href="https://www.youtube.com/channel/UCUgIHlYBOD3yA3yDIRhg_mg"&gt;YouTube&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>walmart</category>
    </item>
  </channel>
</rss>
