In this story, you'll see the process of decoding URL parameters for pagination on Google Maps. It involved deobfuscation of Closure-compiled JavaScript, reverse-engineering of Protobuf data structures, and a bit of math. We tried to decode URL parameters by ourselves, by using pbtk
, and attempted to outsource this work. In the end, we succeeded after several pair programming sessions.
How Google Maps pagination works
We can get the link for the next page only by clicking on the “next page” button.
Links look like this.
Page 1
https://www.google.com/search?tbm=map&authuser=0&hl=en&gl=us&pb=!4m8!1m3!1d24182.00605141337!2d-74.0083012!3d40.7455096!3m2!1i1024!2i768!4f13.1!7i20!8i20!10b1!12m18!2m3!5m1!6e2!20e3!6m11!4b1!23b1!26i1!27i1!41i2!45b1!63m0!67b1!73m0!74i150000!89b1!10b1!16b1!19m4!2m3!1i360!2i120!4i8!20m57!2m2!1i203!2i100!3m2!2i4!5b1!6m6!1m2!1i86!2i86!1m2!1i408!2i240!7m42!1m3!1e1!2b0!3e3!1m3!1e2!2b1!3e2!1m3!1e2!2b0!3e3!1m3!1e3!2b0!3e3!1m3!1e8!2b0!3e3!1m3!1e3!2b1!3e2!1m3!1e9!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e10!2b0!3e4!2b1!4b1!9b0!22m3!1sjWLeXbmnHIXt-gTm2ouwDg%3A23!2zMWk6Mix0OjEyNjk2LGU6MSxwOmpXTGVYYm1uSElYdC1nVG0yb3V3RGc6MjM!7e81!24m40!1m12!13m6!2b1!3b1!4b1!6i1!8b1!9b1!18m4!3b1!4b1!5b1!6b1!2b1!5m5!2b1!3b1!5b1!6b1!7b1!10m1!8e3!14m1!3b1!17b1!20m2!1e3!1e6!24b1!25b1!26b1!30m1!2b1!36b1!43b1!52b1!55b1!56m2!1b1!3b1!26m4!2m3!1i80!2i92!4i8!30m28!1m6!1m2!1i0!2i0!2m2!1i458!2i768!1m6!1m2!1i974!2i0!2m2!1i1024!2i768!1m6!1m2!1i0!2i0!2m2!1i1024!2i20!1m6!1m2!1i0!2i748!2m2!1i1024!2i768!34m9!3b1!4b1!6b1!8m2!1b1!3b1!9b1!12b1!14b1!37m1!1e81!42b1!46m1!1e9!47m0!49m1!3b1!50m40!1m39!2m7!1u3!4sOpen+now!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIMHKBc!10m2!3m1!1e1!2m7!1u2!4sTop+rated!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIQHKBg!10m2!2m1!1e1!2m7!1u1!4sCheap!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIUHKBk!10m2!1m1!1e1!2m7!1u1!4sUpscale!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIYHKBo!10m2!1m1!1e2!3m1!1u2!3m1!1u1!3m1!1u3!4BIAE!59BQ2dBd0Fn&q=Coffee&tch=1&ech=1&psi=jWLeXbmnHIXt-gTm2ouwDg.1574855312836.1
Page 2
https://www.google.com/search?tbm=map&authuser=0&hl=en&gl=us&pb=!4m8!1m3!1d24182.00605141337!2d-74.0083012!3d40.7455096!3m2!1i1024!2i768!4f13.1!7i20!8i20!10b1!12m18!2m3!5m1!6e2!20e3!6m11!4b1!23b1!26i1!27i1!41i2!45b1!63m0!67b1!73m0!74i150000!89b1!10b1!16b1!19m4!2m3!1i360!2i120!4i8!20m57!2m2!1i203!2i100!3m2!2i4!5b1!6m6!1m2!1i86!2i86!1m2!1i408!2i240!7m42!1m3!1e1!2b0!3e3!1m3!1e2!2b1!3e2!1m3!1e2!2b0!3e3!1m3!1e3!2b0!3e3!1m3!1e8!2b0!3e3!1m3!1e3!2b1!3e2!1m3!1e9!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e10!2b0!3e4!2b1!4b1!9b0!22m3!1sjWLeXbmnHIXt-gTm2ouwDg%3A78!2zMWk6Mix0OjEyNjk2LGU6MSxwOmpXTGVYYm1uSElYdC1nVG0yb3V3RGc6Nzg!7e81!24m40!1m12!13m6!2b1!3b1!4b1!6i1!8b1!9b1!18m4!3b1!4b1!5b1!6b1!2b1!5m5!2b1!3b1!5b1!6b1!7b1!10m1!8e3!14m1!3b1!17b1!20m2!1e3!1e6!24b1!25b1!26b1!30m1!2b1!36b1!43b1!52b1!55b1!56m2!1b1!3b1!26m4!2m3!1i80!2i92!4i8!30m28!1m6!1m2!1i0!2i0!2m2!1i458!2i768!1m6!1m2!1i974!2i0!2m2!1i1024!2i768!1m6!1m2!1i0!2i0!2m2!1i1024!2i20!1m6!1m2!1i0!2i748!2m2!1i1024!2i768!34m9!3b1!4b1!6b1!8m2!1b1!3b1!9b1!12b1!14b1!37m1!1e81!42b1!46m1!1e9!47m0!49m1!3b1!50m40!1m39!2m7!1u3!4sOpen+now!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIMHKBc!10m2!3m1!1e1!2m7!1u2!4sTop+rated!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIQHKBg!10m2!2m1!1e1!2m7!1u1!4sCheap!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIUHKBk!10m2!1m1!1e1!2m7!1u1!4sUpscale!5e1!9s0ahUKEwj35K6aqYrmAhWFtp4KHWbtAuYQ_KkBCIYHKBo!10m2!1m1!1e2!3m1!1u2!3m1!1u1!3m1!1u3!4BIAE!59BQ2dBd0Fn&q=Coffee&tch=1&ech=2&psi=jWLeXbmnHIXt-gTm2ouwDg.1574855312836.1
They lead to a f.txt
file that contains the next page results.
The Plan
- Find out how the
pb (protobuf)
string is constructed. - Generate the next page link by setting the required parameters.
- Catch and parse the data.
Decoding the URL parameters for Google Maps pagination
Google Maps URLs contain the pb
parameter contains string-encoded Protobuf. The format is the same as for the data
parameter in the browser URL on Google Maps. It contains !
-separated values. There are several answers on StackOverflow, gists on GitHub, some blog posts about decoding, and even a kinda official guide on reverse engineering protobuf, but none of this touches pagination.
We tried to use pbtk
but it wasn't able to extract structures and crashed. Several attempts of reading pretty-printed obfuscated JavaScript didn't work.
After pairing with Milos, we found out most of the variables in Google Maps pagination URI: latitude
, longitude
, altitude_in_feets
, pagination_offset
, some parameter that is equal to psi but I don't know what it's meaning
. psi
changes after each page reload and it's in window.APP_OPTIONS[11]
.
psi parameter in the window.APP_OPTIONS on Google Maps
Another moving part is a list of filters, but we don't know how to parse them.
List of filters in Google Maps URLs for pagination
We understood that we can make the first request for Google Maps, extract variables and construct pagination URI. Like for our API to scrape YouTube.
if offset
long ||= -73.91476977539236
lat ||= 40.68525694561602
alt ||= 120027.44487325678
offset ||= 40
psi ||= "b24JYPPGOoaJrwTXlbHACw"
google_query = "#{query_scheme_and_domain}/search?tbm=map&authuser=0&hl=en&gl=ua&pb=!4m12!1m3!1d#{alt}!2d#{lat}!3d#{long}!2m3!1f0!2f0!3f0!3m2!1i1920!2i549!4f13.1!7i20!8i#{offset}!10b1!12m8!1m1!18b1!2m3!5m1!6e2!20e3!10b1!16b1!19m4!2m3!1i360!2i120!4i8!20m65!2m2!1i203!2i100!3m2!2i4!5b1!6m6!1m2!1i86!2i86!1m2!1i408!2i240!7m50!1m3!1e1!2b0!3e3!1m3!1e2!2b1!3e2!1m3!1e2!2b0!3e3!1m3!1e3!2b0!3e3!1m3!1e8!2b0!3e3!1m3!1e3!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e9!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e10!2b0!3e4!2b1!4b1!9b0!22m3!1s#{psi}!2s1i%3A2%2Ct%3A12696%2Ce%3A1%2Cp%3A#{psi}%3A1273!7e81!24m56!1m16!13m7!2b1!3b1!4b1!6i1!8b1!9b1!20b0!18m7!3b1!4b1!5b1!6b1!9b1!13b0!14b0!2b1!5m5!2b1!3b1!5b1!6b1!7b1!10m1!8e3!14m1!3b1!17b1!20m4!1e3!1e6!1e14!1e15!24b1!25b1!26b1!29b1!30m1!2b1!36b1!43b1!52b1!54m1!1b1!55b1!56m2!1b1!3b1!65m5!3m4!1m3!1m2!1i224!2i298!89b1!26m4!2m3!1i80!2i92!4i8!30m28!1m6!1m2!1i0!2i0!2m2!1i458!2i549!1m6!1m2!1i1870!2i0!2m2!1i1920!2i549!1m6!1m2!1i0!2i0!2m2!1i1920!2i20!1m6!1m2!1i0!2i529!2m2!1i1920!2i549!31b1!34m16!2b1!3b1!4b1!6b1!8m4!1b1!3b1!4b1!6b1!9b1!12b1!14b1!20b1!23b1!25b1!26b1!37m1!1e81!42b1!46m1!1e9!47m0!49m1!3b1!50m73!1m68!2m7!1u3!4sOpen+now!5e1!9s0ahUKEwiPrMCgia3uAhUjlYsKHanUBmYQ_KkBCNUJKBg!10m2!3m1!1e1!2m7!1u2!4sTop+rated!5e1!9s0ahUKEwiPrMCgia3uAhUjlYsKHanUBmYQ_KkBCNYJKBk!10m2!2m1!1e1!2m7!1u1!4sCheap!5e1!9s0ahUKEwiPrMCgia3uAhUjlYsKHanUBmYQ_KkBCNcJKBo!10m2!1m1!1e1!2m7!1u1!4sUpscale!5e1!9s0ahUKEwiPrMCgia3uAhUjlYsKHanUBmYQ_KkBCNgJKBs!10m2!1m1!1e2!2m7!1u16!4sVisited!5e1!9s0ahUKEwiPrMCgia3uAhUjlYsKHanUBmYQ_KkBCNkJKBw!10m2!16m1!1e1!2m7!1u16!4sHaven%27t+visited!5e1!9s0ahUKEwiPrMCgia3uAhUjlYsKHanUBmYQ_KkBCNoJKB0!10m2!16m1!1e2!3m1!1u3!3m1!1u2!3m2!1u1!3e1!3m11!1u16!2m4!1m2!16m1!1e1!2sVisited!2m4!1m2!16m1!1e2!2sHaven%27t+visited!4BIAE!2e2!3m2!1b1!3b1!59BQ2dBd0Fn!65m0!69i540&q=#{safe_query}&tch=1&ech=1&psi=#{psi}.1611230833287.1"
end
The quick check confirmed that the algorithm works.
$ bundle exec rails runner 'puts Search.new(engine: :google_maps, q: "coffee", lat: 36.3996184, long: -113.9511419, alt: 2124931.1267513777, offset: 20, psi: "rZkJYJuoINHmrgTP1IVI").query_randomized' | xargs curl -s -k -x $HTTP_PROXY -A 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.46' - > f.txt
# "https://www.google.com/search?tbm=map&authuser=0&hl=en&gl=ua&pb=!4m8!1m3!1d2124931.1267513777!2d36.3996184!3d-113.9511419!3m2!1i1024!2i768!4f13.1!7i20!8i20!10b1!12m25!1m1!18b1!2m3!5m1!6e2!20e3!6m16!4b1!23b1!26i1!27i1!41i2!45b1!49b1!63m0!67b1!73m0!74i150000!75b1!89b1!105b1!109b1!110m0!10b1!16b1!19m4!2m3!1i360!2i120!4i8!20m65!2m2!1i203!2i100!3m2!2i4!5b1!6m6!1m2!1i86!2i86!1m2!1i408!2i240!7m50!1m3!1e1!2b0!3e3!1m3!1e2!2b1!3e2!1m3!1e2!2b0!3e3!1m3!1e3!2b0!3e3!1m3!1e8!2b0!3e3!1m3!1e3!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e9!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e10!2b0!3e4!2b1!4b1!9b0!22m3!1sgpYJYKSwMsH9rgT0_7nAAg!2zMWk6Mix0OjEyNjk2LGU6MSxwOmdwWUpZS1N3TXNIOXJnVDBfN25BQWc6MjM!7e81!24m55!1m15!13m7!2b1!3b1!4b1!6i1!8b1!9b1!20b0!18m6!3b1!4b1!5b1!6b1!13b0!14b0!2b1!5m5!2b1!3b1!5b1!6b1!7b1!10m1!8e3!14m1!3b1!17b1!20m4!1e3!1e6!1e14!1e15!24b1!25b1!26b1!29b1!30m1!2b1!36b1!43b1!52b1!54m1!1b1!55b1!56m2!1b1!3b1!65m5!3m4!1m3!1m2!1i224!2i298!89b1!26m4!2m3!1i80!2i92!4i8!30m28!1m6!1m2!1i0!2i0!2m2!1i458!2i768!1m6!1m2!1i974!2i0!2m2!1i1024!2i768!1m6!1m2!1i0!2i0!2m2!1i1024!2i20!1m6!1m2!1i0!2i748!2m2!1i1024!2i768!34m17!2b1!3b1!4b1!6b1!7b1!8m4!1b1!3b1!4b1!6b1!9b1!12b1!14b1!20b1!23b1!25b1!26b1!37m1!1e81!42b1!46m1!1e9!47m0!49m1!3b1!50m72!1m68!2m7!1u3!4sOpen+now!5e1!9s0ahUKEwizhPrmpK3uAhXBvosKHfR_DigQ_KkBCMgJKBY!10m2!3m1!1e1!2m7!1u2!4sTop+rated!5e1!9s0ahUKEwizhPrmpK3uAhXBvosKHfR_DigQ_KkBCMkJKBc!10m2!2m1!1e1!2m7!1u1!4sCheap!5e1!9s0ahUKEwizhPrmpK3uAhXBvosKHfR_DigQ_KkBCMoJKBg!10m2!1m1!1e1!2m7!1u1!4sUpscale!5e1!9s0ahUKEwizhPrmpK3uAhXBvosKHfR_DigQ_KkBCMsJKBk!10m2!1m1!1e2!2m7!1u16!4sVisited!5e1!9s0ahUKEwizhPrmpK3uAhXBvosKHfR_DigQ_KkBCMwJKBo!10m2!16m1!1e1!2m7!1u16!4sHaven%27t+visited!5e1!9s0ahUKEwizhPrmpK3uAhXBvosKHfR_DigQ_KkBCM0JKBs!10m2!16m1!1e2!3m11!1u16!2m4!1m2!16m1!1e1!2sVisited!2m4!1m2!16m1!1e2!2sHaven%27t+visited!3m1!1u2!3m2!1u1!3e0!3m1!1u3!4BIAE!2e2!3m1!3b1!59BQ2dBd0Fn!65m0!69i540&q=coffee&tch=1&ech=1&psi=rZkJYJuoINHmrgTP1IVI.1611241093531.1"
# Hit next page in browser, copy URL and curl it.
$ curl -s -k -x $HTTP_PROXY -A 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.46' 'https://www.google.com/search?tbm=map&authuser=0&hl=en&gl=ua&pb=!4m8!1m3!1d2124931.1267513777!2d-113.9511419!3d36.3996184!3m2!1i1024!2i768!4f13.1!7i20!8i20!10b1!12m25!1m1!18b1!2m3!5m1!6e2!20e3!6m16!4b1!23b1!26i1!27i1!41i2!45b1!49b1!63m0!67b1!73m0!74i150000!75b1!89b1!105b1!109b1!110m0!10b1!16b1!19m4!2m3!1i360!2i120!4i8!20m65!2m2!1i203!2i100!3m2!2i4!5b1!6m6!1m2!1i86!2i86!1m2!1i408!2i240!7m50!1m3!1e1!2b0!3e3!1m3!1e2!2b1!3e2!1m3!1e2!2b0!3e3!1m3!1e3!2b0!3e3!1m3!1e8!2b0!3e3!1m3!1e3!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e9!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e10!2b0!3e4!2b1!4b1!9b0!22m3!1srZkJYJuoINHmrgTP1IVI!2zMWk6Mix0OjEyNjk2LGU6MSxwOnJaa0pZSnVvSU5IbXJnVFAxSVZJOjIz!7e81!24m55!1m15!13m7!2b1!3b1!4b1!6i1!8b1!9b1!20b0!18m6!3b1!4b1!5b1!6b1!13b0!14b0!2b1!5m5!2b1!3b1!5b1!6b1!7b1!10m1!8e3!14m1!3b1!17b1!20m4!1e3!1e6!1e14!1e15!24b1!25b1!26b1!29b1!30m1!2b1!36b1!43b1!52b1!54m1!1b1!55b1!56m2!1b1!3b1!65m5!3m4!1m3!1m2!1i224!2i298!89b1!26m4!2m3!1i80!2i92!4i8!30m28!1m6!1m2!1i0!2i0!2m2!1i458!2i768!1m6!1m2!1i974!2i0!2m2!1i1024!2i768!1m6!1m2!1i0!2i0!2m2!1i1024!2i20!1m6!1m2!1i0!2i748!2m2!1i1024!2i768!34m17!2b1!3b1!4b1!6b1!7b1!8m4!1b1!3b1!4b1!6b1!9b1!12b1!14b1!20b1!23b1!25b1!26b1!37m1!1e81!42b1!46m1!1e9!47m0!49m1!3b1!50m72!1m68!2m7!1u3!4sOpen+now!5e1!9s0ahUKEwig-8Lpp63uAhVRs4sKHU9qAQkQ_KkBCN8KKBY!10m2!3m1!1e1!2m7!1u2!4sTop+rated!5e1!9s0ahUKEwig-8Lpp63uAhVRs4sKHU9qAQkQ_KkBCOAKKBc!10m2!2m1!1e1!2m7!1u1!4sCheap!5e1!9s0ahUKEwig-8Lpp63uAhVRs4sKHU9qAQkQ_KkBCOEKKBg!10m2!1m1!1e1!2m7!1u1!4sUpscale!5e1!9s0ahUKEwig-8Lpp63uAhVRs4sKHU9qAQkQ_KkBCOIKKBk!10m2!1m1!1e2!2m7!1u16!4sVisited!5e1!9s0ahUKEwig-8Lpp63uAhVRs4sKHU9qAQkQ_KkBCOMKKBo!10m2!16m1!1e1!2m7!1u16!4sHaven%27t+visited!5e1!9s0ahUKEwig-8Lpp63uAhVRs4sKHU9qAQkQ_KkBCOQKKBs!10m2!16m1!1e2!3m11!1u16!2m4!1m2!16m1!1e1!2sVisited!2m4!1m2!16m1!1e2!2sHaven%27t+visited!3m1!1u2!3m2!1u1!3e0!3m1!1u3!4BIAE!2e2!3m1!3b1!59BQ2dBd0Fn!65m0!69i540&q=coffee&tch=1&ech=1&psi=rZkJYJuoINHmrgTP1IVI.1611241905488.1' > correct.txt
# Compare f.txt and correct.txt in a text editor - they almost the same.
We didn't want to add new parameters (long
, lat
, alt
) to our API for pagination specifically, so we tried to found ways to convert alt
from zoom. But those formulas don't equal the altitude in pagination URLs that Google Maps use.
Also, altitude depends on the number of pixels per inch which is different on different devices, and Google re-scales map to fit all places on the map. (This was irrelevant actually). Milos combined multiple formulas from the JS code in Google Maps to the formula to convert altitude to zoom for the given latitude.
const EARTH_RADIUS_IN_METERS = 6371010
const TILE_SIZE = 256
const SCREEN_PIXEL_HEIGHT = 768
var zoom = (altitude, latitude) => Math.log(1 / Math.tan(Math.PI / 180 * 13.1 / 2) * (SCREEN_PIXEL_HEIGHT / 2) * 2 * Math.PI / (TILE_SIZE * altitude / (EARTH_RADIUS_IN_METERS * Math.cos(Math.PI / 180 * latitude)))) / Math.LN2;
zoom(24182.00605141337, 40.7455096);
// => 14
The last missing part is the reverse formula.
Convert zoom to an altitude
We've simplified the zoom formula in Wolfram Alpha.
The initial formula for zoom = f(alt, lat)
The simplified formula in Wolfram Alpha. Pretty nice, huh?
After several hours of reading middle-school math books on logarithmic equations, we reversed the formula.
The Ruby code of alt = f(zoom, lat)
:
EARTH_RADIUS_IN_METERS = 6371010
TILE_SIZE = 256
SCREEN_PIXEL_HEIGHT = 768
RADIUS_X_PIXEL_HEIGHT = 27.3611 * EARTH_RADIUS_IN_METERS * SCREEN_PIXEL_HEIGHT
def altitude(zoom, latitude)
(RADIUS_X_PIXEL_HEIGHT * Math.cos((latitude * Math::PI) / 180)) / ((2 ** zoom) * TILE_SIZE)
end
Code to generate pb
parameter
The pb
parameter for the Google Maps pagination is a function of ll
from the URL and theoffset
.
ll
can contain negative and positive latitude
, longitude
, and zoom
. We decided to extract those parameters with regular expression.
PAGINATION_PARAMETERS_REGEX = %r{
\A # Start of string
(?:\s*) # initial possible whitespace
@(?<latitude>[-+]?\d{1,2}(?:[.,]\d+)?) # latitude: @10.78472
(?:\s*,\s*) # separator between latitude and longitude
(?<longitude>[-+]?\d{1,3}(?:[.,]\d+)?) # longitude: @-110
(?:\s*,\s*) # separator between longitude and zoom
(?<zoom>\d{1,2}(?:[.,]\d+)?)z # zoom: 9.22
\z # End of string
}x
Conversion of the ll
URL parameter to pb
for the specific results offset (start
) on Google Maps looks like this.
def pagination(ll, start)
extracted_parameters = ll.match(PAGINATION_PARAMETERS_REGEX)
return "" unless extracted_parameters
"!4m8!1m3!1d" +
altitude(extracted_parameters[:zoom].to_f, extracted_parameters[:latitude].to_f) +
"!2d" +
extracted_parameters[:longitude] +
"!3d" +
extracted_parameters[:latitude] +
"!3m2!1i1024!2i768!4f13.1!7i20!8i" +
(start || "0") +
"!10b1!12m25!1m1!18b1!2m3!5m1!6e2!20e3!6m16!4b1!23b1!26i1!27i1!41i2!45b1!49b1!63m0!67b1!73m0!74i150000!75b1!89b1!105b1!109b1!110m0!10b1!16b1!19m4!2m3!1i360!2i120!4i8!20m65!2m2!1i203!2i100!3m2!2i4!5b1!6m6!1m2!1i86!2i86!1m2!1i408!2i240!7m50!1m3!1e1!2b0!3e3!1m3!1e2!2b1!3e2!1m3!1e2!2b0!3e3!1m3!1e3!2b0!3e3!1m3!1e8!2b0!3e3!1m3!1e3!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e9!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e10!2b0!3e4!2b1!4b1!9b0!22m3!1s!2z!7e81!24m55!1m15!13m7!2b1!3b1!4b1!6i1!8b1!9b1!20b0!18m6!3b1!4b1!5b1!6b1!13b0!14b0!2b1!5m5!2b1!3b1!5b1!6b1!7b1!10m1!8e3!14m1!3b1!17b1!20m4!1e3!1e6!1e14!1e15!24b1!25b1!26b1!29b1!30m1!2b1!36b1!43b1!52b1!54m1!1b1!55b1!56m2!1b1!3b1!65m5!3m4!1m3!1m2!1i224!2i298!89b1!26m4!2m3!1i80!2i92!4i8!30m28!1m6!1m2!1i0!2i0!2m2!1i458!2i768!1m6!1m2!1i974!2i0!2m2!1i1024!2i768!1m6!1m2!1i0!2i0!2m2!1i1024!2i20!1m6!1m2!1i0!2i748!2m2!1i1024!2i768!34m16!2b1!3b1!4b1!6b1!8m4!1b1!3b1!4b1!6b1!9b1!12b1!14b1!20b1!23b1!25b1!26b1!37m1!1e81!42b1!46m1!1e9!47m0!49m1!3b1!50m53!1m49!2m7!1u3!4s!5e1!9s!10m2!3m1!1e1!2m7!1u2!4s!5e1!9s!10m2!2m1!1e1!2m7!1u16!4s!5e1!9s!10m2!16m1!1e1!2m7!1u16!4s!5e1!9s!10m2!16m1!1e2!3m11!1u16!2m4!1m2!16m1!1e1!2s!2m4!1m2!16m1!1e2!2s!3m1!1u2!3m1!1u3!4BIAE!2e2!3m1!3b1!59B!65m0!69i540"
end
Putting all together
# https://regex101.com/r/nOoiJ6/2
PAGINATION_PARAMETERS_REGEX = %r{
\A # Start of string
(?:\s*) # initial possible whitespace
@(?<latitude>[-+]?\d{1,2}(?:[.,]\d+)?) # latitude: @10.78472
(?:\s*,\s*) # separator between latitude and longitude
(?<longitude>[-+]?\d{1,3}(?:[.,]\d+)?) # longitude: @-110
(?:\s*,\s*) # separator between longitude and zoom
(?<zoom>\d{1,2}(?:[.,]\d+)?)z # zoom: 9.22
\z # End of string
}x
EARTH_RADIUS_IN_METERS = 6371010
TILE_SIZE = 256
SCREEN_PIXEL_HEIGHT = 768
RADIUS_X_PIXEL_HEIGHT = 27.3611 * EARTH_RADIUS_IN_METERS * SCREEN_PIXEL_HEIGHT
def pagination(ll, start)
extracted_parameters = ll.match(PAGINATION_PARAMETERS_REGEX)
return "" unless extracted_parameters
"!4m8!1m3!1d" +
altitude(extracted_parameters[:zoom].to_f, extracted_parameters[:latitude].to_f) +
"!2d" +
extracted_parameters[:longitude] +
"!3d" +
extracted_parameters[:latitude] +
"!3m2!1i1024!2i768!4f13.1!7i20!8i" +
(start || "0") +
"!10b1!12m25!1m1!18b1!2m3!5m1!6e2!20e3!6m16!4b1!23b1!26i1!27i1!41i2!45b1!49b1!63m0!67b1!73m0!74i150000!75b1!89b1!105b1!109b1!110m0!10b1!16b1!19m4!2m3!1i360!2i120!4i8!20m65!2m2!1i203!2i100!3m2!2i4!5b1!6m6!1m2!1i86!2i86!1m2!1i408!2i240!7m50!1m3!1e1!2b0!3e3!1m3!1e2!2b1!3e2!1m3!1e2!2b0!3e3!1m3!1e3!2b0!3e3!1m3!1e8!2b0!3e3!1m3!1e3!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e9!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e10!2b0!3e4!2b1!4b1!9b0!22m3!1s!2z!7e81!24m55!1m15!13m7!2b1!3b1!4b1!6i1!8b1!9b1!20b0!18m6!3b1!4b1!5b1!6b1!13b0!14b0!2b1!5m5!2b1!3b1!5b1!6b1!7b1!10m1!8e3!14m1!3b1!17b1!20m4!1e3!1e6!1e14!1e15!24b1!25b1!26b1!29b1!30m1!2b1!36b1!43b1!52b1!54m1!1b1!55b1!56m2!1b1!3b1!65m5!3m4!1m3!1m2!1i224!2i298!89b1!26m4!2m3!1i80!2i92!4i8!30m28!1m6!1m2!1i0!2i0!2m2!1i458!2i768!1m6!1m2!1i974!2i0!2m2!1i1024!2i768!1m6!1m2!1i0!2i0!2m2!1i1024!2i20!1m6!1m2!1i0!2i748!2m2!1i1024!2i768!34m16!2b1!3b1!4b1!6b1!8m4!1b1!3b1!4b1!6b1!9b1!12b1!14b1!20b1!23b1!25b1!26b1!37m1!1e81!42b1!46m1!1e9!47m0!49m1!3b1!50m53!1m49!2m7!1u3!4s!5e1!9s!10m2!3m1!1e1!2m7!1u2!4s!5e1!9s!10m2!2m1!1e1!2m7!1u16!4s!5e1!9s!10m2!16m1!1e1!2m7!1u16!4s!5e1!9s!10m2!16m1!1e2!3m11!1u16!2m4!1m2!16m1!1e1!2s!2m4!1m2!16m1!1e2!2s!3m1!1u2!3m1!1u3!4BIAE!2e2!3m1!3b1!59B!65m0!69i540"
end
def altitude(zoom, latitude)
((RADIUS_X_PIXEL_HEIGHT * Math.cos((latitude * Math::PI) / 180)) / ((2 ** zoom) * TILE_SIZE)).to_s
end
Parse the paginated data
We already have an API to scrape Google Maps that extracts data from the inline JavaScript in the HTML. Milos refactored it to support extraction from inline JS in HTML and from pagination responses. What we can say here is our parser gets the data from the deeply nested arrays and objects.
It's possible to extract data from complex single-page applications without browser automation. For us, it's more fun to understand how the scraped website works instead of tuning timeouts in waitFor
function calls. It also runs faster and is simpler to maintain. If this is something that excites you, we'd love for you to join us.
Top comments (0)