<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Daniel Moura</title>
    <description>The latest articles on DEV Community by Daniel Moura (@dcmoura).</description>
    <link>https://dev.to/dcmoura</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F615877%2Faf82300b-dfcf-410e-908c-720d409e8c42.jpeg</url>
      <title>DEV Community: Daniel Moura</title>
      <link>https://dev.to/dcmoura</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dcmoura"/>
    <language>en</language>
    <item>
      <title>Command-line data analytics made easy with SPyQL</title>
      <dc:creator>Daniel Moura</dc:creator>
      <pubDate>Sun, 06 Nov 2022 11:08:17 +0000</pubDate>
      <link>https://dev.to/dcmoura/command-line-data-analytics-made-easy-with-spyql-2akd</link>
      <guid>https://dev.to/dcmoura/command-line-data-analytics-made-easy-with-spyql-2akd</guid>
      <description>&lt;p&gt;The command-line is incredibly powerful when it comes to data processing. Still, many of us working with data do not take advantage of it. I can think of some reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Poor readability&lt;/strong&gt;: the focus is on minimising how much you need to type and not so much on how readable a sequence of commands is;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Steep learning curve&lt;/strong&gt;: many commands, with many options;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Looks outdated:&lt;/strong&gt; some of these tools are around since the 70s and target delimited text files (and not modern formats like JSON and YAML).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These motivated me to write a command-line tool that focus on readability, easiness to learn and modern data formats, while leveraging the command-line ecosystem. On top of that, it also leverages the Python ecosystem! Meet &lt;a href="https://github.com/dcmoura/spyql" rel="noopener noreferrer"&gt;SPyQL - SQL with Python in the middle&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fromtimestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;purchase_ts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;purchase_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;sum_agg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'my_purchases.csv'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;department&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s1"&gt;'IT'&lt;/span&gt; &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;purchase_ts&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;Null&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  SPyQL in Action
&lt;/h2&gt;

&lt;p&gt;I think the best way for getting to know SPyQL and getting comfortable with the command-line is to open a terminal and solve a problem. In this case, we will try to understand the geographical distribution of cell towers. Let's start!&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;

&lt;p&gt;Let's start by installing SPyQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;pip3 &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-U&lt;/span&gt; spyql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and check its version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;spyql &lt;span class="nt"&gt;--version&lt;/span&gt;
spyql, version 0.8.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's also install &lt;a href="https://github.com/dcmoura/matplotcli" rel="noopener noreferrer"&gt;MatplotCLI&lt;/a&gt;, a utility for creating plots from the command-line that leverages &lt;a href="https://matplotlib.org" rel="noopener noreferrer"&gt;Matplotlib&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;pip3 &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-U&lt;/span&gt; matplotcli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, we will download some sample data (you can alternatively copy-paste the URL to your browser and download the file from there):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;wget https://raw.githubusercontent.com/dcmoura/blogposts/master/spyql_cell_towers/sample.csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This CSV file contains data about cell towers that were added to the &lt;a href="https://www.opencellid.org/" rel="noopener noreferrer"&gt;OpenCellid&lt;/a&gt; database on 2022 September 10 (&lt;code&gt;OCID-diff-cell-export-2022-09-10-T000000.csv&lt;/code&gt; file from the &lt;a href="https://www.opencellid.org/" rel="noopener noreferrer"&gt;OpenCellid&lt;/a&gt; project redistributed without modifications under the &lt;a href="https://creativecommons.org/licenses/by-sa/4.0/" rel="noopener noreferrer"&gt;Creative Commons Attribution-ShareAlike 4.0 International License&lt;/a&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  Inspecting the data
&lt;/h3&gt;

&lt;p&gt;Let's look to the data by getting the first 3 lines of the file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-3&lt;/span&gt; sample.csv
radio,mcc,net,area,cell,unit,lon,lat,range,samples,changeable,created,updated,averageSignal
GSM,262,2,852,2521,0,10.948628,50.170324,15762,200,1,1294561074,1662692508,0
GSM,262,2,852,2501,0,10.940241,50.174076,10591,200,1,1294561074,1662692508,0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You could do this same operation with SPyQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;spyql &lt;span class="s2"&gt;"SELECT * FROM csv LIMIT 2"&lt;/span&gt; &amp;lt; sample.csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;spyql &lt;span class="s2"&gt;"SELECT * FROM csv('sample.csv') LIMIT 2"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice that we are telling to get 2 rows of data and not 3 rows of the file (where the first is the header). &lt;/p&gt;

&lt;p&gt;One advantage of SPyQL is that we can change the output format easily. Let's change the output to JSON and look to the first record:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;spyql &lt;span class="s2"&gt;"SELECT * FROM csv('sample.csv') LIMIT 1 
TO json(indent=2)"&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="s2"&gt;"radio"&lt;/span&gt;: &lt;span class="s2"&gt;"GSM"&lt;/span&gt;,
  &lt;span class="s2"&gt;"mcc"&lt;/span&gt;: 262,
  &lt;span class="s2"&gt;"net"&lt;/span&gt;: 2,
  &lt;span class="s2"&gt;"area"&lt;/span&gt;: 852,
  &lt;span class="s2"&gt;"cell"&lt;/span&gt;: 2521,
  &lt;span class="s2"&gt;"unit"&lt;/span&gt;: 0,
  &lt;span class="s2"&gt;"lon"&lt;/span&gt;: 10.948628,
  &lt;span class="s2"&gt;"lat"&lt;/span&gt;: 50.170324,
  &lt;span class="s2"&gt;"range"&lt;/span&gt;: 15762,
  &lt;span class="s2"&gt;"samples"&lt;/span&gt;: 200,
  &lt;span class="s2"&gt;"changeable"&lt;/span&gt;: 1,
  &lt;span class="s2"&gt;"created"&lt;/span&gt;: 1294561074,
  &lt;span class="s2"&gt;"updated"&lt;/span&gt;: 1662692508,
  &lt;span class="s2"&gt;"averageSignal"&lt;/span&gt;: 0
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Querying the data
&lt;/h3&gt;

&lt;p&gt;Let's first count how many records we have:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;spyql &lt;span class="s2"&gt;"SELECT count_agg(*) AS n FROM csv('sample.csv')"&lt;/span&gt;
n
45745
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice that aggregation functions have the suffix &lt;code&gt;_agg&lt;/code&gt; to avoid conflicts with Python's functions like &lt;code&gt;min&lt;/code&gt;, &lt;code&gt;max&lt;/code&gt; and &lt;code&gt;sum&lt;/code&gt;.&lt;br&gt;
Now, let's count how many cell towers we have by radio type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;spyql &lt;span class="s2"&gt;"SELECT radio, count_agg(*) AS n 
FROM csv('sample.csv') GROUP BY 1 ORDER BY 2 DESC TO pretty"&lt;/span&gt;
radio        n
&lt;span class="nt"&gt;-------&lt;/span&gt;  &lt;span class="nt"&gt;-----&lt;/span&gt;
GSM      31549
LTE      12996
UMTS      1182
CDMA        16
NR           2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the pretty printing output. We can plot the above results by setting the output format to JSON and piping results into &lt;a href="https://github.com/dcmoura/matplotcli" rel="noopener noreferrer"&gt;MatplotCLI&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;spyql &lt;span class="s2"&gt;"SELECT radio, count_agg(*) AS n 
FROM csv('sample.csv') GROUP BY 1 ORDER BY 2 DESC TO json"&lt;/span&gt; |
plt &lt;span class="s2"&gt;"bar(radio, n)"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnt75i5y04hdk576hxbpb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnt75i5y04hdk576hxbpb.png" alt="Matplolib plot created by MatplotCLI using the output of a SPyQL query"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How easy was that? &lt;strong&gt;:-)&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Querying the data: a more complex example
&lt;/h3&gt;

&lt;p&gt;Now, let's get the top 5 countries with more cell towers added on that day:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;spyql &lt;span class="s2"&gt;"SELECT mcc, count_agg(*) AS n 
FROM csv('sample.csv') GROUP BY 1 ORDER BY 2 DESC LIMIT 5 
TO pretty"&lt;/span&gt;
  mcc      n
&lt;span class="nt"&gt;-----&lt;/span&gt;  &lt;span class="nt"&gt;-----&lt;/span&gt;
  262  24979
  440   5085
  208   4573
  310   2084
  311    799
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;MCC stands for Mobile Country Code (262 is the code for Germany). The first digit of the MCC identifies the region. Here's an exert from &lt;a href="https://en.wikipedia.org/wiki/Mobile_country_code" rel="noopener noreferrer"&gt;Wikipedia&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0: Test networks
2: Europe
3: North America and the Caribbean
4: Asia and the Middle East
5: Australia and Oceania
6: Africa
7: South and Central America
9: Worldwide
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's copy paste the above list of regions and create a new file named &lt;code&gt;mcc_geo.txt&lt;/code&gt;. On the mac this is as easy as &lt;code&gt;$ pbpaste &amp;gt; mcc_geo.txt&lt;/code&gt;, but you can also paste this into a text editor and save it. &lt;/p&gt;

&lt;p&gt;Now, let's ask SPyQL to open this file as a CSV and print its contents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;spyql &lt;span class="s2"&gt;"SELECT * FROM csv('mcc_geo.txt') TO pretty"&lt;/span&gt;
  col1  col2
&lt;span class="nt"&gt;------&lt;/span&gt;  &lt;span class="nt"&gt;-------------------------------&lt;/span&gt;
     0  Test networks
     2  Europe
     3  North America and the Caribbean
     4  Asia and the Middle East
     5  Australia and Oceania
     6  Africa
     7  South and Central America
     9  Worldwide
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;SPyQL detected that the separator is a colon and that the file has no header. We will use the &lt;em&gt;colN&lt;/em&gt; syntax to address the &lt;em&gt;Nth&lt;/em&gt; column. &lt;/p&gt;

&lt;p&gt;Now, let's create a single JSON object with as many key-value pairs as input rows. Let the 1st column of the input be the &lt;em&gt;key&lt;/em&gt; and the 2nd column be the &lt;em&gt;value&lt;/em&gt;, and save the result to a new file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;spyql &lt;span class="s2"&gt;"SELECT dict_agg(col1,col2) AS json 
FROM csv('mcc_geo.txt') TO json('mcc_geo.json', indent=2)"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can use &lt;code&gt;cat&lt;/code&gt; to inspect the output file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;cat &lt;/span&gt;mcc_geo.json                                               
&lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="s2"&gt;"0"&lt;/span&gt;: &lt;span class="s2"&gt;"Test networks"&lt;/span&gt;,
  &lt;span class="s2"&gt;"2"&lt;/span&gt;: &lt;span class="s2"&gt;"Europe"&lt;/span&gt;,
  &lt;span class="s2"&gt;"3"&lt;/span&gt;: &lt;span class="s2"&gt;"North America and the Caribbean"&lt;/span&gt;,
  &lt;span class="s2"&gt;"4"&lt;/span&gt;: &lt;span class="s2"&gt;"Asia and the Middle East"&lt;/span&gt;,
  &lt;span class="s2"&gt;"5"&lt;/span&gt;: &lt;span class="s2"&gt;"Australia and Oceania"&lt;/span&gt;,
  &lt;span class="s2"&gt;"6"&lt;/span&gt;: &lt;span class="s2"&gt;"Africa"&lt;/span&gt;,
  &lt;span class="s2"&gt;"7"&lt;/span&gt;: &lt;span class="s2"&gt;"South and Central America"&lt;/span&gt;,
  &lt;span class="s2"&gt;"9"&lt;/span&gt;: &lt;span class="s2"&gt;"Worldwide "&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Basically, we have aggregated all input lines into a Python dictionary, and then saved it as a JSON file. Try removing the &lt;code&gt;AS json&lt;/code&gt; alias from the &lt;code&gt;SELECT&lt;/code&gt; to understand why we need it :-) &lt;/p&gt;

&lt;p&gt;Now, let's get the statistics by region instead of by MCC. For this, we will load the JSON file that we just created (with the &lt;code&gt;-J&lt;/code&gt; option) and do a dictionary lookup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;spyql &lt;span class="nt"&gt;-Jgeo&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;mcc_geo.json &lt;span class="s2"&gt;"SELECT geo[mcc//100] AS region,
count_agg(*) AS n FROM csv('sample.csv') GROUP BY 1 
ORDER BY 2 DESC TO pretty"&lt;/span&gt;
region                               n
&lt;span class="nt"&gt;-------------------------------&lt;/span&gt;  &lt;span class="nt"&gt;-----&lt;/span&gt;
Europe                           35601
Asia and the Middle East          5621
North America and the Caribbean   3247
Australia and Oceania              894
Africa                             381
South and Central America            1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We do an integer division by 100 to get the 1st digit of the MCC and then we lookup this digit on the JSON we just created (which is loaded as a Python dictionary). This how we do a JOIN in SPyQL, via a dictionary lookup :-)&lt;/p&gt;

&lt;h3&gt;
  
  
  Leveraging Python libs on your queries
&lt;/h3&gt;

&lt;p&gt;Another advantage of SPyQL is that we can leverage the Python ecosystem. Let's try to do some more geographical statistics. Let's count towers by H3 cell (resolution 5) for Europe. First, we need to install the &lt;a href="https://github.com/uber/h3-py" rel="noopener noreferrer"&gt;H3 lib&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;pip3 &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-U&lt;/span&gt; h3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, we can convert latitude-longitude pairs into H3 cells, count how many towers we have by H3 cell, and save results into a CSV:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;spyql &lt;span class="s2"&gt;"IMPORT h3 SELECT h3.geo_to_h3(lat, lon, 5) AS cell,
count_agg(*) AS n FROM csv('sample.csv') WHERE mcc//100==2
GROUP BY 1 TO csv('towers_by_h3_res5.csv')"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Visualising these results is fairly simple with Kepler. Just go to &lt;a href="https://kepler.gl/demo" rel="noopener noreferrer"&gt;kepler.gl/demo&lt;/a&gt; and open the above file. You should see something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd449mi56ctxevehmaj6z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd449mi56ctxevehmaj6z.png" alt="Kepler visualization of aggregations by H3 cell (resolution 5) from SPyQL"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Final words
&lt;/h2&gt;

&lt;p&gt;I hope you enjoyed SPyQL and that I could show you how simple it makes to query data from the command-line. In this first post about SPyQL, we are just scratching the surface. There is a lot more we can do. We have barely leveraged the Shell and Python ecosystems in this article. And we worked with a small file (SPyQL can handle GB-size files without compromising system resources). So, stay tuned!&lt;/p&gt;

&lt;p&gt;Try out SPyQL and reach back to me with your thoughts. Thank you!&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;SPyQL repo: &lt;a href="https://github.com/dcmoura/spyql" rel="noopener noreferrer"&gt;github.com/dcmoura/spyql&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SPyQL documentation: &lt;a href="https://spyql.readthedocs.io" rel="noopener noreferrer"&gt;spyql.readthedocs.io&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MatplotCLI repo: &lt;a href="https://github.com/dcmoura/matplotcli" rel="noopener noreferrer"&gt;github.com/dcmoura/matplotcli&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;You can find me on &lt;a href="https://twitter.com/daniel_c_moura" rel="noopener noreferrer"&gt;Tweeter&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/dmoura/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; and &lt;a href="https://vimeo.com/danielcmoura" rel="noopener noreferrer"&gt;Vimeo&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;Article originally published at &lt;a href="https://danielcmoura.com/blog/2022/spyql-cell-towers/" rel="noopener noreferrer"&gt;danielcmoura.com&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>python</category>
      <category>tutorial</category>
      <category>datascience</category>
    </item>
  </channel>
</rss>
