<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: James McPherson</title>
    <description>The latest articles on DEV Community by James McPherson (@jmcp).</description>
    <link>https://dev.to/jmcp</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F155090%2Fdb29d00b-5029-4e09-bb86-8d2b206f8c21.jpeg</url>
      <title>DEV Community: James McPherson</title>
      <link>https://dev.to/jmcp</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jmcp"/>
    <language>en</language>
    <item>
      <title>I know, I'll use a regex!</title>
      <dc:creator>James McPherson</dc:creator>
      <pubDate>Fri, 14 Aug 2020 20:48:53 +0000</pubDate>
      <link>https://dev.to/jmcp/i-know-i-ll-use-a-regex-6go</link>
      <guid>https://dev.to/jmcp/i-know-i-ll-use-a-regex-6go</guid>
      <description>&lt;p&gt;&lt;em&gt;This post appeared first on my blog, at &lt;a href="https://www.jmcpdotcom.com/blog/posts/2020-08-08-i-know-ill-use-a-regex/"&gt;https://www.jmcpdotcom.com/blog/posts/2020-08-08-i-know-ill-use-a-regex/&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This past week, a colleague asked me for help with a shell script that he had come across while investigating how we run one of our data ingestion pipelines. The shell script was  designed to clean input CSV files if they had lines which didn't match a specific pattern.&lt;/p&gt;

&lt;p&gt;Now to start with, the script was run over a directory and used a &lt;em&gt;very&lt;/em&gt; gnarly bit of shell &lt;a href="https://en.wikipedia.org/wiki/Glob_(programming)"&gt;globbing&lt;/a&gt; to generate a list of files in a subdirectory. That list was then iterated over to check for a &lt;code&gt;.csv&lt;/code&gt; extension. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Please save your eye-rolls and "but couldn't they..." for later&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Once that list of files had been weeded to only contain CSVs, each of those files was catted and read line by line to see if the line matched a desired pattern - using shell regular expression parsing. If the line did not match the pattern, it was deleted. The matching lines were then written to a new file.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Again, please save your eye-rolls and "but couldn't they..." for later&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The klaxons went off for my colleague when he saw the regex:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;    &lt;span class="nv"&gt;NEW&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;f&lt;/span&gt;&lt;span class="p"&gt;%.csv&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;_clean.csv&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="nv"&gt;buffer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;
      &lt;span class="nb"&gt;read
      &lt;/span&gt;&lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="nv"&gt;IFS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt; &lt;span class="nb"&gt;read&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; line &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$line&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;
      &lt;span class="k"&gt;do
            &lt;/span&gt;&lt;span class="nv"&gt;buffer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;buffer&lt;/span&gt;&lt;span class="k"&gt;}${&lt;/span&gt;&lt;span class="nv"&gt;line&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$buffer&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;~ ^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;0-9]&lt;span class="o"&gt;{&lt;/span&gt;4&lt;span class="o"&gt;}&lt;/span&gt;-&lt;span class="o"&gt;([&lt;/span&gt;0][0-9]|1[0-2]&lt;span class="o"&gt;)&lt;/span&gt;-&lt;span class="o"&gt;([&lt;/span&gt;0-2][0-9]|3[01]&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,[^,]&lt;span class="k"&gt;*&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,.&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;then
                  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$buffer&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
                  &lt;span class="nv"&gt;buffer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;
            &lt;span class="k"&gt;else
                  &lt;/span&gt;&lt;span class="nv"&gt;buffer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;buffer&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; "&lt;/span&gt;
            &lt;span class="k"&gt;fi
      done&lt;/span&gt;
      &lt;span class="o"&gt;}&lt;/span&gt; &amp;lt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;f&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NEW&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My eyes got whiplash. To make it easier to understand, let's put each element of the pattern on a single line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;    ^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;0-9]&lt;span class="o"&gt;{&lt;/span&gt;4&lt;span class="o"&gt;}&lt;/span&gt;-&lt;span class="o"&gt;([&lt;/span&gt;0][0-9]|1[0-2]&lt;span class="o"&gt;)&lt;/span&gt;-&lt;span class="o"&gt;([&lt;/span&gt;0-2][0-9]|3[01]&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="o"&gt;[&lt;/span&gt;^,]&lt;span class="k"&gt;*&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    &lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;^&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;,
    .&lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which is really something. The first field matches a date format - "yyyy-mm-dd" (which is ok), then we have 12 fields where we care that they are enclosed in double quotes, one field that we want to &lt;em&gt;not&lt;/em&gt; be quoted, another 12 fields which are quoted again, and any other fields we don't care about.&lt;/p&gt;

&lt;p&gt;Wow. &lt;/p&gt;

&lt;p&gt;I told my colleague that this wasn't a good way of doing things (he agreed).&lt;/p&gt;

&lt;p&gt;There are better ways to achieve this, so let's walk through them.&lt;/p&gt;

&lt;p&gt;Firstly, the shell &lt;a href="https://en.wikipedia.org/wiki/Glob_(programming)"&gt;globbing&lt;/a&gt;. There's a Unix command to generate a list of filesystem entries which match particular criteria. It's called &lt;a href="https://www.gnu.org/software/findutils/manual/html_mono/find.html"&gt;find&lt;/a&gt;. If we want a list of files which have a 'csv' extension we do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;    &lt;span class="nv"&gt;$ &lt;/span&gt;find DIR &lt;span class="nt"&gt;-type&lt;/span&gt; f &lt;span class="nt"&gt;-name&lt;/span&gt; &lt;span class="se"&gt;\*&lt;/span&gt;.csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can use '.' or '*' or any way of representing a DIRectory in the filesystem.&lt;/p&gt;

&lt;p&gt;Now since we want this in a list to iterate over, let's put it in a variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;    &lt;span class="nv"&gt;$ CSVfiles&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt; find DIR &lt;span class="nt"&gt;-type&lt;/span&gt; f &lt;span class="nt"&gt;-name&lt;/span&gt; &lt;span class="se"&gt;\*&lt;/span&gt;.csv &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nt"&gt;-name&lt;/span&gt; &lt;span class="se"&gt;\*&lt;/span&gt;.CSV &lt;span class="si"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(You can redirect stderr to /dev/null, with &lt;em&gt;2&amp;gt;/dev/null&lt;/em&gt; inside the parens if you'd like).&lt;/p&gt;

&lt;p&gt;Now that we've got our list, we can move to the second phase - removing lines which do not match our pattern. Let's try this first with &lt;a href="https://www.gnu.org/software/gawk"&gt;awk&lt;/a&gt;. Awk has the concept of a &lt;a href="https://www.gnu.org/software/gawk/manual/gawk.html#Field-Separators"&gt;Field Separator&lt;/a&gt;, and since CSV files are Comma-&lt;em&gt;Separated&lt;/em&gt;-Value files, let's make use of that feature. We also know that we are only really interested in two fields - the first (yyyy-mm-dd) and the fourteenth.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;    &lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="nt"&gt;-F&lt;/span&gt;&lt;span class="s1"&gt;','&lt;/span&gt; &lt;span class="s1"&gt;'$1 ~ /"[0-9]{4}-([0][0-9]|1[0-2])-([0-2][0-9]|3[01])"/ &amp;amp;&amp;amp;
        $14 !~ /".*"/ {print}'&lt;/span&gt; &amp;lt; &lt;span class="nv"&gt;$old&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$new&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's still rather ugly but considerably easier to read. For the record, the bare ~ is awk's equals operator, and !~ is not-equals.&lt;/p&gt;

&lt;p&gt;We could also do this with grep, but at the cost of using more of that horrible regex. &lt;/p&gt;

&lt;p&gt;In my opinion a better method is to cons up a Python script for this validation purpose, and we don't need to use the CSV module.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;collections&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;UserString&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

    &lt;span class="n"&gt;infile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/path/to/file.csv"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"rw"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;infile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;readlines&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;linecount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

        &lt;span class="n"&gt;fields&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;","&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;togo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strptime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="s"&gt;"%Y-%m-%d"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;ValueError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;_ve&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;togo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="s"&gt;'"'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;UserString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="n"&gt;isnumeric&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;togo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;togo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;del&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;linecount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# We've modified the input, so have to write out a new version, but
&lt;/span&gt;        &lt;span class="c1"&gt;# let's overwrite our input file rather than creating a new instance.
&lt;/span&gt;        &lt;span class="n"&gt;infile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;seek&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;infile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;infile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This script is pretty close to how I would write it in C (could you tell?).&lt;/p&gt;

&lt;p&gt;We first open the file (for reading &lt;em&gt;and&lt;/em&gt; writing) and read in every line, which yields us a list. While it's not the most memory efficient way of approaching this problem, it does make processing more efficient because  it's one &lt;code&gt;read()&lt;/code&gt;, rather than one-read-per-line. We store the number of lines that we've read in for comparison at the end of our loop, and then start the processing. &lt;/p&gt;

&lt;p&gt;Since this is a &lt;a href="https://docs.python.org/3.8/library/csv.html"&gt;CSV&lt;/a&gt; file we know we can &lt;code&gt;split()&lt;/code&gt; on the comma, and having done so, we check that we can parse the first field. We're not assigning to a variable with &lt;code&gt;datetime.strptime()&lt;/code&gt; because we only care that we &lt;em&gt;can&lt;/em&gt; rather than what the object's value is. The second check is to see that  we cannot find the double apostrophe in the element, and that the content of  the field is in fact numeric. If neither of these checks succeed, we know to delete the line from our input.&lt;/p&gt;

&lt;p&gt;Finally, if we have in fact had to delete any lines, we rewind our file &lt;br&gt;
(I was going to write pointer, but it's a File object. Told you it was close &lt;br&gt;
to C!) to the start, and write out each line of input with a newline character&lt;br&gt;
before closing the file.&lt;/p&gt;

&lt;p&gt;Whenever I think about regexes, &lt;em&gt;especially&lt;/em&gt; the ones I've written in C over the years, I think about this quote which &lt;a href="http://regex.info/blog/2006-09-15/247"&gt;Jeffrey Friedl&lt;/a&gt; wrote about a long time ago:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Some people, when confronted with a problem, think
“I know, I'll use regular expressions.”

Now they have two problems. 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;It was true when I first heard it some time during my first year of uni, and still true today. &lt;/p&gt;

</description>
      <category>regex</category>
      <category>shell</category>
    </item>
    <item>
      <title>A shell and GNU awk oneliner</title>
      <dc:creator>James McPherson</dc:creator>
      <pubDate>Tue, 28 Jul 2020 22:56:27 +0000</pubDate>
      <link>https://dev.to/jmcp/a-shell-and-gnu-awk-oneliner-2l6e</link>
      <guid>https://dev.to/jmcp/a-shell-and-gnu-awk-oneliner-2l6e</guid>
      <description>&lt;p&gt;I'm updating a test suite for $work, and need to handle input data with differing numbers of columns. The input is CSV files, which is fairly friendly. The number of files I've got to test my code against is extensive and I wanted to quickly figure out how many of each type I have.&lt;/p&gt;

&lt;p&gt;Enter the one-liner:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ for f in *csv; do awk -F, '{END print ARGV[ARGC-1], ":\t", NF}'
file1:      10
file2:      11
file3:      14
file4:      10
....
file100:     14
$
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;For each file in the list, when GNU awk has read it all in and parsed it using the , as a field delimiter (the &lt;code&gt;END&lt;/code&gt;), we then print the filename (&lt;code&gt;ARGV[ARGC-1]&lt;/code&gt;), and the number of fields &lt;code&gt;NF&lt;/code&gt;. If I wanted to print the count of the number of records (lines) in the file, I could add &lt;code&gt;NR&lt;/code&gt; to the arguments to print.&lt;/p&gt;

&lt;p&gt;I hope you found this useful.&lt;/p&gt;

&lt;p&gt;If you want to explore GNU Awk more, the documentation is at &lt;a href="https://www.gnu.org/software/gawk/manual/"&gt;https://www.gnu.org/software/gawk/manual&lt;/a&gt;&lt;/p&gt;

</description>
      <category>quick</category>
      <category>shell</category>
      <category>awk</category>
    </item>
    <item>
      <title>A (not so) brief introduction to using the Python database API</title>
      <dc:creator>James McPherson</dc:creator>
      <pubDate>Sun, 28 Jun 2020 03:55:11 +0000</pubDate>
      <link>https://dev.to/jmcp/a-not-so-brief-introduction-to-using-the-python-database-api-5cp7</link>
      <guid>https://dev.to/jmcp/a-not-so-brief-introduction-to-using-the-python-database-api-5cp7</guid>
      <description>&lt;p&gt;I've had this post in the works for a while now, and I'm pleased to say that I've finally hit 'publish'. &lt;/p&gt;

&lt;p&gt;Earlier this year I promised a friend at #PyladiesBne that I would write something about using databases with #Python, specifically #Oracle and #PostgreSQL. Here it is: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.jmcpdotcom.com/blog/posts/2020-06-03-a-brief-introduction-to-the-python-database-api/"&gt;https://www.jmcpdotcom.com/blog/posts/2020-06-03-a-brief-introduction-to-the-python-database-api/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's not actually brief, because I got sidetracked. You know how it is when you have this dataset and realise that you can ask more questions than you expected....&lt;/p&gt;

&lt;p&gt;That's just an occupational hazard for a #DataEngineer!&lt;/p&gt;

</description>
      <category>python</category>
      <category>database</category>
      <category>oracle</category>
      <category>postgres</category>
    </item>
    <item>
      <title>How I started learning Apache Spark</title>
      <dc:creator>James McPherson</dc:creator>
      <pubDate>Tue, 15 Oct 2019 03:26:22 +0000</pubDate>
      <link>https://dev.to/jmcp/how-i-started-learning-apache-spark-3g8o</link>
      <guid>https://dev.to/jmcp/how-i-started-learning-apache-spark-3g8o</guid>
      <description>&lt;p&gt;I've realised over the years that the best way for me to start learning a new language, toolkit or technology is to dive right in and start trying to solve problems with it.&lt;/p&gt;

&lt;p&gt;This is most definitely true for &lt;a href="https://spark.apache.org"&gt;Apache Spark&lt;/a&gt;, which I had to do recently in order to prepare for a #DataScience interview.&lt;/p&gt;

&lt;p&gt;I wrote a utility to Extract information from my 6+ years of PV Inverter data, Transform it and Load it (#ETL) into #DataFrames which I query for record dates, minimum and maximum output as well as daily average output. Keeping with my standard practice, I've put that code on &lt;a href="https://github.com/jmcp/solar-spark"&gt;GitHub&lt;/a&gt;, and written a blog post about the process. See more (much more!) at &lt;a href="https://www.jmcpdotcom.com/blog/posts/2019-10-11-apache-spark-init/"&gt;https://www.jmcpdotcom.com/blog/posts/2019-10-11-apache-spark-init/&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Apache #Spark, #ETL, #Python
&lt;/h1&gt;

</description>
      <category>apachespark</category>
      <category>etl</category>
      <category>python</category>
    </item>
    <item>
      <title>Sentiment Analysis of Australian political hashtags</title>
      <dc:creator>James McPherson</dc:creator>
      <pubDate>Mon, 14 Oct 2019 22:25:11 +0000</pubDate>
      <link>https://dev.to/jmcp/sentiment-analysis-of-australian-political-hashtags-456k</link>
      <guid>https://dev.to/jmcp/sentiment-analysis-of-australian-political-hashtags-456k</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--OOmWdV4Z--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/qkil6en1msxmeb75ilsh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--OOmWdV4Z--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/qkil6en1msxmeb75ilsh.png" alt="Sentiment Analysis"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's time for another post from my &lt;a href="https://www.jmcpdotcom.com/blog"&gt;home blog&lt;/a&gt;. This is another #Python #microservice, which ties together some #NaturalLanguageProcessing (using the &lt;a href="https://www.nltk.org"&gt;nltk toolkit&lt;/a&gt;, the @Twitter search API and #SentimentAnalysis with some basic #MachineLearning. To visualise all this I used the &lt;a href="https://c3js.org"&gt;C3.js&lt;/a&gt; charting library, and put together some more JavaScript for periodic (30 second) and flowing visuals.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.jmcpdotcom.com/blog/posts/2019-10-04-microservices-part-2/"&gt;https://www.jmcpdotcom.com/blog/posts/2019-10-04-microservices-part-2/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'm going to return to this project shortly, to update the tweet source corpus. The source I've used for this project is based on British tweets, and while Australia and Britain share many things, I'm not completely confident that our sarcasm and cynicism is reflected in the corpus.&lt;/p&gt;

</description>
      <category>sentimentanalysis</category>
      <category>python</category>
      <category>microservices</category>
    </item>
    <item>
      <title>A microservice making electorate info more accessible</title>
      <dc:creator>James McPherson</dc:creator>
      <pubDate>Fri, 04 Oct 2019 02:32:17 +0000</pubDate>
      <link>https://dev.to/jmcp/a-microservice-making-electorate-info-more-accessible-1d9k</link>
      <guid>https://dev.to/jmcp/a-microservice-making-electorate-info-more-accessible-1d9k</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F8yms4l8ktj52iyw40ixe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F8yms4l8ktj52iyw40ixe.png" alt="the results page"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Over at my &lt;a href="https://www.jmcpdotcom.com/blog" rel="noopener noreferrer"&gt;home blog&lt;/a&gt; I've written about a #Python #microservice I created to make the task of finding your Australian electorate more accessible. It covers both Federal divisions and state+territory lower house electorates. I also go into detail about the #ETL process I went through to get the data I needed.&lt;/p&gt;

&lt;p&gt;See more (a lot more!) at &lt;a href="https://www.jmcpdotcom.com/blog/posts/2019-09-27-microservices-part-1/" rel="noopener noreferrer"&gt;A microservice for electorate maps&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Flask @googlemaps #webscraping #BeautifulSoup #GeoJSON #KML
&lt;/h1&gt;

</description>
      <category>microservices</category>
      <category>python</category>
      <category>flask</category>
      <category>etl</category>
    </item>
  </channel>
</rss>
