<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: zchtodd</title>
    <description>The latest articles on DEV Community by zchtodd (@zchtodd).</description>
    <link>https://dev.to/zchtodd</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F225197%2F6fc0ce86-886b-4fdc-ad8d-85fe30f76b03.jpg</url>
      <title>DEV Community: zchtodd</title>
      <link>https://dev.to/zchtodd</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/zchtodd"/>
    <language>en</language>
    <item>
      <title>#SQL30 Day 16: Dev Badges</title>
      <dc:creator>zchtodd</dc:creator>
      <pubDate>Fri, 18 Oct 2019 12:10:53 +0000</pubDate>
      <link>https://dev.to/zchtodd/sql30-day-16-dev-badges-5hk4</link>
      <guid>https://dev.to/zchtodd/sql30-day-16-dev-badges-5hk4</guid>
      <description>&lt;p&gt;Welcome to the SQL showdown series!  &lt;/p&gt;

&lt;h3&gt;
  
  
  What is this and how does it work?
&lt;/h3&gt;

&lt;p&gt;I'm committing to publishing a SQL challenge every day for 30 days.  In each post I'll describe &lt;strong&gt;my&lt;/strong&gt; solution to the last day's challenge.  I'll follow that up with a full description of the challenge for the next day.&lt;/p&gt;

&lt;p&gt;Write your own solution in the comments!  Let's see who can come up with the most creative solutions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenge #16: Dev Badges
&lt;/h3&gt;

&lt;p&gt;If you post a lot on dev.to you might have received one of their nifty badges.  I just got my 4 week streak badge!  It wasn't long until I started thinking about how you could identify posting streaks in SQL.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;day16&lt;/strong&gt; schema contains a &lt;strong&gt;post&lt;/strong&gt; table with a handful of columns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;id&lt;/li&gt;
&lt;li&gt;user_id&lt;/li&gt;
&lt;li&gt;created_at&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The challenge is this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can you write a query that will return all users that are currently eligible for the 4 week streak badge?  (At least one post per week for the last 4 weeks.)&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Sandbox Connection Details
&lt;/h3&gt;

&lt;p&gt;I have a PostgreSQL database ready for you to play with.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--J-Td0C2W--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/zpoed03yjhvjosbii3z7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--J-Td0C2W--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/zpoed03yjhvjosbii3z7.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The password is the same as the username!  To query the &lt;strong&gt;post&lt;/strong&gt; table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day16&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h3&gt;
  
  
  Solution to Challenge #15: Fuzzy Matching
&lt;/h3&gt;

&lt;p&gt;Yesterday we dipped a toe into the world of fuzzy matching in databases.  As a reminder, we had two tables full of info about movies, but from independent sources.  We wanted to see what PostgreSQL and other databases offer when it comes to fuzzy matching.&lt;/p&gt;

&lt;p&gt;This was the challenge from yesterday:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What fuzzy matching methods can be used to join the movie datasets together on keys such as the director name?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We have the &lt;strong&gt;movie&lt;/strong&gt; and &lt;strong&gt;movie_plot&lt;/strong&gt; tables that should contain quite a bit of overlapping data.&lt;/p&gt;

&lt;p&gt;Let's try a naive approach first and join using strict equality:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;director&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day15&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;movie&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;d1&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;director&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day15&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;movie_plot&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;d2&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;d1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;director&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;d2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;director&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Here I join the distinct set of directors from both tables together using a straightforward equality comparison.  I get &lt;strong&gt;478&lt;/strong&gt; directors in common using this technique.  &lt;/p&gt;

&lt;p&gt;What techniques can we try to discover more matches?&lt;/p&gt;

&lt;h3&gt;
  
  
  Levenshtein (Edit Distance)
&lt;/h3&gt;

&lt;p&gt;Are we missing any potential matches due to small differences in spelling?  &lt;/p&gt;

&lt;p&gt;The Levenshtein distance between two strings defines the minimum number of insertions, substitutions, or deletions that would be required to make them equivalent.  The Levenshtein function is also sometimes referred to as edit distance, because although there are other variations on edit distance, it is the most commonly used.&lt;/p&gt;

&lt;p&gt;Before you can try out Levenshtein distance, however, you'll need to enable the &lt;strong&gt;fuzzystrmatch&lt;/strong&gt; extension in PostgreSQL.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;fuzzystrmatch&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Let's say we wanted to retry the query from above, but instead of requiring an exact match, we relax the comparison so that the strings can have an edit distance of 1.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;director&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day15&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;movie&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;d1&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;director&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day15&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;movie_plot&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;d2&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;levenshtein&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;director&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;director&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This time I come up with &lt;strong&gt;493&lt;/strong&gt; matches.  Here are some of the extra matches that turned up with an edit distance of 1.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--NG-ZwNBO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/2ddrru32uxsxfcshn9uy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--NG-ZwNBO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/2ddrru32uxsxfcshn9uy.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As a whole this seems to be a net improvement, but fuzzy matching is always a balancing act between improved accuracy and false positives.  For example, it seems likely that Darren Aronofosky and Darren Arronofsky are the same person, while David Dobkin and David Doblin are probably not.&lt;/p&gt;

&lt;p&gt;The usefulness of edit distance can depend on the data it is applied to.  We can see with names, for example, that a single edit can be very important.&lt;/p&gt;

&lt;h3&gt;
  
  
  Soundex and Metaphone
&lt;/h3&gt;

&lt;p&gt;Soundex belongs to a family of comparison functions that rely on the spoken sound of a word instead of comparing the characters themselves.  Soundex was created almost a century ago to help with the U.S. census, and as a result, it doesn't handle non-English names very well.&lt;/p&gt;

&lt;p&gt;The metaphone function is a more modern alternative.  Let's take a look at an example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt;
      &lt;span class="n"&gt;split_part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;director&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;split_part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;director&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;director&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt;
      &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt;
          &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;director&lt;/span&gt;
        &lt;span class="k"&gt;FROM&lt;/span&gt;
          &lt;span class="n"&gt;day15&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;movie&lt;/span&gt;
      &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;d1&lt;/span&gt;
  &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt;
      &lt;span class="n"&gt;split_part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;director&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;split_part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;director&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;director&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt;
      &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt;
          &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="n"&gt;director&lt;/span&gt;
        &lt;span class="k"&gt;FROM&lt;/span&gt;
          &lt;span class="n"&gt;day15&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;movie_plot&lt;/span&gt;
      &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;d2&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;dmetaphone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dmetaphone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;dmetaphone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dmetaphone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Most of the work here is involved in splitting up the director name so that I can run the metaphone function on first and last names separately.&lt;/p&gt;

&lt;p&gt;Here's what I get back with the above query:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--IJf1Fy3X--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/llq7qj2s408sholsrbdn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--IJf1Fy3X--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/llq7qj2s408sholsrbdn.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are some good matches here, like Andrey and Andrei Gorlenko, but there are also a few inexplicable false positives, like Anna Biller and Wayne Blair.&lt;/p&gt;

&lt;h3&gt;
  
  
  PG_TRGM
&lt;/h3&gt;

&lt;p&gt;PostgreSQL also supports &lt;a href="https://www.postgresql.org/docs/9.6/pgtrgm.html"&gt;trigram matching&lt;/a&gt; through the PG_TRGM extension.  Trigrams are a simple, yet powerful, concept that involves breaking strings down to sets of 3 consecutive letters.  The number of sets a pair of strings have in common determines their trigram score.&lt;/p&gt;

&lt;p&gt;Getting started is pretty easy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;pg_trgm&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The &lt;strong&gt;movie&lt;/strong&gt; table has a comma delimited list of actors.  Let's see if we can use trigrams to fuzzy match on a specific actor.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day15&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;movie&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="s1"&gt;'Patrick Stewart'&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;ANY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string_to_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actors&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;','&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The &lt;strong&gt;%&lt;/strong&gt; operator is used in place of the normal &lt;strong&gt;=&lt;/strong&gt; to apply trigram matching.  The &lt;strong&gt;ANY&lt;/strong&gt; keyword means the expression is considered true if there is a trigram match on any of the actors in that movie.&lt;/p&gt;

&lt;h3&gt;
  
  
  Good luck!
&lt;/h3&gt;

&lt;p&gt;Have fun, and I can't wait to see what you come up with! I'll be back tomorrow with a solution to this problem and a new challenge.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>postgres</category>
      <category>challenge</category>
    </item>
    <item>
      <title>#SQL30 Day 15: Fuzzy Matching</title>
      <dc:creator>zchtodd</dc:creator>
      <pubDate>Thu, 17 Oct 2019 12:25:30 +0000</pubDate>
      <link>https://dev.to/zchtodd/sql30-day-15-fuzzy-matching-2pm2</link>
      <guid>https://dev.to/zchtodd/sql30-day-15-fuzzy-matching-2pm2</guid>
      <description>&lt;p&gt;Welcome to the SQL showdown series!  &lt;/p&gt;

&lt;h3&gt;
  
  
  What is this and how does it work?
&lt;/h3&gt;

&lt;p&gt;I'm committing to publishing a SQL challenge every day for 30 days.  In each post I'll describe &lt;strong&gt;my&lt;/strong&gt; solution to the last day's challenge.  I'll follow that up with a full description of the challenge for the next day.&lt;/p&gt;

&lt;p&gt;Write your own solution in the comments!  Let's see who can come up with the most creative solutions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenge #15: Fuzzy Matching
&lt;/h3&gt;

&lt;p&gt;Today I'd like to tackle the subject of fuzzy matching in SQL.  Not every problem is as black and white as a simple equality check.  You might have used the SQL &lt;strong&gt;LIKE&lt;/strong&gt; operator before, but there are times when even that won't suffice.&lt;/p&gt;

&lt;p&gt;Imagine you have two totally separate data-sets that represent the same entities, and you want to join them together.  Because they come from different sources, they don't share any kind of reliable key.  You may need to join them using names, addresses, or other such "noisy" information.&lt;/p&gt;

&lt;p&gt;For this challenge, I've populated two tables with data about movies.  Both tables contain movie titles, director names, and names from the cast.&lt;/p&gt;

&lt;p&gt;Here's the challenge itself:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What fuzzy matching methods can be used to join the movie datasets together on keys such as the director name?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I won't show a desired output for this challenge because it's more of an open-ended invitation to experiment.  Tomorrow I'll show a number of ways you can fuzzy match in PostgreSQL.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sandbox Connection Details
&lt;/h3&gt;

&lt;p&gt;I have a PostgreSQL database ready for you to play with.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--J-Td0C2W--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/zpoed03yjhvjosbii3z7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--J-Td0C2W--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/zpoed03yjhvjosbii3z7.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The password is the same as the username!  To query the &lt;strong&gt;movie&lt;/strong&gt; table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day15&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;movie&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h3&gt;
  
  
  Solution to Challenge #14: UFO Sightings
&lt;/h3&gt;

&lt;p&gt;If you'll recall yesterday's challenge, we were trying to decipher any patterns that might exist when analyzing UFO sightings over time.&lt;/p&gt;

&lt;p&gt;This was the challenge from yesterday:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Given UFO sightings from 1906 to present day, can you produce a monthly count of sightings by shape?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;And as a reminder, this was the format of the output we were after:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--DNCynQ0O--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/oj3uhddxoeka3dkkhbjz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--DNCynQ0O--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/oj3uhddxoeka3dkkhbjz.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It seems like July and August are a busy time for UFO sightings!&lt;/p&gt;

&lt;p&gt;Here's the SQL that produces the above:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;sighting_m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;"Jan"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;sighting_m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;"Feb"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;sighting_m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;"Mar"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;sighting_m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;"Apr"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;sighting_m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;"May"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;sighting_m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;"Jun"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;sighting_m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;"Jul"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;sighting_m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;"Aug"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;sighting_m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;"Sep"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;sighting_m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;"Oct"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;sighting_m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;11&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;"Nov"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;sighting_m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;"Dec"&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;date_sighted&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="k"&gt;EXTRACT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;MONTH&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;date_sighted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;sighting_m&lt;/span&gt; 
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day14&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ufo&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;shape&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;shape&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day14&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ufo&lt;/span&gt; &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;shape&lt;/span&gt; &lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;3000&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;sightings&lt;/span&gt; &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The main trick here, similar to the solution found in &lt;a href="https://dev.to/zchtodd/sql30-day-5-hierarchies-4eg"&gt;day 5&lt;/a&gt;, is to turn rows into columns using &lt;strong&gt;CASE WHEN&lt;/strong&gt; statements, so that I can show the months across the top.&lt;/p&gt;

&lt;p&gt;Some databases, such as SQL Server, provide a built-in &lt;strong&gt;PIVOT&lt;/strong&gt; function that makes this a bit easier (or at least less verbose).  In PostgreSQL, we'll have to settle for the &lt;strong&gt;CASE WHEN&lt;/strong&gt; trick.  &lt;/p&gt;

&lt;p&gt;Each month has a &lt;strong&gt;sum&lt;/strong&gt; with a conditional inside of it that produces a 1 only if the month of the input row matches.  Whenever the row doesn't match, the output of the conditional is NULL, which is evaluated as zero in the sum.&lt;/p&gt;

&lt;p&gt;Selecting from a subquery instead of directly from the UFO table is optional, and I do that only so I can avoid repeating the month extraction over and over in the &lt;strong&gt;CASE WHEN&lt;/strong&gt; statements.  Likewise, the &lt;strong&gt;WHERE&lt;/strong&gt; clause is optional too, and just serves to eliminate a lot of noisy one-off shapes from the data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Good luck!
&lt;/h3&gt;

&lt;p&gt;Have fun, and I can't wait to see what you come up with! I'll be back tomorrow with a solution to this problem and a new challenge.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>postgres</category>
      <category>challenge</category>
    </item>
    <item>
      <title>#SQL30 Day 14: UFO Sightings</title>
      <dc:creator>zchtodd</dc:creator>
      <pubDate>Wed, 16 Oct 2019 11:06:03 +0000</pubDate>
      <link>https://dev.to/zchtodd/sql30-day-14-ufo-sightings-3kci</link>
      <guid>https://dev.to/zchtodd/sql30-day-14-ufo-sightings-3kci</guid>
      <description>&lt;p&gt;Welcome to the SQL showdown series!  &lt;/p&gt;

&lt;h3&gt;
  
  
  What is this and how does it work?
&lt;/h3&gt;

&lt;p&gt;I'm committing to publishing a SQL challenge every day for 30 days.  In each post I'll describe &lt;strong&gt;my&lt;/strong&gt; solution to the last day's challenge.  I'll follow that up with a full description of the challenge for the next day.&lt;/p&gt;

&lt;p&gt;Write your own solution in the comments!  Let's see who can come up with the most creative solutions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenge #14: UFO Sightings
&lt;/h3&gt;

&lt;p&gt;For a bit of fun I decided to load an &lt;a href="https://www.kaggle.com/NUFORC/ufo-sightings#scrubbed.csv"&gt;open data-set of UFO sightings&lt;/a&gt; into PostgreSQL.  Each row represents a reported sighting, including the date, shape of the object, and the duration of the sighting.&lt;/p&gt;

&lt;p&gt;Let's see if we can deduce some kind of seasonal pattern to UFO sightings.  Here's the challenge itself: &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Given UFO sightings from 1906 to present day, can you produce a monthly count of sightings by shape?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's an example of the output we should get:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4vfYrzg6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/ozc1hfrsnr30yoszng6c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4vfYrzg6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/ozc1hfrsnr30yoszng6c.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Sandbox Connection Details
&lt;/h3&gt;

&lt;p&gt;I have a PostgreSQL database ready for you to play with.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--J-Td0C2W--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/zpoed03yjhvjosbii3z7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--J-Td0C2W--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/zpoed03yjhvjosbii3z7.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The password is the same as the username!  To query the &lt;strong&gt;ufo&lt;/strong&gt; table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day14&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ufo&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h3&gt;
  
  
  Solution to Challenge #13: Permutations
&lt;/h3&gt;

&lt;p&gt;This was the challenge from yesterday:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can you generate all of the permutations of a given set in SQL?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Specifically, we wanted all of the permutations from the set {1, 2, 3}.  My first thought here was to use a &lt;strong&gt;CROSS JOIN&lt;/strong&gt;, as that kind of join will produce every possible pairing between two tables.  &lt;/p&gt;

&lt;p&gt;That's not quite what we want, however, because we'd like each item in the permutation to be distinct.  In other words, {1, 1, 1} is not a valid permutation.  Luckily, that's not too hard to enforce by means of a &lt;strong&gt;WHERE&lt;/strong&gt; clause.&lt;/p&gt;

&lt;p&gt;Let's take a look.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;v1&lt;/span&gt; 
&lt;span class="k"&gt;CROSS&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;v2&lt;/span&gt;
&lt;span class="k"&gt;CROSS&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;v3&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; 
&lt;span class="n"&gt;v1&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt;
&lt;span class="n"&gt;v2&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt;
&lt;span class="n"&gt;v3&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The &lt;strong&gt;WHERE&lt;/strong&gt; clause checks that each element of the permutation does not equal any other element, thus ensuring that there's no repetition.&lt;/p&gt;

&lt;p&gt;You would need to add more joins to account for a larger set.  Can anyone think of an approach that works no matter the size of the input set?&lt;/p&gt;

&lt;h3&gt;
  
  
  Good luck!
&lt;/h3&gt;

&lt;p&gt;Have fun, and I can't wait to see what you come up with! I'll be back tomorrow with a solution to this problem and a new challenge.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>postgres</category>
      <category>challenge</category>
    </item>
    <item>
      <title>#SQL30 Day 13: Permutations</title>
      <dc:creator>zchtodd</dc:creator>
      <pubDate>Tue, 15 Oct 2019 10:45:19 +0000</pubDate>
      <link>https://dev.to/zchtodd/sql30-day-13-permutations-1o9k</link>
      <guid>https://dev.to/zchtodd/sql30-day-13-permutations-1o9k</guid>
      <description>&lt;p&gt;Welcome to the SQL showdown series!  &lt;/p&gt;

&lt;h3&gt;
  
  
  What is this and how does it work?
&lt;/h3&gt;

&lt;p&gt;I'm committing to publishing a SQL challenge every day for 30 days.  In each post I'll describe &lt;strong&gt;my&lt;/strong&gt; solution to the last day's challenge.  I'll follow that up with a full description of the challenge for the next day.&lt;/p&gt;

&lt;p&gt;Write your own solution in the comments!  Let's see who can come up with the most creative solutions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenge #13: Permutations
&lt;/h3&gt;

&lt;p&gt;This is a seemingly simple, yet somewhat tricky challenge for today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can you generate all of the permutations of a given set in SQL?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There is no table available on the sandbox server for today's challenge, but the source data is easy to create using something like the &lt;a href="https://www.citusdata.com/blog/2018/03/14/fun-with-sql-generate-sql/"&gt;generate_series&lt;/a&gt; function in PostgreSQL.&lt;/p&gt;

&lt;p&gt;For this challenge I'm using the set {1, 2, 3}.  Here's an example of the output we should get:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--76bm216n--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/4s82y6dkmfgce7xf7fvt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--76bm216n--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/4s82y6dkmfgce7xf7fvt.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Solution to Challenge #12: Greatest Discount
&lt;/h3&gt;

&lt;p&gt;The key trick to solving yesterday's challenge is to think in terms of events, rather than ranges of times.  In other words, a discount starting is an event, and a discount ending is an event.  When a discount starts the overall cumulative rate rises, and conversely, the rate falls when a discount ends.&lt;/p&gt;

&lt;p&gt;To break down a range into separate events we can use &lt;strong&gt;UNION ALL&lt;/strong&gt; to combine queries that select the start and end dates.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;start_date&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rate&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day12&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;discount&lt;/span&gt;
&lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;end_date&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;rate&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day12&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;discount&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Negating the rate in the second query will allow us to do a running total on the rates, thus giving the cumulative rate at each point in time.  &lt;/p&gt;

&lt;p&gt;Whenever a phrase like "running total" or "rolling average" appears, it's a safe bet that &lt;a href="https://dev.to/helenanders26/sql-301-why-you-need-sql-window-functions-part-1-6e1"&gt;window functions&lt;/a&gt; are part of the solution.  Let's look at how we can combine the first query with a window function to find the cumulative discount rate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;cumulative_disc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;start_date&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rate&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day12&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;discount&lt;/span&gt;
  &lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;end_date&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;rate&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day12&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;discount&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dts&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;cumulative_disc&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The &lt;strong&gt;ORDER BY&lt;/strong&gt; within the &lt;strong&gt;OVER&lt;/strong&gt; clause limits the sum to the current row and all rows prior to it (sorted by date).  From there it's just a matter of sorting the entire query to get the greatest possible discount.&lt;/p&gt;

&lt;h3&gt;
  
  
  Good luck!
&lt;/h3&gt;

&lt;p&gt;Have fun, and I can't wait to see what you come up with! I'll be back tomorrow with a solution to this problem and a new challenge.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>postgres</category>
      <category>challenge</category>
    </item>
    <item>
      <title>#SQL30 Day 12: Greatest Discount</title>
      <dc:creator>zchtodd</dc:creator>
      <pubDate>Mon, 14 Oct 2019 11:42:54 +0000</pubDate>
      <link>https://dev.to/zchtodd/sql30-day-12-greatest-discount-296e</link>
      <guid>https://dev.to/zchtodd/sql30-day-12-greatest-discount-296e</guid>
      <description>&lt;p&gt;Welcome to the SQL showdown series!  &lt;/p&gt;

&lt;h3&gt;
  
  
  What is this and how does it work?
&lt;/h3&gt;

&lt;p&gt;I'm committing to publishing a SQL challenge every day for 30 days.  In each post I'll describe &lt;strong&gt;my&lt;/strong&gt; solution to the last day's challenge.  I'll follow that up with a full description of the challenge for the next day.&lt;/p&gt;

&lt;p&gt;Write your own solution in the comments!  Let's see who can come up with the most creative solutions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenge #12: Greatest Discount
&lt;/h3&gt;

&lt;p&gt;For today's challenge I'll pose a problem that involves finding date overlaps in the data.  Here's the challenge itself:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Suppose you have a table of discount rates, each of which is only valid for a limited time, but discounts are cumulative.  Can you find the greatest possible cumulative discount rate?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's an example of the data we have to play with:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ncPbRikP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/8pdqt6rtxcd976dbxzmh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ncPbRikP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/8pdqt6rtxcd976dbxzmh.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The greatest cumulative discount, given the data above, would be &lt;strong&gt;0.35&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sandbox Connection Details
&lt;/h3&gt;

&lt;p&gt;I have a PostgreSQL database ready for you to play with.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--J-Td0C2W--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/zpoed03yjhvjosbii3z7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--J-Td0C2W--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/zpoed03yjhvjosbii3z7.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The password is the same as the username!  To query the &lt;strong&gt;discount&lt;/strong&gt; table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day12&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;discount&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h3&gt;
  
  
  Solution to Challenge #11: Soft Deletes
&lt;/h3&gt;

&lt;p&gt;If you've read yesterday's challenge, you'll know I opted to change things up a bit, and instead of asking for an answer in the form of a query, I posed a more open-ended design question.&lt;/p&gt;

&lt;p&gt;The challenge focused on how we could implement the &lt;strong&gt;soft delete&lt;/strong&gt; pattern in a database, while asking a number of questions about possible issues that might affect the design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;What happens to related child data when a parent is soft deleted?  If a user deletes an "account" record, what happens to all of the related "order" records?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The entire application is now responsible for making sure that queries filter out all "deleted" records.  Write enough queries, and sooner or later someone will forget to filter out the deleted records.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What affect will deleted records have on constraints that might be in place?  If I have a unique constraint on a field called &lt;strong&gt;name&lt;/strong&gt;, should users be able to create new records that re-use an existing name on a soft-deleted record?&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As I thought through this problem, I realized that there are many complications that come with adding soft deletes.  In fact, if you &lt;a href="https://stackoverflow.com/questions/2549839/are-soft-deletes-a-good-idea"&gt;look at the most popular StackOverflow answer&lt;/a&gt; regarding soft deletes, quite a few people advocate avoiding this pattern if at all possible.&lt;/p&gt;

&lt;p&gt;Assuming we believe the complexity is worthwhile, how can we implement soft deletes in a way that doesn't add a great deal of bug-prone overhead?&lt;/p&gt;

&lt;p&gt;Let's talk through some possible solutions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adding a deleted_on column
&lt;/h3&gt;

&lt;p&gt;In this scenario a &lt;strong&gt;deleted_on&lt;/strong&gt; date-time column would be added to any table that allows soft deletes.  A null value indicates a currently active record.&lt;/p&gt;

&lt;p&gt;The pros:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Solution is easy to implement.&lt;/li&gt;
&lt;li&gt;Unique constraints can still function if the database supports &lt;a href="https://www.postgresql.org/docs/current/indexes-partial.html"&gt;partial indexes&lt;/a&gt;.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The cons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Querying the table becomes more complex.  Every query needs an additional criteria, or a view must be created to handle the filtering.  This may not be an attractive solution because ORMs don't tend to have great support for creating, maintaining, or accessing data through views.&lt;/li&gt;
&lt;li&gt;The behavior of relationships between tables is not very well defined.  Foreign keys lose meaning if deleting a record does not really mean deleting it.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Archive tables
&lt;/h3&gt;

&lt;p&gt;Instead of adding a deleted flag or timestamp, we can introduce archive tables that are (nearly) identical to the original version.  Any deleted records are first inserted into an associated archive table prior to removal from the original.&lt;/p&gt;

&lt;p&gt;Archive tables should have the same columns as their associated original, but would not have unique constraints.  Copying records that are about to be deleted would need to be done in application code, or in the database itself via a trigger.&lt;/p&gt;

&lt;p&gt;The pros:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Queries can operate without knowledge of soft deletes.
&lt;/li&gt;
&lt;li&gt;Foreign keys function as they normally would.  Any cascading deletes or updates result in the expected behavior.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The cons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Depending on how many tables need the soft delete pattern, the size of the schema can balloon.  Maintaining the archive tables as schema migrations occur would be painful to do by hand.  Some databases, such as PostgreSQL, support &lt;a href="https://www.postgresql.org/docs/10/tutorial-inheritance.html"&gt;table inheritance&lt;/a&gt;, which could make maintenance easier.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Good luck!
&lt;/h3&gt;

&lt;p&gt;Have fun, and I can't wait to see what you come up with! I'll be back tomorrow with a solution to this problem and a new challenge.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>postgres</category>
      <category>challenge</category>
    </item>
    <item>
      <title>#SQL30 Day 11: Soft Deletes</title>
      <dc:creator>zchtodd</dc:creator>
      <pubDate>Sun, 13 Oct 2019 12:41:21 +0000</pubDate>
      <link>https://dev.to/zchtodd/sql30-day-11-soft-deletes-di5</link>
      <guid>https://dev.to/zchtodd/sql30-day-11-soft-deletes-di5</guid>
      <description>&lt;p&gt;Welcome to the SQL showdown series!  &lt;/p&gt;

&lt;h3&gt;
  
  
  What is this and how does it work?
&lt;/h3&gt;

&lt;p&gt;I'm committing to publishing a SQL challenge every day for 30 days.  In each post I'll describe &lt;strong&gt;my&lt;/strong&gt; solution to the last day's challenge.  I'll follow that up with a full description of the challenge for the next day.&lt;/p&gt;

&lt;p&gt;Write your own solution in the comments!  Let's see who can come up with the most creative solutions.&lt;/p&gt;

&lt;p&gt;I'll add connection details for a PostgreSQL database containing test data for the challenge.  Solutions are by no means limited to PostgreSQL, but it's there if you want an easy way to test your query!&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenge #11: Soft Deletes
&lt;/h3&gt;

&lt;p&gt;I'm changing things up a bit for this challenge and focusing more on database design instead of a specific query.  Instead of posing a single challenge, I'll present a series of questions about a certain database design topic.  &lt;/p&gt;

&lt;p&gt;I hope you'll chime in!  I have &lt;strong&gt;an&lt;/strong&gt; answer to these questions, but I don't pretend to have the best answers.&lt;/p&gt;

&lt;p&gt;This design challenge centers around the idea of &lt;strong&gt;soft deletes&lt;/strong&gt;.  Normally, when data is deleted in a database, it's irrevocably gone.  It's not uncommon, however, that you might want to preserve a record while hiding it in your user interface.&lt;/p&gt;

&lt;p&gt;A simple way to go about this is to add a &lt;strong&gt;deleted&lt;/strong&gt; boolean field to your table.  In some cases, it might really be just that simple, and there's not much else to worry about.&lt;/p&gt;

&lt;p&gt;But here are some questions about this design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;What happens to related child data when a parent is soft deleted?  If a user deletes an "account" record, what happens to all of the related "order" records?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The entire application is now responsible for making sure that queries filter out all "deleted" records.  Write enough queries, and sooner or later someone will forget to filter out the deleted records.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What affect will deleted records have on constraints that might be in place?  If I have a unique constraint on a field called &lt;strong&gt;name&lt;/strong&gt;, should users be able to create new records that re-use an existing name on a soft-deleted record?&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Solution to Challenge #10: Random Sampling
&lt;/h3&gt;

&lt;p&gt;This is the question we were trying to answer with yesterday's SQL challenge:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can you write a query to efficiently return a random sample of records from a table?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before going any further, let's first look at the simplest possible way to achieve this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day10&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;measurement&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;(The above works in PostgreSQL and MySQL, but in SQL Server you'll need to use the &lt;strong&gt;NEWID()&lt;/strong&gt; function.)&lt;/p&gt;

&lt;p&gt;If your table is of a reasonable size, you probably don't need to overthink this any further.  But if you're dealing with millions of rows, this approach can become quite slow.&lt;/p&gt;

&lt;p&gt;Why is that?  This query will need to scan through every row in the table and generate a random number.  Once every row has a random number, the entire table will need to be sorted on that value.  If the table is big enough, that sort will most likely spill over into a temporary table stored on disk.&lt;/p&gt;

&lt;p&gt;A more efficient solution would be to generate N random numbers first, and then select rows out of your table by ID.  The random numbers could be generated outside of the database, and then put into a select like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day10&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;measurement&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This of course has some pitfalls, including the fact that your primary key could have gaps if you've deleted any rows.  &lt;/p&gt;

&lt;h3&gt;
  
  
  More resources
&lt;/h3&gt;

&lt;p&gt;There is a lot of discussion online about selecting random rows.  Specifically for PostgreSQL, &lt;a href="https://stackoverflow.com/questions/8674718/best-way-to-select-random-rows-postgresql"&gt;there is a fascinating answer&lt;/a&gt; on StackOverflow that delves into some very advanced ways to fetch random records. &lt;/p&gt;

&lt;h3&gt;
  
  
  Good luck!
&lt;/h3&gt;

&lt;p&gt;Have fun, and I can't wait to see what you come up with!  I'll be back tomorrow with a solution to this problem and a new challenge.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>postgres</category>
      <category>challenge</category>
    </item>
    <item>
      <title>#SQL30 Day 10: Random Sampling</title>
      <dc:creator>zchtodd</dc:creator>
      <pubDate>Sat, 12 Oct 2019 11:17:14 +0000</pubDate>
      <link>https://dev.to/zchtodd/sql30-day-10-random-sampling-556f</link>
      <guid>https://dev.to/zchtodd/sql30-day-10-random-sampling-556f</guid>
      <description>&lt;p&gt;Welcome to the SQL showdown series!  &lt;/p&gt;

&lt;h3&gt;
  
  
  What is this and how does it work?
&lt;/h3&gt;

&lt;p&gt;I'm committing to publishing a SQL challenge every day for 30 days.  In each post I'll describe &lt;strong&gt;my&lt;/strong&gt; solution to the last day's challenge.  I'll follow that up with a full description of the challenge for the next day.&lt;/p&gt;

&lt;p&gt;Write your own solution in the comments!  Let's see who can come up with the most creative solutions.&lt;/p&gt;

&lt;p&gt;I'll add connection details for a PostgreSQL database containing test data for the challenge.  Solutions are by no means limited to PostgreSQL, but it's there if you want an easy way to test your query!&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenge #10: Random Sampling
&lt;/h3&gt;

&lt;p&gt;For today's challenge I'll delve into a topic that is simple on the surface, but also has a surprising amount of subtlety.&lt;/p&gt;

&lt;p&gt;Here's the challenge itself:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can you write a query to efficiently return a random sample of records from a table?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We'll re-visit the &lt;strong&gt;measurement&lt;/strong&gt; table from the day 3 challenge.  Obviously, your results will vary from mine, but the goal is to return a random sampling of records from the table.&lt;/p&gt;

&lt;p&gt;Here's what my output looks like when grabbing 10 random records:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Ft92l4bv5jb8b9pufpao1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Ft92l4bv5jb8b9pufpao1.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Sandbox Connection Details
&lt;/h3&gt;

&lt;p&gt;I have a PostgreSQL database ready for you to play with.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fzpoed03yjhvjosbii3z7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fzpoed03yjhvjosbii3z7.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The password is the same as the username!  To query the &lt;strong&gt;employee&lt;/strong&gt; table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day10&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;measurement&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Solution to Challenge #7: Grouping Sets
&lt;/h3&gt;

&lt;p&gt;This is the question we were trying to answer with yesterday's SQL challenge:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Given supermarket sales history, can you display sales not only by location and category, but also at the month and yearly level?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Remember, in this challenge we wanted the output to appear something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fwk41v9vcstmzp3i3sqpi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fwk41v9vcstmzp3i3sqpi.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That is, we wanted sales summed up to the month, but with a yearly subtotal for each location and category.  &lt;/p&gt;

&lt;p&gt;Not long ago, I probably would have reached for Python to do this kind of multi-level aggregation, but it turns out there's a perfectly good way to do this in SQL.&lt;/p&gt;

&lt;p&gt;Group by, apparently, has a hidden super power called grouping sets.  &lt;/p&gt;

&lt;p&gt;Grouping sets allow you to group at several different levels.  Let's take a look at the query that produced the output above.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;   &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;Date_part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'month'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sale_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;monthly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;Date_part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'year'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sale_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;yearly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="k"&gt;Sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt;     &lt;span class="n"&gt;day7&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;supermarket&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; 
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;grouping&lt;/span&gt; &lt;span class="k"&gt;sets&lt;/span&gt; &lt;span class="p"&gt;(((&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;product_line&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;yearly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;monthly&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;product_line&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;yearly&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;product_line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;monthly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;yearly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see in the output, any row produced by a set missing columns (in this case month), will have a null value for that column.&lt;/p&gt;

&lt;p&gt;SQL also has a few shorthand notations, such as &lt;strong&gt;ROLLUP&lt;/strong&gt; and &lt;strong&gt;CUBE&lt;/strong&gt;, that produce commonly needed grouping sets.&lt;/p&gt;

&lt;p&gt;I could change the above query to use &lt;strong&gt;ROLLUP&lt;/strong&gt; like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;   &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;Date_part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'month'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sale_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;monthly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;Date_part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'year'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sale_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;yearly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="k"&gt;Sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt;     &lt;span class="n"&gt;day7&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;supermarket&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; 
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;ROLLUP&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;product_line&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;yearly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;monthly&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;product_line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;monthly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;yearly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This would actually be equivalent to the following sets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;((city, product_line), yearly, monthly): Sales broken down to the month. &lt;/li&gt;
&lt;li&gt;((city, product_line), yearly)): Sales broken down to the year.&lt;/li&gt;
&lt;li&gt;((city, product_line)): All sales for each city/product line.&lt;/li&gt;
&lt;li&gt;(): All sales, for all time, for all city/product combinations.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Good luck!
&lt;/h3&gt;

&lt;p&gt;Have fun, and I can't wait to see what you come up with!  I'll be back tomorrow with a solution to this problem and a new challenge.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>postgres</category>
      <category>challenge</category>
    </item>
    <item>
      <title>#SQL30 Day 9: Longest Span</title>
      <dc:creator>zchtodd</dc:creator>
      <pubDate>Fri, 11 Oct 2019 11:26:32 +0000</pubDate>
      <link>https://dev.to/zchtodd/sql30-day-9-longest-span-59am</link>
      <guid>https://dev.to/zchtodd/sql30-day-9-longest-span-59am</guid>
      <description>&lt;p&gt;Welcome to the SQL showdown series!&lt;/p&gt;

&lt;h3&gt;
  
  
  What is this and how does it work?
&lt;/h3&gt;

&lt;p&gt;I'm committing to publishing a SQL challenge every day for 30 days. In each post I'll describe my solution to the last day's challenge. I'll follow that up with a full description of the challenge for the next day.&lt;/p&gt;

&lt;p&gt;Write your own solution in the comments! Let's see who can come up with the most creative solutions.&lt;/p&gt;

&lt;p&gt;I'll add connection details for a PostgreSQL database containing test data for the challenge. Solutions are by no means limited to PostgreSQL, but it's there if you want an easy way to test your query!&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenge #9: Longest Span
&lt;/h3&gt;

&lt;p&gt;For this challenge we'll revisit the stock market data from #2, but this time we'll try something a bit more difficult.  &lt;/p&gt;

&lt;p&gt;Here's the challenge:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Given a table of daily trading data, can you find the longest stretch of consecutive days during which the stock price rose?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's an example output showing roughly what you're looking for.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F76qu8i09qa7a77u1fkqn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F76qu8i09qa7a77u1fkqn.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Sandbox Connection Details
&lt;/h3&gt;

&lt;p&gt;I have a PostgreSQL database ready for you to play with.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fzpoed03yjhvjosbii3z7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fzpoed03yjhvjosbii3z7.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The password is the same as the username!  To query the &lt;strong&gt;stockprice&lt;/strong&gt; table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day9&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stockprice&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Solution to Challenge #8: Duplicates
&lt;/h3&gt;

&lt;p&gt;This is the question we were trying to answer with yesterday's SQL challenge:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can you write a select query that could be used in a delete statement to make the rows of a table unique according to a natural key?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The solution to this question proved to be trickier than I would have thought.  If we wanted to delete &lt;strong&gt;all&lt;/strong&gt; instances of any employee that had duplicates, it would be fairly easy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day8&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;employee&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day8&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;employee&lt;/span&gt; &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;HAVING&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Unfortunately, the above query will remove any employee with duplicates entirely from the table.  We want exactly one reference to each employee.&lt;/p&gt;

&lt;p&gt;One means of doing this would be to group the employees by name, and then assign a rank to each row in the group.  We could then delete all rows from each group that aren't the first occurrence.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;employee_rnk&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;row_number&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;rnk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day8&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;employee&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day8&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;employee&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;employee_rnk&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rnk&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://www.postgresql.org/docs/current/functions-window.html" rel="noopener noreferrer"&gt;row_number()&lt;/a&gt; window function will assign an increasing integer value to each row of a partition.  &lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;USING&lt;/strong&gt; clause in a delete statement acts essentially the same as having multiple tables listed in the &lt;strong&gt;FROM&lt;/strong&gt; part of a query.&lt;/p&gt;

&lt;p&gt;Unlike SQL Server, you can't perform a join inside of a delete in PostgreSQL, but oddly enough, you can put tables together by combining &lt;strong&gt;USING&lt;/strong&gt; and a &lt;strong&gt;WHERE&lt;/strong&gt; clause.  Once the table and the CTE are joined together, it's just a matter of filtering down to where the row number is not equal to 1, indicating a duplicate to be deleted.&lt;/p&gt;

&lt;h3&gt;
  
  
  Good luck!
&lt;/h3&gt;

&lt;p&gt;Have fun, and I can't wait to see what you come up with! I'll be back tomorrow with a solution to this problem and a new challenge.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>postgres</category>
      <category>challenge</category>
    </item>
    <item>
      <title>#SQL30 Day 8: Deleting Duplicates</title>
      <dc:creator>zchtodd</dc:creator>
      <pubDate>Thu, 10 Oct 2019 11:09:40 +0000</pubDate>
      <link>https://dev.to/zchtodd/sql30-day-8-deleting-duplicates-5fg7</link>
      <guid>https://dev.to/zchtodd/sql30-day-8-deleting-duplicates-5fg7</guid>
      <description>&lt;p&gt;Welcome to the SQL showdown series!  &lt;/p&gt;

&lt;h3&gt;
  
  
  What is this and how does it work?
&lt;/h3&gt;

&lt;p&gt;I'm committing to publishing a SQL challenge every day for 30 days.  In each post I'll describe &lt;strong&gt;my&lt;/strong&gt; solution to the last day's challenge.  I'll follow that up with a full description of the challenge for the next day.&lt;/p&gt;

&lt;p&gt;Write your own solution in the comments!  Let's see who can come up with the most creative solutions.&lt;/p&gt;

&lt;p&gt;I'll add connection details for a PostgreSQL database containing test data for the challenge.  Solutions are by no means limited to PostgreSQL, but it's there if you want an easy way to test your query!&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenge #8: Deleting Duplicates
&lt;/h3&gt;

&lt;p&gt;Today I'd like to pose a challenge focused on detecting duplicates within a table.  The ultimate goal is to delete all duplicates from the table, while preserving one row from each group, such that we're left only with the unique values.&lt;/p&gt;

&lt;p&gt;Deletes aren't allowed on the sandbox server, so I'll stop short of asking for the actual delete query, and instead ask only for the select query that could be passed to the delete.&lt;/p&gt;

&lt;p&gt;Here's the challenge itself:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can you write a select query that could be used in a delete statement to make the rows of a table unique according to a natural key?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We'll re-visit the &lt;strong&gt;employee&lt;/strong&gt; table from day 5 for this challenge.  Here's a sample from the table.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fp3r06kav4q4rrwa2xcgw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fp3r06kav4q4rrwa2xcgw.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can see that Hans Volkman is repeated 3 times.  We'd like to end up with only one Hans Volkman after running the delete statement.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sandbox Connection Details
&lt;/h3&gt;

&lt;p&gt;I have a PostgreSQL database ready for you to play with.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fzpoed03yjhvjosbii3z7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fzpoed03yjhvjosbii3z7.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The password is the same as the username!  To query the &lt;strong&gt;employee&lt;/strong&gt; table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day8&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;employee&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Solution to Challenge #7: Grouping Sets
&lt;/h3&gt;

&lt;p&gt;This is the question we were trying to answer with yesterday's SQL challenge:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Given supermarket sales history, can you display sales not only by location and category, but also at the month and yearly level?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Remember, in this challenge we wanted the output to appear something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fwk41v9vcstmzp3i3sqpi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fwk41v9vcstmzp3i3sqpi.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That is, we wanted sales summed up to the month, but with a yearly subtotal for each location and category.  &lt;/p&gt;

&lt;p&gt;Not long ago, I probably would have reached for Python to do this kind of multi-level aggregation, but it turns out there's a perfectly good way to do this in SQL.&lt;/p&gt;

&lt;p&gt;Group by, apparently, has a hidden super power called grouping sets.  &lt;/p&gt;

&lt;p&gt;Grouping sets allow you to group at several different levels.  Let's take a look at the query that produced the output above.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;   &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;Date_part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'month'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sale_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;monthly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;Date_part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'year'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sale_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;yearly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="k"&gt;Sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt;     &lt;span class="n"&gt;day7&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;supermarket&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; 
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;grouping&lt;/span&gt; &lt;span class="k"&gt;sets&lt;/span&gt; &lt;span class="p"&gt;(((&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;product_line&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;yearly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;monthly&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;product_line&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;yearly&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;product_line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;monthly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;yearly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see in the output, any row produced by a set missing columns (in this case month), will have a null value for that column.&lt;/p&gt;

&lt;p&gt;SQL also has a few shorthand notations, such as &lt;strong&gt;ROLLUP&lt;/strong&gt; and &lt;strong&gt;CUBE&lt;/strong&gt;, that produce commonly needed grouping sets.&lt;/p&gt;

&lt;p&gt;I could change the above query to use &lt;strong&gt;ROLLUP&lt;/strong&gt; like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;   &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;Date_part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'month'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sale_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;monthly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;Date_part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'year'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sale_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;yearly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="k"&gt;Sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt;     &lt;span class="n"&gt;day7&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;supermarket&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; 
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;ROLLUP&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;product_line&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;yearly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;monthly&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;product_line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;monthly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="n"&gt;yearly&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This would actually be equivalent to the following sets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;((city, product_line), yearly, monthly): Sales broken down to the month. &lt;/li&gt;
&lt;li&gt;((city, product_line), yearly)): Sales broken down to the year.&lt;/li&gt;
&lt;li&gt;((city, product_line)): All sales for each city/product line.&lt;/li&gt;
&lt;li&gt;(): All sales, for all time, for all city/product combinations.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Good luck!
&lt;/h3&gt;

&lt;p&gt;Have fun, and I can't wait to see what you come up with!  I'll be back tomorrow with a solution to this problem and a new challenge.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>postgres</category>
      <category>challenge</category>
    </item>
    <item>
      <title>#SQL30 Day 7: Grouping Sets</title>
      <dc:creator>zchtodd</dc:creator>
      <pubDate>Wed, 09 Oct 2019 10:39:25 +0000</pubDate>
      <link>https://dev.to/zchtodd/sql30-day-7-grouping-sets-2co0</link>
      <guid>https://dev.to/zchtodd/sql30-day-7-grouping-sets-2co0</guid>
      <description>&lt;p&gt;Welcome to the SQL showdown series!  &lt;/p&gt;

&lt;h3&gt;
  
  
  What is this and how does it work?
&lt;/h3&gt;

&lt;p&gt;I'm committing to publishing a SQL challenge every day for 30 days.  In each post I'll describe &lt;strong&gt;my&lt;/strong&gt; solution to the last day's challenge.  I'll follow that up with a full description of the challenge for the next day.&lt;/p&gt;

&lt;p&gt;Write your own solution in the comments!  Let's see who can come up with the most creative solutions.&lt;/p&gt;

&lt;p&gt;I'll add connection details for a PostgreSQL database containing test data for the challenge.  Solutions are by no means limited to PostgreSQL, but it's there if you want an easy way to test your query!&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenge #7: Grouping Sets
&lt;/h3&gt;

&lt;p&gt;In this challenge we'll explore the idea of &lt;strong&gt;grouping sets&lt;/strong&gt; by rolling up data in several different ways.&lt;/p&gt;

&lt;p&gt;The question for today is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Given supermarket sales history, can you display sales not only by location and category, but also at the month and yearly level?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's an example to give you a better idea of the output you're after:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fq9lizorckkd5qtwrmj34.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fq9lizorckkd5qtwrmj34.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see above, sales are summed up at the month level, but each location/product combination also has a subtotal row for the year.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;day7&lt;/strong&gt; schema contains only the &lt;strong&gt;supermarket&lt;/strong&gt; table.  Here's a sample of that table:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fqpyaut2mucig5b8zeni1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fqpyaut2mucig5b8zeni1.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Sandbox Connection Details
&lt;/h3&gt;

&lt;p&gt;I have a PostgreSQL database ready for you to play with.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fzpoed03yjhvjosbii3z7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fzpoed03yjhvjosbii3z7.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The password is the same as the username!  To query the **** table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;day7&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;supermarket&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Solution for Challenge #6
&lt;/h3&gt;

&lt;p&gt;This is the question we were trying to answer with yesterday's SQL challenge:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Given supermarket sales history by location and category, can you produce a report without gaps?  In other words, if the Mandalay location had no sales in the home and lifestyle category on a certain day, we'd still like to see a row with zero.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you've done data analysis for a while, you've probably had this requirement come up.  Understandably, users would rather see a row with a default zero value than a missing time range on a report.&lt;/p&gt;

&lt;p&gt;My solution is to generate all of the possible combinations first, and then join to the actual sales data to display that when it's available.  &lt;/p&gt;

&lt;p&gt;In this case, we have locations, categories, and dates.  There are 3 stores and 6 categories, so that should produce 18 rows.  Generating the dates is probably the trickiest part of this, but PostgreSQL provides a handy function that makes life easier in that regard.&lt;/p&gt;

&lt;p&gt;First, let's take a look at the solution and then we'll work through it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;coalesce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_date&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;date_range&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="k"&gt;SELECT&lt;/span&gt;
        &lt;span class="k"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sale_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;min_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sale_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;max_date&lt;/span&gt;
      &lt;span class="k"&gt;FROM&lt;/span&gt;
        &lt;span class="n"&gt;day6&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;supermarket&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt;
      &lt;span class="o"&gt;*&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt;
      &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt;
          &lt;span class="k"&gt;distinct&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;
        &lt;span class="k"&gt;FROM&lt;/span&gt;
          &lt;span class="n"&gt;day6&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;supermarket&lt;/span&gt;
      &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;s1&lt;/span&gt;
      &lt;span class="k"&gt;CROSS&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt;
          &lt;span class="k"&gt;distinct&lt;/span&gt; &lt;span class="n"&gt;product_line&lt;/span&gt;
        &lt;span class="k"&gt;FROM&lt;/span&gt;
          &lt;span class="n"&gt;day6&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;supermarket&lt;/span&gt;
      &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;s2&lt;/span&gt;
      &lt;span class="k"&gt;CROSS&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt;
          &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;min_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1 day'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;sale_date&lt;/span&gt;
        &lt;span class="k"&gt;FROM&lt;/span&gt;
          &lt;span class="n"&gt;date_range&lt;/span&gt;
      &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;s3&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;keys&lt;/span&gt;
  &lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;day6&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;supermarket&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_line&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_line&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_date&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final select uses &lt;strong&gt;coalesce&lt;/strong&gt; to turn NULL into zero where there was no actual sales data.  I then use a &lt;a href="https://www.postgresql.org/docs/current/queries-with.html" rel="noopener noreferrer"&gt;common table expression&lt;/a&gt; to get the first and last day of sales, which is fed into &lt;strong&gt;generate_series&lt;/strong&gt; to produce every day between those dates.&lt;/p&gt;

&lt;p&gt;That gives us every day, but we still need to pair that up with store locations and categories.  This is where the reclusive, rarely seen &lt;strong&gt;CROSS JOIN&lt;/strong&gt; comes into play.  If you've never seen one, it's equivalent to writing a &lt;strong&gt;JOIN&lt;/strong&gt; with an ON clause of &lt;strong&gt;1=1&lt;/strong&gt; or &lt;strong&gt;true&lt;/strong&gt;, which gives us every possible combination.&lt;/p&gt;

&lt;p&gt;Now that we have every combination, we can &lt;strong&gt;LEFT JOIN&lt;/strong&gt; to the sales data.  I'm using a &lt;strong&gt;LEFT JOIN&lt;/strong&gt; because I want a row even where there was no matching sale.  &lt;/p&gt;

&lt;h3&gt;
  
  
  Good luck!
&lt;/h3&gt;

&lt;p&gt;Have fun, and I can't wait to see what you come up with!  I'll be back tomorrow with a solution to this problem and a new challenge.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>postgres</category>
      <category>challenge</category>
    </item>
    <item>
      <title>What database do you use?</title>
      <dc:creator>zchtodd</dc:creator>
      <pubDate>Wed, 09 Oct 2019 01:01:05 +0000</pubDate>
      <link>https://dev.to/zchtodd/what-database-do-you-use-2aob</link>
      <guid>https://dev.to/zchtodd/what-database-do-you-use-2aob</guid>
      <description>&lt;p&gt;I'm curious what databases people are using for their projects, and why you might have decided on that.&lt;/p&gt;

&lt;p&gt;I'll list some of the databases I've used and my experiences with them.&lt;/p&gt;

&lt;h3&gt;
  
  
  PostgreSQL
&lt;/h3&gt;

&lt;p&gt;I'm using PostgreSQL for a Python project, alongside SQLAlchemy as the ORM.  The data model for the project is fairly complex, so it's been good to have SQL for the flexibility.  &lt;/p&gt;

&lt;p&gt;One downside so far has been that some of the queries have gotten pretty gnarly, and modifying them can feel like moving blocks in a Jenga tower.  &lt;/p&gt;

&lt;p&gt;Performance can also be a little unpredictable.  For instance, one query either took ~400ms or 20 seconds to execute.  In that case, the query was executing immediately after a big insert, and the table statistics were out of date, causing the planner to generate a really poor plan.  &lt;/p&gt;

&lt;p&gt;Issuing an ANALYZE command refreshed the table stats, and the planner started producing the 400ms version of the query.  As far as I know, it's not possible to pin a query plan in PostgreSQL.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cassandra
&lt;/h3&gt;

&lt;p&gt;My experience with Cassandra was mixed.  I think this was because the project was not well suited to an eventually consistent key-value store.  The application developers spent a lot of time coming to terms with eventual consistency, lack of transactions, and the rigidity of the data model.&lt;/p&gt;

&lt;p&gt;Cassandra can, however, support an amazingly high write through-put.  If you don't need the flexibility of SQL, and a single machine can't support your demands, then Cassandra might be a viable option.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are you using?
&lt;/h3&gt;

&lt;p&gt;I'd love to hear what databases you're using, as well as the positives and negatives that come with them.&lt;/p&gt;

</description>
      <category>discuss</category>
    </item>
    <item>
      <title>Building parsers for fun and profit with Pyparsing</title>
      <dc:creator>zchtodd</dc:creator>
      <pubDate>Tue, 08 Oct 2019 21:17:27 +0000</pubDate>
      <link>https://dev.to/zchtodd/building-parsers-for-fun-and-profit-with-pyparsing-4l9e</link>
      <guid>https://dev.to/zchtodd/building-parsers-for-fun-and-profit-with-pyparsing-4l9e</guid>
      <description>&lt;p&gt;I sometimes need to either create a domain specific language, or at least be able to parse a subset of some existing language. This can be incredibly useful.&lt;/p&gt;

&lt;p&gt;Recently, I worked on an application in which users create filters that classify data. The filters are boolean expressions that are constructed using a friendly web interface, but power users can directly edit the filters as text. This direct editing mode, however, means that we’ll need to parse the text to perform validations and to understand what the user intends.&lt;/p&gt;

&lt;p&gt;Parsing is a huge subject and can be an equally huge undertaking. Luckily, this is well trodden ground, and there’s no need to start from scratch. The &lt;a href="https://github.com/pyparsing/pyparsing"&gt;pyparsing&lt;/a&gt; module is a powerful toolkit that takes care of the time consuming and error prone aspects of parsing, allowing you to get directly to the essence of what you’re trying to do.&lt;/p&gt;

&lt;p&gt;In my case, what I’m trying to do is parse a simplified version of the SQL where clause.&lt;/p&gt;

&lt;p&gt;Building anything with Pyparsing involves starting with the basic building blocks of your desired grammar. The basic elements of my simplified SQL grammar are numbers, strings, operators and identifiers (the actual variables or column names).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pyparsing&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;

&lt;span class="n"&gt;lparen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Suppress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"("&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;rparen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Suppress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;")"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;and_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"AND"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;or_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"OR"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;operator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;oneOf&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="s"&gt;"="&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"!="&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"&amp;gt;="&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"&amp;lt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"&amp;lt;="&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;alphaword&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Word&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;alphanums&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"_"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;string&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;QuotedString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;quoteChar&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"'"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Word&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"."&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OneOrMore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Word&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;identifier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;alphaword&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Pyparsing takes advantage of operator overloading in Python to make grammar construction compact and elegant. Each time two Pyparsing objects are combined they will form a new object that is immediately returned, thus allowing you to build up a grammar. Every object has a parseString method, meaning all the subsets of your grammar can be independently tested.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parseString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"1.32"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s"&gt;'1'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'.'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'32'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The default result of calling &lt;strong&gt;parseString&lt;/strong&gt; is comparable to what you’d expect to see during the lexing or tokenizing phase of a compiler. This is a useful and necessary step in the parsing process, but it leaves a ton of work for us to do.&lt;/p&gt;

&lt;p&gt;At the end of the day, I need to understand the structure of the filter expression, which is best represented as a tree.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;setParseAction&lt;/strong&gt; method is what you’ll need if you want to encode the structure of a grammar into a tree. The &lt;strong&gt;setParseAction&lt;/strong&gt; method accepts any callable object, meaning you can pass functions or class objects.  As your various subgrammars are encountered in an input, Pyparsing will invoke your &lt;strong&gt;setParseAction&lt;/strong&gt; callables, with the results being passed into higher level grammar constructs.&lt;/p&gt;

&lt;p&gt;Let’s look again at the last example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"'{}'"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Number&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Identifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Condition&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;identifier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;" "&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;identifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rval&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;

&lt;span class="n"&gt;lparen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Suppress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"("&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;rparen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Suppress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;")"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;and_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"AND"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;or_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"OR"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;operator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;oneOf&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="s"&gt;"="&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"!="&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"&amp;gt;="&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"&amp;lt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"&amp;lt;="&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;alphaword&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Word&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;alphanums&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"_"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;string&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;QuotedString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;quoteChar&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"'"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;setParseAction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Word&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"."&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OneOrMore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Word&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;setParseAction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;identifier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;alphaword&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;setParseAction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;Identifier&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;condition&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;identifier&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;operator&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;setParseAction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Condition&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;If we try parsing a test string again, we’ll see that the result is returned in terms of the classes defined above, instead of as a string that has been tokenized.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;condition&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parseString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"x = 4"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;identifier&lt;/span&gt;
&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Identifier&lt;/span&gt; &lt;span class="nb"&gt;object&lt;/span&gt; &lt;span class="n"&gt;at&lt;/span&gt; &lt;span class="mh"&gt;0x7f327b9ed240&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;
&lt;span class="s"&gt;'='&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;rval&lt;/span&gt;
&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Number&lt;/span&gt; &lt;span class="nb"&gt;object&lt;/span&gt; &lt;span class="n"&gt;at&lt;/span&gt; &lt;span class="mh"&gt;0x7f327b9ed320&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The AND/OR part of the grammar is where things could start to become a little tricky, but it’s also where Pyparsing really shines. Up to this point, the grammar has been very simple. When we introduce AND/OR to this grammar, two important things have happened: the grammar now has operator precedence, and has also become recursive.&lt;/p&gt;

&lt;p&gt;If you’re familiar with SQL you probably know that AND has higher precedence than OR, just as multiplication has higher precedence than addition.  We’ll have to be careful how we construct our grammar so that operator precedence is encoded properly.&lt;/p&gt;

&lt;p&gt;Let’s take a look at how to add operator precedence and nested expression parsing to this grammar:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Expression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;and_conditions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;[::&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;"("&lt;/span&gt;
            &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;" OR "&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;and_condition&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;and_condition&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;and_conditions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;")"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AndCondition&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conditions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;[::&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;" AND "&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;condition&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;condition&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conditions&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conditions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"("&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;")"&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Condition&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;identifier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;" "&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;identifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rval&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"'{}'"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Number&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Identifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;

&lt;span class="n"&gt;lparen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Suppress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"("&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;rparen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Suppress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;")"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;and_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"AND"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;or_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"OR"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;operator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;oneOf&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="s"&gt;"="&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"!="&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"&amp;gt;="&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"&amp;lt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"&amp;lt;="&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;alphaword&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Word&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;alphanums&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"_"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;string&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;QuotedString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;quoteChar&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"'"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;setParseAction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Word&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"."&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OneOrMore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Word&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nums&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;setParseAction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;identifier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;alphaword&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;setParseAction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;Identifier&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;expr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Forward&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;condition&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;identifier&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;operator&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;setParseAction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Condition&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;condition&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;condition&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lparen&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;rparen&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;and_condition&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;condition&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ZeroOrMore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;and_&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;condition&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="n"&gt;setParseAction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="n"&gt;AndCondition&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;expr&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;and_condition&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;pp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ZeroOrMore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;or_&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;and_condition&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;expr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;setParseAction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Expression&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;It can be helpful to plan out a grammar in terms of something like Backus Naur form before committing it to code. Sketching out this simplified SQL grammar, I came up with the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# OP := = | != | &amp;gt; | &amp;lt; | &amp;lt;= | &amp;gt;=
# CONDITION := &amp;lt;word&amp;gt; &amp;lt;op&amp;gt; &amp;lt;number&amp;gt; | (&amp;lt;expression&amp;gt;)
# AND_CONDITION := &amp;lt;condition&amp;gt; | &amp;lt;condition&amp;gt; AND &amp;lt;condition&amp;gt;
# EXPRESSION := &amp;lt;and_condition&amp;gt; | &amp;lt;and_condition&amp;gt; OR &amp;lt;and_condition&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;The proper precedence is taken care of because the tree formed by our grammar places OR at a higher level, forcing AND conditions to be resolved first. You might also have noticed that conditions can contain expressions, while conditions are themselves part of expressions. This is a circular dependency, but is also required if we’re going to make nested expressions possible.&lt;/p&gt;

&lt;p&gt;Luckily, Pyparsing provides the Forward class for just such cases, allowing us to use part of our grammar before we’ve actually defined it.&lt;/p&gt;

&lt;p&gt;Now that we can parse this grammar, what can we do from here? As just one example, I’ve added generate methods to each component class that will produce a string representing that part of the tree. This will produce almost an exact replica of the input string, but in the AndCondition I’ve modified things slightly so that AND’d conditions are grouped together within parentheses in the output.&lt;/p&gt;

&lt;p&gt;This is a simple example of the kinds of transformations you can do once you’re able to parse a grammar. Of course, you could also perform all kinds of validations, or even execute the logic itself.&lt;/p&gt;

&lt;p&gt;Hopefully you’ve enjoyed this brief intro to Pyparsing. I may do more tutorials or examples in the future, because parsing is a pretty fun subject.&lt;/p&gt;

</description>
      <category>python</category>
      <category>computerscience</category>
    </item>
  </channel>
</rss>
