<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: AJ Kerrigan</title>
    <description>The latest articles on DEV Community by AJ Kerrigan (@ajkerrigan).</description>
    <link>https://dev.to/ajkerrigan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F30169%2Fd4e5fdb0-f4a8-4201-a35e-185bec33f193.png</url>
      <title>DEV Community: AJ Kerrigan</title>
      <link>https://dev.to/ajkerrigan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ajkerrigan"/>
    <language>en</language>
    <item>
      <title>Curses, space heists and found family - My freefall into Jessie Kwak fandom</title>
      <dc:creator>AJ Kerrigan</dc:creator>
      <pubDate>Mon, 24 Jun 2024 17:59:27 +0000</pubDate>
      <link>https://dev.to/ajkerrigan/curses-space-heists-and-found-family-my-freefall-into-jessie-kwak-fandom-4oj0</link>
      <guid>https://dev.to/ajkerrigan/curses-space-heists-and-found-family-my-freefall-into-jessie-kwak-fandom-4oj0</guid>
      <description>&lt;ul&gt;
&lt;li&gt;My Jessie Kwak Jumpstart&lt;/li&gt;
&lt;li&gt;But Wait, There's More!&lt;/li&gt;
&lt;li&gt;...And More to Come (Kickstarter)!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;TL;DR:&lt;/em&gt;&lt;/strong&gt; &lt;em&gt;I've fallen in love with Jessie Kwak's writing - character-rich speculative fiction stories, many of which take place in the same expanding world. I gush in the hope that there are other folks who will love her work but haven't come across it yet.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  My Jessie Kwak Jumpstart
&lt;/h3&gt;

&lt;p&gt;A while back I read the collection &lt;a href="https://www.powells.com/book/dispatches-from-anarres-9781942436485"&gt;Dispatches from Anarres&lt;/a&gt; - stories from a bunch of Portland authors in tribute to Ursula K. Le Guin. As with most collections, a few stories stuck with me long after I finished. Atop that list was &lt;em&gt;Black as Thread&lt;/em&gt;, from an author I had never read before. I love it when that happens! Jessie Kwak's teaser for the story &lt;a href="https://www.jessiekwak.com/short-stories/"&gt;on her site&lt;/a&gt; is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Sewing curses into the boots of foreign occupiers may be helping win the war, but what does it do to the soul?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Coming as part of a Le Guin tribute, the story immediately reminded me of the &lt;a href="https://theanarchistlibrary.org/mirror/u/uk/ursula-k-le-guin-the-carrier-bag-theory-of-fiction.pdf"&gt;Carrier Bag Theory of Fiction&lt;/a&gt; essay. Because sure there's that word "war", but the story isn't about battles and commanders and heroes. As the Carrier Bag essay notes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;That is why I like novels: instead of heroes they have people in them.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Something about the world, the characters, the writing style... it hooked me hard, and I had to find more from the author who created it.&lt;/p&gt;

&lt;h3&gt;
  
  
  But Wait, There's More!
&lt;/h3&gt;

&lt;p&gt;It's a beautiful thing - finding an author that you can connect with immediately, and learning that there are several books already out in the world waiting for you. Jessie provides a couple &lt;a href="https://store.jessiekwak.com/pages/jessie-kwak-book-reading-order"&gt;recommended entry points&lt;/a&gt; for her fiction, which I found helpful. I latched onto the &lt;a href="https://store.jessiekwak.com/collections/nanshe-chronicles"&gt;Nanshe Chronicles&lt;/a&gt; because... how could I not:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Are you looking for fun spaceship-based adventures that read like an episode of your favorite TV show — complete with heists, trick flying, witty banter, and a crew of wary newcomers slowly transforming into found family?&lt;/p&gt;

&lt;p&gt;(Thank Firefly, Cowboy Bebop, and Leverage with a splash of Indiana Jones.)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Leverage doesn't mean anything to me, but the rest combine nicely and feel spot-on in retrospect. I found a lot of &lt;em&gt;Expanse&lt;/em&gt; vibes in the series too, including one particular trope that surfaced in both series (for what it's worth, I vastly prefer how it was handled in the Nanshe Chronicles).&lt;/p&gt;

&lt;p&gt;There's a certain optimism/pessimism slider that's tough to nail, particularly in speculative fiction. Some feels willfully pessimistic and depressing enough that it's a slog to read. Go too far the other way, and things are so happy huggy that I can't suspend disbelief long enough to enjoy a book. The Nanshe Chronicles hits a sweet spot - the world isn't perfect, but I have people to love and root for.&lt;/p&gt;

&lt;h3&gt;
  
  
  ...And More to Come (Kickstarter)!
&lt;/h3&gt;

&lt;p&gt;I've read the Bulari Saga and all that's available from the Nanshe Chronicles so far (the two series take place in the same world). The thriller &lt;em&gt;From Earth and Bone&lt;/em&gt; had a totally different vibe, but worked for me also.&lt;/p&gt;

&lt;p&gt;And today there's a &lt;a href="https://www.kickstarter.com/projects/jessiekwak/bulari-saga-travel-guide"&gt;Kickstarter launching&lt;/a&gt; for a collection that will dig deeper into the world of the &lt;a href="https://www.jessiekwak.com/bulari-saga/"&gt;Bulari Saga&lt;/a&gt;. It's intended as a love letter to fans, particularly those who have supported the series since it launched in 2019. It wasn't on my radar then, but I'm so glad it is now. And I hope this post reaches a few eyes that will love these stories as much as I do.&lt;/p&gt;

&lt;p&gt;Cheers folks. One more thought from that Carrier Bag essay to send you off on your travels:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Science fiction properly conceived, like all serious fiction, however funny, is a way of trying to describe what is in fact going on, what people actually do and feel, how people relate to everything else in this vast stack, this belly of the universe, this womb of things to be and tomb of things that were, this unending story.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>books</category>
    </item>
    <item>
      <title>Git: Pull and Fetch, When and Why?</title>
      <dc:creator>AJ Kerrigan</dc:creator>
      <pubDate>Sun, 31 Mar 2024 04:10:38 +0000</pubDate>
      <link>https://dev.to/ajkerrigan/git-pull-and-fetch-when-and-why-1bh1</link>
      <guid>https://dev.to/ajkerrigan/git-pull-and-fetch-when-and-why-1bh1</guid>
      <description>&lt;ul&gt;
&lt;li&gt;The Question&lt;/li&gt;
&lt;li&gt;Simple Case&lt;/li&gt;
&lt;li&gt;More Complex Case&lt;/li&gt;
&lt;li&gt;Is This Working?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Question
&lt;/h2&gt;

&lt;p&gt;I saw &lt;a href="https://fediverse.speckledmonkey.com/@glyph@mastodon.social/112179953375723103" rel="noopener noreferrer"&gt;a poll&lt;/a&gt; pop up in my Mastodon feed:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjz8orlnw2fyq0tb6lbv7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjz8orlnw2fyq0tb6lbv7.png" alt="Mastodon poll about git pull vs. git fetch"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And my brain snagged on it. Because it reminded me that I developed some preferences, built them into aliases, and then shoved them comfortably below the level of conscious thought. And hey, that hasn't caused a problem for me so far! It's still good to re-evaluate these things from time to time though. So when I need to grab some upstream changes with git, what do I do?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; I don't intend this as expert guidance - I am neither expert nor guide.&lt;/p&gt;

&lt;h2&gt;
  
  
  Simple Case
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Situation:&lt;/strong&gt; My active local branch is behind its upstream counterpart.&lt;/p&gt;

&lt;p&gt;I...&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run &lt;code&gt;git f&lt;/code&gt; (an alias which in my head means "freshen up").&lt;/li&gt;
&lt;li&gt;Under the covers, this runs a &lt;code&gt;git pull&lt;/code&gt; and I'm up to date. Yay!&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  More Complex Case
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Situation&lt;/strong&gt;: I'm working on a branch and want to pull updates in from another branch (typically &lt;code&gt;main&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;I...&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run &lt;code&gt;git f main&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Under the covers, this runs a &lt;code&gt;git fetch&lt;/code&gt; command that updates my local main branch from its upstream remote, but leaves my current branch alone.&lt;/li&gt;
&lt;li&gt;Then depending on the situation, I might:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;git merge main&lt;/code&gt; (perhaps I'm making minor maintainer edits to a community pull request)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;git rebase main&lt;/code&gt; (it's my own junk and I just want to be updated)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;git rebase -i main&lt;/code&gt; (I've got some fine-tuning to do)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Is This Working?
&lt;/h2&gt;

&lt;p&gt;I haven't &lt;em&gt;really&lt;/em&gt; interrogated my instincts in a while, so does this flow still work for me?&lt;/p&gt;

&lt;p&gt;Yeah, I think so. It tucks most of my "get me up to date" behavior behind a single alias, and deliberately leaves me to make a judgement call in cases where the next step is less certain.&lt;/p&gt;

&lt;p&gt;For the record, here's the full picture of what &lt;code&gt;git f&lt;/code&gt; does:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euxo&lt;/span&gt; pipefail

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;1&lt;/span&gt;&lt;span class="k"&gt;:-}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;git pull &lt;span class="nt"&gt;-v&lt;/span&gt;
&lt;span class="k"&gt;else
    &lt;/span&gt;git rev-parse &lt;span class="nt"&gt;--abbrev-ref&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;1&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;@&lt;span class="o"&gt;{&lt;/span&gt;upstream&lt;span class="o"&gt;}&lt;/span&gt; | &lt;span class="se"&gt;\&lt;/span&gt;
        &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'{split ($0,ref,"/"); system("git fetch -v --prune " ref[1] " " ref[2] ":" ref[2]); }'&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;That &lt;code&gt;else&lt;/code&gt; clause feels a bit weird. But it means when I run &lt;code&gt;git f main&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;That &lt;code&gt;git rev-parse&lt;/code&gt; bit finds whichever remote is upstream of &lt;code&gt;main&lt;/code&gt;. Let's say that's called &lt;code&gt;origin&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Something like &lt;code&gt;git fetch origin main:main&lt;/code&gt; updates my local &lt;code&gt;main&lt;/code&gt; branch, but leaves my active branch alone.

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;--prune&lt;/code&gt; does a bit of tidying while we're at it, removing local branches that were tracking remote branches that no longer exist&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;There's undoubtedly a better/smarter way to handle this workflow, express it in bash, or both.  Suggestions welcome in the comments!&lt;/p&gt;

</description>
      <category>git</category>
    </item>
    <item>
      <title>Keep your AWS CLI config fresh with Cog</title>
      <dc:creator>AJ Kerrigan</dc:creator>
      <pubDate>Thu, 28 Mar 2024 18:34:04 +0000</pubDate>
      <link>https://dev.to/ajkerrigan/keep-your-aws-cli-config-fresh-with-cog-1oi6</link>
      <guid>https://dev.to/ajkerrigan/keep-your-aws-cli-config-fresh-with-cog-1oi6</guid>
      <description>&lt;ul&gt;
&lt;li&gt;The Problem&lt;/li&gt;
&lt;li&gt;An Opinionated Solution&lt;/li&gt;
&lt;li&gt;
How I Use It

&lt;ul&gt;
&lt;li&gt;Install Cog and aws-sso-config-builder&lt;/li&gt;
&lt;li&gt;Update the AWS CLI config file&lt;/li&gt;
&lt;li&gt;Refresh my config file&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;Should &lt;em&gt;you&lt;/em&gt; use this?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;I have...&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Access to an ever-shifting set of AWS accounts and permissions sets, courtesy of &lt;a href="https://docs.aws.amazon.com/singlesignon/latest/userguide/what-is.html"&gt;IAM Identity Center&lt;/a&gt; (formerly AWS SSO).&lt;/li&gt;
&lt;li&gt;Undying fondness for &lt;a href="https://github.com/99designs/aws-vault/"&gt;aws-vault&lt;/a&gt; to securely cache my session credentials.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I want...&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A set of AWS CLI profiles automatically defined in &lt;code&gt;~/.aws/config&lt;/code&gt;, based on all of the permission sets available to me through IAM Identity Center.&lt;/li&gt;
&lt;li&gt;Those profiles to "just work", in that:

&lt;ul&gt;
&lt;li&gt;They transparently refresh short-lived credentials or launch a browser-based SSO login window as needed.&lt;/li&gt;
&lt;li&gt;They don't trip up external utilities that read an AWS CLI configuration.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;The ability to add profiles manually for special cases, without interfering with the autogenerated profiles.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  An Opinionated Solution
&lt;/h2&gt;

&lt;p&gt;For a long time, I solved this problem with custom scripts. And that was fine! My workflow felt so specific to my tastes that it wasn't worth packaging.&lt;/p&gt;

&lt;p&gt;But then a couple things happened:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I saw some of &lt;a href="https://nedbatchelder.com/code/cog/"&gt;Cog's&lt;/a&gt; use cases and realized my custom scripts were just a low budget reimplementation of what Cog already did well.&lt;/li&gt;
&lt;li&gt;I was exploring different Python packaging tools, but none of my previous work or personal experience gave me a reason to use &lt;a href="https://hatch.pypa.io/"&gt;Hatch&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I decided to convert my custom nonsense into a Hatch-based package that would play nicely with Cog:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://pypi.org/project/aws-sso-config-builder/"&gt;https://pypi.org/project/aws-sso-config-builder/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Bird.&lt;br&gt;
Bird.&lt;br&gt;
Stone.&lt;/p&gt;
&lt;h2&gt;
  
  
  How I Use It
&lt;/h2&gt;

&lt;p&gt;The aws-sso-config-builder tool doesn't &lt;em&gt;require&lt;/em&gt; this workflow - the &lt;a href="https://github.com/ajkerrigan/aws-sso-config-builder?tab=readme-ov-file#usage"&gt;usage section&lt;/a&gt; of the README includes alternatives. But this is how &lt;em&gt;I&lt;/em&gt; work:&lt;/p&gt;
&lt;h3&gt;
  
  
  Install Cog and aws-sso-config-builder
&lt;/h3&gt;

&lt;p&gt;Use &lt;a href="https://pipx.pypa.io/"&gt;pipx&lt;/a&gt; to install Cog and my aws-sso-config-builder tool in the same environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;pipx install --verbose cogapp
pipx inject --verbose cogapp aws-sso-config-builder
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Update the AWS CLI config file
&lt;/h3&gt;

&lt;p&gt;Add a block like this to the &lt;code&gt;~/.aws/config&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# [[[cog
# import cog
# from aws_sso_config_builder.gen_config import generate_config_blocks
#
# cog.outl(generate_config_blocks(
#     sso_directories=["home", "work"],
#     profile_template="""
#         [profile {profile_name}-sso]
#         sso_session = {sso_session}
#         sso_account_id = {account_id}
#         sso_role_name = {role_name}
#         output = json
#         region = us-east-2
#
#         [profile {profile_name}]
#         credential_process = {aws_vault_path} exec --json {profile_name}-sso
#         output = json
#         region = us-east-2
#     """,
#     regex_replacements={
#         "Production": "prod",
#         "Sandbox": "sbx"
#     },
#     aws_vault_path="/home/aj/go/bin/aws-vault",
# ))
# ]]]
# [[[end]]]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which defines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IAM Identity Center directory names / organizations that I have access to&lt;/li&gt;
&lt;li&gt;The skeleton of a config block that should be generated for each account/permission set&lt;/li&gt;
&lt;li&gt;A few regular expressions to customize / normalize profile names&lt;/li&gt;
&lt;li&gt;The path to my aws-vault binary&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Refresh my config file
&lt;/h3&gt;

&lt;p&gt;Now I have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cog available&lt;/li&gt;
&lt;li&gt;aws-sso-config-builder providing a mechanism for looking up permission sets&lt;/li&gt;
&lt;li&gt;A template block in my config file&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So when I run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;cog -r ~/.aws/config
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It replaces everything between &lt;code&gt;]]]&lt;/code&gt; and &lt;code&gt;[[[end]]]&lt;/code&gt; with one rendered &lt;code&gt;profile_template&lt;/code&gt; block per permission set:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="err"&gt;...template&lt;/span&gt; &lt;span class="err"&gt;block&lt;/span&gt; &lt;span class="err"&gt;from&lt;/span&gt; &lt;span class="err"&gt;above...&lt;/span&gt;
&lt;span class="err"&gt;]]]&lt;/span&gt;
&lt;span class="err"&gt;...many&lt;/span&gt; &lt;span class="err"&gt;other&lt;/span&gt; &lt;span class="err"&gt;profiles...&lt;/span&gt;

&lt;span class="nn"&gt;[profile sbx-AJ-AWSReadOnlyAccess-sso]&lt;/span&gt;
&lt;span class="py"&gt;sso_session&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;stacklet&lt;/span&gt;
&lt;span class="py"&gt;sso_account_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;111111111111&lt;/span&gt;
&lt;span class="py"&gt;sso_role_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;AWSReadOnlyAccess&lt;/span&gt;
&lt;span class="py"&gt;output&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;json&lt;/span&gt;
&lt;span class="py"&gt;region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;us-east-2&lt;/span&gt;

&lt;span class="nn"&gt;[profile sbx-AJ-AWSReadOnlyAccess]&lt;/span&gt;
&lt;span class="py"&gt;credential_process&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;/home/aj/go/bin/aws-vault exec --json sbx-AJ-AWSReadOnlyAccess-sso&lt;/span&gt;
&lt;span class="py"&gt;output&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;json&lt;/span&gt;
&lt;span class="py"&gt;region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;us-east-2&lt;/span&gt;

&lt;span class="nn"&gt;[profile sbx-AJ-AWSAdministratorAccess-sso]&lt;/span&gt;
&lt;span class="py"&gt;sso_session&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;stacklet&lt;/span&gt;
&lt;span class="py"&gt;sso_account_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;111111111111&lt;/span&gt;
&lt;span class="py"&gt;sso_role_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;AWSAdministratorAccess&lt;/span&gt;
&lt;span class="py"&gt;output&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;json&lt;/span&gt;
&lt;span class="py"&gt;region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;us-east-2&lt;/span&gt;

&lt;span class="nn"&gt;[profile sbx-AJ-AWSAdministratorAccess]&lt;/span&gt;
&lt;span class="py"&gt;credential_process&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;/home/aj/go/bin/aws-vault exec --json sbx-AJ-AWSAdministratorAccess-sso&lt;/span&gt;
&lt;span class="py"&gt;output&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;json&lt;/span&gt;
&lt;span class="py"&gt;region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;us-east-2&lt;/span&gt;
&lt;span class="nn"&gt;[[[end]&lt;/span&gt;&lt;span class="err"&gt;]]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And from there, it's easy for me to switch among any CLI profiles I have. The profile names are verbose enough that it's very clear which account/permissions I have, and through some &lt;a href="https://github.com/ajkerrigan/aws-sso-config-builder?tab=readme-ov-file#fish-convenience-functions"&gt;shell helpers&lt;/a&gt; it's still friendly to bounce around:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;❯ asp sbx-AJ-AWSReadOnlyAccess

~ via 🐍v3.12.2 (cogapp) on ☁️  sbx-AJ-AWSReadOnlyAccess (us-east-2)
❯
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Should &lt;em&gt;you&lt;/em&gt; use this?
&lt;/h2&gt;

&lt;p&gt;Probably not. But if you got this far in the post... maybe? Thanks for reading in any case, you deserve some &lt;a href="https://www.gimmesomeoven.com/swedish-cinnamon-buns-kanelbullar/"&gt;kanelbullar&lt;/a&gt;. Boy those things are delicious.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>cli</category>
      <category>python</category>
    </item>
    <item>
      <title>Stuff I Learned during Hanukkah of Data 2023</title>
      <dc:creator>AJ Kerrigan</dc:creator>
      <pubDate>Mon, 18 Dec 2023 20:57:27 +0000</pubDate>
      <link>https://dev.to/ajkerrigan/stuff-i-learned-during-hanukkah-of-data-2023-4da6</link>
      <guid>https://dev.to/ajkerrigan/stuff-i-learned-during-hanukkah-of-data-2023-4da6</guid>
      <description>&lt;p&gt;
  Background on Hanukkah of Data, motivation behind this post
  &lt;p&gt;&lt;a href="https://hanukkah.bluebird.sh/"&gt;Hanukkah of Data&lt;/a&gt; is a series of data-themed puzzles, where you solve puzzles to move your way through a holiday-themed story using a fictional dataset. I think of it as "&lt;a href="https://adventofcode.com/"&gt;Advent of Code&lt;/a&gt; meets &lt;a href="https://mystery.knightlab.com/"&gt;SQL Murder Mystery&lt;/a&gt;".&lt;/p&gt;

&lt;p&gt;Last year I worked through the challenges using &lt;a href="https://www.visidata.org/"&gt;VisiData&lt;/a&gt;, &lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt;, and &lt;a href="https://pandas.pydata.org/"&gt;Pandas&lt;/a&gt;. I walked through my thought process and solutions in &lt;a href="https://dev.to/ajkerrigan/series/21177"&gt;a series of posts&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This&lt;/em&gt; year I decided to commit to a set of tools on day 1 (&lt;a href="https://pola.rs/"&gt;Polars&lt;/a&gt; and &lt;a href="https://jupyter.org/"&gt;Jupyter&lt;/a&gt;) and use them for the whole challenge. It seemed silly to do a whole new meandering walkthrough, so instead I'll highlight a few things that stuck out after finishing the challenge and sitting on it for a few days. Here we go!&lt;/p&gt;



&lt;/p&gt;

&lt;p&gt;Here are a few things that stuck with me after using Polars and Jupyter for &lt;a href="https://hanukkah.bluebird.sh/"&gt;Hanukkah of Data&lt;/a&gt; this year:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
Parsing Dates at Read Time Can Be Fiddly

&lt;ul&gt;
&lt;li&gt;Helping SQLite Help Me&lt;/li&gt;
&lt;li&gt;Falling Back to Explicit Post-Read Parsing&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;nbdime Rocks&lt;/li&gt;
&lt;li&gt;Polars LazyFrames are Neat&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Parsing Dates at Read Time Can Be Fiddly
&lt;/h3&gt;

&lt;p&gt;While I had no trouble parsing dates from an existing Polars DataFrame, I hit two bumps trying to parse dates from strings while reading a &lt;a href="https://sqlite.org/"&gt;SQLite&lt;/a&gt; file with &lt;a href="https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.read_database.html"&gt;read_database()&lt;/a&gt;:&lt;/p&gt;

&lt;h4&gt;
  
  
  Helping SQLite Help Me
&lt;/h4&gt;

&lt;p&gt;First, I got an error that looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ComputeError: could not append value: "2017-01-31 00:11:08"
of type: str to the builder; make sure that all rows have
the same schema or consider increasing `infer_schema_length`

it might also be that a value overflows the data-type's
capacity
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That turned out to be related to &lt;a href="https://github.com/pola-rs/polars/issues/11912"&gt;pola-rs/polars#11912&lt;/a&gt;, and &lt;a href="https://github.com/pola-rs/polars/issues/11246#issuecomment-1731633736"&gt;this linked comment&lt;/a&gt; provided a deceptively simple solution - use &lt;a href="https://docs.python.org/3/library/sqlite3.html?highlight=sqlite3#sqlite3.PARSE_DECLTYPES"&gt;PARSE_DECLTYPES&lt;/a&gt; when creating the connection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="p"&gt;con = sqlite3.connect(
&lt;/span&gt;&lt;span class="gd"&gt;-    "5784/noahs.sqlite"
&lt;/span&gt;&lt;span class="gi"&gt;+    "5784/noahs.sqlite", detect_types=sqlite3.PARSE_DECLTYPES
&lt;/span&gt;)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Falling Back to Explicit Post-Read Parsing
&lt;/h4&gt;

&lt;p&gt;With that fixed, I still hit overflow errors on the &lt;code&gt;birthdate&lt;/code&gt; column. As far as I've been able to tell from &lt;a href="https://github.com/pola-rs/polars/issues/12900#issuecomment-1842170343"&gt;here&lt;/a&gt; and elsewhere, this is because birthdate was a simple &lt;code&gt;YYYY-MM-DD&lt;/code&gt; format string rather than a full ISO format date? In any case, I couldn't find a way to parse that at read time.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: I expect this is a failure in my brain and fingers, rather than Polars.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;So I was able to parse a couple datetimes at read time, and leaned on &lt;code&gt;.str.strptime()&lt;/code&gt; to parse birthdates after reading:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_database&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;con&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="gp"&gt;        ...&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;schema_overrides&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ordered&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shipped&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;with_columns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;birthdate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;birthdate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strptime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  nbdime Rocks
&lt;/h3&gt;

&lt;p&gt;I remember hearing about &lt;a href="https://nbdime.readthedocs.io/"&gt;nbdime&lt;/a&gt; and thinking it sounded useful, but I've never really needed it since I rarely use Jupyter in the first place. But then I made some changes to my &lt;a href="https://github.com/ajkerrigan/hanukkah-of-data/blob/main/2023/Hanukkah%20of%20Data%202023.ipynb"&gt;Hanukkah of Data 2023 notebook&lt;/a&gt; to work with the follow-up "speed run" challenge (a new dataset and slightly tweaked clues), and the native Git diff was too noisy to be useful. nbdime came to the rescue! Here are the changes I had to make for days 2 and 3 during the speed run:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--7zKWdACX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/h4izzotq7m61i2sjb366.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--7zKWdACX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/h4izzotq7m61i2sjb366.png" alt="Snippet of a Notebook Diff with nbdime's nbdiff-web tool" width="800" height="202"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/h4izzotq7m61i2sjb366.png"&gt;(Full-size image)&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;
  Text representation of this diff
  &lt;br&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight diff"&gt;&lt;code&gt;&lt;span class="p"&gt;nbdiff 2023/Hanukkah of Data 2023.ipynb (656616004639c75824de43284ff34bbdc9d89f37) 2023/Hanukkah of Data 2023.ipynb
&lt;/span&gt;&lt;span class="gd"&gt;--- 2023/Hanukkah of Data 2023.ipynb (656616004639c75824de43284ff34bbdc9d89f37)  (no timestamp)
&lt;/span&gt;&lt;span class="gi"&gt;+++ 2023/Hanukkah of Data 2023.ipynb  2023-12-18 11:43:03.005108
&lt;/span&gt;## modified /cells/12/source:
&lt;span class="p"&gt;@@ -1,7 +1,12 @@&lt;/span&gt;
&lt;span class="gd"&gt;-day2 = df.filter(
-    (pl.col("name").str.contains(r"J.*P"))
-    &amp;amp; (pl.col("desc").str.contains(r"(?i)(coffee|bagel)"))
-    &amp;amp; (pl.col("ordered").str.strptime(pl.Datetime).dt.year() == 2017)
&lt;/span&gt;&lt;span class="gi"&gt;+day2 = (
+    df.filter(
+        (pl.col("name").str.contains(r"D.*?S"))
+        &amp;amp; (pl.col("desc").str.contains(r"(?i)(coffee|bagel)"))
+        &amp;amp; (pl.col("ordered").dt.year() == 2017)
+    )
+    .group_by("phone", "citystatezip")
+    .agg(pl.col("orderid").count().alias("ordercount"))
+    .top_k(1, by="ordercount")
&lt;/span&gt; )
&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="gd"&gt;-day2.select("phone").unique().item()
&lt;/span&gt;&lt;span class="gi"&gt;+day2.select("phone")
&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;## modified /cells/14/source:
&lt;span class="p"&gt;@@ -1,13 +1,8 @@&lt;/span&gt;
 neighborhood = day2.select("citystatezip").unique().item()
&lt;span class="err"&gt;
&lt;/span&gt; day3 = df.filter(
&lt;span class="gd"&gt;-    (
-        pl.col("birthdate")
-        .str.strptime(pl.Datetime)
-        .dt.ordinal_day()
-        .is_between(173, 203)
-    )
-    &amp;amp; (pl.col("birthdate").str.strptime(pl.Datetime).dt.year() % 12 == 7)
&lt;/span&gt;&lt;span class="gi"&gt;+    (pl.col("birthdate").dt.ordinal_day().is_between(266, 296))
+    &amp;amp; (pl.col("birthdate").dt.year() % 12 == 11)
&lt;/span&gt;     &amp;amp; (pl.col("citystatezip") == neighborhood)
 )
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;




&lt;/p&gt;

&lt;h3&gt;
  
  
  Polars LazyFrames are Neat
&lt;/h3&gt;

&lt;p&gt;For this eight-day challenge, I knew I would be reusing the same core dataset and just slicing it different ways. Reading the full data set into a starter DataFrame seemed fine for that use case, but I still wanted to fiddle with the Polars &lt;a href="https://pola-rs.github.io/polars/user-guide/lazy/using/"&gt;Lazy API&lt;/a&gt; a bit.&lt;/p&gt;

&lt;p&gt;On Day 4, for example, I needed to rank customers by the number of bakery items they bought before 5am. That involves a few steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Filter: I only care about bakery orders (an order sku that starts with BKY) in the 0400-0459 time range.&lt;/li&gt;
&lt;li&gt;Aggregate: Count the filtered orders by customer&lt;/li&gt;
&lt;li&gt;Sort: Show the top customers by order count&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Which I can do with an eager/non-lazy call like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;day4&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sku&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;starts_with&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BKY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ordered&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;hour&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;group_by&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phone&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;agg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orderid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;alias&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ordercount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ordercount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;day4&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phone&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ordercount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or this very similar looking lazified version, which only requires two changes (calling &lt;code&gt;df.lazy()&lt;/code&gt; up front to queue operations, and &lt;code&gt;collect()&lt;/code&gt; later to combine/perform them):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;day4_lazy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lazy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sku&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;starts_with&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BKY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ordered&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;hour&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;group_by&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phone&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;agg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orderid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;alias&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ordercount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ordercount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;day4_lazy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;collect&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phone&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ordercount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The neat part is that before executing those queued operations, the &lt;code&gt;explain()&lt;/code&gt; and &lt;code&gt;show_graph()&lt;/code&gt; methods can offer peeks at the &lt;a href="https://pola-rs.github.io/polars/user-guide/lazy/query-plan/#graphviz-visualization"&gt;query plan&lt;/a&gt;. And it lets you choose whether or not that plan display includes LazyFrame optimizations. Check it out, here's the unoptimized version of my day 4 operations:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GzDZ8OW7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/7i690wimtmuwc2837rva.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GzDZ8OW7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/7i690wimtmuwc2837rva.png" alt="Unoptimized Query Plan" width="392" height="488"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;
  Text version
  &lt;br&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SLICE[offset: 0, len: 5]
  SORT BY [col("ordercount")]
    AGGREGATE
        [col("orderid").count().alias("ordercount")] BY [col("name"), col("phone")] FROM
      FILTER [(col("sku").str.starts_with([Utf8(BKY)])) &amp;amp; ([(col("ordered").dt.hour()) == (4)])] FROM

      DF ["customerid", "name", "address", "citystatezip"]
 PROJECT */18 COLUMNS
 SELECTION: "None"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;




&lt;/p&gt;

&lt;p&gt;And here's the optimized version:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--o4beqF1K--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8cmv5ysfp79axh9z1m1a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--o4beqF1K--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8cmv5ysfp79axh9z1m1a.png" alt="Optimized Query Plan" width="392" height="296"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;
  Text version
  &lt;br&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SORT BY [col("ordercount")]
  AGGREGATE
    [col("orderid").count().alias("ordercount")] BY [col("name"), col("phone")] FROM
    DF ["customerid", "name", "address", "citystatezip"]
 PROJECT 5/18 COLUMNS
 SELECTION: "[(col(\"sku\").str.starts_with([Utf8(BKY)])) &amp;amp; ([(col(\"ordered\").dt.hour()) == (4)])]"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;




&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;/p&gt;

&lt;p&gt;It's pretty cool to see what Polars does under the hood to combine operations and make them more efficient.&lt;/p&gt;

&lt;p&gt;I didn't get the full benefit of &lt;a href="https://pola-rs.github.io/polars/user-guide/lazy/optimizations/"&gt;all available lazy optimizations&lt;/a&gt; because of the data I had and how I was using it. It's still very neat stuff to be aware of though!&lt;br&gt;
&lt;/p&gt;

</description>
      <category>python</category>
      <category>polars</category>
      <category>jupyter</category>
      <category>hanukkahofdata</category>
    </item>
    <item>
      <title>Kitty Image Quality in the Terminal... Without Giving Up Tmux</title>
      <dc:creator>AJ Kerrigan</dc:creator>
      <pubDate>Thu, 23 Mar 2023 03:04:55 +0000</pubDate>
      <link>https://dev.to/ajkerrigan/kitty-image-quality-in-the-terminal-without-giving-up-tmux-16be</link>
      <guid>https://dev.to/ajkerrigan/kitty-image-quality-in-the-terminal-without-giving-up-tmux-16be</guid>
      <description>&lt;ul&gt;
&lt;li&gt;Motivation&lt;/li&gt;
&lt;li&gt;Is this actually a problem?&lt;/li&gt;
&lt;li&gt;
Time for a workaround

&lt;ul&gt;
&lt;li&gt;Kitty remote control&lt;/li&gt;
&lt;li&gt;Kitty tabs&lt;/li&gt;
&lt;li&gt;Timg&lt;/li&gt;
&lt;li&gt;Duct taping it together&lt;/li&gt;
&lt;li&gt;Video demo&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;Better options&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Motivation
&lt;/h2&gt;

&lt;p&gt;This is a bit cheeky and simplistic, but also reasonably accurate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kitty has a &lt;a href="https://sw.kovidgoyal.net/kitty/graphics-protocol/"&gt;graphics protocol&lt;/a&gt; that allows a terminal to render high-resolution images&lt;/li&gt;
&lt;li&gt;A number of tools support kitty's graphics protocol&lt;/li&gt;
&lt;li&gt;Tmux &lt;a href="https://github.com/tmux/tmux/issues/3395#issuecomment-1327312472"&gt;does not&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;...but I like tmux&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Is this actually a problem?
&lt;/h2&gt;

&lt;p&gt;One obvious solution here would be to... &lt;em&gt;not&lt;/em&gt; use tmux. Kitty has its own support for windows/tabs/panes, and by all accounts it's quite full-featured. There are other terminals out there too. And hey, I'm all for trying out new tools. But some tools are ingrained enough in my fingers that I'm hesitant to discard them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Time for a workaround
&lt;/h2&gt;

&lt;p&gt;Since at least for the moment I'm not ready to give up tmux, what's the lowest friction way to still get kitty-quality images? The flow doesn't have to be pretty, just the image 😎. So far the way that feels smoothest to me combines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://sw.kovidgoyal.net/kitty/remote-control/#remote-control-via-a-socket"&gt;kitty remote control&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://sw.kovidgoyal.net/kitty/launch/"&gt;kitty tabs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/hzeller/timg/"&gt;timg&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Kitty remote control
&lt;/h3&gt;

&lt;p&gt;I start a kitty session at login and keep it always a keyboard shortcut away. I've updated my startup command to &lt;a href="https://sw.kovidgoyal.net/kitty/remote-control/#remote-control-via-a-socket"&gt;listen on a unix socket&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;kitty -o allow_remote_control=yes --listen-on unix:/tmp/kitty_remote_control
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Kitty tabs
&lt;/h3&gt;

&lt;p&gt;With my kitty session listening on a socket, I can run commands &lt;em&gt;inside tmux&lt;/em&gt; that run in new kitty tabs &lt;em&gt;outside of tmux&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;kitty @ --to unix:/tmp/kitty_remote_control launch --type=tab --title='my new tab'
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Timg
&lt;/h3&gt;

&lt;p&gt;Kitty's &lt;a href="https://sw.kovidgoyal.net/kitty/graphics-protocol/"&gt;graphics protocol page&lt;/a&gt; lists a number of tools that use it. While I haven't tried them all, I've become particularly fond of &lt;a href="https://github.com/hzeller/timg/"&gt;timg&lt;/a&gt;. It has some friendly options for things like upscaling and handling transparency/alpha channels. It also has the option to gracefully fall back from kitty-quality to half or quarter pixels in cases where a blocky image is OK.&lt;/p&gt;

&lt;h3&gt;
  
  
  Duct taping it together
&lt;/h3&gt;

&lt;p&gt;With a &lt;code&gt;kimg&lt;/code&gt; script like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euxo&lt;/span&gt; pipefail

&lt;span class="nv"&gt;FILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$1&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;shift

&lt;/span&gt;kitty @ &lt;span class="nt"&gt;--to&lt;/span&gt; unix:/tmp/kitty_remote_control launch &lt;span class="nt"&gt;--type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;tab &lt;span class="nt"&gt;--title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$FILE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--no-response&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    /bin/bash &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"timg -pk -B gray --title='%f (%wx%h)' &lt;/span&gt;&lt;span class="nv"&gt;$*&lt;/span&gt;&lt;span class="s2"&gt; '&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;realpath&lt;/span&gt; &lt;span class="nv"&gt;$FILE&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;' &amp;amp;&amp;amp; read -sn1"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I can use &lt;code&gt;kimg &amp;lt;image file&amp;gt;&lt;/code&gt; to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open a new kitty tab outside tmux&lt;/li&gt;
&lt;li&gt;Run timg to display the image&lt;/li&gt;
&lt;li&gt;Wait for a keypress to close the kitty tab and return to my tmux home&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Video demo
&lt;/h3&gt;

&lt;p&gt;This short video shows the difference between opening an image in tmux directly vs. in a dedicated kitty tab:&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/-ERPQVY4wkk"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Better options
&lt;/h2&gt;

&lt;p&gt;If you've read this post and thought "wow that's pretty gross, why doesn't he just..." then &lt;em&gt;great&lt;/em&gt;! First of all, thanks for reading. And second, please share your own suggestions.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Just keep in mind that if the suggestion is "abandon tmux" I'll be listening with my hands over my ears for now.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>tmux</category>
      <category>kitty</category>
      <category>terminal</category>
    </item>
    <item>
      <title>Opening CSVs with VisiData in Tmux, from the Firefox Flatpak App</title>
      <dc:creator>AJ Kerrigan</dc:creator>
      <pubDate>Sat, 18 Feb 2023 15:20:04 +0000</pubDate>
      <link>https://dev.to/ajkerrigan/today-i-learned-opening-csvs-with-visidata-in-tmux-from-the-firefox-flatpak-app-4m07</link>
      <guid>https://dev.to/ajkerrigan/today-i-learned-opening-csvs-with-visidata-in-tmux-from-the-firefox-flatpak-app-4m07</guid>
      <description>&lt;ul&gt;
&lt;li&gt;
The Flatpak Wrinkle

&lt;ul&gt;
&lt;li&gt;Create a Flatpak-Specific Desktop Entry&lt;/li&gt;
&lt;li&gt;Let Firefox Talk to the System Bus&lt;/li&gt;
&lt;li&gt;Tell Firefox to Use VisiData as a Default App&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

The Setup Outside of Flatpak

&lt;ul&gt;
&lt;li&gt;Desktop Entry&lt;/li&gt;
&lt;li&gt;VisiData as a Default Application&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;I use &lt;a href="https://www.visidata.org" rel="noopener noreferrer"&gt;VisiData&lt;/a&gt; as my default app for CSVs. There are a couple moving parts, and I had ignored the Firefox/Flatpak piece until recently. While this all feels very specific to my usage patterns, some of these bits were not at all obvious to me at first. So maybe something in here will be a useful reference to someone else or future me.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Flatpak Wrinkle
&lt;/h2&gt;

&lt;p&gt;I had a mostly-working setup (details) that opened CSVs in VisiData just fine, until I started using the Firefox Flatpak app. I kept thinking "I'll fix this &lt;em&gt;someday&lt;/em&gt;..." and then a &lt;a href="https://github.com/saulpw/visidata/issues/1737#issuecomment-1435335934" rel="noopener noreferrer"&gt;comment on a GitHub issue&lt;/a&gt; reminded me about it.&lt;/p&gt;

&lt;p&gt;So the keys to fix my Flatpak setup were...&lt;/p&gt;

&lt;h3&gt;
  
  
  Create a Flatpak-Specific Desktop Entry
&lt;/h3&gt;

&lt;p&gt;I needed to copy my existing &lt;code&gt;VisiData.desktop&lt;/code&gt; file to &lt;code&gt;~/.var/app/org.mozilla.firefox/data/applications/VisiData.desktop&lt;/code&gt; and tweak the &lt;code&gt;Exec&lt;/code&gt; line to use &lt;code&gt;flatpak-spawn&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Desktop Entry]
Encoding=UTF-8
Version=1.0
Type=Application
Terminal=false
Exec=/usr/bin/bash -c "flatpak-spawn --host i3-focus-or-launch.sh "kitty" "kitty" &amp;amp;&amp;amp; flatpak-spawn --host /usr/bin/tmux new-window -n vd ~/.local/bin/vd %U"
MimeType=application/vnd.oasis.opendocument.spreadsheet;application/vnd.oasis.opendocument.spreadsheet-template;application/vnd.sun.xml.calc;application/vnd.sun.xml.calc.template;application/msexcel;application/vnd.ms-excel;application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;application/vnd.ms-excel.sheet.macroEnabled.12;application/vnd.openxmlformats-officedocument.spreadsheetml.template;application/vnd.ms-excel.template.macroEnabled.12;application/vnd.ms-excel.sheet.binary.macroEnabled.12;text/csv;application/x-dbf;text/spreadsheet;application/csv;application/excel;application/tab-separated-values;application/vnd.lotus-1-2-3;application/vnd.oasis.opendocument.chart;application/vnd.oasis.opendocument.chart-template;application/x-dbase;application/x-dos_ms_excel;application/x-excel;application/x-msexcel;application/x-ms-excel;application/x-quattropro;application/x-123;text/comma-separated-values;text/tab-separated-values;text/x-comma-separated-values;text/x-csv;application/vnd.oasis.opendocument.spreadsheet-flat-xml;application/vnd.ms-works;application/clarisworks;application/x-iwork-numbers-sffnumbers;application/x-starcalc;
Name=VisiData
Icon=~/code/visidata.org/static/vd-rtd.png
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Let Firefox Talk to the System Bus
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flatpak override --user --talk-name org.freedesktop.Flatpak org.mozilla.firefox
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Tell Firefox to Use VisiData as a Default App
&lt;/h3&gt;

&lt;p&gt;In &lt;code&gt;~/.var/app/org.mozilla.firefox/config/mimeapps.list&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Default Applications]
...snip...
text/csv=VisiData.desktop
application/csv=VisiData.desktop
application/x-csv=VisiData.desktop
application/vnd.oasis.opendocument.spreadsheet=VisiData.desktop
application/vnd.oasis.opendocument.spreadsheet-flat-xml=VisiData.desktop
application/vnd.oasis.opendocument.spreadsheet-template=VisiData.desktop
application/vnd.sun.xml.calc=VisiData.desktop
application/vnd.sun.xml.calc.template=VisiData.desktop
application/msexcel=VisiData.desktop
application/vnd.ms-excel=VisiData.desktop
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet=VisiData.desktop
application/vnd.ms-excel.sheet.macroenabled.12=VisiData.desktop
application/vnd.openxmlformats-officedocument.spreadsheetml.template=VisiData.desktop
application/vnd.ms-excel.template.macroenabled.12=VisiData.desktop
application/vnd.ms-excel.sheet.binary.macroenabled.12=VisiData.desktop
application/x-dbf=VisiData.desktop
text/spreadsheet=VisiData.desktop

[Added Associations]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Setup Outside of Flatpak
&lt;/h2&gt;

&lt;p&gt;I'm running Regolith Desktop 2.1 on Ubuntu 22.04, but this &lt;em&gt;should&lt;/em&gt; generalize to Ubuntu+Gnome pairings in the same version neighborhood.&lt;/p&gt;

&lt;h3&gt;
  
  
  Desktop Entry
&lt;/h3&gt;

&lt;p&gt;I have a file like this at &lt;code&gt;~/.local/share/applications/VisiData.desktop&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Desktop Entry]
Encoding=UTF-8
Version=1.0
Type=Application
Terminal=false
Exec=/usr/bin/sh -c "i3-focus-or-launch.sh kitty kitty &amp;amp;&amp;amp; /usr/bin/tmux new-window -n vd ~/.local/bin/vd %U"
MimeType=application/vnd.oasis.opendocument.spreadsheet;application/vnd.oasis.opendocument.spreadsheet-template;application/vnd.sun.xml.calc;application/vnd.sun.xml.calc.template;application/msexcel;application/vnd.ms-excel;application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;application/vnd.ms-excel.sheet.macroEnabled.12;application/vnd.openxmlformats-officedocument.spreadsheetml.template;application/vnd.ms-excel.template.macroEnabled.12;application/vnd.ms-excel.sheet.binary.macroEnabled.12;text/csv;application/x-dbf;text/spreadsheet;application/csv;application/excel;application/tab-separated-values;application/vnd.lotus-1-2-3;application/vnd.oasis.opendocument.chart;application/vnd.oasis.opendocument.chart-template;application/x-dbase;application/x-dos_ms_excel;application/x-excel;application/x-msexcel;application/x-ms-excel;application/x-quattropro;application/x-123;text/comma-separated-values;text/tab-separated-values;text/x-comma-separated-values;text/x-csv;application/vnd.oasis.opendocument.spreadsheet-flat-xml;application/vnd.ms-works;application/clarisworks;application/x-iwork-numbers-sffnumbers;application/x-starcalc;
Name=VisiData
Icon=~/code/visidata.org/static/vd-rtd.png
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;Exec&lt;/code&gt; string is pretty specific to my workflow, but it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Focuses my terminal&lt;/li&gt;
&lt;li&gt;Creates a new tmux window named &lt;code&gt;vd&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Opens the file/URL with VisiData in that window&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The list of MIME types was copy/pasted from Libre Office Calc, since this started with "anywhere Calc is trying to open, use VisiData instead".&lt;/p&gt;

&lt;h3&gt;
  
  
  VisiData as a Default Application
&lt;/h3&gt;

&lt;p&gt;I update my &lt;code&gt;~/.config/mimeapps.list&lt;/code&gt; file so my desktop environment knows to use VisiData for a bunch of stuff:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Added Associations]
...snip...
text/csv=VisiData.desktop;
application/csv=VisiData.desktop;
text/x-csv=VisiData.desktop;
text/spreadsheet=VisiData.desktop;

[Default Applications]
...snip...
text/csv=VisiData.desktop
application/csv=VisiData.desktop
text/x-csv=VisiData.desktop
text/spreadsheet=VisiData.desktop
application/vnd.oasis.opendocument.spreadsheet=VisiData.desktop
application/vnd.oasis.opendocument.spreadsheet-flat-xml=VisiData.desktop
application/vnd.oasis.opendocument.spreadsheet-template=VisiData.desktop
application/vnd.sun.xml.calc=VisiData.desktop
application/vnd.sun.xml.calc.template=VisiData.desktop
application/msexcel=VisiData.desktop
application/vnd.ms-excel=VisiData.desktop
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet=VisiData.desktop
application/vnd.ms-excel.sheet.macroenabled.12=VisiData.desktop
application/vnd.openxmlformats-officedocument.spreadsheetml.template=VisiData.desktop
application/vnd.ms-excel.template.macroenabled.12=VisiData.desktop
application/vnd.ms-excel.sheet.binary.macroenabled.12=VisiData.desktop
application/x-dbf=VisiData.desktop
text/spreadsheet=VisiData.desktop
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Again, this is mostly yoinking Libre Office Calc-related entries out of &lt;code&gt;/usr/share/applications/gnome-mimeapps.list&lt;/code&gt; and overriding them to target VisiData for my user.&lt;/p&gt;

</description>
      <category>flatpak</category>
      <category>visidata</category>
      <category>tmux</category>
      <category>todayilearned</category>
    </item>
    <item>
      <title>Hanukkah of Data 2022 - Puzzle 8 and Wrap-Up</title>
      <dc:creator>AJ Kerrigan</dc:creator>
      <pubDate>Fri, 30 Dec 2022 21:58:21 +0000</pubDate>
      <link>https://dev.to/ajkerrigan/hanukkah-of-data-2022-puzzle-8-p9i</link>
      <guid>https://dev.to/ajkerrigan/hanukkah-of-data-2022-puzzle-8-p9i</guid>
      <description>&lt;p&gt;&lt;em&gt;&lt;a href="https://hanukkah.bluebird.sh/" rel="noopener noreferrer"&gt;Hanukkah of Data&lt;/a&gt; is a series of data-themed puzzles, where you solve puzzles to move your way through a holiday-themed story using a fictional dataset. See the &lt;a href="https://www.visidata.org/blog/2022/hanukkah-of-data/" rel="noopener noreferrer"&gt;introductory post&lt;/a&gt; for a bit more detail, but the pitch in my head is "&lt;a href="https://adventofcode.com/" rel="noopener noreferrer"&gt;Advent of Code&lt;/a&gt; meets &lt;a href="https://mystery.knightlab.com/" rel="noopener noreferrer"&gt;SQL Murder Mystery&lt;/a&gt;". This post walks through my approach to the eighth puzzle.&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Thinking&lt;/li&gt;
&lt;li&gt;Doing (Datasette)&lt;/li&gt;
&lt;li&gt;Doing (Pandas)&lt;/li&gt;
&lt;li&gt;Wrap-Up&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Warning:&lt;/strong&gt; This post unavoidably contains spoilers. If you'd rather do the puzzles on your own first, please close this browser tab and run away :).&lt;/p&gt;

&lt;h2&gt;
  
  
  Thinking
&lt;/h2&gt;

&lt;p&gt;For the last puzzle, we're looking for the person who still probably has the rug! Turns out he's a bit of a pack rat:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;She wound up getting a newer and more expensive rug, so she gave it to an acquaintance of hers who collects all sorts of junk. Apparently he owns an entire set of Noah’s collectibles! He probably still has the rug, even.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The plan here will be find what counts as a Noah's collectible and see who has the most of them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Doing (Datasette)
&lt;/h2&gt;

&lt;p&gt;Based on a peek at the products table, it looks like collectibles all have a 'COL' sku prefix. So we should be able to find customers who have ordered the most distinct items with that prefix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;phone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;distinct&lt;/span&gt; &lt;span class="n"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;colcount&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;
  &lt;span class="n"&gt;noahs_order_detail&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt;
  &lt;span class="n"&gt;sku&lt;/span&gt; &lt;span class="k"&gt;like&lt;/span&gt; &lt;span class="s1"&gt;'COL%'&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;phone&lt;/span&gt;
&lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt;
  &lt;span class="n"&gt;colcount&lt;/span&gt; &lt;span class="k"&gt;desc&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And yes, Travis has more than 2x the collectible count of any other customer!&lt;/p&gt;

&lt;h2&gt;
  
  
  Doing (Pandas)
&lt;/h2&gt;

&lt;p&gt;A pretty straight SQL--&amp;gt;pandas translation seems to work fine here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;^COL&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;phone&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;agg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sku&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;nunique&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sku&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;tail&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Wrap-Up
&lt;/h2&gt;

&lt;p&gt;This was a lot of fun! Some specific takeaways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Huge thanks to &lt;a href="https://bluebird.sh/about/#" rel="noopener noreferrer"&gt;The Devottys&lt;/a&gt; for putting this challenge together

&lt;ul&gt;
&lt;li&gt;Saul &amp;amp; Anja pour a lot of heart and smarts into VisiData and the data/terminal-loving community&lt;/li&gt;
&lt;li&gt;I'm consistently impressed by Dwimmer's artwork, he's magic at the terminal&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;There's something dangerous or funky about using &lt;code&gt;category&lt;/code&gt; dtypes in pandas with &lt;code&gt;groupby()&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;I got various errors or saw operations take 40x longer, though this is at least partly my inexperience / ignorance&lt;/li&gt;
&lt;li&gt;Issues like &lt;a href="https://github.com/pandas-dev/pandas/issues/36698" rel="noopener noreferrer"&gt;https://github.com/pandas-dev/pandas/issues/36698&lt;/a&gt; suggest I'm not alone!&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Working through problems with multiple tools can sometimes help you rethink your approach in all of them&lt;/li&gt;

&lt;li&gt;You don't have to &lt;a href="https://til.simonwillison.net/sqlite" rel="noopener noreferrer"&gt;be Simon Willison&lt;/a&gt; to get a lot out of SQLite&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;I should also point out the official wrap-up post &lt;a href="https://www.visidata.org/blog/2022/hanukkah-of-data-wrap-up/" rel="noopener noreferrer"&gt;here&lt;/a&gt;, which links to other community posts.&lt;/p&gt;

</description>
      <category>hanukkahofdata</category>
      <category>datasette</category>
      <category>python</category>
      <category>pandas</category>
    </item>
    <item>
      <title>Hanukkah of Data 2022 - Puzzle 7</title>
      <dc:creator>AJ Kerrigan</dc:creator>
      <pubDate>Fri, 30 Dec 2022 21:27:07 +0000</pubDate>
      <link>https://dev.to/ajkerrigan/hanukkah-of-data-2022-puzzle-7-2699</link>
      <guid>https://dev.to/ajkerrigan/hanukkah-of-data-2022-puzzle-7-2699</guid>
      <description>&lt;p&gt;&lt;em&gt;&lt;a href="https://hanukkah.bluebird.sh/"&gt;Hanukkah of Data&lt;/a&gt; is a series of data-themed puzzles, where you solve puzzles to move your way through a holiday-themed story using a fictional dataset. See the &lt;a href="https://www.visidata.org/blog/2022/hanukkah-of-data/"&gt;introductory post&lt;/a&gt; for a bit more detail, but the pitch in my head is "&lt;a href="https://adventofcode.com/"&gt;Advent of Code&lt;/a&gt; meets &lt;a href="https://mystery.knightlab.com/"&gt;SQL Murder Mystery&lt;/a&gt;". This post walks through my approach to the seventh puzzle.&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Thinking&lt;/li&gt;
&lt;li&gt;
Doing (Datasette)

&lt;ul&gt;
&lt;li&gt;Emily's Orders&lt;/li&gt;
&lt;li&gt;Orders on Emily Days&lt;/li&gt;
&lt;li&gt;Matching by SKU Prefix&lt;/li&gt;
&lt;li&gt;Fine, I'll Filter on Description&lt;/li&gt;
&lt;li&gt;The Whole is Uglier Than the Sum of Its Parts&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;
Doing (Pandas)

&lt;ul&gt;
&lt;li&gt;Tricky Colors&lt;/li&gt;
&lt;li&gt;Getting Chainy&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Warning:&lt;/strong&gt; This post unavoidably contains spoilers. If you'd rather do the puzzles on your own first, please close this browser tab and run away :).&lt;/p&gt;

&lt;h2&gt;
  
  
  Thinking
&lt;/h2&gt;

&lt;p&gt;It turns out that one day while Emily was robbing Noah's blind, another customer bought the same item in a different color:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Well I turned around and sure enough this cute guy was holding something I had bought. He said ‘I got almost exactly the same thing!’ We laughed about it and wound up swapping items because he had wanted the color I got. We had a moment when our eyes met and my heart stopped for a second. I asked him to get some food with me and we spent the rest of the day together.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So we'll need to find orders on the same day as Emily, where the item is &lt;em&gt;almost&lt;/em&gt; the same as what Emily bought...&lt;/p&gt;

&lt;p&gt;Tackling this in pandas felt a bit daunting to me, so I figured I'd start with SQL. Looking through the data I saw that color &lt;em&gt;usually&lt;/em&gt; showed up inside parentheses in a description. That seemed something worth pursuing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Doing (Datasette)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Emily's Orders
&lt;/h3&gt;

&lt;p&gt;Breaking the problem down into smaller pieces, I figured we'd want to have Emily's orders first. So I set up a common table expression for those:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;emily_orders&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;select&lt;/span&gt;
    &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordered&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;desc&lt;/span&gt;
  &lt;span class="k"&gt;from&lt;/span&gt;
    &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
    &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customerid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customerid&lt;/span&gt;
    &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;orders_items&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orderid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orderid&lt;/span&gt;
    &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;
  &lt;span class="k"&gt;where&lt;/span&gt;
    &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Emily Randolph'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Orders on Emily Days
&lt;/h3&gt;

&lt;p&gt;Then I pulled in non-Emily orders that were placed on the same day as one of Emily's:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citystatezip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;phone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordered&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;em&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordered&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;em_ordered&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;em&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;em_sku&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;desc&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;
  &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
  &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customerid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customerid&lt;/span&gt;
    &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordered&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;select&lt;/span&gt;
        &lt;span class="k"&gt;distinct&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ordered&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;from&lt;/span&gt;
        &lt;span class="n"&gt;emily_orders&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That was too many results, so I fine-tuned the order time to look for non-Emily orders placed within 5 minutes of an Emily order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;  &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;emily_orders&lt;/span&gt; &lt;span class="n"&gt;em&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s1"&gt;'Emily Randolph'&lt;/span&gt;
    &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="k"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'%s'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordered&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'%s'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;em&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordered&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Matching by SKU Prefix
&lt;/h3&gt;

&lt;p&gt;That was still too many results, so I tried filtering on orders that matched the sku prefix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;  &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;emily_orders&lt;/span&gt; &lt;span class="n"&gt;em&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s1"&gt;'Emily Randolph'&lt;/span&gt;
  &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;orders_items&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orderid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orderid&lt;/span&gt;
  &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt;
  &lt;span class="n"&gt;substr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;substr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;em&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Fine, I'll Filter on Description
&lt;/h3&gt;

&lt;p&gt;I was trying to avoid comparing descriptions, but ran out of other ideas before finding a definitive match. So I tried matching just the portion of a description in front of any parentheses by adding this to the select:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;  &lt;span class="k"&gt;case&lt;/span&gt;
    &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="n"&gt;instr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'('&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;desc&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;substr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;instr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'('&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;stripped_desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;case&lt;/span&gt;
    &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="n"&gt;instr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;em&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'('&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="n"&gt;em&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;desc&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;substr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;em&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;instr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;em&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'('&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;em_stripped_desc&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And matching it in the &lt;code&gt;WHERE&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;where&lt;/span&gt;
  &lt;span class="n"&gt;substr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;substr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;em&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;stripped_desc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;em_stripped_desc&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This felt pretty awkward to me as any latent SQL Server or Postgres instincts were useless. But it did work!&lt;/p&gt;

&lt;p&gt;I think there are better ways to do this with extensions and/or full-text search support. This felt like a really ugly way to work around a lack of fuzzy matching or string splitting operations.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Whole is Uglier Than the Sum of Its Parts
&lt;/h3&gt;

&lt;p&gt;That was a bumpy ride, and I ended up with this monstrosity:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;emily_orders&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;select&lt;/span&gt;
    &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordered&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;desc&lt;/span&gt;
  &lt;span class="k"&gt;from&lt;/span&gt;
    &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
    &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customerid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customerid&lt;/span&gt;
    &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;orders_items&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orderid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orderid&lt;/span&gt;
    &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;
  &lt;span class="k"&gt;where&lt;/span&gt;
    &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Emily Randolph'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;phone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordered&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;em&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordered&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;em_ordered&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;em&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;em_sku&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;case&lt;/span&gt;
    &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="n"&gt;instr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'('&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;desc&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;substr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;instr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'('&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;stripped_desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;em&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;case&lt;/span&gt;
    &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="n"&gt;instr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;em&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'('&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="n"&gt;em&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;desc&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;substr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;em&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;instr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;em&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'('&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;em_stripped_desc&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;
  &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
  &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customerid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customerid&lt;/span&gt;
  &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordered&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;select&lt;/span&gt;
      &lt;span class="k"&gt;distinct&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ordered&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;from&lt;/span&gt;
      &lt;span class="n"&gt;emily_orders&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;-- rough cut&lt;/span&gt;
  &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;emily_orders&lt;/span&gt; &lt;span class="n"&gt;em&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s1"&gt;'Emily Randolph'&lt;/span&gt;
  &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="k"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'%s'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordered&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'%s'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;em&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordered&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt; &lt;span class="c1"&gt;-- fine tune&lt;/span&gt;
  &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;orders_items&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orderid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orderid&lt;/span&gt;
  &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt;
  &lt;span class="n"&gt;substr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;substr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;em&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;stripped_desc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;em_stripped_desc&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But also an answer, so I'll take it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Doing (Pandas)
&lt;/h2&gt;

&lt;p&gt;Because the SQL approach felt so gross to me, I tried to rethink it rather than just "porting" it to pandas. The start was the same - finding a group of not-Emily orders on the same day as Emily orders:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'ordered_date'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordered&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;
&lt;span class="n"&gt;emily&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"Emily Randolph"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;not_emily&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s"&gt;"Emily Randolph"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;merged&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;not_emily&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;emily&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"ordered_date"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But then I tried to come up with a better way to indicate "descriptions that differ just by color". This was tricky! I experimented a bit with &lt;a href="https://docs.python.org/3/library/difflib.html"&gt;difflib&lt;/a&gt; which is fun for looking at close string matches. But in the end I leaned on a little helper function to split a description into a set of lowercased words with the parentheses and other punctuation discarded:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;re&lt;/span&gt;

&lt;span class="n"&gt;stemset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;findall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s"&gt;"\w+"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which I could use in a &lt;code&gt;transform()&lt;/code&gt;, and then use &lt;code&gt;apply()&lt;/code&gt; to find the symmetric difference of those sets between each Emily and not-Emily order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;merged&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"desc_diff"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;merged&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;"desc_x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;stemset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"desc_y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;stemset&lt;/span&gt;&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nb"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;desc_x&lt;/span&gt; &lt;span class="o"&gt;^&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;desc_y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because the color wasn't &lt;em&gt;always&lt;/em&gt; present in an item description, I figured I could look for cases where the symmetric difference between description wordsets had either 1 or 2 words. Combining that with the "orders within 5 minutes of each other" condition, that left this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;merged&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;merged&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordered_x&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordered_y&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;total_seconds&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;
        &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;desc_diff&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Tricky Colors
&lt;/h3&gt;

&lt;p&gt;I already had the answer by this point, but there was one thing nagging me: how would I check for specifically a &lt;em&gt;color&lt;/em&gt; difference rather than just a 1-2 &lt;em&gt;word&lt;/em&gt; difference? This is where I had to shake an angry yet appreciative fist at the puzzle design. Because while I could find lists of color names in packages like &lt;a href="https://github.com/ubernostrum/webcolors/blob/3a072f02027c433a40a0b9c35201f57280506f86/src/webcolors/constants.py#L88-L236"&gt;webcolors&lt;/a&gt;, matplotlib, Crayola APIs, etc... the relevant colors in this puzzle were outside the core set of Crayola/HTML/CSS colors. The best alternative I found was to pull out a distinct set of colors like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;colors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s"&gt;'\(([a-z]+)\)'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groups&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
  &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;unique&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;...and then check to be sure my &lt;code&gt;desc_diff&lt;/code&gt; column contained only words from that set. In this case that ended up not being necessary or useful, but hey I was curious.&lt;/p&gt;

&lt;h3&gt;
  
  
  Getting Chainy
&lt;/h3&gt;

&lt;p&gt;Some pandas users like to chain their methods together rather than leaving a trail of intermediate variables. I don't have a consistent preference, but it feels like good practice to work both ways. So if I tried to chainify this pandas code, it would look something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s"&gt;"Emily Randolph"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"Emily Randolph"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"ordered_date"&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;desc_diff&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s"&gt;"desc_x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;stemset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"desc_y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;stemset&lt;/span&gt;&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nb"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;desc_x&lt;/span&gt; &lt;span class="o"&gt;^&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;desc_y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordered_x&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordered_y&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;total_seconds&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;
            &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;desc_diff&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I hope that when I browse other people's solutions I find a cleaner chained version, because this looks gross to me 😅.&lt;/p&gt;

</description>
      <category>hanukkahofdata</category>
      <category>datasette</category>
      <category>python</category>
      <category>pandas</category>
    </item>
    <item>
      <title>Hanukkah of Data 2022 - Puzzle 6</title>
      <dc:creator>AJ Kerrigan</dc:creator>
      <pubDate>Fri, 30 Dec 2022 21:07:43 +0000</pubDate>
      <link>https://dev.to/ajkerrigan/hanukkah-of-data-2022-puzzle-6-ljk</link>
      <guid>https://dev.to/ajkerrigan/hanukkah-of-data-2022-puzzle-6-ljk</guid>
      <description>&lt;p&gt;&lt;em&gt;&lt;a href="https://hanukkah.bluebird.sh/"&gt;Hanukkah of Data&lt;/a&gt; is a series of data-themed puzzles, where you solve puzzles to move your way through a holiday-themed story using a fictional dataset. See the &lt;a href="https://www.visidata.org/blog/2022/hanukkah-of-data/"&gt;introductory post&lt;/a&gt; for a bit more detail, but the pitch in my head is "&lt;a href="https://adventofcode.com/"&gt;Advent of Code&lt;/a&gt; meets &lt;a href="https://mystery.knightlab.com/"&gt;SQL Murder Mystery&lt;/a&gt;". This post walks through my approach to the sixth puzzle.&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Thinking&lt;/li&gt;
&lt;li&gt;Doing (Pandas)&lt;/li&gt;
&lt;li&gt;Doing (Datasette)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Warning:&lt;/strong&gt; This post unavoidably contains spoilers. If you'd rather do the puzzles on your own first, please close this browser tab and run away :).&lt;/p&gt;

&lt;h2&gt;
  
  
  Thinking
&lt;/h2&gt;

&lt;p&gt;Day 6 brings us a cheapskate!&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“She refused to buy a new rug for herself–she said they were way too expensive. She’s always been very frugal, and she clips every coupon and shops every sale at Noah’s Market. In fact I like to tease her that Noah actually loses money whenever she comes in the store.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So we're looking for someone who's buying things on sale...&lt;/p&gt;

&lt;h2&gt;
  
  
  Doing (Pandas)
&lt;/h2&gt;

&lt;p&gt;We have the wholesale cost of items, along with the unit price and quantity of each order. That means we can define the profit for each as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'profit'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;qty&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unit_price&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wholesale_cost&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which means we can see which customer represented the lowest profit (or greatest loss) in aggregate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'name'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s"&gt;'phone'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;agg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;'profit'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s"&gt;'sum'&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'profit'&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Emily, you're a thief!&lt;/p&gt;

&lt;h2&gt;
  
  
  Doing (Datasette)
&lt;/h2&gt;

&lt;p&gt;Having a &lt;code&gt;noahs_order_detail&lt;/code&gt; view with the same structure as my pandas DataFrame makes this conversion pretty smooth:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;phone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;qty&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;unit_price&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;wholesale_cost&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;profit&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;
  &lt;span class="n"&gt;noahs_order_detail&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;phone&lt;/span&gt;
&lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt;
  &lt;span class="n"&gt;profit&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>hanukkahofdata</category>
      <category>datasette</category>
      <category>python</category>
      <category>pandas</category>
    </item>
    <item>
      <title>Hanukkah of Data 2022 - Puzzle 5</title>
      <dc:creator>AJ Kerrigan</dc:creator>
      <pubDate>Fri, 30 Dec 2022 20:51:38 +0000</pubDate>
      <link>https://dev.to/ajkerrigan/hanukkah-of-data-2022-puzzle-5-191h</link>
      <guid>https://dev.to/ajkerrigan/hanukkah-of-data-2022-puzzle-5-191h</guid>
      <description>&lt;p&gt;&lt;em&gt;&lt;a href="https://hanukkah.bluebird.sh/" rel="noopener noreferrer"&gt;Hanukkah of Data&lt;/a&gt; is a series of data-themed puzzles, where you solve puzzles to move your way through a holiday-themed story using a fictional dataset. See the &lt;a href="https://www.visidata.org/blog/2022/hanukkah-of-data/" rel="noopener noreferrer"&gt;introductory post&lt;/a&gt; for a bit more detail, but the pitch in my head is "&lt;a href="https://adventofcode.com/" rel="noopener noreferrer"&gt;Advent of Code&lt;/a&gt; meets &lt;a href="https://mystery.knightlab.com/" rel="noopener noreferrer"&gt;SQL Murder Mystery&lt;/a&gt;". This post walks through my approach to the fifth puzzle.&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Thinking&lt;/li&gt;
&lt;li&gt;Doing (Pandas)&lt;/li&gt;
&lt;li&gt;Doing (Datasette)&lt;/li&gt;
&lt;li&gt;Refining / Reusing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Warning:&lt;/strong&gt; This post unavoidably contains spoilers. If you'd rather do the puzzles on your own first, please close this browser tab and run away :).&lt;/p&gt;

&lt;h2&gt;
  
  
  Thinking
&lt;/h2&gt;

&lt;p&gt;You know what Hanukkah of Data was missing? A cat lady! And that's where the rug went next:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“I listed it on Freecycle, and a woman in Queens Village came to pick it up. She was wearing a ‘Noah’s Market’ sweatshirt, and it was just covered in cat hair. When I suggested that a clowder of cats might ruin such a fine tapestry, she looked at me funny and said she only had ten or eleven cats and they were getting quite old and had cataracts now so they probably wouldn’t notice some old rug anyway.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Extracting some facts&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The next rug owner had a bunch of cats&lt;/li&gt;
&lt;li&gt;The cats are old&lt;/li&gt;
&lt;li&gt;She had a Noah's Market sweatshirt (seemed relevant initially)&lt;/li&gt;
&lt;li&gt;Lives in Queens Village&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Doing (Pandas)
&lt;/h2&gt;

&lt;p&gt;So who's buying cat stuff?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cat&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;case&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;))]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lots of people, cool. But who from Queens Village?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cat&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;case&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citystatezip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Queens Village&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yikes Anita, that's a ton of cat stuff. Eyeballing the results it also looks like she's specifically buying senior cat food, which jives with the clues. We can bake that into the query and roll up by quantity though, to make things even clearer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;senior.*cat&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;case&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;regex&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citystatezip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Queens Village&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;citystatezip&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;phone&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;agg&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;qty&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Doing (Datasette)
&lt;/h2&gt;

&lt;p&gt;Again the second tool feels more straightforward. The pandas work turns into this SQL query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citystatezip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;phone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;qty&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;total_items&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;
  &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
  &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customerid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customerid&lt;/span&gt;
  &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;orders_items&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orderid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orderid&lt;/span&gt;
  &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citystatezip&lt;/span&gt; &lt;span class="k"&gt;like&lt;/span&gt; &lt;span class="s1"&gt;'Queens Village%'&lt;/span&gt;
  &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;desc&lt;/span&gt; &lt;span class="k"&gt;like&lt;/span&gt; &lt;span class="s1"&gt;'%senior%cat%'&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citystatezip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;phone&lt;/span&gt;
&lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt;
  &lt;span class="n"&gt;total_items&lt;/span&gt; &lt;span class="k"&gt;desc&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Though I &lt;em&gt;did&lt;/em&gt; bump my head on this at first, as I was using &lt;code&gt;count()&lt;/code&gt; rather than &lt;code&gt;sum()&lt;/code&gt;. Whoops!&lt;/p&gt;

&lt;h2&gt;
  
  
  Refining / Reusing
&lt;/h2&gt;

&lt;p&gt;By this point I was reusing the same core query with the same joins a lot, and realized I probably should have created a view for it already. So:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;view&lt;/span&gt; &lt;span class="n"&gt;noahs_order_detail&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orderid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordered&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shipped&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;qty&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;unit_price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wholesale_cost&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;
  &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
  &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customerid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customerid&lt;/span&gt;
  &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;orders_items&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orderid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orderid&lt;/span&gt;
  &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>hanukkahofdata</category>
      <category>datasette</category>
      <category>python</category>
      <category>pandas</category>
    </item>
    <item>
      <title>Hanukkah of Data 2022 - Puzzle 4</title>
      <dc:creator>AJ Kerrigan</dc:creator>
      <pubDate>Fri, 30 Dec 2022 20:34:48 +0000</pubDate>
      <link>https://dev.to/ajkerrigan/hanukkah-of-data-2022-puzzle-4-1og5</link>
      <guid>https://dev.to/ajkerrigan/hanukkah-of-data-2022-puzzle-4-1og5</guid>
      <description>&lt;p&gt;&lt;em&gt;&lt;a href="https://hanukkah.bluebird.sh/" rel="noopener noreferrer"&gt;Hanukkah of Data&lt;/a&gt; is a series of data-themed puzzles, where you solve puzzles to move your way through a holiday-themed story using a fictional dataset. See the &lt;a href="https://www.visidata.org/blog/2022/hanukkah-of-data/" rel="noopener noreferrer"&gt;introductory post&lt;/a&gt; for a bit more detail, but the pitch in my head is "&lt;a href="https://adventofcode.com/" rel="noopener noreferrer"&gt;Advent of Code&lt;/a&gt; meets &lt;a href="https://mystery.knightlab.com/" rel="noopener noreferrer"&gt;SQL Murder Mystery&lt;/a&gt;". This post walks through my approach to the fourth puzzle.&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Thinking&lt;/li&gt;
&lt;li&gt;Doing (Pandas)&lt;/li&gt;
&lt;li&gt;Doing (Datasette)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Warning:&lt;/strong&gt; This post unavoidably contains spoilers. If you'd rather do the puzzles on your own first, please close this browser tab and run away :).&lt;/p&gt;

&lt;h2&gt;
  
  
  Thinking
&lt;/h2&gt;

&lt;p&gt;On the fourth day, we find a woman who likes to eat pastries early in the morning:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“A few weeks later my bike chain broke on the way home, and I needed to get it fixed before work the next day. Thankfully, this woman I met on Tinder came over at 5am with her bike chain repair kit and some pastries from Noah’s. Apparently she liked to get up before dawn and claim the first pastries that came out of the oven.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Doing (Pandas)
&lt;/h2&gt;

&lt;p&gt;This time I turned things around and started with Pandas. Because I had explored the data a bit prior to starting puzzles, I noticed that the &lt;code&gt;sku&lt;/code&gt; column started with a category/department prefix. Baked goods all seemed to have a bakery prefix of &lt;code&gt;BKY&lt;/code&gt;, so finding orders with that sku prefix seemed like a good start:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;BKY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That was far too many results (as expected), so the next step was to find bakery orders early in the morning. Let's say between 4am and 9am, which should cover cases where someone got up before dawn and got the first pastries out of the oven:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;BKY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordered&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hour&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;between&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;))]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That was still a bunch of rows! But the clues make it sound like this is a habit. So how about looking at which customers get those early morning pastries most often?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;BKY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordered&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hour&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;between&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;phone&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;tail&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That suggests Cristina Booker as the clear winner, and narrowing that time range from 4-9am to 4-6am makes it even clearer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Doing (Datasette)
&lt;/h2&gt;

&lt;p&gt;Adapting the pandas logic to SQL left me with this query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;phone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ordercount&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;
  &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
  &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customerid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customerid&lt;/span&gt;
  &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;orders_items&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orderid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orderid&lt;/span&gt;
  &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt;
  &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt; &lt;span class="k"&gt;like&lt;/span&gt; &lt;span class="s1"&gt;'BKY%'&lt;/span&gt;
  &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="k"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'%H'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ordered&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;phone&lt;/span&gt;
&lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;ordercount&lt;/span&gt; &lt;span class="k"&gt;desc&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>hanukkahofdata</category>
      <category>datasette</category>
      <category>python</category>
      <category>pandas</category>
    </item>
    <item>
      <title>Hanukkah of Data 2022 - Puzzle 3</title>
      <dc:creator>AJ Kerrigan</dc:creator>
      <pubDate>Fri, 30 Dec 2022 20:02:14 +0000</pubDate>
      <link>https://dev.to/ajkerrigan/hanukkah-of-data-2022-puzzle-3-nnl</link>
      <guid>https://dev.to/ajkerrigan/hanukkah-of-data-2022-puzzle-3-nnl</guid>
      <description>&lt;p&gt;&lt;em&gt;&lt;a href="https://hanukkah.bluebird.sh/"&gt;Hanukkah of Data&lt;/a&gt; is a series of data-themed puzzles, where you solve puzzles to move your way through a holiday-themed story using a fictional dataset. See the &lt;a href="https://www.visidata.org/blog/2022/hanukkah-of-data/"&gt;introductory post&lt;/a&gt; for a bit more detail, but the pitch in my head is "&lt;a href="https://adventofcode.com/"&gt;Advent of Code&lt;/a&gt; meets &lt;a href="https://mystery.knightlab.com/"&gt;SQL Murder Mystery&lt;/a&gt;". This post walks through my approach to the third puzzle.&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
Thinking

&lt;ul&gt;
&lt;li&gt;Tool Choices&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;
Doing (Datasette)

&lt;ul&gt;
&lt;li&gt;Joining and Selecting&lt;/li&gt;
&lt;li&gt;Filtering&lt;/li&gt;
&lt;li&gt;All Together Now&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;Doing (Pandas)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Warning:&lt;/strong&gt; This post unavoidably contains spoilers. If you'd rather do the puzzles on your own first, please close this browser tab and run away :).&lt;/p&gt;

&lt;h2&gt;
  
  
  Thinking
&lt;/h2&gt;

&lt;p&gt;In the third puzzle, we learn that the rug passed from a contractor to someone in his neighborhood:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“At last I couldn’t deal with the rug taking up my whole bathtub, so I gave it to this guy who lived in my neighborhood. He said that he was naturally assertive because he was a Aries born in the year of the Dog, so maybe he was able to clean it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Extracting some facts about the guy in the contractor's neighborhood:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Aries

&lt;ul&gt;
&lt;li&gt;Birth date between March 21 and April 19&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;Born in the year of the dog

&lt;ul&gt;
&lt;li&gt;1958, 1970, etc&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;Lives in the contractor's neighborhood

&lt;ul&gt;
&lt;li&gt;To start, look at the same zip code&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tool Choices
&lt;/h3&gt;

&lt;p&gt;VisiData would work great for this puzzle, but I was having fun practicing with the combo of Datasette and Pandas so I kept rolling with it for the rest of the challenge.&lt;/p&gt;

&lt;h2&gt;
  
  
  Doing (Datasette)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Joining and Selecting
&lt;/h3&gt;

&lt;p&gt;Here we're only looking at customer data, so there are no joins required. A select/from block like this works just fine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;birthdate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citystatezip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;phone&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;
  &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Filtering
&lt;/h3&gt;

&lt;p&gt;We're looking for someone in the same neighborhood as the contractor from the previous puzzle. So let's look in that zip code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;where&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citystatezip&lt;/span&gt; &lt;span class="k"&gt;like&lt;/span&gt; &lt;span class="s1"&gt;'%11420'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;...who was born in the year of the dog. A search took me to &lt;a href="https://www.chinahighlights.com/travelguide/chinese-zodiac/dog.htm"&gt;this page&lt;/a&gt; which says that's every 12 years, most recently 2018. In a Python REPL, &lt;code&gt;2018 % 12 == 2&lt;/code&gt; so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;  &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'%Y'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;birthdate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The last bit looking for an Aries is a nice touch - a fun subtle bit of puzzle design from the Devottys. Because horoscope borders cross months, it makes things just a bit trickier than looking at birth month alone. My first attempt was to convert the birthdate to "date of the year" (with a bit of padding to cover leap years), but that felt a bit quirky afterward. Tweaking it to use this felt a little clearer and more explicit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;  &lt;span class="c1"&gt;-- March 21 - April 19&lt;/span&gt;
  &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="k"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'%m%d'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;birthdate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;between&lt;/span&gt; &lt;span class="mi"&gt;321&lt;/span&gt; &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="mi"&gt;419&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  All Together Now
&lt;/h3&gt;

&lt;p&gt;That combines into a tidier/simpler than Puzzle 2:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="k"&gt;distinct&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;birthdate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citystatezip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;phone&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;
  &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
  &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customerid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customerid&lt;/span&gt;
  &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;orders_items&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orderid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orderid&lt;/span&gt;
  &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sku&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt;
  &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citystatezip&lt;/span&gt; &lt;span class="k"&gt;like&lt;/span&gt; &lt;span class="s1"&gt;'%11420'&lt;/span&gt;
  &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'%Y'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;birthdate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
  &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="k"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'%m%d'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;birthdate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;between&lt;/span&gt; &lt;span class="mi"&gt;321&lt;/span&gt; &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="mi"&gt;419&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Doing (Pandas)
&lt;/h2&gt;

&lt;p&gt;After doing all the thinking and building a SQL query, the Pandas piece feels a little anti-climactic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;birthdate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;day_of_year&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;between&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;79&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;111&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;birthdate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citystatezip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'11420'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using the arguably clearer "month + day" formulation felt a bit more awkward in Pandas, but maybe that's because there's a better way to do this than faithfully porting my SQL 🤔:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;birthdate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'%m%d'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;between&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;321&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;419&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;birthdate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citystatezip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'11420'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>hanukkahofdata</category>
      <category>datasette</category>
      <category>python</category>
      <category>pandas</category>
    </item>
  </channel>
</rss>
