DEV Community

AJ Kerrigan
AJ Kerrigan

Posted on

1

Hanukkah of Data 2022 - Puzzle 8 and Wrap-Up

Hanukkah of Data is a series of data-themed puzzles, where you solve puzzles to move your way through a holiday-themed story using a fictional dataset. See the introductory post for a bit more detail, but the pitch in my head is "Advent of Code meets SQL Murder Mystery". This post walks through my approach to the eighth puzzle.

Warning: This post unavoidably contains spoilers. If you'd rather do the puzzles on your own first, please close this browser tab and run away :).

Thinking

For the last puzzle, we're looking for the person who still probably has the rug! Turns out he's a bit of a pack rat:

She wound up getting a newer and more expensive rug, so she gave it to an acquaintance of hers who collects all sorts of junk. Apparently he owns an entire set of Noah’s collectibles! He probably still has the rug, even.

The plan here will be find what counts as a Noah's collectible and see who has the most of them.

Doing (Datasette)

Based on a peek at the products table, it looks like collectibles all have a 'COL' sku prefix. So we should be able to find customers who have ordered the most distinct items with that prefix:

select
  name,
  phone,
  count(distinct sku) as colcount
from
  noahs_order_detail
where
  sku like 'COL%'
group by
  name,
  phone
order by
  colcount desc
Enter fullscreen mode Exit fullscreen mode

And yes, Travis has more than 2x the collectible count of any other customer!

Doing (Pandas)

A pretty straight SQL-->pandas translation seems to work fine here:

df[
  df.sku.str.contains('^COL')
].groupby(
  ['name','phone']
).agg(
  {'sku':'nunique'}
).sort_values(by='sku').tail()
Enter fullscreen mode Exit fullscreen mode

Wrap-Up

This was a lot of fun! Some specific takeaways:

  • Huge thanks to The Devottys for putting this challenge together
    • Saul & Anja pour a lot of heart and smarts into VisiData and the data/terminal-loving community
    • I'm consistently impressed by Dwimmer's artwork, he's magic at the terminal
  • There's something dangerous or funky about using category dtypes in pandas with groupby()
  • Working through problems with multiple tools can sometimes help you rethink your approach in all of them
  • You don't have to be Simon Willison to get a lot out of SQLite

I should also point out the official wrap-up post here, which links to other community posts.

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay