Forem

AJ Kerrigan
AJ Kerrigan

Posted on

1

Hanukkah of Data 2022 - Puzzle 5

Hanukkah of Data is a series of data-themed puzzles, where you solve puzzles to move your way through a holiday-themed story using a fictional dataset. See the introductory post for a bit more detail, but the pitch in my head is "Advent of Code meets SQL Murder Mystery". This post walks through my approach to the fifth puzzle.

Warning: This post unavoidably contains spoilers. If you'd rather do the puzzles on your own first, please close this browser tab and run away :).

Thinking

You know what Hanukkah of Data was missing? A cat lady! And that's where the rug went next:

“I listed it on Freecycle, and a woman in Queens Village came to pick it up. She was wearing a ‘Noah’s Market’ sweatshirt, and it was just covered in cat hair. When I suggested that a clowder of cats might ruin such a fine tapestry, she looked at me funny and said she only had ten or eleven cats and they were getting quite old and had cataracts now so they probably wouldn’t notice some old rug anyway.

Extracting some facts

  • The next rug owner had a bunch of cats
  • The cats are old
  • She had a Noah's Market sweatshirt (seemed relevant initially)
  • Lives in Queens Village

Doing (Pandas)

So who's buying cat stuff?

df[(df.desc.str.contains(r'cat',case=False))]
Enter fullscreen mode Exit fullscreen mode

Lots of people, cool. But who from Queens Village?

df[
  (df.desc.str.contains(r'cat',case=False))
  & (df.citystatezip.str.contains('Queens Village'))
]
Enter fullscreen mode Exit fullscreen mode

Yikes Anita, that's a ton of cat stuff. Eyeballing the results it also looks like she's specifically buying senior cat food, which jives with the clues. We can bake that into the query and roll up by quantity though, to make things even clearer:

df[
  (df.desc.str.contains(r'senior.*cat', case=False, regex=True))
  & (df.citystatezip.str.contains('Queens Village'))
].groupby(['name', 'citystatezip', 'phone']).agg({'qty': sum})
Enter fullscreen mode Exit fullscreen mode

Doing (Datasette)

Again the second tool feels more straightforward. The pandas work turns into this SQL query:

select
  c.name,
  c.citystatezip,
  c.phone,
  sum(i.qty) as total_items
from
  customers c
  join orders o on c.customerid = o.customerid
  join orders_items i on o.orderid = i.orderid
  join products p on i.sku = p.sku
where
  c.citystatezip like 'Queens Village%'
  and p.desc like '%senior%cat%'
group by
  c.name,
  c.citystatezip,
  c.phone
order by
  total_items desc
Enter fullscreen mode Exit fullscreen mode

Though I did bump my head on this at first, as I was using count() rather than sum(). Whoops!

Refining / Reusing

By this point I was reusing the same core query with the same joins a lot, and realized I probably should have created a view for it already. So:

create view noahs_order_detail as
select
  c.*,
  o.orderid,
  o.ordered,
  o.shipped,
  o.total,
  i.qty,
  i.unit_price,
  p.sku,
  p.desc,
  p.wholesale_cost
from
  customers c
  join orders o on c.customerid = o.customerid
  join orders_items i on o.orderid = i.orderid
  join products p on i.sku = p.sku
Enter fullscreen mode Exit fullscreen mode

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more

Top comments (0)

AWS Security LIVE!

Tune in for AWS Security LIVE!

Join AWS Security LIVE! for expert insights and actionable tips to protect your organization and keep security teams prepared.

Learn More

👋 Kindness is contagious

Discover a treasure trove of wisdom within this insightful piece, highly respected in the nurturing DEV Community enviroment. Developers, whether novice or expert, are encouraged to participate and add to our shared knowledge basin.

A simple "thank you" can illuminate someone's day. Express your appreciation in the comments section!

On DEV, sharing ideas smoothens our journey and strengthens our community ties. Learn something useful? Offering a quick thanks to the author is deeply appreciated.

Okay