Hanukkah of Data is a series of data-themed puzzles, where you solve puzzles to move your way through a holiday-themed story using a fictional dataset. See the introductory post for a bit more detail, but the pitch in my head is "Advent of Code meets SQL Murder Mystery". This post walks through my approach to the fifth puzzle.
Warning: This post unavoidably contains spoilers. If you'd rather do the puzzles on your own first, please close this browser tab and run away :).
Thinking
You know what Hanukkah of Data was missing? A cat lady! And that's where the rug went next:
“I listed it on Freecycle, and a woman in Queens Village came to pick it up. She was wearing a ‘Noah’s Market’ sweatshirt, and it was just covered in cat hair. When I suggested that a clowder of cats might ruin such a fine tapestry, she looked at me funny and said she only had ten or eleven cats and they were getting quite old and had cataracts now so they probably wouldn’t notice some old rug anyway.
Extracting some facts
- The next rug owner had a bunch of cats
- The cats are old
- She had a Noah's Market sweatshirt (seemed relevant initially)
- Lives in Queens Village
Doing (Pandas)
So who's buying cat stuff?
df[(df.desc.str.contains(r'cat',case=False))]
Lots of people, cool. But who from Queens Village?
df[
(df.desc.str.contains(r'cat',case=False))
& (df.citystatezip.str.contains('Queens Village'))
]
Yikes Anita, that's a ton of cat stuff. Eyeballing the results it also looks like she's specifically buying senior cat food, which jives with the clues. We can bake that into the query and roll up by quantity though, to make things even clearer:
df[
(df.desc.str.contains(r'senior.*cat', case=False, regex=True))
& (df.citystatezip.str.contains('Queens Village'))
].groupby(['name', 'citystatezip', 'phone']).agg({'qty': sum})
Doing (Datasette)
Again the second tool feels more straightforward. The pandas work turns into this SQL query:
select
c.name,
c.citystatezip,
c.phone,
sum(i.qty) as total_items
from
customers c
join orders o on c.customerid = o.customerid
join orders_items i on o.orderid = i.orderid
join products p on i.sku = p.sku
where
c.citystatezip like 'Queens Village%'
and p.desc like '%senior%cat%'
group by
c.name,
c.citystatezip,
c.phone
order by
total_items desc
Though I did bump my head on this at first, as I was using count()
rather than sum()
. Whoops!
Refining / Reusing
By this point I was reusing the same core query with the same joins a lot, and realized I probably should have created a view for it already. So:
create view noahs_order_detail as
select
c.*,
o.orderid,
o.ordered,
o.shipped,
o.total,
i.qty,
i.unit_price,
p.sku,
p.desc,
p.wholesale_cost
from
customers c
join orders o on c.customerid = o.customerid
join orders_items i on o.orderid = i.orderid
join products p on i.sku = p.sku
Top comments (0)