DEV Community

Alok Chaudhary
Alok Chaudhary

Posted on

How a simple left join can be your biggest nightmare.

What if someone ask you a basic question - You have a table item_store having stores and there corresponding item details and table item_store_old that is having legacy data for all the item's and stores, now I need to find out if item_store_old is having all the stores which are present in item_store ?

The first thought is ahh that's pretty simple I will write a query something like this

Select * from item_store store
left join
item_store_old old_store
on store.store_id = old_store.store_id
where old_store.store_id is null

In case the above query return's 0 row's I can confirm that item_store_old is having all the store_id's which are present in item_store.

When I ran the above query on BQ it kept on running for 3 hrs, and I have to kill the job.

Reason - item_store_old was having 400 million records
and item_store was having 15 million records and I was trying to join these tables which had no partition.

This is the problem statement which I faced recently at my day to day work, interested to know what all I tried before I got a working solution. I will come up with another post, let me know in the comments what all you would have tried.

Note the data was in the form of parquet files and I loaded those in BigQuery tables in order to query that data.

Top comments (0)