## DEV Community

zchtodd

Posted on • Updated on

# #SQL30 Day 1: Wildfires

Welcome to the SQL showdown series!

### What is this and how does it work?

I'm committing to publishing a SQL challenge every day for 30 days. In each post I'll describe my solution to the last day's challenge. I'll follow that up with a full description of the challenge for the next day.

Write your own solution in the comments! Let's see who can come up with the most creative solutions.

I'll add connection details for a PostgreSQL database containing test data for the challenge. Solutions are by no means limited to PostgreSQL, but it's there if you want an easy way to test your query!

### Challenge #1: Wildfire Causes

In this challenge we'll use the 1.88 Million Wildfires data-set from Kaggle.

The question is:

What are the details of the largest recorded wildfires (by acreage) for each potential cause of wildfires?

We want the name of the fire, its cause, the state it happened in, the date it was discovered, and the date at which it was considered contained. Each output row should be the largest fire that has ever occurred for that particular cause of wildfire.

The wildfire table has the following columns:

• name
• cause
• acres
• disc_date
• cont_date
• state

### Sandbox Connection Details

I have a PostgreSQL database ready for you to play with.

The password is the same as the username! To query the wildfire table:

``````SELECT * FROM day1.wildfire;
``````

### Good luck!

Have fun, and I can't wait to see what you come up with! I'll be back tomorrow with a solution to this problem and a new challenge.

Helen Anderson

What a great idea, thanks for setting this up :D

This may be a little convoluted but ....

I would check out how many causes there were first:

``````select distinct cause
from day1.wildfire
order by cause

-- 13 rows

``````

Then use a Window Function to order the fires by their cause and assign a rank in descending order.

Then use a CTE to isolate the Window Function results and return all rows where the rank = 1.

``````with rankings as
(select
fire_name,
cause,
disc_date,
cont_date,
state,
acres,
dense_rank() over(partition by cause order by acres desc) as fire_rank
from
day1.wildfire
order by cause, fire_rank
)

select *
from rankings
where rankings.fire_rank = 1

-- 20 seconds to execute

``````

Then do some checks to make sure I've got the right results

``````select fire_name, max(acres) as acresmax
from day1.wildfire
where cause  = 'Campfire'
group by fire_name
order by acresmax desc
``````

zchtodd

That's an awesome solution, thanks for writing that. I don't think it's convoluted at all.

In my solution I went without window functions and did a join to a subquery. The interesting thing is that I could potentially join to the wrong row if there were more than 1 fire with the same cause and max acres burned, whereas here I think any ties would end up with the same rank, but it's guaranteed that there's no mixing and matching of data.

I hope you like the rest of the series! I started it because I was one of those developers that thought they knew SQL, when really I knew SQL-92, and modern SQL has so much more to offer.

Sam Mason

bit late to the party! google pointed me to here and it seems interesting, hopefully I can get to the end...

given this is Postgres I'd use the `DISTINCT ON` operator, as it seems perfectly suited to the task:

``````SELECT DISTINCT ON (cause) * FROM wildfire ORDER BY cause, acres DESC;
``````

in this case you can also use an `IN` query:

``````SELECT *
FROM day1.wildfire
WHERE (cause, acres) IN (
SELECT cause, MAX(acres) AS acres
FROM day1.wildfire
GROUP BY cause
)
ORDER BY acres DESC;
``````

which gives me the same thing. but has different semantics so does different things in the presence of non-unique values. for some reason I thought it could do unexpected things when `NULL` values were involved, but a quick read suggests that's just `NOT IN`

Saurabh Sikchi

I think my query is quite a bit easier than WINDOWing and CTE:

```SELECT w1.* FROM day1.wildfire w1 INNER JOIN ( SELECT cause, MAX(acres) as max_acres FROM day1.wildfire GROUP BY cause ) w2 ON w1.cause = w2.cause AND w1.acres = w2.max_acres;```

zchtodd

Nice! I think of window functions as the heavy artillery... not always necessary when a group by might do the job.

Anjan Kant