#SQL30 Day 9: Longest Span

#sql #database #postgres #challenge

Welcome to the SQL showdown series!

What is this and how does it work?

I'm committing to publishing a SQL challenge every day for 30 days. In each post I'll describe my solution to the last day's challenge. I'll follow that up with a full description of the challenge for the next day.

Write your own solution in the comments! Let's see who can come up with the most creative solutions.

I'll add connection details for a PostgreSQL database containing test data for the challenge. Solutions are by no means limited to PostgreSQL, but it's there if you want an easy way to test your query!

Challenge #9: Longest Span

For this challenge we'll revisit the stock market data from #2, but this time we'll try something a bit more difficult.

Here's the challenge:

Given a table of daily trading data, can you find the longest stretch of consecutive days during which the stock price rose?

Here's an example output showing roughly what you're looking for.

Sandbox Connection Details

I have a PostgreSQL database ready for you to play with.

The password is the same as the username! To query the stockprice table:

SELECT * FROM day9.stockprice;

Solution to Challenge #8: Duplicates

This is the question we were trying to answer with yesterday's SQL challenge:

Can you write a select query that could be used in a delete statement to make the rows of a table unique according to a natural key?

The solution to this question proved to be trickier than I would have thought. If we wanted to delete all instances of any employee that had duplicates, it would be fairly easy.

DELETE FROM day8.employee WHERE name IN (
  SELECT name FROM day8.employee GROUP BY name HAVING count(*) > 1
);

Unfortunately, the above query will remove any employee with duplicates entirely from the table. We want exactly one reference to each employee.

One means of doing this would be to group the employees by name, and then assign a rank to each row in the group. We could then delete all rows from each group that aren't the first occurrence.

WITH employee_rnk AS (
  SELECT row_number() OVER (PARTITION BY name) AS rnk, name FROM day8.employee
)
DELETE FROM day8.employee s USING employee_rnk r WHERE s.name = r.name AND r.rnk != 1;

The row_number() window function will assign an increasing integer value to each row of a partition.

The USING clause in a delete statement acts essentially the same as having multiple tables listed in the FROM part of a query.

Unlike SQL Server, you can't perform a join inside of a delete in PostgreSQL, but oddly enough, you can put tables together by combining USING and a WHERE clause. Once the table and the CTE are joined together, it's just a matter of filtering down to where the row number is not equal to 1, indicating a duplicate to be deleted.