The Problem
You are given a table, Projects
, containing three columns: Task_ID
, Start_Date
, and End_Date
. It is guaranteed that the difference between the End_Date
and the Start_Date
is equal to 1 day for each row in the table.
If the End_Date
of the tasks are consecutive, then they are part of the same project. Samantha is interested in finding the total number of different projects completed.
Write a query to output the start and end dates of projects listed by the number of days it took to complete the project in ascending order. If there is more than one project that have the same number of completion days, then order by the start date of the project.
Explanation
Input:
Task_ID | Start_Date | End_Date |
---|---|---|
1 | 2015-10-01 | 2015-10-02 |
2 | 2015-10-02 | 2015-10-03 |
3 | 2015-10-03 | 2015-10-04 |
4 | 2015-10-13 | 2015-10-14 |
5 | 2015-10-14 | 2015-10-15 |
6 | 2015-10-28 | 2015-10-29 |
7 | 2015-10-30 | 2015-10-31 |
Output:
Project_Start_Date | Project_End_Date |
---|---|
2015-10-28 | 2015-10-29 |
2015-10-30 | 2015-10-31 |
2015-10-13 | 2015-10-15 |
2015-10-01 | 2015-10-04 |
The Solution
Here, we'll discuss two different SQL approaches that make use of window functions and date arithmetic to solve this problem. Both approaches have different methods of grouping tasks into projects.
Source Code 1
This approach groups tasks into projects based on their Start_Date
. If a task's Start_Date
is consecutive to another task's End_Date
, they belong to the same project. The ROW_NUMBER
window function generates a sequence of numbers, which is subtracted from the Start_Date
to create a grouping identifier.
WITH CTE AS (
SELECT
Start_Date,
End_Date,
DATEADD(day, - ROW_NUMBER() OVER(ORDER BY Start_Date), Start_Date) [Grouping]
FROM Projects
)
SELECT
MIN(Start_Date) [Project_Start_Date],
MAX(End_Date) [Project_End_Date]
FROM CTE
GROUP BY Grouping
ORDER BY DATEDIFF(day, MIN(Start_Date), MAX(End_Date)), MIN(Start_Date)
Source Code 2
This approach, on the other hand, groups tasks into projects based on their End_Date
. Similar to the first approach, ROW_NUMBER
is used to generate a sequence of numbers, which is subtracted from the End_Date
to create a grouping identifier.
WITH CTE AS (
SELECT
Start_Date,
End_Date,
DATEADD(day, - ROW_NUMBER() OVER(ORDER BY End_Date), End_Date) [Grouping]
FROM Projects
)
SELECT
MIN(Start_Date) [Project_Start_Date],
MAX(End_Date) [Project_End_Date]
FROM CTE
GROUP BY Grouping
ORDER BY DATEDIFF(day, MIN(Start_Date), MAX(End_Date)), MIN(Start_Date)
Conclusion
Both of these solutions offer an efficient way to tackle the problem using window functions and date arithmetic. They provide an excellent showcase of some of the more advanced features of SQL and how they can be used to solve complex problems.
While these solutions have been shown to be effective on HackerRank, it's important to keep in mind that performance can vary based on different factors like the specific RDBMS used, the database structure, and the size of the data.
You can find the original problem at HackerRank.
For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.
Top comments (0)