Retiago Drago

Posted on

# Trips and Users | LeetCode | MSSQL

## The Problem

The challenge involves two tables: `Trips` and `Users`. The `Trips` table holds information about all taxi trips, including the unique trip id, client id, driver id, city id, status of the trip, and the date of the request. The `Users` table maintains records of all users, encompassing the unique user id, user's status (banned or not), and their role (client, driver, partner).

The task is to compute the cancellation rate each day between "2013-10-01" and "2013-10-03". The cancellation rate is defined as the ratio of the number of cancelled requests by unbanned users (both client and driver) to the total number of requests made by unbanned users that day. The cancellation rate should be expressed up to two decimal places. The tables' schemas are as follows:

Trips

Column Name Type
id int
client_id int
driver_id int
city_id int
status enum
request_at date

Users

Column Name Type
users_id int
banned enum
role enum

## Explanation

Consider the following input:

Trips

id client_id driver_id city_id status request_at
1 1 10 1 completed 2013-10-01
2 2 11 1 cancelled_by_driver 2013-10-01
3 3 12 6 completed 2013-10-01
4 4 13 6 cancelled_by_client 2013-10-01
5 1 10 1 completed 2013-10-02
6 2 11 6 completed 2013-10-02
7 3 12 6 completed 2013-10-02
8 2 12 12 completed 2013-10-03
9 3 10 12 completed 2013-10-03
10 4 13 12 cancelled_by_driver 2013-10-03

Users

users_id banned role
1 No client
2 Yes client
3 No client
4 No client
10 No driver
11 No driver
12 No driver
13 No driver

For this data, the expected output is:

Day Cancellation Rate
2013-10-01 0.33
2013-10-02 0.00
2013-10-03 0.50

### Explanation:

On 2013-10-01:

• There were 4 requests in total, 2 of which were canceled.
• The request with Id=2 was made by a banned client (User_Id=2), so it is excluded from the calculation.
• Hence, considering unbanned users, there were 3 requests in total, 1 of which was canceled.
• Therefore, the Cancellation Rate is (1 / 3) = 0.33

On 2013-10-02:

• There were 3 requests in total, none of which were canceled.
• The request with Id=6 was made by a banned client, so it is excluded from the calculation.
• Thus, considering unbanned users, there were 2 requests in total, none of which were canceled.
• Hence, the Cancellation Rate is (0 / 2) = 0.00

On 2013-10-03:

• There were 3 requests in total, 1 of which was canceled.
• The request with Id=8 was made by a banned client, so it is excluded from the calculation.
• Hence, considering unbanned users, there were 2 requests in total, 1 of which was canceled.
• Therefore, the Cancellation Rate is (1 / 2) = 0.50

The cancellation rate for each day is calculated considering only unbanned users (both client and driver) and requests that occurred on the day. If a request was made by a banned user, it is ignored in the calculation. The final cancellation rate is presented as a fraction of the total requests made by unbanned users that were canceled, rounded to two decimal places.

## The Solution

Let's delve into three different approaches to solve this problem. As we progress from Source Code 1 to Source Code 3, the code gets more efficient and easier to read, even though all three of them provide the correct results.

### Source Code 1

This SQL query makes use of a Common Table Expression (CTE) called 'cancellation_rates'. It generates a temporary result set by joining the `Trips` table with the `Users` table twice, once for the client and once for the driver. This query uses window functions `COUNT(*) OVER()` to get the number of unbanned requests and unbanned statuses for each day. It then calculates the cancellation rate in the main `SELECT` query.

This query might seem a bit complex due to the use of window functions and CTE. However, it provides a clear separation of calculating the unbanned request count and the actual calculation of cancellation rate.

The performance of this query is good but not the best, with a runtime of 572ms, beating 45.44% of other submissions.

``````WITH cancellation_rates AS (
SELECT DISTINCT
t.request_at AS Day,
t.status,
COUNT(*) OVER(PARTITION BY t.request_at, t.status) AS unbanned_stat,
COUNT(*) OVER(PARTITION BY t.request_at) AS unbanned_req
WHERE
c.banned = 'No'
AND d.banned = 'No'
AND  t.request_at BETWEEN '2013-10-01' AND '2013-10-03'
)
SELECT
Day,
CASE
WHEN status = 'completed' AND unbanned_stat = unbanned_req THEN 0
ELSE ROUND(SUM(unbanned_stat) OVER (PARTITION BY Day, unbanned_stat) * 1.0 / unbanned_req, 2)
END AS 'Cancellation Rate'
FROM cancellation_rates
WHERE
status LIKE 'cancelled%'
OR (status = 'completed' AND unbanned_stat = unbanned_req)
``````

### Source Code 2

This query simplifies the approach by removing the use of a CTE. Instead, it calculates the cancellation rate directly within the main `SELECT` statement using conditional aggregation. It uses a `CASE` statement inside the `SUM()` function to count the number of cancellations and then divides that by the total number of requests.

This approach is easier to understand but has a slightly worse performance, with a runtime of 741ms, beating 12.61% of other submissions.

``````SELECT
t.request_at AS Day,
ISNULL(
ROUND(
CAST(
SUM(
CASE
WHEN t.status LIKE 'cancelled%' THEN 1
ELSE 0
END
) AS FLOAT) /
CAST(
COUNT(*) AS FLOAT),
2),
0) AS 'Cancellation Rate'
FROM
Trips t
JOIN
Users c ON t.client_id = c.users_id AND c.banned = 'No'
JOIN
Users d ON t.driver_id = d.users_id AND d.banned = 'No'
WHERE
t.request_at BETWEEN '2013-10-01' AND '2013-10-03'
GROUP BY
t.request_at
``````

### Source Code 3

The third query is even more simplified and performs better than the previous two queries. It removes the explicit `JOIN` operations, replacing them with subqueries in the `WHERE` clause to filter out banned users.

This not only improves readability but also enhances performance because the query doesn't need to create large intermediate join tables. Instead, it filters rows directly based on client_id and driver_id, which is more efficient.

This query has the best performance among the three, with a runtime of 435ms, beating 81.43% of other submissions.

``````SELECT
t.request_at AS Day,
ISNULL(
ROUND(
SUM(
CASE
WHEN t.status LIKE 'cancelled%' THEN 1.0
ELSE 0.0
END
) /
COUNT(*),
2),
0) AS 'Cancellation Rate'
FROM
Trips t
WHERE
t.request_at BETWEEN '2013-10-01' AND '2013-10-03'
AND t.client_id IN (SELECT users_id FROM Users WHERE banned = 'No' AND role = 'client')
AND t.driver_id IN (SELECT users_id FROM Users WHERE banned = 'No' AND role = 'driver')
GROUP BY
t.request_at
``````

## Conclusion

From these solutions, we can learn that while different SQL queries can achieve the same result, their performance can vary based on how we leverage the SQL features and functions. Using joins and subqueries strategically can have a significant impact on the query execution speed.

You can find the original problem at LeetCode.

For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.

## ranggakd - Link in Bio & Creator Tools | Beacons

@ranggakd | center details summary summary Oh hello there I m a an Programmer AI Tech Writer Data Practitioner Statistics Math Addict Open Source Contributor Quantum Computing Enthusiast details center.

beacons.ai