Retiago Drago

Posted on

# Average Time of Process per Machine | LeetCode | MSSQL

## The Problem

Table: Activity

Column Name Type
machine_id int
process_id int
activity_type enum
timestamp float

The table shows the user activities for a factory website. `(machine_id, process_id, activity_type)` is the primary key of this table. `machine_id` is the ID of a machine. `process_id` is the ID of a process running on the machine with ID `machine_id`. `activity_type` is an ENUM of type ('start', 'end'). `timestamp` is a float representing the current time in seconds. 'start' means the machine starts the process at the given timestamp and 'end' means the machine ends the process at the given timestamp. The 'start' timestamp will always be before the 'end' timestamp for every `(machine_id, process_id)` pair.

Problem: Write an SQL query to find the average time each machine takes to complete a process. The time to complete a process is the 'end' timestamp minus the 'start' timestamp. The average time is calculated by the total time to complete every process on the machine divided by the number of processes that were run. The resulting table should have the `machine_id` along with the average time as `processing_time`, which should be rounded to 3 decimal places. Return the result table in any order.

## Explanation

Input:
Activity table:

machine_id process_id activity_type timestamp
0 0 start 0.712
0 0 end 1.520
0 1 start 3.140
0 1 end 4.120
1 0 start 0.550
1 0 end 1.550
1 1 start 0.430
1 1 end 1.420
2 0 start 4.100
2 0 end 4.512
2 1 start 2.500
2 1 end 5.000

Output:

machine_id processing_time
0 0.894
1 0.995
2 1.456

Explanation:
There are 3 machines running 2 processes each. Machine 0's average time is ((1.520 - 0.712) + (4.120 - 3.140)) / 2 = 0.894. Machine 1's average time is ((1.550 - 0.550) + (1.420 - 0.430)) / 2 = 0.995. Machine 2's average time is ((4.512 - 4.100) + (5.000 - 2.500)) / 2 = 1.456.

## The Solution

This post presents four distinct solutions to the problem. These solutions vary in their approach, performance, and complexity. Let's explore each one and highlight their strengths, weaknesses, and main structural differences.

### Source Code 1

The first solution uses a simple SELECT statement with a CASE WHEN clause to calculate the sum of 'end' and 'start' timestamps for each machine. It then divides this sum by the count of distinct process_ids to find the average processing time.

``````SELECT
machine_id,
ROUND(
SUM(
CASE
WHEN activity_type = 'start' THEN -timestamp
ELSE timestamp
END
) / COUNT(DISTINCT process_id)
, 3) [processing_time]
FROM Activity
GROUP BY machine_id
``````

The runtime of this code is 824ms, beating 11.26% of submissions.

### Source Code 2

The second solution also uses a SELECT statement, but replaces the CASE WHEN clause with the IIF function, which simplifies the code slightly. The rest of the code is identical to Source Code 1.

``````SELECT
machine_id,
ROUND(SUM(IIF(activity_type = 'start', -timestamp, timestamp)) / COUNT(DISTINCT process_id), 3) [processing_time]
FROM Activity
GROUP BY machine_id
``````

The runtime of this code is 631ms, beating 27.1% of submissions.

### Source Code 3

The third solution differs significantly from the first two. It uses a subquery to calculate the sum of 'start' and 'end' timestamps for each machine. The main SELECT statement then subtracts the total 'start' timestamp from the total 'end' timestamp and divides the result by the count of distinct process_ids.

``````SELECT
r.machine_id,
ROUND((r.total_end-r.total_start) / (SELECT COUNT(DISTINCT process_id) FROM Activity), 3) [processing_time]
FROM (
SELECT
machine_id,
SUM(IIF(activity_type = 'start', timestamp, 0)) [total_start],
SUM(IIF(activity_type = 'end', timestamp, 0)) [total_end]
FROM Activity
GROUP BY machine_id
) r
``````

The runtime of this code is 517ms, beating 46.61% of submissions.

### Source Code 4

The final solution uses two CTEs (Common Table Expressions) to calculate the total time taken by each machine and the total number of processes. The main SELECT statement then calculates the average processing time as in Source Code 3.

``````WITH processes AS (
SELECT COUNT(DISTINCT process_id) [total_process]
FROM Activity
), total_time AS (
SELECT
machine_id,
SUM(IIF(activity_type = 'start', timestamp, 0)) [total_start],
SUM(IIF(activity_type = 'end', timestamp, 0)) [total_end]
FROM Activity
GROUP BY machine_id
)
SELECT
machine_id,
ROUND((total_end-total_start) / (SELECT total_process FROM processes), 3) [processing_time]
FROM total_time
``````

The runtime of this code is 791ms, beating 12.81% of submissions.

## Conclusion

The third solution (Source Code 3) demonstrates the best overall performance on LeetCode. However, it's worth noting that performance can vary significantly depending on the specific characteristics and configurations of the real-world RDMS. Therefore, choosing the best solution for a particular use case would require a comprehensive understanding of the data structure, data volume, and the performance of different SQL constructs within that specific environment.

Here's a ranking of these solutions based on LeetCode performance:

1. Source Code 3
2. Source Code 2
3. Source Code 4
4. Source Code 1

You can find the original problem at LeetCode.

For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.