1661 Average Time of Process per Machine

1661 Average Time of Process per Machine: This is a really interesting question where we write a compact code of 1 line which is equivalent to multiline CTE code. Read more

Company: MAANG

Difficulty: Easy

SourceLeetCode

SQL Interview Question

Q. 1661 Average Time of Process per Machine

1661 Average Time of Process per Machine
CREATE TABLE Activity (
    machine_id INT,
    process_id INT,
    activity_type TEXT CHECK(activity_type IN ('start', 'end')),
    timestamp FLOAT
);

INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (0, 0, 'start', 0.712);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (0, 0, 'end', 1.52);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (0, 1, 'start', 3.14);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (0, 1, 'end', 4.12);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (1, 0, 'start', 0.55);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (1, 0, 'end', 1.55);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (1, 1, 'start', 0.43);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (1, 1, 'end', 1.42);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (2, 0, 'start', 4.1);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (2, 0, 'end', 4.512);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (2, 1, 'start', 2.5);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (2, 1, 'end', 5.0);

View code on DB Fiddle

Solution

Solution1

WITH ProcessTimes AS (
    SELECT
        machine_id,
        process_id,
        MAX(CASE WHEN activity_type = 'end' THEN timestamp END) AS end_time,
        MAX(CASE WHEN activity_type = 'start' THEN timestamp END) AS start_time
    FROM Activity
    GROUP BY machine_id, process_id
),
ProcessingDurations AS (
    SELECT
        machine_id,
        (end_time - start_time) AS processing_time
    FROM ProcessTimes
)
SELECT
    machine_id,
    ROUND(AVG(processing_time), 3) AS processing_time
FROM ProcessingDurations
GROUP BY machine_id
ORDER BY machine_id;

Explanation

To solve this problem using the CTE (Common Table Expression) method, we’ve broken the process into three steps:

Step 1: Extracting Start and End Times (CTE: ProcessTimes)

The first thing we needed to do was transform the start and end timestamps from the row level to the column level for each machine_id and process_id. In the provided Activity table, the start and end times appear as separate rows for the same process_id. To perform calculations efficiently, these values need to appear in separate columns (start_time and end_time) for each process_id.

To achieve this, we used a CASE statement inside the first CTE (ProcessTimes):

  • CASE Statement:
CASE 
    WHEN activity_type = 'start' THEN timestamp 
    ELSE NULL 
END

This captures the timestamp for start and end rows.

Using MAX:
Wrapping the CASE statements in MAX allows us to satisfy SQL’s grouping rules when using GROUP BY on machine_id and process_id:

MAX(CASE WHEN activity_type = 'start' THEN timestamp END) AS start_time,
MAX(CASE WHEN activity_type = 'end' THEN timestamp END) AS end_time

Does MAX change the values? No, since the problem guarantees only one start and end timestamp per process_id. MAX simply ensures compliance with GROUP BY. If multiple values existed, MAX would pick the latest, which still works for duration calculations.

Resulting Output (ProcessTimes):

machine_idprocess_idstart_timeend_time
000.7121.520
003.1404.120
100.5501.550
Step 2: Calculating Process Duration (CTE: ProcessingDurations)

In this step, we calculate the duration of each process by subtracting the start_time from the end_time:

SELECT 
    machine_id, 
    (end_time - start_time) AS processing_time
FROM ProcessTimes

This creates a table (ProcessingDurations) where each row represents the duration of a single process for a specific machine. The result looks like this:

machine_idprocessing_time
00.808
00.980
11.000
Step 3: Aggregating Average Processing Time

Finally, we calculate the average processing time for each machine. Using the results from ProcessingDurations, we calculate the average as follows:

SELECT 
machine_id,
ROUND(AVG(processing_time), 3) AS processing_time
FROM ProcessingDurations
GROUP BY machine_id
ORDER BY machine_id
  • AVG(processing_time): Calculates the average of all process durations for a machine.
  • ROUND(..., 3): Rounds the result to 3 decimal places, as required by the problem.
  • GROUP BY machine_id: Groups the results by each machine.
Final Output

The query produces the following result:

machine_idprocessing_time
00.894
10.995
21.456

This output satisfies the requirements of the problem.

Solution 2

SELECT 
    machine_id,
    ROUND(SUM(CASE WHEN activity_type='start' THEN timestamp*-1 ELSE timestamp END)*1.0/ (SELECT COUNT(DISTINCT process_id)),3) AS processing_time
FROM 
    Activity
GROUP BY machine_id

Explanation

For this solution you need to understand the below part, where entire three CTEs are in one line itself, using a mathmetical logic.

SUM(CASE WHEN activity_type='start' THEN timestamp*-1 ELSE timestamp END) * 1.0 / (SELECT COUNT(DISTINCT process_id))

To understand this better let’s break it down further:

SUM(CASE WHEN activity_type=’start’ THEN timestamp*-1 ELSE timestamp END)

This part calculates the total time for each process. the logic used here is:

  • If activity_type = 'start':
    • The timestamp is multiplied by -1. This effectively turns the start time into a negative value so it can be subtracted later.
  • If activity_type = 'end':
    • The timestamp is added as-is (positive).

This means for every process, the start time and end time are combined to produce the processing duration. For example:

  • For start = 0.712 and end = 1.52, this will sum to:
(−0.712)+1.52=0.808 (duration of the process)

SUM: The SUM function adds up these values for all processes within the same machine (GROUP BY machine_id is applied in the query). This gives the total processing time for the machine.

1.0

This part ensures that the division produces a floating-point result instead of an integer.

  • Without * 1.0, SQL might perform integer division and truncate the decimal part, depending on the database.
  • Multiplying by 1.0 ensures the calculation is performed in floating-point arithmetic, preserving the precision.
/ (SELECT COUNT(DISTINCT process_id))

This calculates the average processing time by dividing the total processing time (from step 1) by the number of unique processes for the machine.

Putting It All Together
SELECT 
    machine_id,
    ROUND(SUM(CASE WHEN activity_type='start' THEN timestamp*-1 ELSE timestamp END)*1.0/ (SELECT COUNT(DISTINCT process_id)),3) AS processing_time
FROM 
    Activity
GROUP BY machine_id
Final Output

The query produces the following result:

machine_idprocessing_time
00.894
10.995
21.456

I hope this would have been helpful for you, consider sharing it with your friends. thank you.

Spread the love
Exit mobile version