Blog

1661 Average Time of Process per Machine

1661 Average Time of Process per Machine: This is a really interesting question where we write a compact code of 1 line which is equivalent to multiline CTE code. Read more

Company: MAANG

Difficulty: Easy

Source: LeetCode

SQL Interview Question

Q. 1661 Average Time of Process per Machine

CREATE TABLE Activity (
    machine_id INT,
    process_id INT,
    activity_type TEXT CHECK(activity_type IN ('start', 'end')),
    timestamp FLOAT
);

INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (0, 0, 'start', 0.712);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (0, 0, 'end', 1.52);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (0, 1, 'start', 3.14);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (0, 1, 'end', 4.12);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (1, 0, 'start', 0.55);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (1, 0, 'end', 1.55);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (1, 1, 'start', 0.43);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (1, 1, 'end', 1.42);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (2, 0, 'start', 4.1);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (2, 0, 'end', 4.512);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (2, 1, 'start', 2.5);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (2, 1, 'end', 5.0);

View code on DB Fiddle

Solution

Solution1

WITH ProcessTimes AS (
    SELECT
        machine_id,
        process_id,
        MAX(CASE WHEN activity_type = 'end' THEN timestamp END) AS end_time,
        MAX(CASE WHEN activity_type = 'start' THEN timestamp END) AS start_time
    FROM Activity
    GROUP BY machine_id, process_id
),
ProcessingDurations AS (
    SELECT
        machine_id,
        (end_time - start_time) AS processing_time
    FROM ProcessTimes
)
SELECT
    machine_id,
    ROUND(AVG(processing_time), 3) AS processing_time
FROM ProcessingDurations
GROUP BY machine_id
ORDER BY machine_id;

Explanation

To solve this problem using the CTE (Common Table Expression) method, we’ve broken the process into three steps:

Step 1: Extracting Start and End Times (CTE: `ProcessTimes`)

The first thing we needed to do was transform the start and end timestamps from the row level to the column level for each machine_id and process_id. In the provided Activity table, the start and end times appear as separate rows for the same process_id. To perform calculations efficiently, these values need to appear in separate columns (start_time and end_time) for each process_id.

To achieve this, we used a CASE statement inside the first CTE (ProcessTimes):

CASE Statement:

CASE 
    WHEN activity_type = 'start' THEN timestamp 
    ELSE NULL 
END

This captures the timestamp for start and end rows.

Using MAX:
Wrapping the CASE statements in MAX allows us to satisfy SQL’s grouping rules when using GROUP BY on machine_id and process_id:

MAX(CASE WHEN activity_type = 'start' THEN timestamp END) AS start_time,
MAX(CASE WHEN activity_type = 'end' THEN timestamp END) AS end_time

Does MAX change the values? No, since the problem guarantees only one start and end timestamp per process_id. MAX simply ensures compliance with GROUP BY. If multiple values existed, MAX would pick the latest, which still works for duration calculations.

Resulting Output (ProcessTimes):

machine_id	process_id	start_time	end_time
0	0	0.712	1.520
0	0	3.140	4.120
1	0	0.550	1.550

Step 2: Calculating Process Duration (CTE: `ProcessingDurations`)

In this step, we calculate the duration of each process by subtracting the start_time from the end_time:

SELECT 
    machine_id, 
    (end_time - start_time) AS processing_time
FROM ProcessTimes

This creates a table (ProcessingDurations) where each row represents the duration of a single process for a specific machine. The result looks like this:

machine_id	processing_time
0	0.808
0	0.980
1	1.000

Step 3: Aggregating Average Processing Time

Finally, we calculate the average processing time for each machine. Using the results from ProcessingDurations, we calculate the average as follows:

SELECT 
    machine_id, 
    ROUND(AVG(processing_time), 3) AS processing_time
FROM ProcessingDurations
GROUP BY machine_id
ORDER BY machine_id

AVG(processing_time): Calculates the average of all process durations for a machine.
ROUND(..., 3): Rounds the result to 3 decimal places, as required by the problem.
GROUP BY machine_id: Groups the results by each machine.

Final Output

The query produces the following result:

machine_id	processing_time
0	0.894
1	0.995
2	1.456

This output satisfies the requirements of the problem.

Solution 2

SELECT 
    machine_id,
    ROUND(SUM(CASE WHEN activity_type='start' THEN timestamp*-1 ELSE timestamp END)*1.0/ (SELECT COUNT(DISTINCT process_id)),3) AS processing_time
FROM 
    Activity
GROUP BY machine_id

Explanation

For this solution you need to understand the below part, where entire three CTEs are in one line itself, using a mathmetical logic.

SUM(CASE WHEN activity_type='start' THEN timestamp*-1 ELSE timestamp END) * 1.0 / (SELECT COUNT(DISTINCT process_id))

To understand this better let’s break it down further:

SUM(CASE WHEN activity_type=’start’ THEN timestamp*-1 ELSE timestamp END)

This part calculates the total time for each process. the logic used here is:

If activity_type = 'start':
- The timestamp is multiplied by -1. This effectively turns the start time into a negative value so it can be subtracted later.
If activity_type = 'end':
- The timestamp is added as-is (positive).

This means for every process, the start time and end time are combined to produce the processing duration. For example:

For start = 0.712 and end = 1.52, this will sum to:

(−0.712)+1.52=0.808 (duration of the process)

SUM: The SUM function adds up these values for all processes within the same machine (GROUP BY machine_id is applied in the query). This gives the total processing time for the machine.

1.0

This part ensures that the division produces a floating-point result instead of an integer.

Without * 1.0, SQL might perform integer division and truncate the decimal part, depending on the database.
Multiplying by 1.0 ensures the calculation is performed in floating-point arithmetic, preserving the precision.

/ (SELECT COUNT(DISTINCT process_id))

This calculates the average processing time by dividing the total processing time (from step 1) by the number of unique processes for the machine.

Putting It All Together

SELECT 
    machine_id,
    ROUND(SUM(CASE WHEN activity_type='start' THEN timestamp*-1 ELSE timestamp END)*1.0/ (SELECT COUNT(DISTINCT process_id)),3) AS processing_time
FROM 
    Activity
GROUP BY machine_id

Final Output

The query produces the following result:

machine_id	processing_time
0	0.894
1	0.995
2	1.456

I hope this would have been helpful for you, consider sharing it with your friends. thank you.

Vishal

Hi, I am Vishal Jaiswal, I have about a decade of experience of working in MNCs like Genpact, Savista, Ingenious. Currently i am working in EXL as a senior quality analyst. Using my writing skills i want to share the experience i have gained and help as many as i can.

Spread the love