Blog

1661 Average Time of Process per Machine

1661 Average Time of Process per Machine: This is a really interesting question where we write a compact code of 1 line which is equivalent to multiline CTE code. Read more

Company: MAANG

Difficulty: Easy

SourceLeetCode

SQL Interview Question

Q. 1661 Average Time of Process per Machine

CREATE TABLE Activity (
    machine_id INT,
    process_id INT,
    activity_type TEXT CHECK(activity_type IN ('start', 'end')),
    timestamp FLOAT
);

INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (0, 0, 'start', 0.712);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (0, 0, 'end', 1.52);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (0, 1, 'start', 3.14);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (0, 1, 'end', 4.12);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (1, 0, 'start', 0.55);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (1, 0, 'end', 1.55);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (1, 1, 'start', 0.43);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (1, 1, 'end', 1.42);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (2, 0, 'start', 4.1);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (2, 0, 'end', 4.512);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (2, 1, 'start', 2.5);
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp) VALUES (2, 1, 'end', 5.0);

View code on DB Fiddle

Solution

Solution1

WITH ProcessTimes AS (
    SELECT
        machine_id,
        process_id,
        MAX(CASE WHEN activity_type = 'end' THEN timestamp END) AS end_time,
        MAX(CASE WHEN activity_type = 'start' THEN timestamp END) AS start_time
    FROM Activity
    GROUP BY machine_id, process_id
),
ProcessingDurations AS (
    SELECT
        machine_id,
        (end_time - start_time) AS processing_time
    FROM ProcessTimes
)
SELECT
    machine_id,
    ROUND(AVG(processing_time), 3) AS processing_time
FROM ProcessingDurations
GROUP BY machine_id
ORDER BY machine_id;

Explanation

To solve this problem using the CTE (Common Table Expression) method, we’ve broken the process into three steps:

Step 1: Extracting Start and End Times (CTE: ProcessTimes)

The first thing we needed to do was transform the start and end timestamps from the row level to the column level for each machine_id and process_id. In the provided Activity table, the start and end times appear as separate rows for the same process_id. To perform calculations efficiently, these values need to appear in separate columns (start_time and end_time) for each process_id.

To achieve this, we used a CASE statement inside the first CTE (ProcessTimes):

  • CASE Statement:
CASE 
    WHEN activity_type = 'start' THEN timestamp 
    ELSE NULL 
END

This captures the timestamp for start and end rows.

Using MAX:
Wrapping the CASE statements in MAX allows us to satisfy SQL’s grouping rules when using GROUP BY on machine_id and process_id:

MAX(CASE WHEN activity_type = 'start' THEN timestamp END) AS start_time,
MAX(CASE WHEN activity_type = 'end' THEN timestamp END) AS end_time

Does MAX change the values? No, since the problem guarantees only one start and end timestamp per process_id. MAX simply ensures compliance with GROUP BY. If multiple values existed, MAX would pick the latest, which still works for duration calculations.

Resulting Output (ProcessTimes):

machine_idprocess_idstart_timeend_time
000.7121.520
003.1404.120
100.5501.550
Step 2: Calculating Process Duration (CTE: ProcessingDurations)

In this step, we calculate the duration of each process by subtracting the start_time from the end_time:

SELECT 
    machine_id, 
    (end_time - start_time) AS processing_time
FROM ProcessTimes

This creates a table (ProcessingDurations) where each row represents the duration of a single process for a specific machine. The result looks like this:

machine_idprocessing_time
00.808
00.980
11.000
Step 3: Aggregating Average Processing Time

Finally, we calculate the average processing time for each machine. Using the results from ProcessingDurations, we calculate the average as follows:

SELECT 
machine_id,
ROUND(AVG(processing_time), 3) AS processing_time
FROM ProcessingDurations
GROUP BY machine_id
ORDER BY machine_id
  • AVG(processing_time): Calculates the average of all process durations for a machine.
  • ROUND(..., 3): Rounds the result to 3 decimal places, as required by the problem.
  • GROUP BY machine_id: Groups the results by each machine.
Final Output

The query produces the following result:

machine_idprocessing_time
00.894
10.995
21.456

This output satisfies the requirements of the problem.

Solution 2

SELECT 
    machine_id,
    ROUND(SUM(CASE WHEN activity_type='start' THEN timestamp*-1 ELSE timestamp END)*1.0/ (SELECT COUNT(DISTINCT process_id)),3) AS processing_time
FROM 
    Activity
GROUP BY machine_id

Explanation

For this solution you need to understand the below part, where entire three CTEs are in one line itself, using a mathmetical logic.

SUM(CASE WHEN activity_type='start' THEN timestamp*-1 ELSE timestamp END) * 1.0 / (SELECT COUNT(DISTINCT process_id))

To understand this better let’s break it down further:

SUM(CASE WHEN activity_type=’start’ THEN timestamp*-1 ELSE timestamp END)

This part calculates the total time for each process. the logic used here is:

  • If activity_type = 'start':
    • The timestamp is multiplied by -1. This effectively turns the start time into a negative value so it can be subtracted later.
  • If activity_type = 'end':
    • The timestamp is added as-is (positive).

This means for every process, the start time and end time are combined to produce the processing duration. For example:

  • For start = 0.712 and end = 1.52, this will sum to:
(−0.712)+1.52=0.808 (duration of the process)

SUM: The SUM function adds up these values for all processes within the same machine (GROUP BY machine_id is applied in the query). This gives the total processing time for the machine.

1.0

This part ensures that the division produces a floating-point result instead of an integer.

  • Without * 1.0, SQL might perform integer division and truncate the decimal part, depending on the database.
  • Multiplying by 1.0 ensures the calculation is performed in floating-point arithmetic, preserving the precision.
/ (SELECT COUNT(DISTINCT process_id))

This calculates the average processing time by dividing the total processing time (from step 1) by the number of unique processes for the machine.

Putting It All Together
SELECT 
    machine_id,
    ROUND(SUM(CASE WHEN activity_type='start' THEN timestamp*-1 ELSE timestamp END)*1.0/ (SELECT COUNT(DISTINCT process_id)),3) AS processing_time
FROM 
    Activity
GROUP BY machine_id
Final Output

The query produces the following result:

machine_idprocessing_time
00.894
10.995
21.456

I hope this would have been helpful for you, consider sharing it with your friends. thank you.

Spread the love

Recent Posts

Mastering Pivot Table in Python: A Comprehensive Guide

Pivot tables are a powerful tool for summarizing and analyzing data, and Python’s Pandas library…

1 week ago

Data Science Interview Questions Section 3: SQL, Data Warehousing, and General Analytics Concepts

Welcome to Section 3 of our Data Science Interview Questions series! In this part, we…

1 week ago

Data Science Interview Questions Section 2: 25 Questions Designed To Deepen Your Understanding

Welcome back to our Data Science Interview Questions series! In the first section, we explored…

2 weeks ago

Data Science Questions Section 1: Data Visualization & BI Tools (Power BI, Tableau, etc.)

Data Science Questions in Section 1 focus on the essential concepts of Data Visualization and…

2 weeks ago

Optum Interview Questions: 30 Multiple Choice Questions (MCQs) with Answers

In this article, we’ve compiled 30 carefully selected multiple choice questions (MCQs) with answers to…

2 weeks ago

Day 15 of Learning Python for Data Science: Exploring Matplotlib Visualizations and EDA

Welcome to Day 15 of our Python for Data Science journey!On Day 15, we dived…

2 weeks ago