Master your Amazon SQL Interview Questions! This article reveals the key SQL concepts and question patterns Amazon uses in their hiring process. Understand Amazon’s data-focused approach and prepare with essential topic breakdowns and practical tips. Read More

1. Write a query to find the second highest salary from an employee table.

SELECT MAX(Salary)
FROM Employee
WHERE Salary < (SELECT MAX(Salary) FROM Employee);

https://www.db-fiddle.com/f/ea1PzLfkiUw6YrLmKDjMZv/0

2. How would you find the missing numbers in a given list of numbers from 1 to N?

Based on the database, we have multiple options to find the missing number like using a recursive CTE, using a left join, or using a subquery.

2.1 Using a Recursive CTE (If Supported – MySQL 8+, PostgreSQL, SQL Server 2022)

WITH RECURSIVE NumberSeries AS (
    SELECT 1 AS Num
    UNION ALL
    SELECT Num + 1 FROM NumberSeries WHERE Num < N
)
SELECT Num 
FROM NumberSeries 
WHERE Num NOT IN (SELECT Num FROM Numbers);

https://www.db-fiddle.com/f/bGP1B8JnLm3DCpu888f36r/0

In this method recursive CTE generates a series of number from 1 to N and then that series of numbers is filtered by our existing series to find the missing numbers.

2.2 Using a LEFT JOIN (For Databases Without Recursive CTEs)

SELECT ns.Num 
FROM NumberSeries ns
LEFT JOIN Numbers n ON ns.Num = n.Num
WHERE n.Num IS NULL;

https://www.db-fiddle.com/f/bGP1B8JnLm3DCpu888f36r/0

In this method we are using the same method of filtering but without using a CTE, as it compares a complete number series with the given list to find missing numbers.

2.3 Using a Subquery

SELECT t.Num + 1 AS MissingNum
FROM Numbers t
WHERE NOT EXISTS (SELECT 1 FROM Numbers n WHERE n.Num = t.Num + 1)
AND t.Num + 1 <= 10;

https://www.db-fiddle.com/f/bGP1B8JnLm3DCpu888f36r/0

Here, since we need to find the missing numbers, we don’t need to compare existing ones. Instead, we check if the next number in the sequence exists in the table. If it doesn’t, that means the number is missing.

3. Write a query to calculate the running total of a sales table by month.

WITH MonthlySales AS (
    SELECT 
        DATE_FORMAT(SaleDate, '%Y-%m') AS month, 
        SUM(Amount) AS total_sales
    FROM Sales
    GROUP BY DATE_FORMAT(SaleDate, '%Y-%m')
)
SELECT *,
       SUM(total_sales) OVER (ORDER BY month) AS running_total
FROM MonthlySales;

https://www.db-fiddle.com/f/8Fm7H4AyzYJvt63XJSLeaP/0

Within CTE, we have used group by to get one row per month and calculated the running total. This ensures monthly granularity while maintaining a progressive sales trend over time. The window function avoids self-joins, making it efficient.

4. How would you retrieve all rows where a column contains duplicates, but only showing the duplicates once?

SELECT column_name
FROM table_name
GROUP BY column_name
HAVING COUNT(*) > 1;

We can use a combination of group by and having on the column containing duplicates where we select only the row which have count greater than 1.

5. Write a query to find the most frequent order status in an orders table.

SELECT status, COUNT(*) AS status_count
FROM orders
GROUP BY status
ORDER BY status_count DESC
LIMIT 1;

https://www.db-fiddle.com/f/v41D9Tz6NhwFGa25frYpZW/0

6. How would you find the top 3 products with the highest total sales in the last year?

Solution 1: In a live database selecting for ‘last year’.

SELECT
    YEAR(order_date),
    product_id,
    product_name,
    sum(price * quantity) as sales
FROM order_items
JOIN products USING (product_id)
JOIN orders USING (order_id)
WHERE YEAR(order_date) >= YEAR(DATE_SUB(CURDATE(), INTERVAL 1 YEAR))
GROUP BY YEAR(order_date), product_id, product_name
ORDER BY sales DESC
LIMIT 3;

https://www.db-fiddle.com/f/s5s386TMAWbKqqjPC2U5rF/1

We have used CURDATE() inside DATE_SUB to get the last year which should geater than or equal to year extracted from order_date, this ensures the data is fltered for last year.

Solution 2: Where data itself is for last year.

SELECT
    date_format(order_date, '%Y') as 'year',
    product_id,
    product_name,
    sum(price * quantity) as sales
FROM order_items 
JOIN products USING (product_id) 
JOIN orders USING (order_id) 
GROUP BY 'year', product_id, product_name 
ORDER BY sales DESC
LIMIT 3;

https://www.db-fiddle.com/f/s5s386TMAWbKqqjPC2U5rF/1

7. Write a query to calculate the percentage of users who logged in today compared to the total user base.

SELECT ROUND((COUNT(DISTINCT user_id) / (SELECT COUNT(*) FROM users)) * 100, 2) AS login_perc
FROM logins
WHERE login_date = CURRENT_DATE;

https://www.db-fiddle.com/f/rFfFAD6QRFvNQZFBbfRYVf/0

8. How would you find all customers who placed an order more than once in the past month?

SELECT
    c.customer_id,
    c.customer_name,
    COUNT(o.order_id) AS Num_of_orders
FROM
    customers c
LEFT JOIN
    orders o ON c.customer_id = o.customer_id
WHERE
    MONTH(o.order_date) = MONTH(DATE_SUB(CURDATE(), INTERVAL 1 MONTH))
GROUP BY
    c.customer_id
HAVING
    Num_of_orders > 1;

I have filtered the orders table to find customers who have placed more than one order in the last month. To achieve this, I extracted the month from the order_date and matched it with the month from DATE_SUB(CURRENT_DATE, INTERVAL 1 MONTH). Then, I grouped the data by customer_id and applied a HAVING filter to retain only those customers who have placed more than one order. Finally, I joined this result with the customers table to retrieve the customer_name.

9. Write a query to retrieve the employee(s) with the highest salary for each department.

SELECT * 
FROM
(
    SELECT
        employee_id,
        employee_name,
        salary,
        department_name,
        RANK() OVER (PARTITION BY department_name ORDER BY salary DESC) AS employee_rank 
    FROM
        employees
    INNER JOIN
        departments USING (department_id)
) as temp
WHERE employee_rank = 1;

https://www.db-fiddle.com/f/7m53K8Hwkczd2xhWCM3rgf/0

10. How do you find the first and last purchase date for each customer from an order table?

SELECT 
    customer_id,
    MIN(order_date) as first_purchase,
    MAX(order_date) as last_purchase
FROM orders
GROUP BY customer_id;

https://www.db-fiddle.com/f/tRByvXoiYCTTv5czyqELCL/0

11. Write a query to find the first non-null value in a column across multiple rows.

SELECT group_id, MIN(value) 
FROM sample_table
WHERE value is NOT NULL
GROUP BY group_id;

https://www.db-fiddle.com/f/gb9hh1hjATYn2kR2bvkgec/0

12. How would you handle performance issues when querying large datasets in SQL?

There are many things that can be done to optimize a slow query, some common options are using index, avoid selecting unnecessary columns, filtering early like using where instead of having, using limit clause, use of CTEs and temp table

13. Write a query to get the average order value per customer, excluding those who have only placed one order.

SELECT customer_id, 
       AVG(order_amount) AS avg_order_val, 
       COUNT(*) AS order_count
FROM Orders
GROUP BY customer_id
HAVING COUNT(*) > 1;

https://www.db-fiddle.com/f/8wGvRinT3939khnx3JDyr2/0

14. How would you retrieve the employees who have worked for more than 10 years without using DATEDIFF?

SELECT 
    employee_name,
    hire_date
FROM 
    Employees
WHERE 
    hire_date <= DATE_SUB(CURDATE(), INTERVAL 10 YEAR);

15. Write a query to find the day of the week that had the maximum number of orders.

SELECT DAYNAME(order_date) AS day_of_week, 
       COUNT(*) AS total_orders
FROM Orders
GROUP BY day_of_week
ORDER BY total_orders DESC
LIMIT 1;

https://www.db-fiddle.com/f/99P5iCqrxwtv2Yhp164rog/0

16. How would you retrieve the customer(s) with the longest order history?

SELECT
    customer_id,
    DATEDIFF(day, MIN(order_date), MAX(order_date)) AS history_length 
FROM
    Orders
GROUP BY
    customer_id
ORDER BY
    history_length DESC
LIMIT 1;

https://www.db-fiddle.com/f/s5jBVpe79EEE4TbK7MJsji/0

17. Write a query to detect and list duplicate rows in a table based on specific columns.

SELECT 
    customer_id,
    product_id,
    quantity,
    COUNT(*) AS count
FROM Orders
GROUP BY 
    customer_id,
    product_id,
    quantity
HAVING COUNT(*) > 1;

https://www.db-fiddle.com/f/nYKvRND2AuTPFdV3aHEiqY/0

18. How would you find the total number of products sold each month, grouped by product category?

SELECT
    MONTHNAME(o.order_date) AS month,
    pc.category_name,
    SUM(oi.quantity) AS total_units_sold
FROM
    Order_Items oi
INNER JOIN
    Products p ON oi.product_id = p.product_id
INNER JOIN
    Product_Categories pc ON p.category_id = pc.category_id
INNER JOIN
    Orders o ON o.order_id = oi.order_id
GROUP BY
    MONTHNAME(o.order_date),
    pc.category_name
ORDER BY
    MONTH(o.order_date), 
    pc.category_name;

https://www.db-fiddle.com/f/9CjiGTL6BSn1wJWFTb47oc/0

19. Write a query to compare the current year’s sales data against the same period in the previous year.

WITH sd AS (
    SELECT
        DATE_FORMAT(order_date, '%Y-%m') AS month_year,
        YEAR(order_date) AS year,
        MONTH(order_date) AS month,
        SUM(total_price) AS total_sale
    FROM
        Order_Items oi
    JOIN
        Orders o ON oi.order_id = o.order_id
    GROUP BY
        month_year,
  	year,
        month
)
SELECT
    sd1.month_year AS current_period,
    sd1.year AS current_year,
    sd1.total_sale AS current_year_sales,
    sd2.year AS previous_year,
    sd2.total_sale AS previous_year_sales
FROM
    sd AS sd1
INNER JOIN
    sd AS sd2 ON sd1.month = sd2.month AND sd1.year = sd2.year + 1
ORDER BY
    current_period;

https://www.db-fiddle.com/f/k7ptZCamLz87E594M5sbzk/0

To achieve this, we used the CTE method, where we joined the Order_Items table with the Orders table. We extracted the month and year from the order_date column and calculated the total sales using SUM(total_price).

Using this CTE, we performed a self-join with the condition:
sd1.month = sd2.month AND sd1.year = sd2.year + 1.

Finally, we selected:

sd2.total_sale AS prev_year_sale (previous year sales).
sd1.month_year (current period)
sd1.year (current year)
sd1.total_sale (current year sales)
sd2.year AS prev_year (previous year)

20. How would you optimize a slow-running query in SQL involving multiple joins on large tables?

I would optimize a slow-running query involving multiple joins on large tables by:

Indexing – Ensuring indexes exist on JOIN, WHERE, and ORDER BY columns, and using composite or covering indexes where needed.
Optimizing Joins – Using INNER JOIN instead of LEFT JOIN where possible, ensuring indexed joins, and evaluating join order for efficiency.
Reducing Data Scanned – Applying selective WHERE filters early, using partitioning for large tables, and limiting result sets.
Avoiding Unnecessary Computation – Replacing DISTINCT with better indexing, minimizing functions on indexed columns, and using EXISTS instead of JOIN for filtering.
Analyzing Query Execution Plan – Using EXPLAIN to identify bottlenecks and adjusting indexes, join strategies, or restructuring the query accordingly.

Join Telegram | Join WhatsApp Channel

We hope this article about Data Analyst Interview Questions was helpful for you and you learned a lot of new things from it. If you have friends or family members who would find it helpful, please share it to them or on social media.

Also Read:

Vishal

Hi, I am Vishal Jaiswal, I have about a decade of experience of working in MNCs like Genpact, Savista, Ingenious. Currently i am working in EXL as a senior quality analyst. Using my writing skills i want to share the experience i have gained and help as many as i can.

Spread the love