
Master your Amazon SQL Interview Questions! This article reveals the key SQL concepts and question patterns Amazon uses in their hiring process. Understand Amazon’s data-focused approach and prepare with essential topic breakdowns and practical tips. Read More
1. Write a query to find the second highest salary from an employee table.
SELECT MAX(Salary)
FROM Employee
WHERE Salary < (SELECT MAX(Salary) FROM Employee);
https://www.db-fiddle.com/f/ea1PzLfkiUw6YrLmKDjMZv/0
2. How would you find the missing numbers in a given list of numbers from 1 to N?
Based on the database, we have multiple options to find the missing number like using a recursive CTE, using a left join, or using a subquery.
2.1 Using a Recursive CTE (If Supported – MySQL 8+, PostgreSQL, SQL Server 2022)
WITH RECURSIVE NumberSeries AS (
SELECT 1 AS Num
UNION ALL
SELECT Num + 1 FROM NumberSeries WHERE Num < N
)
SELECT Num
FROM NumberSeries
WHERE Num NOT IN (SELECT Num FROM Numbers);
https://www.db-fiddle.com/f/bGP1B8JnLm3DCpu888f36r/0
In this method recursive CTE generates a series of number from 1 to N and then that series of numbers is filtered by our existing series to find the missing numbers.
2.2 Using a LEFT JOIN (For Databases Without Recursive CTEs)
SELECT ns.Num
FROM NumberSeries ns
LEFT JOIN Numbers n ON ns.Num = n.Num
WHERE n.Num IS NULL;
https://www.db-fiddle.com/f/bGP1B8JnLm3DCpu888f36r/0
In this method we are using the same method of filtering but without using a CTE, as it compares a complete number series with the given list to find missing numbers.
2.3 Using a Subquery
SELECT t.Num + 1 AS MissingNum
FROM Numbers t
WHERE NOT EXISTS (SELECT 1 FROM Numbers n WHERE n.Num = t.Num + 1)
AND t.Num + 1 <= 10;
https://www.db-fiddle.com/f/bGP1B8JnLm3DCpu888f36r/0
Here, since we need to find the missing numbers, we don’t need to compare existing ones. Instead, we check if the next number in the sequence exists in the table. If it doesn’t, that means the number is missing.
3. Write a query to calculate the running total of a sales table by month.
WITH MonthlySales AS (
SELECT
DATE_FORMAT(SaleDate, '%Y-%m') AS month,
SUM(Amount) AS total_sales
FROM Sales
GROUP BY DATE_FORMAT(SaleDate, '%Y-%m')
)
SELECT *,
SUM(total_sales) OVER (ORDER BY month) AS running_total
FROM MonthlySales;
https://www.db-fiddle.com/f/8Fm7H4AyzYJvt63XJSLeaP/0
Within CTE, we have used group by to get one row per month and calculated the running total. This ensures monthly granularity while maintaining a progressive sales trend over time. The window function avoids self-joins, making it efficient.
4. How would you retrieve all rows where a column contains duplicates, but only showing the duplicates once?
SELECT column_name
FROM table_name
GROUP BY column_name
HAVING COUNT(*) > 1;
We can use a combination of group by and having on the column containing duplicates where we select only the row which have count greater than 1.
5. Write a query to find the most frequent order status in an orders table.
SELECT status, COUNT(*) AS status_count
FROM orders
GROUP BY status
ORDER BY status_count DESC
LIMIT 1;
https://www.db-fiddle.com/f/v41D9Tz6NhwFGa25frYpZW/0
6. How would you find the top 3 products with the highest total sales in the last year?
Solution 1: In a live database selecting for ‘last year’.
SELECT
YEAR(order_date),
product_id,
product_name,
sum(price * quantity) as sales
FROM order_items
JOIN products USING (product_id)
JOIN orders USING (order_id)
WHERE YEAR(order_date) >= YEAR(DATE_SUB(CURDATE(), INTERVAL 1 YEAR))
GROUP BY YEAR(order_date), product_id, product_name
ORDER BY sales DESC
LIMIT 3;
https://www.db-fiddle.com/f/s5s386TMAWbKqqjPC2U5rF/1
We have used CURDATE() inside DATE_SUB to get the last year which should geater than or equal to year extracted from order_date, this ensures the data is fltered for last year.
Solution 2: Where data itself is for last year.
SELECT
date_format(order_date, '%Y') as 'year',
product_id,
product_name,
sum(price * quantity) as sales
FROM order_items
JOIN products USING (product_id)
JOIN orders USING (order_id)
GROUP BY 'year', product_id, product_name
ORDER BY sales DESC
LIMIT 3;
https://www.db-fiddle.com/f/s5s386TMAWbKqqjPC2U5rF/1
7. Write a query to calculate the percentage of users who logged in today compared to the total user base.
SELECT ROUND((COUNT(DISTINCT user_id) / (SELECT COUNT(*) FROM users)) * 100, 2) AS login_perc
FROM logins
WHERE login_date = CURRENT_DATE;
https://www.db-fiddle.com/f/rFfFAD6QRFvNQZFBbfRYVf/0
8. How would you find all customers who placed an order more than once in the past month?
SELECT
c.customer_id,
c.customer_name,
COUNT(o.order_id) AS Num_of_orders
FROM
customers c
LEFT JOIN
orders o ON c.customer_id = o.customer_id
WHERE
MONTH(o.order_date) = MONTH(DATE_SUB(CURDATE(), INTERVAL 1 MONTH))
GROUP BY
c.customer_id
HAVING
Num_of_orders > 1;
I have filtered the orders table to find customers who have placed more than one order in the last month. To achieve this, I extracted the month from the order_date
and matched it with the month from DATE_SUB(CURRENT_DATE, INTERVAL 1 MONTH)
. Then, I grouped the data by customer_id
and applied a HAVING
filter to retain only those customers who have placed more than one order. Finally, I joined this result with the customers table to retrieve the customer_name
.
9. Write a query to retrieve the employee(s) with the highest salary for each department.
SELECT *
FROM
(
SELECT
employee_id,
employee_name,
salary,
department_name,
RANK() OVER (PARTITION BY department_name ORDER BY salary DESC) AS employee_rank
FROM
employees
INNER JOIN
departments USING (department_id)
) as temp
WHERE employee_rank = 1;
https://www.db-fiddle.com/f/7m53K8Hwkczd2xhWCM3rgf/0
10. How do you find the first and last purchase date for each customer from an order table?
SELECT
customer_id,
MIN(order_date) as first_purchase,
MAX(order_date) as last_purchase
FROM orders
GROUP BY customer_id;
https://www.db-fiddle.com/f/tRByvXoiYCTTv5czyqELCL/0
11. Write a query to find the first non-null value in a column across multiple rows.
SELECT group_id, MIN(value)
FROM sample_table
WHERE value is NOT NULL
GROUP BY group_id;
https://www.db-fiddle.com/f/gb9hh1hjATYn2kR2bvkgec/0
12. How would you handle performance issues when querying large datasets in SQL?
There are many things that can be done to optimize a slow query, some common options are using index, avoid selecting unnecessary columns, filtering early like using where instead of having, using limit clause, use of CTEs and temp table
13. Write a query to get the average order value per customer, excluding those who have only placed one order.
SELECT customer_id,
AVG(order_amount) AS avg_order_val,
COUNT(*) AS order_count
FROM Orders
GROUP BY customer_id
HAVING COUNT(*) > 1;
https://www.db-fiddle.com/f/8wGvRinT3939khnx3JDyr2/0
14. How would you retrieve the employees who have worked for more than 10 years without using DATEDIFF?
SELECT
employee_name,
hire_date
FROM
Employees
WHERE
hire_date <= DATE_SUB(CURDATE(), INTERVAL 10 YEAR);
15. Write a query to find the day of the week that had the maximum number of orders.
SELECT DAYNAME(order_date) AS day_of_week,
COUNT(*) AS total_orders
FROM Orders
GROUP BY day_of_week
ORDER BY total_orders DESC
LIMIT 1;
https://www.db-fiddle.com/f/99P5iCqrxwtv2Yhp164rog/0
16. How would you retrieve the customer(s) with the longest order history?
SELECT
customer_id,
DATEDIFF(day, MIN(order_date), MAX(order_date)) AS history_length
FROM
Orders
GROUP BY
customer_id
ORDER BY
history_length DESC
LIMIT 1;
https://www.db-fiddle.com/f/s5jBVpe79EEE4TbK7MJsji/0
17. Write a query to detect and list duplicate rows in a table based on specific columns.
SELECT
customer_id,
product_id,
quantity,
COUNT(*) AS count
FROM Orders
GROUP BY
customer_id,
product_id,
quantity
HAVING COUNT(*) > 1;
https://www.db-fiddle.com/f/nYKvRND2AuTPFdV3aHEiqY/0
18. How would you find the total number of products sold each month, grouped by product category?
SELECT
MONTHNAME(o.order_date) AS month,
pc.category_name,
SUM(oi.quantity) AS total_units_sold
FROM
Order_Items oi
INNER JOIN
Products p ON oi.product_id = p.product_id
INNER JOIN
Product_Categories pc ON p.category_id = pc.category_id
INNER JOIN
Orders o ON o.order_id = oi.order_id
GROUP BY
MONTHNAME(o.order_date),
pc.category_name
ORDER BY
MONTH(o.order_date),
pc.category_name;
https://www.db-fiddle.com/f/9CjiGTL6BSn1wJWFTb47oc/0
19. Write a query to compare the current year’s sales data against the same period in the previous year.
WITH sd AS (
SELECT
DATE_FORMAT(order_date, '%Y-%m') AS month_year,
YEAR(order_date) AS year,
MONTH(order_date) AS month,
SUM(total_price) AS total_sale
FROM
Order_Items oi
JOIN
Orders o ON oi.order_id = o.order_id
GROUP BY
month_year,
year,
month
)
SELECT
sd1.month_year AS current_period,
sd1.year AS current_year,
sd1.total_sale AS current_year_sales,
sd2.year AS previous_year,
sd2.total_sale AS previous_year_sales
FROM
sd AS sd1
INNER JOIN
sd AS sd2 ON sd1.month = sd2.month AND sd1.year = sd2.year + 1
ORDER BY
current_period;
https://www.db-fiddle.com/f/k7ptZCamLz87E594M5sbzk/0
To achieve this, we used the CTE method, where we joined the Order_Items table with the Orders table. We extracted the month and year from the order_date
column and calculated the total sales using SUM(total_price)
.
Using this CTE, we performed a self-join with the condition:sd1.month = sd2.month AND sd1.year = sd2.year + 1
.
Finally, we selected:
sd2.total_sale AS prev_year_sale
(previous year sales).sd1.month_year
(current period)sd1.year
(current year)sd1.total_sale
(current year sales)sd2.year AS prev_year
(previous year)
20. How would you optimize a slow-running query in SQL involving multiple joins on large tables?
I would optimize a slow-running query involving multiple joins on large tables by:
- Indexing – Ensuring indexes exist on
JOIN
,WHERE
, andORDER BY
columns, and using composite or covering indexes where needed. - Optimizing Joins – Using
INNER JOIN
instead ofLEFT JOIN
where possible, ensuring indexed joins, and evaluating join order for efficiency. - Reducing Data Scanned – Applying selective
WHERE
filters early, using partitioning for large tables, and limiting result sets. - Avoiding Unnecessary Computation – Replacing
DISTINCT
with better indexing, minimizing functions on indexed columns, and usingEXISTS
instead ofJOIN
for filtering. - Analyzing Query Execution Plan – Using
EXPLAIN
to identify bottlenecks and adjusting indexes, join strategies, or restructuring the query accordingly.
Join Telegram | Join WhatsApp Channel
We hope this article about Data Analyst Interview Questions was helpful for you and you learned a lot of new things from it. If you have friends or family members who would find it helpful, please share it to them or on social media.
Also Read:
- Amazon’s Most Asked SQL Interview Questions: Curious Club
- Data Analyst Interview Questions Experience at Flipkart
- Query Execution Plans & Advanced Optimization Techniques
- Single-Column Indexes: How They Work and When to Use Them
- Introduction to SQL Indexing: Boosting Your Database Performance
Hi, I am Vishal Jaiswal, I have about a decade of experience of working in MNCs like Genpact, Savista, Ingenious. Currently i am working in EXL as a senior quality analyst. Using my writing skills i want to share the experience i have gained and help as many as i can.