Amazon’s Most Asked SQL Interview Questions: Curious Club

Master your Amazon SQL Interview Questions! This article reveals the key SQL concepts and question patterns Amazon uses in their hiring process. Understand Amazon’s data-focused approach and prepare with essential topic breakdowns and practical tips. Read More

1. Write a query to find the second highest salary from an employee table.

https://www.db-fiddle.com/f/ea1PzLfkiUw6YrLmKDjMZv/0

2. How would you find the missing numbers in a given list of numbers from 1 to N?

Based on the database, we have multiple options to find the missing number like using a recursive CTE, using a left join, or using a subquery.

2.1 Using a Recursive CTE (If Supported – MySQL 8+, PostgreSQL, SQL Server 2022)

https://www.db-fiddle.com/f/bGP1B8JnLm3DCpu888f36r/0

In this method recursive CTE generates a series of number from 1 to N and then that series of numbers is filtered by our existing series to find the missing numbers.

2.2 Using a LEFT JOIN (For Databases Without Recursive CTEs)

https://www.db-fiddle.com/f/bGP1B8JnLm3DCpu888f36r/0

In this method we are using the same method of filtering but without using a CTE, as it compares a complete number series with the given list to find missing numbers.

2.3 Using a Subquery

https://www.db-fiddle.com/f/bGP1B8JnLm3DCpu888f36r/0

Here, since we need to find the missing numbers, we don’t need to compare existing ones. Instead, we check if the next number in the sequence exists in the table. If it doesn’t, that means the number is missing.

3. Write a query to calculate the running total of a sales table by month.

https://www.db-fiddle.com/f/8Fm7H4AyzYJvt63XJSLeaP/0

Within CTE, we have used group by to get one row per month and calculated the running total. This ensures monthly granularity while maintaining a progressive sales trend over time. The window function avoids self-joins, making it efficient.

4. How would you retrieve all rows where a column contains duplicates, but only showing the duplicates once?

We can use a combination of group by and having on the column containing duplicates where we select only the row which have count greater than 1.

5. Write a query to find the most frequent order status in an orders table.

https://www.db-fiddle.com/f/v41D9Tz6NhwFGa25frYpZW/0

6. How would you find the top 3 products with the highest total sales in the last year?

Solution 1: In a live database selecting for ‘last year’.

https://www.db-fiddle.com/f/s5s386TMAWbKqqjPC2U5rF/1

We have used CURDATE() inside DATE_SUB to get the last year which should geater than or equal to year extracted from order_date, this ensures the data is fltered for last year.

Solution 2: Where data itself is for last year.

https://www.db-fiddle.com/f/s5s386TMAWbKqqjPC2U5rF/1

7. Write a query to calculate the percentage of users who logged in today compared to the total user base.

https://www.db-fiddle.com/f/rFfFAD6QRFvNQZFBbfRYVf/0

8. How would you find all customers who placed an order more than once in the past month?

I have filtered the orders table to find customers who have placed more than one order in the last month. To achieve this, I extracted the month from the order_date and matched it with the month from DATE_SUB(CURRENT_DATE, INTERVAL 1 MONTH). Then, I grouped the data by customer_id and applied a HAVING filter to retain only those customers who have placed more than one order. Finally, I joined this result with the customers table to retrieve the customer_name.

9. Write a query to retrieve the employee(s) with the highest salary for each department.

https://www.db-fiddle.com/f/7m53K8Hwkczd2xhWCM3rgf/0

10. How do you find the first and last purchase date for each customer from an order table?

https://www.db-fiddle.com/f/tRByvXoiYCTTv5czyqELCL/0

11. Write a query to find the first non-null value in a column across multiple rows.

https://www.db-fiddle.com/f/gb9hh1hjATYn2kR2bvkgec/0

12. How would you handle performance issues when querying large datasets in SQL?

There are many things that can be done to optimize a slow query, some common options are using index, avoid selecting unnecessary columns, filtering early like using where instead of having, using limit clause, use of CTEs and temp table

13. Write a query to get the average order value per customer, excluding those who have only placed one order.

https://www.db-fiddle.com/f/8wGvRinT3939khnx3JDyr2/0

14. How would you retrieve the employees who have worked for more than 10 years without using DATEDIFF?

15. Write a query to find the day of the week that had the maximum number of orders.

https://www.db-fiddle.com/f/99P5iCqrxwtv2Yhp164rog/0

16. How would you retrieve the customer(s) with the longest order history?

https://www.db-fiddle.com/f/s5jBVpe79EEE4TbK7MJsji/0

17. Write a query to detect and list duplicate rows in a table based on specific columns.

https://www.db-fiddle.com/f/nYKvRND2AuTPFdV3aHEiqY/0

18. How would you find the total number of products sold each month, grouped by product category?

https://www.db-fiddle.com/f/9CjiGTL6BSn1wJWFTb47oc/0

19. Write a query to compare the current year’s sales data against the same period in the previous year.

https://www.db-fiddle.com/f/k7ptZCamLz87E594M5sbzk/0

To achieve this, we used the CTE method, where we joined the Order_Items table with the Orders table. We extracted the month and year from the order_date column and calculated the total sales using SUM(total_price).

Using this CTE, we performed a self-join with the condition:
sd1.month = sd2.month AND sd1.year = sd2.year + 1.

Finally, we selected:

  • sd2.total_sale AS prev_year_sale (previous year sales).
  • sd1.month_year (current period)
  • sd1.year (current year)
  • sd1.total_sale (current year sales)
  • sd2.year AS prev_year (previous year)

20. How would you optimize a slow-running query in SQL involving multiple joins on large tables?

I would optimize a slow-running query involving multiple joins on large tables by:

  1. Indexing – Ensuring indexes exist on JOIN, WHERE, and ORDER BY columns, and using composite or covering indexes where needed.
  2. Optimizing Joins – Using INNER JOIN instead of LEFT JOIN where possible, ensuring indexed joins, and evaluating join order for efficiency.
  3. Reducing Data Scanned – Applying selective WHERE filters early, using partitioning for large tables, and limiting result sets.
  4. Avoiding Unnecessary Computation – Replacing DISTINCT with better indexing, minimizing functions on indexed columns, and using EXISTS instead of JOIN for filtering.
  5. Analyzing Query Execution Plan – Using EXPLAIN to identify bottlenecks and adjusting indexes, join strategies, or restructuring the query accordingly.

Join Telegram | Join WhatsApp Channel


We hope this article about Data Analyst Interview Questions was helpful for you and you learned a lot of new things from it. If you have friends or family members who would find it helpful, please share it to them or on social media.

Also Read:

Spread the love