Mastering Pivot Table in Python: A Comprehensive Guide

Pivot tables are a powerful tool for summarizing and analyzing data, and Python’s Pandas library provides an easy-to-use and efficient way to create them. In this article, we’ll walk through different types of pivot tables and demonstrate how to implement them using Python code, including the outputs generated by each example.

Also Read: Understanding Dynamic Sets in Tableau

1. Basic Pivot Tables

It allow us to summarize data in a more meaningful way by organizing data around index and column values. Here’s how you can create a basic one that sums the sales by region and product:

# Sample data
data = {
'Region': ['North', 'South', 'East', 'West', 'North'],
'Product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone'],
'Sales': [1200, 800, 600, 1100, 900]
}

df = pd.DataFrame(data)

# Basic pivot table
pivot = pd.pivot_table(df, index='Region', columns='Product', values='Sales', aggfunc='sum')

print(pivot)

Output:

Product  Laptop  Phone  Tablet
Region
East NaN NaN 600.0
North 1200.0 900.0 NaN
South NaN 800.0 NaN
West 1100.0 NaN NaN

This output shows the sum of sales for each product in each region.


2. Pivot Tables with Multiple Aggregation Functions

You can also apply multiple aggregation functions to the same column. For example, we can sum sales and calculate the average quantity sold:

# Sample data
data = {
'Region': ['North', 'South', 'East', 'West', 'North'],
'Product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone'],
'Sales': [1200, 800, 600, 1100, 900],
'Quantity': [5, 2, 4, 3, 6]
}

df = pd.DataFrame(data)

# Pivot table with multiple aggregation functions
pivot = pd.pivot_table(df,
index='Region',
columns='Product',
values=['Sales', 'Quantity'],
aggfunc={'Sales': 'sum', 'Quantity': 'mean'})

print(pivot)

Output:

           Sales                 Quantity             
Product Laptop Phone Tablet Laptop Phone Tablet
Region
East NaN NaN 600.0 NaN NaN 4.0
North 1200.0 900.0 NaN 5.0 6.0 NaN
South NaN 800.0 NaN NaN 2.0 NaN
West 1100.0 NaN NaN 3.0 NaN NaN

This provides both the sum of sales and the average quantity sold for each region and product.


3. Pivot Table with Margins (Row and Column Totals)

Margins are useful for adding totals to rows and columns. Here’s how you can include them:

# Sample data
data = {
'Region': ['North', 'South', 'East', 'West', 'North'],
'Product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone'],
'Sales': [1200, 800, 600, 1100, 900]
}

df = pd.DataFrame(data)

# Pivot table with margins
pivot = pd.pivot_table(df,
index='Region',
columns='Product',
values='Sales',
aggfunc='sum',
margins=True)

print(pivot)

Output:

Product  Laptop  Phone  Tablet  All
Region
East NaN NaN 600.0 600.0
North 1200.0 900.0 NaN 2100.0
South NaN 800.0 NaN 800.0
West 1100.0 NaN NaN 1100.0
All 2300.0 1700.0 600.0 4600.0

The margins=True argument adds a row and column at the end with totals for each region and product.


4. Pivot Table with Hierarchical Index (Multi-level Rows and Columns)

A multi-level table can group data by multiple categories, such as regions, departments, and products. Here’s an example:

# Sample data
data = {
'Region': ['North', 'North', 'South', 'South', 'East', 'East', 'West', 'West'],
'Department': ['Sales', 'Tech', 'Sales', 'Tech', 'Sales', 'Tech', 'Sales', 'Tech'],
'Product': ['Laptop', 'Laptop', 'Phone', 'Phone', 'Tablet', 'Tablet', 'Laptop', 'Phone'],
'Quarter': ['Q1', 'Q2', 'Q1', 'Q2', 'Q1', 'Q2', 'Q1', 'Q2'],
'Revenue': [1000, 1500, 900, 1100, 800, 1200, 950, 1050]
}

df = pd.DataFrame(data)

# Pivot table with hierarchical index
pivot = pd.pivot_table(df,
index=['Region', 'Department'],
columns=['Quarter', 'Product'],
values='Revenue',
aggfunc='sum')

print(pivot)

Output:

             Q1                Q2           
Product Laptop Phone Tablet Laptop Phone Tablet
Region Department
East Sales 800.0 NaN 800.0 NaN NaN NaN
Tech NaN NaN 1200.0 NaN NaN NaN
North Sales 1000.0 NaN NaN NaN NaN NaN
Tech NaN NaN NaN 1500.0 NaN NaN
South Sales NaN 900.0 NaN NaN NaN NaN
Tech NaN NaN NaN NaN 1100.0 NaN
West Sales NaN NaN NaN 950.0 NaN NaN
Tech NaN NaN NaN NaN 1050.0 NaN

This creates a multi-level table that groups data by region, department, product, and quarter.


5. Pivot Table Showing Percentage Contribution

You can also calculate the percentage contribution of each category. In this case, we’ll calculate the percentage contribution of each product in overall sales:

# Sample data
data = {
'Region': ['North', 'South', 'East', 'West', 'North'],
'Product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone'],
'Sales': [1200, 800, 600, 1100, 900]
}

df = pd.DataFrame(data)

# Pivot table showing percentage contribution
pivot = pd.pivot_table(df,
index='Region',
columns='Product',
values='Sales',
aggfunc='sum',
margins=True)

# Calculate percentage contribution
pivot_percentage = pivot.div(pivot.loc['All', :], axis=1) * 100

print(pivot_percentage)

Output:

Product   Laptop       Phone      Tablet     All
Region
East NaN NaN 100.0 100.0
North 57.14 42.86 NaN 100.0
South NaN 100.0 NaN 100.0
West 100.0 NaN NaN 100.0
All 50.0 37.14 12.86 100.0

This code calculates the percentage contribution by dividing each value by the total sales and multiplying by 100.


6. Pivot Table with Filtered Values (e.g., Sales > 1000)

Sometimes, you may want to filter the table to show only specific values, like sales above 1000. Here’s an example:

# Sample data
data = {
'Region': ['North', 'South', 'East', 'West', 'North'],
'Product': ['Laptop', 'Phone', 'Laptop', 'Phone', 'Tablet'],
'Sales': [1200, 800, 600, 1100, 900]
}

df = pd.DataFrame(data)

# Create pivot table: average sales by region and product
pivot = pd.pivot_table(df,
index='Region',
columns='Product',
values='Sales',
aggfunc='mean')

# Filter: show only values where sales > 1000
filtered = pivot.where(pivot > 1000)

print(filtered)

Output:

Product   Laptop  Phone  Tablet
Region
East NaN NaN NaN
North 1200.0 NaN NaN
South NaN NaN NaN
West 1100.0 NaN NaN

The where() function is used here to filter and display values greater than 1000.

Conclusion

Pivot tables in Python, particularly using Pandas, are incredibly versatile for summarizing and analyzing large datasets. With the help of different aggregation functions, margins, and hierarchical indices, you can tailor your analysis to meet various data exploration needs. The ability to calculate percentages, filter specific values, and apply custom aggregation functions makes it a powerful tool for data analysis.


We hope this article was helpful for you and you learned a lot about data science from it. If you have friends or family members who would find it helpful, please share it to them or on social media.

Join our social media for more.

Data Science Interview Questions Data Science Interview Questions Data Science Interview Questions Data Science Interview Questions Data Science Interview Questions Data Science Interview Questions Data Science Interview Questions Data Science Interview Questions

Spread the love