Mastering Pivot Table in Python: A Comprehensive Guide

Pivot tables are a powerful tool for summarizing and analyzing data, and Python’s Pandas library provides an easy-to-use and efficient way to create them. In this article, we’ll walk through different types of pivot tables and demonstrate how to implement them using Python code, including the outputs generated by each example.

Also Read: Understanding Dynamic Sets in Tableau

Table of Contents

1. Basic Pivot Tables

It allow us to summarize data in a more meaningful way by organizing data around index and column values. Here’s how you can create a basic one that sums the sales by region and product:

# Sample data
data = {
    'Region': ['North', 'South', 'East', 'West', 'North'],
    'Product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone'],
    'Sales': [1200, 800, 600, 1100, 900]
}

df = pd.DataFrame(data)

# Basic pivot table
pivot = pd.pivot_table(df, index='Region', columns='Product', values='Sales', aggfunc='sum')

print(pivot)

Output:

Product  Laptop  Phone  Tablet
Region                          
East        NaN    NaN   600.0
North    1200.0  900.0     NaN
South      NaN    800.0     NaN
West     1100.0    NaN     NaN

This output shows the sum of sales for each product in each region.

2. Pivot Tables with Multiple Aggregation Functions

You can also apply multiple aggregation functions to the same column. For example, we can sum sales and calculate the average quantity sold:

# Sample data
data = {
    'Region': ['North', 'South', 'East', 'West', 'North'],
    'Product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone'],
    'Sales': [1200, 800, 600, 1100, 900],
    'Quantity': [5, 2, 4, 3, 6]
}

df = pd.DataFrame(data)

# Pivot table with multiple aggregation functions
pivot = pd.pivot_table(df, 
                       index='Region', 
                       columns='Product', 
                       values=['Sales', 'Quantity'], 
                       aggfunc={'Sales': 'sum', 'Quantity': 'mean'})

print(pivot)

Output:

           Sales                 Quantity             
Product  Laptop Phone Tablet Laptop Phone Tablet
Region                                         
East       NaN   NaN   600.0    NaN   NaN   4.0
North   1200.0  900.0    NaN   5.0   6.0   NaN
South     NaN   800.0    NaN    NaN   2.0   NaN
West    1100.0   NaN    NaN   3.0   NaN   NaN

This provides both the sum of sales and the average quantity sold for each region and product.

3. Pivot Table with Margins (Row and Column Totals)

Margins are useful for adding totals to rows and columns. Here’s how you can include them:

# Sample data
data = {
    'Region': ['North', 'South', 'East', 'West', 'North'],
    'Product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone'],
    'Sales': [1200, 800, 600, 1100, 900]
}

df = pd.DataFrame(data)

# Pivot table with margins
pivot = pd.pivot_table(df, 
                       index='Region', 
                       columns='Product', 
                       values='Sales', 
                       aggfunc='sum', 
                       margins=True)

print(pivot)

Output:

Product  Laptop  Phone  Tablet  All
Region                                 
East        NaN    NaN   600.0  600.0
North    1200.0  900.0     NaN  2100.0
South      NaN    800.0     NaN  800.0
West     1100.0    NaN     NaN  1100.0
All      2300.0  1700.0   600.0  4600.0

The margins=True argument adds a row and column at the end with totals for each region and product.

4. Pivot Table with Hierarchical Index (Multi-level Rows and Columns)

A multi-level table can group data by multiple categories, such as regions, departments, and products. Here’s an example:

# Sample data
data = {
    'Region': ['North', 'North', 'South', 'South', 'East', 'East', 'West', 'West'],
    'Department': ['Sales', 'Tech', 'Sales', 'Tech', 'Sales', 'Tech', 'Sales', 'Tech'],
    'Product': ['Laptop', 'Laptop', 'Phone', 'Phone', 'Tablet', 'Tablet', 'Laptop', 'Phone'],
    'Quarter': ['Q1', 'Q2', 'Q1', 'Q2', 'Q1', 'Q2', 'Q1', 'Q2'],
    'Revenue': [1000, 1500, 900, 1100, 800, 1200, 950, 1050]
}

df = pd.DataFrame(data)

# Pivot table with hierarchical index
pivot = pd.pivot_table(df, 
                       index=['Region', 'Department'], 
                       columns=['Quarter', 'Product'], 
                       values='Revenue', 
                       aggfunc='sum')

print(pivot)

Output:

             Q1                Q2           
Product        Laptop Phone Tablet Laptop Phone Tablet
Region Department                                
East   Sales     800.0    NaN   800.0    NaN    NaN    NaN
      Tech      NaN     NaN   1200.0   NaN    NaN   NaN
North  Sales    1000.0   NaN     NaN    NaN    NaN    NaN
      Tech      NaN     NaN     NaN    1500.0 NaN   NaN
South  Sales     NaN     900.0   NaN    NaN     NaN    NaN
      Tech      NaN     NaN     NaN    NaN    1100.0  NaN
West   Sales     NaN     NaN     NaN    950.0  NaN    NaN
      Tech      NaN     NaN     NaN    NaN     1050.0 NaN

This creates a multi-level table that groups data by region, department, product, and quarter.

5. Pivot Table Showing Percentage Contribution

You can also calculate the percentage contribution of each category. In this case, we’ll calculate the percentage contribution of each product in overall sales:

# Sample data
data = {
    'Region': ['North', 'South', 'East', 'West', 'North'],
    'Product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone'],
    'Sales': [1200, 800, 600, 1100, 900]
}

df = pd.DataFrame(data)

# Pivot table showing percentage contribution
pivot = pd.pivot_table(df, 
                       index='Region', 
                       columns='Product', 
                       values='Sales', 
                       aggfunc='sum', 
                       margins=True)

# Calculate percentage contribution
pivot_percentage = pivot.div(pivot.loc['All', :], axis=1) * 100

print(pivot_percentage)

Output:

Product   Laptop       Phone      Tablet     All
Region                                       
East        NaN        NaN         100.0    100.0
North     57.14      42.86          NaN     100.0
South       NaN      100.0          NaN     100.0
West      100.0        NaN          NaN     100.0
All        50.0       37.14      12.86    100.0

This code calculates the percentage contribution by dividing each value by the total sales and multiplying by 100.

6. Pivot Table with Filtered Values (e.g., Sales > 1000)

Sometimes, you may want to filter the table to show only specific values, like sales above 1000. Here’s an example:

# Sample data
data = {
    'Region': ['North', 'South', 'East', 'West', 'North'],
    'Product': ['Laptop', 'Phone', 'Laptop', 'Phone', 'Tablet'],
    'Sales': [1200, 800, 600, 1100, 900]
}

df = pd.DataFrame(data)

# Create pivot table: average sales by region and product
pivot = pd.pivot_table(df, 
                       index='Region', 
                       columns='Product', 
                       values='Sales', 
                       aggfunc='mean')

# Filter: show only values where sales > 1000
filtered = pivot.where(pivot > 1000)

print(filtered)

Output:

Product   Laptop  Phone  Tablet
Region                          
East        NaN    NaN     NaN
North    1200.0    NaN     NaN
South      NaN    NaN     NaN
West     1100.0    NaN     NaN

The where() function is used here to filter and display values greater than 1000.

Conclusion

Pivot tables in Python, particularly using Pandas, are incredibly versatile for summarizing and analyzing large datasets. With the help of different aggregation functions, margins, and hierarchical indices, you can tailor your analysis to meet various data exploration needs. The ability to calculate percentages, filter specific values, and apply custom aggregation functions makes it a powerful tool for data analysis.

We hope this article was helpful for you and you learned a lot about data science from it. If you have friends or family members who would find it helpful, please share it to them or on social media.

Join our social media for more.

Join us on Telegram

Follow our WhatsApp Channel

Data Science Interview Questions Data Science Interview Questions Data Science Interview Questions Data Science Interview Questions Data Science Interview Questions Data Science Interview Questions Data Science Interview Questions Data Science Interview Questions

Also Read

Vishal

Hi, I am Vishal Jaiswal, I have about a decade of experience of working in MNCs like Genpact, Savista, Ingenious. Currently i am working in EXL as a senior quality analyst. Using my writing skills i want to share the experience i have gained and help as many as i can.

Spread the love