Day 13 of Learning Python for Data Science: Mastering Pivot, Apply and RegEx
Welcome to Day 13 of Learning Python for Data Science! Today, we’re focusing on three powerful tools in Pandas that can take your data analysis skills to the next level: Pivot, Apply, and Regular Expressions (RegEx). These features allow for advanced data transformation, customization, and pattern matching. Pivot tables help you restructure and summarize data effortlessly, Apply lets you run custom functions across rows or columns for flexible processing, and RegEx enables sophisticated string pattern searching and cleaning. Mastering these tools will give you greater control and insight when working with complex datasets in real-world projects.
Day 12 of Learning Python for Data Science – Pandas
In data analysis, a pivot table is a valuable tool for transforming or summarizing datasets. In pandas, the pivot()
and pivot_table()
functions provide the ability to reorganize and explore data with ease — enabling better analysis, similar to Excel’s pivot table functionality.
pivot()
vs pivot_table()
?pivot()
is used when the index/column combination is unique and doesn’t require aggregation.pivot_table()
is used when duplicate entries exist, and an aggregation function (like sum
, mean
) is needed to resolve them.pivot()
and pivot_table()
import pandas as pd
# Sample DataFrame
data = {
'Employee': ['Alice', 'Bob', 'Alice', 'Bob'],
'Department': ['HR', 'HR', 'IT', 'IT'],
'Hours': [5, 6, 7, 8]
}
df = pd.DataFrame(data)
pivot()
– requires unique combinationsdf.pivot(index='Employee', columns='Department', values='Hours')
Output:
Department HR IT
Employee
Alice 5 7
Bob 6 8
pivot_table()
– handles duplicates and performs aggregationdf.pivot_table(index='Employee', columns='Department', values='Hours', aggfunc='sum')
pivot_table()
Parameter | Description |
---|---|
index | Column(s) to use as row index |
columns | Column(s) to use as columns |
values | Column(s) to aggregate |
aggfunc | Aggregation function to apply (default is mean ) |
fill_value | Value to replace missing entries (NaN) |
fill_value
df.pivot_table(index='Employee', columns='Department', values='Hours', aggfunc='sum', fill_value=0)
This ensures you don’t have NaN
in your output.
The apply()
function in pandas allows you to apply a function along an axis of the DataFrame (either rows or columns). It’s widely used for row-wise or column-wise transformations without writing for-loops.
DataFrame.apply(func, axis=0)
func
: The function to applyaxis
: 0 for columns (default), 1 for rowsimport pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Score1': [85, 90, 95],
'Score2': [80, 88, 92]
}
df = pd.DataFrame(data)
print(df)
# Output:
# Name Score1 Score2
# 0 Alice 85 80
# 1 Bob 90 88
# 2 Charlie 95 92
def average(row):
return (row['Score1'] + row['Score2']) / 2
# Apply row-wise function
df['Average'] = df.apply(average, axis=1)
print(df)
Output:
Name Score1 Score2 Average
0 Alice 85 80 82.5
1 Bob 90 88 89.0
2 Charlie 95 92 93.5
re
module)Regex (Regular Expressions) is a powerful tool for searching and manipulating strings based on patterns. Python provides support for regex via the built-in re
module.
re
Module with ExamplesSearch for pattern anywhere in string
import re
text = "I have 2 apples"
result = re.search(r'\d+', text)
print(result.group())
Output:
re.search(): 2
Match pattern only at the start of string
text2 = "2023 Report Released"
result = re.match(r'\d+', text2)
print(result.group())
Output:
2023
Find all matching patterns
sentence = "The order numbers are 123, 456 and 789."
results = re.findall(r'\d+', sentence)
print(results)
Output:
['123', '456', '789']
Replace matched patterns with a string
messy = "Error: 404 Not Found"
cleaned = re.sub(r'\d+', 'XXX', messy)
print(cleaned)
Output:
Error: XXX Not Found
Common Regex Symbols and Their Meaning
Symbol | Description |
---|---|
. | Matches any character except newline |
^ | Start of string |
$ | End of string |
* | 0 or more repetitions |
+ | 1 or more repetitions |
? | 0 or 1 repetition or makes quantifier non-greedy |
{n} | Exactly n repetitions |
{n,} | n or more repetitions |
{n,m} | Between n and m repetitions |
[] | Matches one character in the set |
[^] | Matches one character not in the set |
\d | Digit [0-9] |
\D | Non-digit |
\w | Alphanumeric [a-zA-Z0-9_] |
\W | Non-alphanumeric |
\s | Whitespace |
\S | Non-whitespace |
() | Capturing group |
(?:) | Non-capturing group |
Practice
Cold
, Warm
, Hot
) and apply it to a DataFrame..apply()
to calculate the length of strings in a Name column..apply()
to convert a list of dates from string to datetime
..apply()
to calculate total marks from multiple score columns..apply()
to extract the domain name from an email address column..apply()
on grouped data to calculate the range of each group..apply()
..apply()
with np.where
or nested if-else
logic for complex row-wise classification.re.search()
to extract the first number from a sentence.re.match()
to check if a string starts with a capital letter.re.findall()
to extract all email addresses from a paragraph.xxx-xxx-xxxx
from text using re.findall()
.#
using re.sub()
.YYYY-MM-DD
.We hope this article was helpful for you and you learned a lot about data science from it. If you have friends or family members who would find it helpful, please share it to them or on social media.
Join our social media for more.
Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science
Hi, I am Vishal Jaiswal, I have about a decade of experience of working in MNCs like Genpact, Savista, Ingenious. Currently i am working in EXL as a senior quality analyst. Using my writing skills i want to share the experience i have gained and help as many as i can.
Test your understanding of Python Data Structure, which we learned in our previous lesson of…
Welcome to Day 12 of Learning Python for Data Science. Today, we’ll dive into Pandas,…
NumPy Array in Python is a powerful library for numerical computing in Python. It provides…
Welcome to Day 9 of Learning Python for Data Science. Today we will explore comprehensions,…
Test your understanding of Python Data Structure, which we learned in our previous lesson of…
Welcome to Day 8 of Learning Python for Data Science. Today we will explore Functions…