Practice day 13 of Learning Python for Data Science

Test your understanding of Python Data Structure, which we learned in our previous lesson of Day 13 of Learning Python for Data Science, with these targeted practice questions.

Welcome back to Day 13 of Learning Python for Data Science journey! In the last article, we explored:

✅ Pivot
✅ Apply
✅ Regular Expressions

Now, it’s time to solve the practice questions given in the previous article.
Each question is followed by a detailed explanation and output.

Table of Contents

Find the phone number. text = “My phone number is a 9876543210 986 047.”

text = "My phone number is a 9876543210 9876543210 986 047."

result = re.search(r'\d{10}', text)
result.group() # Returns first occurance of the match.

result1 = re.findall(r'\d{10}', text)
result1 # Returns all occurance of the match.

Output:

['9876543210', '9876543210']

Identify valid email addresses from a list.

emails = [“[email protected]”,”[email protected]”,”[email protected]”,”[email protected]”,”[email protected]”,”[email protected]”,”[email protected]”,”[email protected]”,”[email protected]”,”\”user@strange\”@example.com”, “[email protected]”, “[email protected]”]

[i for i in emails if re.search(r'[\w\._-]+@[\w_-]+\.\w+',i)]

Output:

['[email protected]',  '[email protected]',  '[email protected]',  '[email protected]',  '[email protected]',  '[email protected]',  '[email protected]',  '[email protected]', '[email protected]',  '[email protected]',  '[email protected]']

Extract all hashtags from a sentence. text = “I love #Python and #MachineLearning!”

re.findall(r'#\w+', text)

Output:

['#Python', '#MachineLearning']

Replace all dates in the format YYYY-MM-DD with [DATE]. text = “The meeting is on 2024-12-22.”

re.sub(r'\d+-\d+-\d+', "[DATE]", text)

Output:

'The meeting is on [DATE].'

Extract phone numbers from a string. text = “Call me at 9876543210 or at 1234567890.”

re.findall(r'\d{10}', text)

Output:

['9876543210', '1234567890']

Use `re.search()` to extract the first number from a sentence.

text = "I walked 5 kilometers today and burned 300 calories."

re.search(r'\d', text).group()

Output:

'5'

Use `re.match()` to check if a string starts with a capital letter.

text = "I walked 5 kilometers today and burned 300 calories."

re.match(r'[A-Z]', text).group() # re.match() returns an object .group() is used to extract the result.

Output:

'I'

Use `re.findall()` to extract all email addresses from a paragraph.

paragraph = “””
Please reach out to our team at [email protected] or [email protected].
For press inquiries, contact [email protected]. You can also connect with [email protected].
“””

re.findall(r'[\w.+_]+@[\w.+_].[\w.+_]+', paragraph)

Output:

['[email protected]',  '[email protected].',  '[email protected].',  '[email protected].']

Extract all phone numbers of pattern `xxx-xxx-xxxx` from text using `re.findall()`.

text = "You can reach us at 123-456-7890 or 987-654-3210. Emergency contact: 555-000-1111. Invalid: 12-3456-7890 or 1234567890."

re.findall(r'\d{3}-\d{3}-\d{4}', text)

Output:

['123-456-7890', '987-654-3210', '555-000-1111']

Replace all digits in a string with `#` using `re.sub()`.

text = "You can reach us at 123-456-7890 or 987-654-3210."

re.sub(r'\d', '#', text)

Output:

'You can reach us at ###-###-#### or ###-###-####.'

Extract all hashtags from a tweet using regex.

tweet = "Loving the vibes at the beach! 🌊 #sunset #vacation #RelaxMode"

re.findall(r'#\w+', tweet)

Output:

['#sunset', '#vacation', '#RelaxMode']

Validate if a given string is a valid date in format `YYYY-MM-DD`.

text = "The project began on 2023-05-12 and saw major updates on 2023-11-28 and 2024-02-30."

re.findall(r'\d{4}-\d{2}-\d{2}', text)

Output:

['2023-05-12', '2023-11-28', '2024-02-30']

Extract all words that start with capital letters and are more than 5 characters long.

text = "Alexander traveled to California for the International Conference on Data Science. While there, he met Benjamin and participated in Workshops and Seminars hosted by Google and Microsoft."

re.findall(r'\b[A-Z][a-zA-Z]{5,}\b', text)

Output:

['Alexander',  'California',  'International',  'Conference',  'Science',  'Benjamin',  'Workshops',  'Seminars',  'Google', Microsoft']

Remove all special characters except alphabets and numbers from a text string.

text = "Hello is this for > $12"

re.sub(r'[^A-Za-z0-9]', '', text)

Output:

Helloisthisfor12

Use regex to split a sentence into words ignoring punctuation.

sentence = "Hello, world! How's everything going?"

re.findall(r'\b\w+\b', sentence)

Output:

['Hello', 'world', 'How', 's', 'everything', 'going']

Create a DataFrame with sales data and pivot it.

data = {
    "Date": ["2024-04-01", "2024-04-01", "2024-04-02", "2024-04-02", "2024-04-03"],
    "Product": ["Laptop", "Phone", "Tablet", "Laptop", "Phone"],
    "Units Sold": [5, 10, 3, 2, 7],
    "Unit Price": [1000, 500, 300, 1000, 500],
    "Region": ["North", "South", "East", "West", "North"]
}

# Create DataFrame
df = pd.DataFrame(data)

df['Total Sales'] = df['Unit Price']*df['Units Sold']

df.pivot(index = 'Region', columns = 'Product', values= 'Total Sales').fillna(0)

Output:

Product	Laptop	Phone	Tablet
Region			
East	        0.0	        0.0	        900.0
North	5000.0	3500.0	0.0
South	0.0	        5000.0	0.0
West	2000.0	0.0	        0.0

Write a function to categorize temperatures (`Cold`, `Warm`, `Hot`) and apply it to a DataFrame.

data = {
    "Date": ["2024-04-01", "2024-04-01", "2024-04-02", "2024-04-02", "2024-04-03"],
    "City": ["Delhi", "Mumbai", "Delhi", "Mumbai", "Delhi"],
    "Temperature (C)": [15, 20, 36, 31, 37],
    "Condition": ["Sunny", "Cloudy", "Sunny", "Rainy", "Hot"]
}

# Create DataFrame
df = pd.DataFrame(data)

df['temp_cat'] = df['Temperature (C)'].apply(lambda x : 'Hot' if x > 20 else ('Warm' if x > 15 and x < 20 else 'Cold'))

Output:

Calculate total pay for employees given hours worked and hourly rate.

df["Total Pay"] = df["Hours Worked"] * df["Hourly Rate"]

Output:

  Employee  Hours Worked  Hourly Rate  Total Pay
0    Alice             40           20        800
1      Bob             35           22        770
2  Charlie             45           18        810
3    Diana             30           25        750
4    Ethan             50           15        750

Create a pivot table showing the average salary by department.

pivot_table = df.pivot_table(index='Department', values='Salary', aggfunc='mean')

Output:

            Salary
Department         
Finance     62500.0
HR          51000.0
IT          71000.0

Count the number of employees in each city using a pivot table.

pivot_table = df.pivot_table(index='City', values='Employee', aggfunc='count')

Output:

             Employee
City                 
Chicago             2
Los Angeles         2
New York            2

Create a pivot table showing total sales by product.

pivot_table = df.pivot_table(index='Product', values='Total Sales', aggfunc='sum')

Output:

         Total Sales
Product              
Laptop         7000
Phone          8500
Tablet         2100

Create a pivot table showing the average age by gender and department.

pivot_table = df.pivot_table(index=['Gender', 'Department'], values='Age', aggfunc='mean')

Output:

                                      Age
Gender Department      
Female Finance             30.0
       HR                          26.5
       IT                            22.0
Male   Finance               30.0
       HR                          28.0
       IT                           35.0

Create a pivot table with multiple aggregation functions (mean and sum) for sales by region.

pivot_table = df.pivot_table(
    index='Region', 
    values='Total Sales', 
    aggfunc={'Total Sales': ['sum', 'mean']}
)

Output:

                        Total Sales               
                        sum     mean
Region                         
East               2100    700.0000
North           10500   3500.0000
South           6500    3250.0000
West            5600    2800.0000

Create a pivot table showing percentage contribution of each product in overall sales.

# Pivot table: Total sales by product
pivot_table = df.pivot_table(
    index='Product', 
    values='Total Sales', 
    aggfunc='sum'
)

# Calculate overall total sales
overall_sales = pivot_table['Total Sales'].sum()

pivot_table['Percentage Contribution'] = (pivot_table['Total Sales'] / overall_sales) * 100

Output:

                    Total Sales  Percentage Contribution
Product                                    
Laptop         15000                50.000000
Phone          10500                35.000000
Tablet           3000                  15.000000

Create a pivot table with custom aggregation (e.g., standard deviation of scores by subject).

pivot_std = pd.pivot_table(
    df,
    index='Subject',
    values='Score',
    aggfunc=np.std
)

Output:

               Score
Subject        
English  2.516611
Math     3.511885
Science  2.054805

Use margins in a pivot table to include row and column totals.

# Pivot table with margins
pivot_table = pd.pivot_table(
    df,
    index='Region',
    columns='Product',
    values='Sales',
    aggfunc='sum',
    margins=True,
    margins_name='Total'  # Custom name for totals
)

Output:

Product   Laptop  Phone  Tablet  Total
Region                                
East        1300    500     NaN   1800
North       1000   1500     NaN   2500
South       1200    NaN     800   2000
West         NaN    600     700   1300
Total       3500   2600    1500   7600

margins=True: Adds row and column totals.

margins_name='Total': Customizes the label used for totals (default is "All").

Missing values are shown as NaN where no data is available.

Create a pivot table with a hierarchical index (multi-level rows and columns).

pivot_table = pd.pivot_table(
    df,
    index=['Region', 'Department'],
    columns=['Quarter', 'Product'],
    values='Revenue',
    aggfunc='sum'
)

Output:

Filter a pivot table to show only values where the average sales exceed a threshold.

pivot = pd.pivot_table(
    df,
    index='Region',
    columns='Product',
    values='Sales',
    aggfunc='mean'
)


filtered = pivot.where(pivot > 1000)

Output:

Product  Laptop   Phone  Tablet
Region                          
East       NaN     NaN   1100.0
North    1200.0    NaN     NaN
South    1300.0    NaN     NaN
West     1250.0    NaN     NaN

Use `.apply()` to calculate the length of strings in a Name column.

# Sample data with a 'Name' column
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40]
}

# Create DataFrame
df = pd.DataFrame(data)

# Calculate the length of strings in the 'Name' column using apply()
df['Name Length'] = df['Name'].apply(len)

print(df)

Output:

         Name       Age  Name Length
0         Alice         25            5
1         Bob          30            3
2         Charlie     35            7
3         David       40            5

Use `.apply()` to convert a list of dates from string to `datetime`.

df['Date'] = df['Date'].apply(pd.to_datetime)

Output:

        Date
0 2024-04-01
1 2024-05-01
2 2024-06-01
3 2024-07-01

Apply a function to standardize a column (z-score).

# Define function to calculate Z-score
def z_score(x, mean, std):
    return (x - mean) / std

# Calculate the mean and standard deviation
mean = df['Scores'].mean()
std = df['Scores'].std()

# Apply the function to the 'Scores' column to standardize it
df['Z-Score'] = df['Scores'].apply(z_score, args=(mean, std))

print(df)

Output:

   Scores   Z-Score
0      85    -1.264911
1      90    -0.632456
2      95     0.000000
3     100    0.632456
4     105    1.264911

Use row-wise `.apply()` to calculate total marks from multiple score columns.

# Define a function to calculate the total marks for each row
def total_marks(row):
    return row.sum()

# Apply the function row-wise (axis=1)
df['Total Marks'] = df.apply(total_marks, axis=1)

Output:

   Math  English  Science  Total Marks
0    85       78       90          253
1    90       88       85          263
2    95       92       95          282
3   100      85      100         285

Apply a function that flags customers as “High” or “Low” value based on total purchase.

# Sample sales data
data = {
    'Customer ID': [1, 2, 3, 4, 5],
    'Product': ['Laptop', 'Phone', 'Tablet', 'Laptop', 'Phone'],
    'Quantity': [2, 5, 1, 3, 4],
    'Unit Price': [1000, 500, 300, 1000, 500]
}

# Create DataFrame
df = pd.DataFrame(data)

# Calculate total purchase (Total = Quantity * Unit Price)
df['Total Purchase'] = df['Quantity'] * df['Unit Price']

# Function to flag as 'High' or 'Low' value customer based on total purchase
def flag_customer(row):
    if row['Total Purchase'] > 2000:
        return 'High'
    else:
        return 'Low'

# Apply the function to create a new column 'Customer Value'
df['Customer Value'] = df.apply(flag_customer, axis=1)

print(df)

Output:

   Customer ID     Product     Quantity     Unit Price     Total Purchase    Customer Value
0            1               Laptop         2              1000                 2000                   Low
1            2               Phone          5               500                  2500                   High
2            3               Tablet           1               300                  300                     Low
3            4               Laptop         3              1000                 3000                   High
4            5               Phone          4                500                 2000                   Low

Use `.apply()` to extract the domain name from an email address column.

# Function to extract domain name from an email address
def extract_domain(email):
    return email.split('@')[1]

# Apply the function to extract domain names
df['Domain'] = df['Email'].apply(extract_domain)

Output:

      Name                Email                  Domain
0     John     [email protected]   example.com
1    Alice    [email protected]     company.org
2      Bob       [email protected]     domain.net
3  Charlie  [email protected]   webmail.com

Apply a lambda function that returns different values based on multiple conditions (e.g., risk score).

df['Risk Level'] = df['Score'].apply(lambda x: 'High Risk' if x < 50 else ('Medium Risk' if x < 75 else 'Low Risk'))

Output:

      Name   Score   Risk Level
0     John      85       Low Risk
1     Alice      42       High Risk
2     Bob       73       Medium Risk
3     Charlie   91      Low Risk

Use `.apply()` on grouped data to calculate the range of each group.

# Sample data with sales
data = {
    'Product': ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'],
    'Sales': [250, 300, 350, 400, 450, 100, 150, 200]
}

# Create DataFrame
df = pd.DataFrame(data)

# Group by 'Product' and apply a lambda function to calculate the range (max - min)
range_df = df.groupby('Product')['Sales'].apply(lambda x: x.max() - x.min()).reset_index()

# Rename the column for clarity
range_df.columns = ['Product', 'Sales Range']

print(range_df)

Output:

  Product  Sales Range
0       A          100
1       B           50
2       C          100

Create a new column that computes weighted average of other columns using `.apply()`.

# Sample data with scores and their corresponding weights
data = {
    'Subject1_Score': [90, 85, 88, 92],
    'Subject2_Score': [80, 75, 78, 85],
    'Subject1_Weight': [0.6, 0.7, 0.5, 0.4],
    'Subject2_Weight': [0.4, 0.3, 0.5, 0.6]
}

# Create DataFrame
df = pd.DataFrame(data)

# Define a function to calculate the weighted average for each row
def weighted_average(row):
    total_score = (row['Subject1_Score'] * row['Subject1_Weight']) + (row['Subject2_Score'] * row['Subject2_Weight'])
    total_weight = row['Subject1_Weight'] + row['Subject2_Weight']
    return total_score / total_weight

# Apply the weighted_average function to each row
df['Weighted_Avg'] = df.apply(weighted_average, axis=1)

# Display the DataFrame with the new column
print(df)

Output:

   Subject1_Score  Subject2_Score  Subject1_Weight  Subject2_Weight  Weighted_Avg
0               90                    80                    0.6                       0.4                       85.000000
1               85                    75                    0.7                       0.3                       81.000000
2               88                    78                    0.5                       0.5                       83.000000
3               92                    85                    0.4                       0.6                       88.000000

Combine `.apply()` with `np.where` or nested `if-else` logic for complex row-wise classification.

# Sample data with age and income columns
data = {
    'Age': [25, 35, 29, 40, 22],
    'Income': [35000, 50000, 29000, 60000, 25000]
}

# Create DataFrame
df = pd.DataFrame(data)

# Use np.where() for classification
df['Risk'] = np.where((df['Age'] < 30) & (df['Income'] < 40000), 'High Risk', 
                  np.where((df['Age'] >= 30) & (df['Income'] >= 40000), 'Low Risk', 'Medium Risk'))

print(df)

Output:

   Age  Income         Risk
0   25   35000    High Risk
1   35   50000     Low Risk
2   29   29000    High Risk
3   40   60000     Low Risk
4   22   25000    High Risk

We hope this article was helpful for you and you learned a lot about data science from it. If you have friends or family members who would find it helpful, please share it to them or on social media.

Join our social media for more.

Join us on Telegram

Follow our WhatsApp Channel

Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science

Also Read:

Vishal

Hi, I am Vishal Jaiswal, I have about a decade of experience of working in MNCs like Genpact, Savista, Ingenious. Currently i am working in EXL as a senior quality analyst. Using my writing skills i want to share the experience i have gained and help as many as i can.

Spread the love

Find the phone number. text = “My phone number is a 9876543210 986 047.”

Identify valid email addresses from a list.

Extract all hashtags from a sentence. text = “I love #Python and #MachineLearning!”

Replace all dates in the format YYYY-MM-DD with [DATE]. text = “The meeting is on 2024-12-22.”

Extract phone numbers from a string. text = “Call me at 9876543210 or at 1234567890.”

Use re.search() to extract the first number from a sentence.

Use re.match() to check if a string starts with a capital letter.

Use re.findall() to extract all email addresses from a paragraph.

Extract all phone numbers of pattern xxx-xxx-xxxx from text using re.findall().

Replace all digits in a string with # using re.sub().

Extract all hashtags from a tweet using regex.

Validate if a given string is a valid date in format YYYY-MM-DD.

Extract all words that start with capital letters and are more than 5 characters long.

Remove all special characters except alphabets and numbers from a text string.

Use regex to split a sentence into words ignoring punctuation.

Create a DataFrame with sales data and pivot it.

Write a function to categorize temperatures (Cold, Warm, Hot) and apply it to a DataFrame.

Calculate total pay for employees given hours worked and hourly rate.

Create a pivot table showing the average salary by department.

Count the number of employees in each city using a pivot table.

Create a pivot table showing total sales by product.

Create a pivot table showing the average age by gender and department.

Create a pivot table with multiple aggregation functions (mean and sum) for sales by region.

Create a pivot table showing percentage contribution of each product in overall sales.

Create a pivot table with custom aggregation (e.g., standard deviation of scores by subject).

Use margins in a pivot table to include row and column totals.

Create a pivot table with a hierarchical index (multi-level rows and columns).

Filter a pivot table to show only values where the average sales exceed a threshold.

Use .apply() to calculate the length of strings in a Name column.

Use .apply() to convert a list of dates from string to datetime.

Apply a function to standardize a column (z-score).

Use row-wise .apply() to calculate total marks from multiple score columns.

Apply a function that flags customers as “High” or “Low” value based on total purchase.

Use .apply() to extract the domain name from an email address column.

Apply a lambda function that returns different values based on multiple conditions (e.g., risk score).

Use .apply() on grouped data to calculate the range of each group.

Create a new column that computes weighted average of other columns using .apply().

Combine .apply() with np.where or nested if-else logic for complex row-wise classification.

Also Read:

Use `re.search()` to extract the first number from a sentence.

Use `re.match()` to check if a string starts with a capital letter.

Use `re.findall()` to extract all email addresses from a paragraph.

Extract all phone numbers of pattern `xxx-xxx-xxxx` from text using `re.findall()`.

Replace all digits in a string with `#` using `re.sub()`.

Validate if a given string is a valid date in format `YYYY-MM-DD`.

Write a function to categorize temperatures (`Cold`, `Warm`, `Hot`) and apply it to a DataFrame.

Use `.apply()` to calculate the length of strings in a Name column.

Use `.apply()` to convert a list of dates from string to `datetime`.

Use row-wise `.apply()` to calculate total marks from multiple score columns.

Use `.apply()` to extract the domain name from an email address column.

Use `.apply()` on grouped data to calculate the range of each group.

Create a new column that computes weighted average of other columns using `.apply()`.

Combine `.apply()` with `np.where` or nested `if-else` logic for complex row-wise classification.