Blog

Day 12 of Learning Python for Data Science – Pandas

Welcome to Day 12 of Learning Python for Data Science. Today, we’ll dive into Pandas, one of the most essential libraries for data manipulation and analysis in Python. Pandas provides powerful, easy-to-use data structures like Series and DataFrames that simplify handling structured data. In this article, we’ll explore how to load, inspect, clean, and transform data using Pandas — key skills for any data science project. Whether you’re working with CSV files, handling missing data, or performing group operations, Pandas offers efficient tools to streamline the entire process.

Day 10 Of Learning Python for Data Science – NumPy Array In Python

Creating & Viewing DataFrames

Creating a DataFrame

import pandas as pd
import numpy as np

data = {'Name': ['Sneha', 'Alice', 'Bob', 'Peter'],
        'Age': [27, 28, np.nan, 20],
        'City': ['Hyd', 'Guj', 'Luk', 'Blg']}

df = pd.DataFrame(data)
print(df)

Viewing Data

print(df.head(2))  # First 2 rows
print(df.tail(2))  # Last 2 rows
print(df.info())   # Summary of DataFrame
print(df.shape)    # Number of rows & columns
print(df.columns)  # List of column names
print(df.index)    # Row labels
print(df.describe())  # Statistical summary

Indexing & Selection

print(df['Name'])         # Select single column
print(df[['Name', 'City']])  # Select multiple columns
print(df.loc[1, 'Name'])  # Label-based indexing
print(df.iloc[0:2, 0:3])  # Position-based indexing
print(df.at[1, 'Name'])   # Fast access by label
print(df.iat[1, 1])       # Fast access by index position

Data Manipulation

Sorting & Dropping

df.sort_values(by='Age', ascending=False, inplace=True)
df.drop(columns=['City'], inplace=True)

Renaming & Indexing

df.rename(columns={'Age': 'Years'}, inplace=True)
df.set_index('Name', inplace=True)
df.reset_index(inplace=True)

Handling Missing Values

df.fillna(df['Age'].mean(), inplace=True)
df.dropna(inplace=True)

Changing Data Types

df['Age'] = df['Age'].astype(int)

Aggregation & Grouping

print(df.groupby('City').mean())
print(df.agg({'Age': 'sum', 'City': 'count'}))
print(df.transform(lambda x: x - x.mean()))

Data Joining & Merging

df1 = pd.DataFrame({'ID': [1, 2], 'Name': ['Alice', 'Bob']})
df2 = pd.DataFrame({'ID': [1, 2], 'City': ['NY', 'LA']})
merged_df = pd.merge(df1, df2, on='ID')
print(merged_df)
concat_df = pd.concat([df1, df2], axis=1)
print(concat_df)

String Operations

df['Name'] = df['Name'].str.upper()
print(df['Name'].str.contains('ALICE'))
df['Name'] = df['Name'].str.replace('SNEHA', 'SARA')

Handling Time Series Data

df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
print(df.resample('M').sum())
print(df.shift(1))

Numerical Operations

print(df.sum())
print(df.mean())
print(df.median())
print(df.min())
print(df.max())
print(df.std())
print(df.cumsum())
print(df.cumprod())

Pivoting & Melting

pivot_df = df.pivot(index='City', columns='Name', values='Age')
print(pivot_df)
melted_df = df.melt(id_vars=['City'], value_vars=['Age'])
print(melted_df)

Visualization

df.plot(kind='line')
df.hist()

Practice Questions

Beginner:

  1. Create a DataFrame from a dictionary.
  2. Display the first 5 rows of a DataFrame.
  3. Retrieve a single column as a Series.
  4. Find the number of missing values in each column.
  5. Sort the DataFrame by a specific column.
  6. Select rows where a column’s value is greater than 50.
  7. Rename a column.
  8. Drop a column from the DataFrame.
  9. Fill missing values with the mean.
  10. Add a new column to the DataFrame.

Intermediate:

  1. Select a subset of rows and columns using loc and iloc.
  2. Convert a column to a different data type.
  3. Find the average of all numerical columns.
  4. Count the number of unique values in a column.
  5. Use groupby() to find the mean of a column by category.
  6. Merge two DataFrames using merge().
  7. Find the most frequent value in a column.
  8. Compute cumulative sum of a column.
  9. Apply a custom function to transform a column.
  10. Use pivot_table() to summarize data.

Advanced:

  1. Reshape a DataFrame using melt().
  2. Create a time series and resample it to monthly data.
  3. Use applymap() to apply a function to every element in the DataFrame.
  4. Use map() to apply a function to a Series.
  5. Create a scatter plot using Pandas.
  6. Compute rolling averages using rolling().
  7. Rank values in a column.
  8. Filter rows based on multiple conditions.
  9. Use .xs() to slice data from a multi-index DataFrame.
  10. Implement a lambda function inside apply() for custom transformations.

We hope this article was helpful for you and you learned a lot about data science from it. If you have friends or family members who would find it helpful, please share it to them or on social media.

Join our social media for more.

Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science

Spread the love

Recent Posts

Mastering Pivot Table in Python: A Comprehensive Guide

Pivot tables are a powerful tool for summarizing and analyzing data, and Python’s Pandas library…

3 weeks ago

Data Science Interview Questions Section 3: SQL, Data Warehousing, and General Analytics Concepts

Welcome to Section 3 of our Data Science Interview Questions series! In this part, we…

4 weeks ago

Data Science Interview Questions Section 2: 25 Questions Designed To Deepen Your Understanding

Welcome back to our Data Science Interview Questions series! In the first section, we explored…

4 weeks ago

Data Science Questions Section 1: Data Visualization & BI Tools (Power BI, Tableau, etc.)

Data Science Questions in Section 1 focus on the essential concepts of Data Visualization and…

4 weeks ago

Optum Interview Questions: 30 Multiple Choice Questions (MCQs) with Answers

In this article, we’ve compiled 30 carefully selected multiple choice questions (MCQs) with answers to…

4 weeks ago

Day 15 of Learning Python for Data Science: Exploring Matplotlib Visualizations and EDA

Welcome to Day 15 of our Python for Data Science journey!On Day 15, we dived…

4 weeks ago