
Welcome to Day 12 of Learning Python for Data Science. Today, we’ll dive into Pandas, one of the most essential libraries for data manipulation and analysis in Python. Pandas provides powerful, easy-to-use data structures like Series and DataFrames that simplify handling structured data. In this article, we’ll explore how to load, inspect, clean, and transform data using Pandas — key skills for any data science project. Whether you’re working with CSV files, handling missing data, or performing group operations, Pandas offers efficient tools to streamline the entire process.
Day 10 Of Learning Python for Data Science – NumPy Array In Python
Creating & Viewing DataFrames
Creating a DataFrame
import pandas as pd
import numpy as np
data = {'Name': ['Sneha', 'Alice', 'Bob', 'Peter'],
'Age': [27, 28, np.nan, 20],
'City': ['Hyd', 'Guj', 'Luk', 'Blg']}
df = pd.DataFrame(data)
print(df)
Viewing Data
print(df.head(2)) # First 2 rows
print(df.tail(2)) # Last 2 rows
print(df.info()) # Summary of DataFrame
print(df.shape) # Number of rows & columns
print(df.columns) # List of column names
print(df.index) # Row labels
print(df.describe()) # Statistical summary
Indexing & Selection
print(df['Name']) # Select single column
print(df[['Name', 'City']]) # Select multiple columns
print(df.loc[1, 'Name']) # Label-based indexing
print(df.iloc[0:2, 0:3]) # Position-based indexing
print(df.at[1, 'Name']) # Fast access by label
print(df.iat[1, 1]) # Fast access by index position
Data Manipulation
Sorting & Dropping
df.sort_values(by='Age', ascending=False, inplace=True)
df.drop(columns=['City'], inplace=True)
Renaming & Indexing
df.rename(columns={'Age': 'Years'}, inplace=True)
df.set_index('Name', inplace=True)
df.reset_index(inplace=True)
Handling Missing Values
df.fillna(df['Age'].mean(), inplace=True)
df.dropna(inplace=True)
Changing Data Types
df['Age'] = df['Age'].astype(int)
Aggregation & Grouping
print(df.groupby('City').mean())
print(df.agg({'Age': 'sum', 'City': 'count'}))
print(df.transform(lambda x: x - x.mean()))
Data Joining & Merging
df1 = pd.DataFrame({'ID': [1, 2], 'Name': ['Alice', 'Bob']})
df2 = pd.DataFrame({'ID': [1, 2], 'City': ['NY', 'LA']})
merged_df = pd.merge(df1, df2, on='ID')
print(merged_df)
concat_df = pd.concat([df1, df2], axis=1)
print(concat_df)
String Operations
df['Name'] = df['Name'].str.upper()
print(df['Name'].str.contains('ALICE'))
df['Name'] = df['Name'].str.replace('SNEHA', 'SARA')
Handling Time Series Data
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
print(df.resample('M').sum())
print(df.shift(1))
Numerical Operations
print(df.sum())
print(df.mean())
print(df.median())
print(df.min())
print(df.max())
print(df.std())
print(df.cumsum())
print(df.cumprod())
Pivoting & Melting
pivot_df = df.pivot(index='City', columns='Name', values='Age')
print(pivot_df)
melted_df = df.melt(id_vars=['City'], value_vars=['Age'])
print(melted_df)
Visualization
df.plot(kind='line')
df.hist()
Practice Questions
Beginner:
- Create a DataFrame from a dictionary.
- Display the first 5 rows of a DataFrame.
- Retrieve a single column as a Series.
- Find the number of missing values in each column.
- Sort the DataFrame by a specific column.
- Select rows where a column’s value is greater than 50.
- Rename a column.
- Drop a column from the DataFrame.
- Fill missing values with the mean.
- Add a new column to the DataFrame.
Intermediate:
- Select a subset of rows and columns using
loc
andiloc
. - Convert a column to a different data type.
- Find the average of all numerical columns.
- Count the number of unique values in a column.
- Use
groupby()
to find the mean of a column by category. - Merge two DataFrames using
merge()
. - Find the most frequent value in a column.
- Compute cumulative sum of a column.
- Apply a custom function to transform a column.
- Use
pivot_table()
to summarize data.
Advanced:
- Reshape a DataFrame using
melt()
. - Create a time series and resample it to monthly data.
- Use
applymap()
to apply a function to every element in the DataFrame. - Use
map()
to apply a function to a Series. - Create a scatter plot using Pandas.
- Compute rolling averages using
rolling()
. - Rank values in a column.
- Filter rows based on multiple conditions.
- Use
.xs()
to slice data from a multi-index DataFrame. - Implement a lambda function inside
apply()
for custom transformations.
We hope this article was helpful for you and you learned a lot about data science from it. If you have friends or family members who would find it helpful, please share it to them or on social media.
Join our social media for more.
Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science Python for Data Science
Also Read:
- Practice day 12 of Learning Python for Data Science
- Day 12 of Learning Python for Data Science – Pandas
- Day 10 Of Learning Python for Data Science – NumPy Array In Python
- Day 9 of Learning Python for Data Science – Queries Related To Functions In Python
- Practice day 8 of Learning Python for Data Science
Hi, I am Vishal Jaiswal, I have about a decade of experience of working in MNCs like Genpact, Savista, Ingenious. Currently i am working in EXL as a senior quality analyst. Using my writing skills i want to share the experience i have gained and help as many as i can.