2. Introduction to Pandas¶
2.1. Introduction¶
This lecture focuses on Pandas, a powerful Python library for data manipulation and analysis. We’ll explore its capabilities in handling structured data effectively.
2.2. Understanding Pandas Basics¶
Pandas provides data structures like Series and DataFrame. It is built on top of NumPy, making it easy to work with structured data.
import pandas as pd
# Dummy data
data = {
'Name': ['John', 'Anna', 'Peter', 'Linda', 'Jack'],
'Age': [28, 23, 25, 24, 30],
'City': ['New York', 'Paris', 'Berlin', 'London', 'Tokyo']
}
# Creating a DataFrame
df = pd.DataFrame(data)
# Displaying the DataFrame
print(df)
2.3. Exploratory Data Analysis (EDA) with Pandas¶
Check data dimensions and examine its structure:
# Shape of the DataFrame
print(df.shape)
# Information about the DataFrame
print(df.info())
2.4. Data Cleaning and Transformation¶
Rename columns:
# Rename columns
df.rename(columns={'Name': 'Full Name', 'City': 'Location'}, inplace=True)
2.5. Data Manipulation and Aggregation¶
Select, filter, group, and aggregate data:
# Selecting columns
print(df[['Name', 'Age']])
# Filtering data
filtered_data = df[df['Age'] > 25]
print(filtered_data)
# Grouping and aggregating data
age_group_stats = df.groupby('Age').size()
print(age_group_stats)
2.6. Data Visualization with Pandas and Matplotlib¶
Utilize Matplotlib for visualizations:
import matplotlib.pyplot as plt
# Plotting example
df['Age'].plot(kind='hist', bins=5)
plt.title('Age Distribution')
plt.xlabel('Age')
plt.ylabel('Frequency')
display(plt, "plot_area") # Replace with plt.show() if running locally
Note
We are using PyScript to run NumPy and Matplotlib in the browser. Use plt.show() instead of display(plt, “plot_area”) to show the plots if you are running code locally.
2.7. Interactive Example¶
Here’s an interactive example where you can filter the DataFrame based on age and visualize the results:
Note
Ensure you run all the code blocks provided to see the complete results and understand the functionalities demonstrated.
2.8. Exercise¶
Write code to calculate the average age of the individuals in the DataFrame.