2. Introduction to Pandas

2.1. Introduction

This lecture focuses on Pandas, a powerful Python library for data manipulation and analysis. We’ll explore its capabilities in handling structured data effectively.

2.2. Understanding Pandas Basics

Pandas provides data structures like Series and DataFrame. It is built on top of NumPy, making it easy to work with structured data.

Importing Pandas and Loading Dummy Data
import pandas as pd

# Dummy data
data = {
    'Name': ['John', 'Anna', 'Peter', 'Linda', 'Jack'],
    'Age': [28, 23, 25, 24, 30],
    'City': ['New York', 'Paris', 'Berlin', 'London', 'Tokyo']
}

# Creating a DataFrame
df = pd.DataFrame(data)

# Displaying the DataFrame
print(df)

2.3. Exploratory Data Analysis (EDA) with Pandas

Check data dimensions and examine its structure:

Checking Data Dimensions and Info
# Shape of the DataFrame
print(df.shape)

# Information about the DataFrame
print(df.info())

2.4. Data Cleaning and Transformation

Rename columns:

Cleaning and Transforming Data
# Rename columns
df.rename(columns={'Name': 'Full Name', 'City': 'Location'}, inplace=True)

2.5. Data Manipulation and Aggregation

Select, filter, group, and aggregate data:

Data Manipulation and Aggregation
# Selecting columns
print(df[['Name', 'Age']])

# Filtering data
filtered_data = df[df['Age'] > 25]
print(filtered_data)

# Grouping and aggregating data
age_group_stats = df.groupby('Age').size()
print(age_group_stats)

2.6. Data Visualization with Pandas and Matplotlib

Utilize Matplotlib for visualizations:

Data Visualization
import matplotlib.pyplot as plt

# Plotting example
df['Age'].plot(kind='hist', bins=5)
plt.title('Age Distribution')
plt.xlabel('Age')
plt.ylabel('Frequency')
display(plt, "plot_area") # Replace with plt.show() if running locally

Note

We are using PyScript to run NumPy and Matplotlib in the browser. Use plt.show() instead of display(plt, “plot_area”) to show the plots if you are running code locally.

2.7. Interactive Example

Here’s an interactive example where you can filter the DataFrame based on age and visualize the results:

Note

Ensure you run all the code blocks provided to see the complete results and understand the functionalities demonstrated.

2.8. Exercise

Write code to calculate the average age of the individuals in the DataFrame.

You have attempted of activities on this page