Power of Pandas

Introduction

This lecture is on one of the most powerful tools in the Python ecosystem for data analysis - Pandas. In this lecture, we’ll embark on a journey to explore the ins and outs of Pandas, understanding its capabilities in handling, manipulating, and analyzing data effectively.

Understanding Pandas Basics

  • Pandas, built on top of NumPy, provides data structures and functions to work with structured data.

  • Key components: Series (1-dimensional labeled array) and DataFrame (2-dimensional labeled data structure).

  • Importing Pandas and loading data: import pandas as pd and pd.read_csv().

Exploratory Data Analysis (EDA) with Pandas

  • Checking data dimensions: df.shape.

  • Getting summary statistics: df.describe().

  • Examining data types and missing values: df.info().

Data Cleaning and Transformation

  • Renaming columns for clarity: df.rename(columns={'old_name': 'new_name'}, inplace=True).

  • Handling missing data: df.dropna(), df.fillna().

  • Data type conversion: df.astype().

Data Manipulation and Aggregation

  • Selecting columns and rows: df['column_name'], df.loc[], df.iloc[].

  • Filtering data: df.query().

  • Grouping and aggregating data: df.groupby().agg().

Data Visualization with Pandas

  • Utilizing Matplotlib and Seaborn integration for visualizations.

  • Basic plots: df.plot().

  • Bar plots, histograms, box plots: df.plot(kind='bar'), df.plot(kind='hist'), df.plot(kind='box').

Advanced Data Analysis Techniques

  • Time series analysis: Handling datetime data with Pandas.

  • Merging and joining datasets: pd.merge(), pd.concat().

  • Handling duplicates: df.drop_duplicates().

Real-world Applications and Case Studies

  • Analyzing healthcare data: Exploring patient wait times, service distribution, and geographical trends.

  • Financial data analysis: Stock market analysis, portfolio management.

  • Social media data analysis: Sentiment analysis, trend detection.

Best Practices and Performance Optimization

  • Efficient data loading and storage: Utilizing chunking, optimizing data types.

  • Vectorized operations: Leveraging Pandas’ vectorized operations for faster computations.

  • Memory management: Reducing memory usage for large datasets.

Example

  • Here is something you can develop using panda. This example uses the data of ice cream products from Beijing. The data is read from a CSV file and then visualized using matplotlib. The user can select a flavour of ice cream and the graph will display the rating of the selected flavour.

import js
import pandas as pd
import matplotlib.pyplot as plt

from pyodide.http import open_url
from pyodide.ffi import create_proxy

url = (
    "https://raw.githubusercontent.com/Cheukting/pyscript-ice-cream/main/bj-products.csv"
)
ice_data = pd.read_csv(open_url(url))

current_selected = []
flavour_elements = js.document.getElementsByName("flavour")

def plot(data):
    plt.rcParams["figure.figsize"] = (22,20)
    fig, ax = plt.subplots()
    bars = ax.barh(data["name"], data["rating"], height=0.7)
    ax.bar_label(bars)
    plt.title("Rating of ice cream flavours of your choice")
    display(fig, target="graph-area", append=False)

def select_flavour(event):
    for ele in flavour_elements:
        if ele.checked:
            current_selected = ele.value
            break
    if current_selected == "ALL":
        plot(ice_data)
    else:
        filter = ice_data.apply(lambda x: ele.value in x["ingredients"], axis=1)
        plot(ice_data[filter])

ele_proxy = create_proxy(select_flavour)

for ele in flavour_elements:
    if ele.value == "ALL":
        ele.checked = True
        current_selected = ele.value
    ele.addEventListener("change", ele_proxy)

plot(ice_data)

Ice Cream Picker packages = ["matplotlib", "pandas"] import js import pandas as pd import matplotlib.pyplot as plt from pyodide.http import open_url from pyodide.ffi import create_proxy url = ( "https://raw.githubusercontent.com/Cheukting/pyscript-ice-cream/main/bj-products.csv" ) ice_data = pd.read_csv(open_url(url)) current_selected = [] flavour_elements = js.document.getElementsByName("flavour") def plot(data): plt.rcParams["figure.figsize"] = (22,20) fig, ax = plt.subplots() bars = ax.barh(data["name"], data["rating"], height=0.7) ax.bar_label(bars) plt.title("Rating of ice cream flavours of your choice") display(fig, target="graph-area", append=False) def select_flavour(event): for ele in flavour_elements: if ele.checked: current_selected = ele.value break if current_selected == "ALL": plot(ice_data) else: filter = ice_data.apply(lambda x: ele.value in x["ingredients"], axis=1) plot(ice_data[filter]) ele_proxy = create_proxy(select_flavour) for ele in flavour_elements: if ele.value == "ALL": ele.checked = True current_selected = ele.value ele.addEventListener("change", ele_proxy) plot(ice_data)
Select your 🍨 flavour:
You have attempted of activities on this page