540-440-1540‬

info@ardiland.com

Pandas – The Python Library for Data Analysis & Manipulation

Created by Adugna Asrat in Quick Notes 2 Apr 2025

💡 What Is Pandas?

Pandas is a Python library used to work with structured data like tables (rows and columns), spreadsheets, and CSV files.

✅ Built for fast, flexible, and powerful data analysis
✅ Can handle missing data, filtering, sorting, and transformations
✅ Works seamlessly with NumPy, Matplotlib, Seaborn, and Scikit-learn

🧱 1. Key Data Structures in Pandas

Structure	Description
Series	A single column of data (like a list)
DataFrame	A table with rows and columns (like Excel)

Example: A DataFrame can represent a table of students:

Name	Age	Score
Maya	21	88
Christian	22	91

📥 2. Loading Data into Pandas

Pandas allows you to read many data formats:

✅ CSV files
✅ Excel files
✅ JSON
✅ SQL
✅ Online datasets (URLs)

Typical use:

import pandas as pd

df = pd.read_csv("students.csv")

Now df is a DataFrame holding the entire dataset.

🔍 3. Exploring the Dataset

Once the data is loaded, Pandas provides tools to explore it:

✅ df.head() – View first 5 rows
✅ df.info() – See structure and data types
✅ df.describe() – Summary statistics
✅ df.columns – List column names
✅ df.shape – Show number of rows and columns

🧹 4. Data Cleaning with Pandas

Real-world data is often messy. Pandas helps clean it:

Task	Method
Remove missing values	df.dropna()
Fill missing values	df.fillna(value)
Rename columns	df.rename()
Change data types	df.astype()
Filter rows by condition	df[df['Score'] > 85]

✅ Data cleaning is essential before analysis or modeling.

🧮 5. Data Analysis Tasks

With Pandas, you can:

✅ Group and summarize:

df.groupby('Gender')['Score'].mean()

✅ Sort data:

df.sort_values('Score', ascending=False)

✅ Create new columns:

df['Grade'] = df['Score'] >= 90

✅ Merge datasets:

pd.merge(df1, df2, on='StudentID')

✅ Handle time-series:

df['Date'] = pd.to_datetime(df['Date'])

📊 6. Visualization with Pandas

Pandas integrates with Matplotlib and Seaborn:

✅ Line chart:

df.plot(x='Date', y='Sales')

✅ Bar chart:

df['Score'].value_counts().plot(kind='bar')

✅ Histogram:

df['Age'].plot.hist()

Visualization helps you see trends and patterns in data.

🧠 7. Why Pandas Matters for Ethiopians in Tech

✅ Used in data projects for finance, education, health, NGOs
✅ Essential in roles like Data Analyst, AI/ML Engineer, and Business Intelligence
✅ Helps students analyze real Ethiopian datasets (grade records,surveys, budgeting, etc.)
✅ Works well with Jupyter Notebooks for data science learning