likhitha manikonda

Posted on Oct 25, 2025

Data Manipulation With Pandas And Numpy

#datascience #python #beginners #tutorial

If you're new to Python and want to work with data—like spreadsheets, tables, or numbers—then Pandas and NumPy are your best friends. This guide will walk you through the basics of data manipulation using these two powerful libraries, with simple explanations and examples.

📦 Installing Pandas and NumPy

Before you start, install the libraries using pip:

pip install pandas numpy

Or using conda (recommended for Anaconda users):

conda install pandas numpy

🔢 What is NumPy?

NumPy stands for Numerical Python. It helps you work with numbers and arrays efficiently.

✅ Creating Arrays

import numpy as np

# Create a simple array
arr = np.array([1, 2, 3, 4, 5])
print(arr)

Output:

[1 2 3 4 5]

📝 Explanation: np.array() turns a Python list into a NumPy array, which is faster and better for math operations.

✅ Array Operations

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Addition
print(a + b)

# Multiplication
print(a * b)

Output:

[5 7 9]
[ 4 10 18]

📝 Explanation: NumPy performs element-wise operations. It adds or multiplies each pair of elements from the arrays.

✅ Filtering with Conditions

data = np.array([10, 20, 30, 40, 50])
filtered = data[data > 30]
print(filtered)

Output:

[40 50]

📝 Explanation: This filters the array to show only values greater than 30.

📊 What is Pandas?

Pandas is a library for working with tabular data—like rows and columns in Excel.

✅ Creating a DataFrame

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 22],
    'City': ['New York', 'San Francisco', 'Chicago']
}

df = pd.DataFrame(data)
print(df)

Output:

     Name  Age           City
0   Alice   25       New York
1     Bob   30  San Francisco
2  Charlie   22        Chicago

📝 Explanation: A DataFrame is like a table. Each column has a name, and each row has an index.

✅ Selecting Data

# Select a column
print(df['Name'])

# Select a row by index
print(df.loc[1])

Output:

0     Alice
1       Bob
2    Charlie
Name: Name, dtype: object

Name              Bob
Age                30
City    San Francisco
Name: 1, dtype: object

✅ Filtering Rows

# Show people older than 23
print(df[df['Age'] > 23])

Output:

     Name  Age           City
0   Alice   25       New York
1     Bob   30  San Francisco

✅ Adding a Column

df['Country'] = ['USA', 'USA', 'USA']
print(df)

Output:

     Name  Age           City Country
0   Alice   25       New York     USA
1     Bob   30  San Francisco     USA
2  Charlie   22        Chicago     USA

✅ Summary Statistics

print(df.describe())

Output:

             Age
count   3.000000
mean   25.666667
std     4.041452
min    22.000000
25%    23.500000
50%    25.000000
75%    27.500000
max    30.000000

📝 Explanation: describe() gives you basic statistics like mean, min, max, etc.

✅ Grouping Data

grouped = df.groupby('City')['Age'].mean()
print(grouped)

Output:

City
Chicago          22
New York         25
San Francisco    30
Name: Age, dtype: int64

✅ Merging DataFrames

df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [2, 3, 4], 'City': ['New York', 'Chicago', 'Los Angeles']})

merged = pd.merge(df1, df2, on='ID')
print(merged)

Output:

   ID    Name      City
0   2     Bob  New York
1   3  Charlie   Chicago

📝 Explanation: merge() combines two tables based on a common column.

✅ Handling Missing Data

import numpy as np

df.loc[1, 'Age'] = np.nan
print(df)

# Fill missing values
df['Age'].fillna(0, inplace=True)
print(df)

Output:

     Name   Age           City Country
0   Alice  25.0       New York     USA
1     Bob   NaN  San Francisco     USA
2  Charlie 22.0        Chicago     USA

     Name   Age           City Country
0   Alice  25.0       New York     USA
1     Bob   0.0  San Francisco     USA
2  Charlie 22.0        Chicago     USA

🔄 Reshaping Data

✅ NumPy Reshape

import numpy as np

arr = np.arange(12)
reshaped = arr.reshape(3, 4)
print(reshaped)

Output:

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

📝 Explanation: np.arange(12) creates an array from 0 to 11. reshape(3, 4) turns it into a 3-row, 4-column array.

✅ Pandas Pivot Table

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Alice', 'Bob'],
    'Subject': ['Math', 'Math', 'Science', 'Science'],
    'Score': [85, 90, 95, 80]
}

df = pd.DataFrame(data)
pivot = df.pivot_table(values='Score', index='Name', columns='Subject')
print(pivot)

Output:

Subject  Math  Science
Name                  
Alice      85       95
Bob        90       80

📝 Explanation: Pivot tables summarize data. Here, we show each person’s score by subject.

🧮 Applying Functions

✅ NumPy Vectorized Operations

arr = np.array([1, 2, 3, 4, 5])
squared = arr ** 2
print(squared)

Output:

[ 1  4  9 16 25]

📝 Explanation: NumPy applies operations to each element without loops.

✅ Pandas Apply Function

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 22]
})

# Add 5 years to each age
df['AgePlus5'] = df['Age'].apply(lambda x: x + 5)
print(df)

Output:

     Name  Age  AgePlus5
0   Alice   25        30
1     Bob   30        35
2  Charlie   22        27

📝 Explanation: apply() lets you run a function on each value in a column.

🧼 Cleaning Data

✅ Removing Duplicates

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Alice'],
    'Age': [25, 30, 25]
})

df_cleaned = df.drop_duplicates()
print(df_cleaned)

Output:

   Name  Age
0  Alice   25
1    Bob   30

📝 Explanation: drop_duplicates() removes repeated rows.

✅ Renaming Columns

df.rename(columns={'Name': 'Full Name', 'Age': 'Years'}, inplace=True)
print(df)

Output:

  Full Name  Years
0     Alice     25
1       Bob     30
2     Alice     25

📅 Working with Dates

df = pd.DataFrame({
    'Date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03']),
    'Sales': [100, 150, 200]
})

# Extract day of week
df['Day'] = df['Date'].dt.day_name()
print(df)

Output:

        Date  Sales       Day
0 2023-01-01    100    Sunday
1 2023-01-02    150    Monday
2 2023-01-03    200   Tuesday

📝 Explanation: dt.day_name() extracts the weekday name from a date column.

🔗 Combining Data

✅ Concatenating DataFrames

df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'A': [3, 4]})

combined = pd.concat([df1, df2])
print(combined)

Output:

📝 Explanation: concat() stacks DataFrames vertically.

📌 Conclusion

With Pandas and NumPy, you can:

Clean messy data
Analyze and summarize information
Perform fast calculations
Work with dates and times
Combine and reshape datasets

DEV Community

Data Manipulation With Pandas And Numpy

📦 Installing Pandas and NumPy

🔢 What is NumPy?

✅ Creating Arrays

✅ Array Operations

✅ Filtering with Conditions

📊 What is Pandas?

✅ Creating a DataFrame

✅ Selecting Data

✅ Filtering Rows

✅ Adding a Column

✅ Summary Statistics

✅ Grouping Data

✅ Merging DataFrames

✅ Handling Missing Data

🔄 Reshaping Data

✅ NumPy Reshape

✅ Pandas Pivot Table

🧮 Applying Functions

✅ NumPy Vectorized Operations

✅ Pandas Apply Function

🧼 Cleaning Data

✅ Removing Duplicates

✅ Renaming Columns

📅 Working with Dates

🔗 Combining Data

✅ Concatenating DataFrames

📌 Conclusion

Top comments (0)