DEV Community

Aryama
Aryama

Posted on

NumPy & Data Manipulation

By the end of this session you will learn:

• What NumPy is
• Why NumPy is faster than Python lists
• Core NumPy operations
• Data manipulation techniques
• A real-world example

1. Introduction

Python is widely used in data science, machine learning, and AI systems.

At the core of these systems lies a powerful library called NumPy.

NumPy stands for Numerical Python and provides:

  • High-performance arrays
  • Mathematical operations
  • Matrix computations
  • Fast vectorized operations

Many popular libraries are built on top of NumPy:

  • Pandas
  • Scikit-Learn
  • TensorFlow
  • PyTorch

Understanding NumPy helps us understand how modern AI systems process data efficiently.

2. Why NumPy? (Python List vs NumPy Array)

Python lists are flexible but not optimized for large numerical computations.

NumPy arrays are implemented in C, making them significantly faster.

Example: Performance Comparison

import numpy as np
import time

size = 1000000

list1 = list(range(size))
list2 = list(range(size))

start = time.time()
result = [x + y for x, y in zip(list1, list2)]
print("Python list time:", time.time() - start)

arr1 = np.arange(size)
arr2 = np.arange(size)

start = time.time()
result = arr1 + arr2
print("NumPy array time:", time.time() - start)

Enter fullscreen mode Exit fullscreen mode

Explanation:

NumPy performs operations on entire arrays at once, called vectorization.

This removes the need for loops.

3. Creating NumPy Arrays

NumPy arrays are called ndarrays (N-dimensional arrays).

Basic Array Creation

import numpy as np

arr = np.array([1,2,3,4,5])
print(arr)
Enter fullscreen mode Exit fullscreen mode

Creating Arrays of Zeros

zeros = np.zeros((3,3))
print(zeros)
Enter fullscreen mode Exit fullscreen mode

Creates a 3×3 matrix of zeros.

Creating Arrays of Ones

ones = np.ones((2,4))
print(ones)
Enter fullscreen mode Exit fullscreen mode

Creates a 2×4 matrix filled with ones.

Creating Ranges

numbers = np.arange(0,10,2)
print(numbers)
Enter fullscreen mode Exit fullscreen mode

Output

[0 2 4 6 8]

4. Array Shapes and Dimensions

NumPy arrays can be multi-dimensional.

Example:

matrix = np.array([
    [1,2,3],
    [4,5,6],
    [7,8,9]
])

print(matrix.shape)
Enter fullscreen mode Exit fullscreen mode

Output

(3,3)

Meaning:

3 rows
3 columns

5. Vectorized Operations

NumPy allows mathematical operations without loops.

Example:


arr = np.array([10,20,30,40])

print(arr + 5)
print(arr * 2)
print(arr / 10)
Enter fullscreen mode Exit fullscreen mode

Output

[15 25 35 45]
[20 40 60 80]
[1. 2. 3. 4.]

Explanation:

The operation is automatically applied to each element of the array.

This is called broadcasting.

6. Indexing and Slicing

1D Array

data = np.array([10,20,30,40,50])

print(data[0])
print(data[1:4])
Enter fullscreen mode Exit fullscreen mode

Output

10
[20 30 40]

2D Array

matrix = np.array([
    [1,2,3],
    [4,5,6],
    [7,8,9]
])

print(matrix[1,2])
Enter fullscreen mode Exit fullscreen mode

Output

6
Extract Column

print(matrix[:,1])
Enter fullscreen mode Exit fullscreen mode

Output

[2 5 8]

Explanation

: means all rows

7. Aggregation Functions

NumPy provides built-in functions for data analysis.

Example:


data = np.array([10,20,30,40])

print("Mean:", np.mean(data))
print("Sum:", np.sum(data))
print("Max:", np.max(data))
print("Min:", np.min(data))
Enter fullscreen mode Exit fullscreen mode

Output

Mean: 25
Sum: 100
Max: 40
Min: 10

8. Boolean Filtering (Very Useful for Data Cleaning)

Example dataset:

scores = np.array([55,78,90,34,88,67])
Enter fullscreen mode Exit fullscreen mode

Find students who passed:

passed = scores[scores > 60]
print(passed)
Enter fullscreen mode Exit fullscreen mode

Output

[78 90 88 67]

Explanation:

NumPy allows filtering data without loops.

This is heavily used in data preprocessing pipelines.

10. Real-World Example – Netflix Recommendation System

Recommendation systems power platforms like Netflix.

Users and movies can be represented as vectors.

Movie List

movies = ["Interstellar", "Inception", "Titanic", "Avengers"]

Enter fullscreen mode Exit fullscreen mode

User Ratings Matrix

ratings = np.array([
    [5,4,1,1],  # Alice
    [4,5,1,1],  # Bob
    [1,1,5,4]   # Charlie
])

Enter fullscreen mode Exit fullscreen mode
print(ratings)
Enter fullscreen mode Exit fullscreen mode

Rows represent users

Columns represent movies

  1. Cosine Similarity

Recommendation systems often compute similarity between users.

Formula

similarity = (A · B) / (|A| |B|)
Implementing Similarity

from numpy.linalg import norm

alice = ratings[0]
bob = ratings[1]
charlie = ratings[2]

sim_alice_bob = np.dot(alice,bob) / (norm(alice)*norm(bob))
sim_alice_charlie = np.dot(alice,charlie) / (norm(alice)*norm(charlie))

print("Alice vs Bob:", sim_alice_bob)
print("Alice vs Charlie:", sim_alice_charlie)
Enter fullscreen mode Exit fullscreen mode

Expected output

Alice vs Bob: ~0.98
Alice vs Charlie: ~0.32

Explanation

Alice and Bob have similar movie taste.

Charlie has different preferences.

  1. Recommendation Logic

If Alice hasn't watched Avengers, we can recommend movies liked by similar users.

alice_ratings = np.array([5,4,1,0])
bob_ratings = ratings[1]

recommended_index = np.argmax(bob_ratings)

print("Recommended movie:", movies[recommended_index])
Enter fullscreen mode Exit fullscreen mode

Top comments (0)