Vuk Rosić

Posted on Jul 6

NumPy Essentials: Arrays and vectorization

NumPy Essentials: Arrays and Vectorization

Part 1: Getting Started

import numpy as np

# Create your first array
arr = np.array([1, 2, 3, 4, 5])
print(arr)  # [1 2 3 4 5]

What happened: We converted a Python list into a NumPy array - the foundation of scientific computing.

Part 2: Arrays vs Lists

# Python list
python_list = [1, 2, 3, 4, 5]
print(type(python_list))  # <class 'list'>

# NumPy array
numpy_array = np.array([1, 2, 3, 4, 5])
print(type(numpy_array))  # <class 'numpy.ndarray'>

Key difference: Lists store objects, arrays store numbers - much faster for math!

Part 3: Array Properties

arr = np.array([1, 2, 3, 4, 5])
print(arr.shape)   # (5,) - 5 elements in 1 dimension
print(arr.size)    # 5 - total number of elements
print(arr.dtype)   # int64 - data type

Intuition: Shape tells you the dimensions, size tells you total elements.

Part 4: Creating Arrays

# Zeros
zeros = np.zeros(5)          # [0. 0. 0. 0. 0.]

# Ones
ones = np.ones(3)            # [1. 1. 1.]

# Range
range_arr = np.arange(10)    # [0 1 2 3 4 5 6 7 8 9]

# Evenly spaced
linspace = np.linspace(0, 1, 5)  # [0.   0.25 0.5  0.75 1.  ]

Practice: Create arrays filled with specific values or patterns.

Part 5: 2D Arrays (Matrices)

# Create a 2D array
matrix = np.array([[1, 2, 3], 
                   [4, 5, 6]])
print(matrix.shape)  # (2, 3) - 2 rows, 3 columns
print(matrix.size)   # 6 - total elements

Visualization: Think of it as a table with rows and columns.

Part 6: Array Creation Shortcuts

# 2D zeros
zeros_2d = np.zeros((3, 4))    # 3x4 matrix of zeros

# Identity matrix
identity = np.eye(3)           # 3x3 identity matrix

# Random numbers
random_arr = np.random.random(5)  # 5 random numbers [0,1)

Use cases: Initialize matrices for machine learning, create test data.

Part 7: Array Indexing

arr = np.array([10, 20, 30, 40, 50])

# Single element
print(arr[0])    # 10 - first element
print(arr[-1])   # 50 - last element

# Multiple elements
print(arr[1:4])  # [20 30 40] - slice notation

Rule: Same as Python lists, but much faster for large arrays.

Part 8: 2D Array Indexing

matrix = np.array([[1, 2, 3], 
                   [4, 5, 6]])

# Single element
print(matrix[0, 1])  # 2 - row 0, column 1

# Entire row
print(matrix[1, :])  # [4 5 6] - row 1, all columns

# Entire column
print(matrix[:, 2])  # [3 6] - all rows, column 2

Syntax: [row, column] - comma separates dimensions.

Part 9: The Magic of Vectorization

# Python way (slow)
python_list = [1, 2, 3, 4, 5]
result = []
for x in python_list:
    result.append(x * 2)

# NumPy way (fast)
numpy_array = np.array([1, 2, 3, 4, 5])
result = numpy_array * 2  # [2 4 6 8 10]

Vectorization: Apply operations to entire arrays at once - no loops needed!

Part 10: Element-wise Operations

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Basic operations
print(a + b)   # [5 7 9]   - addition
print(a - b)   # [-3 -3 -3] - subtraction
print(a * b)   # [4 10 18] - multiplication
print(a / b)   # [0.25 0.4 0.5] - division

Key insight: Operations happen element-by-element automatically.

Part 11: Broadcasting

# Array and scalar
arr = np.array([1, 2, 3, 4])
result = arr + 10  # [11 12 13 14]

# Different shapes
a = np.array([[1, 2, 3]])      # 1x3
b = np.array([[10], [20]])     # 2x1
result = a + b                 # 2x3 result

Broadcasting: NumPy automatically expands arrays to compatible shapes.

Part 12: Mathematical Functions

arr = np.array([1, 4, 9, 16])

# Common functions
print(np.sqrt(arr))    # [1. 2. 3. 4.] - square root
print(np.log(arr))     # natural logarithm
print(np.exp(arr))     # exponential
print(np.sin(arr))     # sine

Advantage: All functions work element-wise across entire arrays.

Part 13: Array Statistics

data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Basic statistics
print(np.mean(data))   # 5.5 - average
print(np.median(data)) # 5.5 - middle value
print(np.std(data))    # 2.87 - standard deviation
print(np.sum(data))    # 55 - total

Use case: Quick analysis of datasets without writing loops.

Part 14: Array Reshaping

arr = np.array([1, 2, 3, 4, 5, 6])

# Reshape to 2x3
reshaped = arr.reshape(2, 3)
print(reshaped)
# [[1 2 3]
#  [4 5 6]]

# Flatten back to 1D
flat = reshaped.flatten()  # [1 2 3 4 5 6]

Rule: Total elements must stay the same (2×3 = 6 elements).

Part 15: Boolean Indexing

data = np.array([1, 5, 3, 8, 2, 9])

# Create boolean mask
mask = data > 4  # [False True False True False True]

# Filter data
filtered = data[mask]  # [5 8 9]

# One-liner
big_numbers = data[data > 4]  # [5 8 9]

Power: Select elements based on conditions without loops.

Part 16: Array Concatenation

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Concatenate
combined = np.concatenate([a, b])  # [1 2 3 4 5 6]

# Stack vertically
stacked = np.vstack([a, b])
# [[1 2 3]
#  [4 5 6]]

Use case: Combine datasets or results from different computations.

Part 17: Matrix Operations

# Matrix multiplication
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# Element-wise multiplication
element_wise = A * B  # [[5 12] [21 32]]

# Matrix multiplication
matrix_mult = A @ B   # [[19 22] [43 50]]

Difference: * is element-wise, @ is true matrix multiplication.

Part 18: Performance Comparison

import time

# Large arrays
size = 1000000
a = np.random.random(size)
b = np.random.random(size)

# Time NumPy
start = time.time()
result = a + b
numpy_time = time.time() - start

print(f"NumPy time: {numpy_time:.4f} seconds")
# Typically 100x faster than pure Python!

Why faster: NumPy uses optimized C code under the hood.

Part 19: Common Patterns

# Generate data
x = np.linspace(0, 10, 100)  # 100 points from 0 to 10
y = np.sin(x)                # Sine wave

# Find peaks
peaks = y[y > 0.9]

# Normalize data
normalized = (y - np.mean(y)) / np.std(y)

Real-world: Data generation, filtering, and preprocessing.

Part 20: Advanced Indexing

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Fancy indexing
rows = [0, 2]
cols = [1, 2]
result = arr[rows, cols]  # [2 9] - elements at (0,1) and (2,2)

# Boolean indexing with conditions
mask = (arr > 3) & (arr < 8)  # Multiple conditions
filtered = arr[mask]  # [4 5 6 7]

Power: Extract complex patterns from data with simple syntax.

Part 21: Array Sorting

data = np.array([3, 1, 4, 1, 5, 9, 2, 6])

# Sort array
sorted_data = np.sort(data)  # [1 1 2 3 4 5 6 9]

# Get sort indices
indices = np.argsort(data)   # [1 3 6 0 2 7 4 8]

# Sort 2D array
matrix = np.array([[3, 1], [4, 2]])
sorted_matrix = np.sort(matrix, axis=1)  # Sort each row

Use case: Order data for analysis or find top/bottom values.

Part 22: Working with NaN

data = np.array([1, 2, np.nan, 4, 5])

# Check for NaN
has_nan = np.isnan(data)  # [False False True False False]

# Remove NaN
clean_data = data[~np.isnan(data)]  # [1. 2. 4. 5.]

# NaN-aware functions
mean_ignore_nan = np.nanmean(data)  # 3.0

Real data: Often contains missing values - NumPy handles them gracefully.

Part 23: Array Memory and Views

arr = np.array([1, 2, 3, 4, 5])

# Slicing creates a view (shares memory)
view = arr[1:4]
view[0] = 999
print(arr)  # [1 999 3 4 5] - original changed!

# Copy creates new array
copy = arr.copy()
copy[0] = 777
print(arr)  # [1 999 3 4 5] - original unchanged

Memory efficiency: Views save memory, copies ensure independence.

Part 24: Practical Example - Data Analysis

# Simulate temperature data
days = 30
temperatures = np.random.normal(25, 5, days)  # Mean 25°C, std 5°C

# Analysis
avg_temp = np.mean(temperatures)
hot_days = np.sum(temperatures > 30)
cold_days = np.sum(temperatures < 20)
temp_range = np.max(temperatures) - np.min(temperatures)

print(f"Average: {avg_temp:.1f}°C")
print(f"Hot days (>30°C): {hot_days}")
print(f"Cold days (<20°C): {cold_days}")
print(f"Temperature range: {temp_range:.1f}°C")

Real application: Weather data analysis with just a few lines.

Part 25: Image Processing Example

# Create a simple "image" (2D array)
image = np.random.randint(0, 256, (100, 100))  # 100x100 grayscale

# Basic operations
bright_image = image + 50          # Brighten
dark_image = image * 0.5           # Darken
threshold = image > 128            # Binary threshold

# Image statistics
print(f"Average brightness: {np.mean(image):.1f}")
print(f"Bright pixels: {np.sum(image > 200)}")

Application: Images are just arrays of numbers - perfect for NumPy.

Part 26: Scientific Computing

# Simulate a simple physics problem
time = np.linspace(0, 10, 1000)    # Time from 0 to 10 seconds
gravity = 9.81                     # m/s²
initial_velocity = 50              # m/s

# Calculate position (physics equation)
position = initial_velocity * time - 0.5 * gravity * time**2

# Find maximum height
max_height = np.max(position)
max_time = time[np.argmax(position)]

print(f"Maximum height: {max_height:.1f}m at {max_time:.1f}s")

Power: Solve complex scientific problems with vectorized operations.

Part 27: Performance Tips

# Avoid Python loops
# BAD:
result = []
for x in large_array:
    result.append(x**2)

# GOOD:
result = large_array**2

# Use built-in functions
# BAD:
total = 0
for x in large_array:
    total += x

# GOOD:
total = np.sum(large_array)

Golden rule: If you're writing a loop, there's probably a NumPy function for it.

Part 28: Common Mistakes

# Mistake 1: Creating arrays in loops
# BAD:
arr = np.array([])
for i in range(1000):
    arr = np.append(arr, i)  # Slow!

# GOOD:
arr = np.arange(1000)       # Fast!

# Mistake 2: Not using vectorization
# BAD:
result = np.zeros(len(arr))
for i in range(len(arr)):
    result[i] = arr[i] * 2

# GOOD:
result = arr * 2

Efficiency: Pre-allocate arrays and use vectorized operations.

Part 29: Next Steps

# What you can do with NumPy:
# 1. Data analysis (pandas builds on NumPy)
# 2. Machine learning (scikit-learn uses NumPy)
# 3. Image processing (OpenCV, PIL)
# 4. Scientific computing (SciPy)
# 5. Deep learning (TensorFlow, PyTorch)

# Example: Linear regression in one line
X = np.random.random((100, 2))
y = np.random.random(100)
weights = np.linalg.lstsq(X, y, rcond=None)[0]

Foundation: NumPy is the base for the entire Python scientific ecosystem.

Key Takeaways

Arrays > Lists: Faster, more memory efficient for numerical data
Vectorization: Apply operations to entire arrays at once
Broadcasting: Automatically handle different array shapes
Boolean indexing: Filter data with conditions
No loops: NumPy functions are optimized - use them!
Shape matters: Understanding dimensions is crucial
Memory views: Slicing shares memory, copying creates new arrays

Practice Challenge

# Create a 10x10 matrix of random numbers
# Find all numbers greater than 0.5
# Calculate their average
# Replace numbers less than 0.3 with 0

matrix = np.random.random((10, 10))
mask = matrix > 0.5
high_values = matrix[mask]
average = np.mean(high_values)
matrix[matrix < 0.3] = 0

print(f"Found {len(high_values)} values > 0.5")
print(f"Their average: {average:.3f}")

Master these concepts and you'll have a solid foundation for data science, machine learning, and scientific computing!

DEV Community