NumPy Essentials: Arrays and Vectorization
Part 1: Getting Started
import numpy as np
# Create your first array
arr = np.array([1, 2, 3, 4, 5])
print(arr) # [1 2 3 4 5]
What happened: We converted a Python list into a NumPy array - the foundation of scientific computing.
Part 2: Arrays vs Lists
# Python list
python_list = [1, 2, 3, 4, 5]
print(type(python_list)) # <class 'list'>
# NumPy array
numpy_array = np.array([1, 2, 3, 4, 5])
print(type(numpy_array)) # <class 'numpy.ndarray'>
Key difference: Lists store objects, arrays store numbers - much faster for math!
Part 3: Array Properties
arr = np.array([1, 2, 3, 4, 5])
print(arr.shape) # (5,) - 5 elements in 1 dimension
print(arr.size) # 5 - total number of elements
print(arr.dtype) # int64 - data type
Intuition: Shape tells you the dimensions, size tells you total elements.
Part 4: Creating Arrays
# Zeros
zeros = np.zeros(5) # [0. 0. 0. 0. 0.]
# Ones
ones = np.ones(3) # [1. 1. 1.]
# Range
range_arr = np.arange(10) # [0 1 2 3 4 5 6 7 8 9]
# Evenly spaced
linspace = np.linspace(0, 1, 5) # [0. 0.25 0.5 0.75 1. ]
Practice: Create arrays filled with specific values or patterns.
Part 5: 2D Arrays (Matrices)
# Create a 2D array
matrix = np.array([[1, 2, 3],
[4, 5, 6]])
print(matrix.shape) # (2, 3) - 2 rows, 3 columns
print(matrix.size) # 6 - total elements
Visualization: Think of it as a table with rows and columns.
Part 6: Array Creation Shortcuts
# 2D zeros
zeros_2d = np.zeros((3, 4)) # 3x4 matrix of zeros
# Identity matrix
identity = np.eye(3) # 3x3 identity matrix
# Random numbers
random_arr = np.random.random(5) # 5 random numbers [0,1)
Use cases: Initialize matrices for machine learning, create test data.
Part 7: Array Indexing
arr = np.array([10, 20, 30, 40, 50])
# Single element
print(arr[0]) # 10 - first element
print(arr[-1]) # 50 - last element
# Multiple elements
print(arr[1:4]) # [20 30 40] - slice notation
Rule: Same as Python lists, but much faster for large arrays.
Part 8: 2D Array Indexing
matrix = np.array([[1, 2, 3],
[4, 5, 6]])
# Single element
print(matrix[0, 1]) # 2 - row 0, column 1
# Entire row
print(matrix[1, :]) # [4 5 6] - row 1, all columns
# Entire column
print(matrix[:, 2]) # [3 6] - all rows, column 2
Syntax: [row, column]
- comma separates dimensions.
Part 9: The Magic of Vectorization
# Python way (slow)
python_list = [1, 2, 3, 4, 5]
result = []
for x in python_list:
result.append(x * 2)
# NumPy way (fast)
numpy_array = np.array([1, 2, 3, 4, 5])
result = numpy_array * 2 # [2 4 6 8 10]
Vectorization: Apply operations to entire arrays at once - no loops needed!
Part 10: Element-wise Operations
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Basic operations
print(a + b) # [5 7 9] - addition
print(a - b) # [-3 -3 -3] - subtraction
print(a * b) # [4 10 18] - multiplication
print(a / b) # [0.25 0.4 0.5] - division
Key insight: Operations happen element-by-element automatically.
Part 11: Broadcasting
# Array and scalar
arr = np.array([1, 2, 3, 4])
result = arr + 10 # [11 12 13 14]
# Different shapes
a = np.array([[1, 2, 3]]) # 1x3
b = np.array([[10], [20]]) # 2x1
result = a + b # 2x3 result
Broadcasting: NumPy automatically expands arrays to compatible shapes.
Part 12: Mathematical Functions
arr = np.array([1, 4, 9, 16])
# Common functions
print(np.sqrt(arr)) # [1. 2. 3. 4.] - square root
print(np.log(arr)) # natural logarithm
print(np.exp(arr)) # exponential
print(np.sin(arr)) # sine
Advantage: All functions work element-wise across entire arrays.
Part 13: Array Statistics
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# Basic statistics
print(np.mean(data)) # 5.5 - average
print(np.median(data)) # 5.5 - middle value
print(np.std(data)) # 2.87 - standard deviation
print(np.sum(data)) # 55 - total
Use case: Quick analysis of datasets without writing loops.
Part 14: Array Reshaping
arr = np.array([1, 2, 3, 4, 5, 6])
# Reshape to 2x3
reshaped = arr.reshape(2, 3)
print(reshaped)
# [[1 2 3]
# [4 5 6]]
# Flatten back to 1D
flat = reshaped.flatten() # [1 2 3 4 5 6]
Rule: Total elements must stay the same (2×3 = 6 elements).
Part 15: Boolean Indexing
data = np.array([1, 5, 3, 8, 2, 9])
# Create boolean mask
mask = data > 4 # [False True False True False True]
# Filter data
filtered = data[mask] # [5 8 9]
# One-liner
big_numbers = data[data > 4] # [5 8 9]
Power: Select elements based on conditions without loops.
Part 16: Array Concatenation
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Concatenate
combined = np.concatenate([a, b]) # [1 2 3 4 5 6]
# Stack vertically
stacked = np.vstack([a, b])
# [[1 2 3]
# [4 5 6]]
Use case: Combine datasets or results from different computations.
Part 17: Matrix Operations
# Matrix multiplication
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
# Element-wise multiplication
element_wise = A * B # [[5 12] [21 32]]
# Matrix multiplication
matrix_mult = A @ B # [[19 22] [43 50]]
Difference: *
is element-wise, @
is true matrix multiplication.
Part 18: Performance Comparison
import time
# Large arrays
size = 1000000
a = np.random.random(size)
b = np.random.random(size)
# Time NumPy
start = time.time()
result = a + b
numpy_time = time.time() - start
print(f"NumPy time: {numpy_time:.4f} seconds")
# Typically 100x faster than pure Python!
Why faster: NumPy uses optimized C code under the hood.
Part 19: Common Patterns
# Generate data
x = np.linspace(0, 10, 100) # 100 points from 0 to 10
y = np.sin(x) # Sine wave
# Find peaks
peaks = y[y > 0.9]
# Normalize data
normalized = (y - np.mean(y)) / np.std(y)
Real-world: Data generation, filtering, and preprocessing.
Part 20: Advanced Indexing
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Fancy indexing
rows = [0, 2]
cols = [1, 2]
result = arr[rows, cols] # [2 9] - elements at (0,1) and (2,2)
# Boolean indexing with conditions
mask = (arr > 3) & (arr < 8) # Multiple conditions
filtered = arr[mask] # [4 5 6 7]
Power: Extract complex patterns from data with simple syntax.
Part 21: Array Sorting
data = np.array([3, 1, 4, 1, 5, 9, 2, 6])
# Sort array
sorted_data = np.sort(data) # [1 1 2 3 4 5 6 9]
# Get sort indices
indices = np.argsort(data) # [1 3 6 0 2 7 4 8]
# Sort 2D array
matrix = np.array([[3, 1], [4, 2]])
sorted_matrix = np.sort(matrix, axis=1) # Sort each row
Use case: Order data for analysis or find top/bottom values.
Part 22: Working with NaN
data = np.array([1, 2, np.nan, 4, 5])
# Check for NaN
has_nan = np.isnan(data) # [False False True False False]
# Remove NaN
clean_data = data[~np.isnan(data)] # [1. 2. 4. 5.]
# NaN-aware functions
mean_ignore_nan = np.nanmean(data) # 3.0
Real data: Often contains missing values - NumPy handles them gracefully.
Part 23: Array Memory and Views
arr = np.array([1, 2, 3, 4, 5])
# Slicing creates a view (shares memory)
view = arr[1:4]
view[0] = 999
print(arr) # [1 999 3 4 5] - original changed!
# Copy creates new array
copy = arr.copy()
copy[0] = 777
print(arr) # [1 999 3 4 5] - original unchanged
Memory efficiency: Views save memory, copies ensure independence.
Part 24: Practical Example - Data Analysis
# Simulate temperature data
days = 30
temperatures = np.random.normal(25, 5, days) # Mean 25°C, std 5°C
# Analysis
avg_temp = np.mean(temperatures)
hot_days = np.sum(temperatures > 30)
cold_days = np.sum(temperatures < 20)
temp_range = np.max(temperatures) - np.min(temperatures)
print(f"Average: {avg_temp:.1f}°C")
print(f"Hot days (>30°C): {hot_days}")
print(f"Cold days (<20°C): {cold_days}")
print(f"Temperature range: {temp_range:.1f}°C")
Real application: Weather data analysis with just a few lines.
Part 25: Image Processing Example
# Create a simple "image" (2D array)
image = np.random.randint(0, 256, (100, 100)) # 100x100 grayscale
# Basic operations
bright_image = image + 50 # Brighten
dark_image = image * 0.5 # Darken
threshold = image > 128 # Binary threshold
# Image statistics
print(f"Average brightness: {np.mean(image):.1f}")
print(f"Bright pixels: {np.sum(image > 200)}")
Application: Images are just arrays of numbers - perfect for NumPy.
Part 26: Scientific Computing
# Simulate a simple physics problem
time = np.linspace(0, 10, 1000) # Time from 0 to 10 seconds
gravity = 9.81 # m/s²
initial_velocity = 50 # m/s
# Calculate position (physics equation)
position = initial_velocity * time - 0.5 * gravity * time**2
# Find maximum height
max_height = np.max(position)
max_time = time[np.argmax(position)]
print(f"Maximum height: {max_height:.1f}m at {max_time:.1f}s")
Power: Solve complex scientific problems with vectorized operations.
Part 27: Performance Tips
# Avoid Python loops
# BAD:
result = []
for x in large_array:
result.append(x**2)
# GOOD:
result = large_array**2
# Use built-in functions
# BAD:
total = 0
for x in large_array:
total += x
# GOOD:
total = np.sum(large_array)
Golden rule: If you're writing a loop, there's probably a NumPy function for it.
Part 28: Common Mistakes
# Mistake 1: Creating arrays in loops
# BAD:
arr = np.array([])
for i in range(1000):
arr = np.append(arr, i) # Slow!
# GOOD:
arr = np.arange(1000) # Fast!
# Mistake 2: Not using vectorization
# BAD:
result = np.zeros(len(arr))
for i in range(len(arr)):
result[i] = arr[i] * 2
# GOOD:
result = arr * 2
Efficiency: Pre-allocate arrays and use vectorized operations.
Part 29: Next Steps
# What you can do with NumPy:
# 1. Data analysis (pandas builds on NumPy)
# 2. Machine learning (scikit-learn uses NumPy)
# 3. Image processing (OpenCV, PIL)
# 4. Scientific computing (SciPy)
# 5. Deep learning (TensorFlow, PyTorch)
# Example: Linear regression in one line
X = np.random.random((100, 2))
y = np.random.random(100)
weights = np.linalg.lstsq(X, y, rcond=None)[0]
Foundation: NumPy is the base for the entire Python scientific ecosystem.
Key Takeaways
- Arrays > Lists: Faster, more memory efficient for numerical data
- Vectorization: Apply operations to entire arrays at once
- Broadcasting: Automatically handle different array shapes
- Boolean indexing: Filter data with conditions
- No loops: NumPy functions are optimized - use them!
- Shape matters: Understanding dimensions is crucial
- Memory views: Slicing shares memory, copying creates new arrays
Practice Challenge
# Create a 10x10 matrix of random numbers
# Find all numbers greater than 0.5
# Calculate their average
# Replace numbers less than 0.3 with 0
matrix = np.random.random((10, 10))
mask = matrix > 0.5
high_values = matrix[mask]
average = np.mean(high_values)
matrix[matrix < 0.3] = 0
print(f"Found {len(high_values)} values > 0.5")
print(f"Their average: {average:.3f}")
Master these concepts and you'll have a solid foundation for data science, machine learning, and scientific computing!
Top comments (0)