Python has become the first choice for data science, numerical computing, and exploratory analysis. At the heart of this ecosystem are two foundational libraries:
NumPy, which provides high-performance arrays and mathematical operations
SciPy, which extends NumPy with advanced statistical, scientific, and analytical tools
In this article, we’ll walk through how NumPy and SciPy can be used for statistical analysis — starting with array creation and manipulation, and progressing to key descriptive statistics.
A Quick Overview of NumPy and SciPy
✅ NumPy (Numerical Python)
NumPy provides:
Multidimensional array objects
Fast mathematical and logical operations
Vectorized computations that run significantly faster than pure Python
Efficient memory usage compared to lists
A NumPy array can replace a list in most mathematical tasks, while being faster, lighter, and easier to compute at scale.
✅ SciPy (Scientific Python)
SciPy builds on NumPy by providing:
Probability distributions
Statistical tests
Optimization
Signal processing
Linear algebra
Interpolation
Together, NumPy + SciPy form the foundation of scientific computing in Python.
Installing NumPy
You can install NumPy in two ways:
✅ Using pip
pip install numpy
✅ Using Anaconda (recommended for data science)
NumPy comes preinstalled.
Creating Arrays in NumPy
Let’s start by importing NumPy:
import numpy as np
✅ Creating a 5×5 matrix
a = np.arange(25).reshape(5,5)
print(a)
np.arange() creates a sequence of numbers, which we reshaped into 5×5.
✅ Checking data type
print(a.dtype)
By default, NumPy stores integers as 32-bit (int32).
✅ Number of elements
a.size
Basic Array Creation
✅ 1D array
arr = np.array([1, 2, 3, 4, 5])
✅ 2D array
b = np.array([[1, 2],[3, 4]])
✅ 5D array
You can also create arrays of higher dimension, though they are less common in statistical analysis.
Basic Operations with NumPy
Let’s define two simple arrays:
a = np.array([1,2,3])
b = np.array([4,5,6])
NumPy supports vectorized operations:
a - b
a * b
a ** 2 # squaring
a > 2
b < 4
These operations run element-wise and are blazing fast compared to Python loops.
Indexing and Slicing in NumPy
Consider the earlier 5×5 matrix:
a = np.arange(25).reshape(5,5)
✅ Slice the first row
a[0, :]
✅ Slice the first column
a[:, 0]
✅ Extract a specific element (2nd row, 3rd column)
a[1, 2]
Remember: NumPy uses zero-based indexing.
Stacking Arrays
NumPy allows you to join arrays:
✅ Vertical stacking (row-wise)
np.vstack((a, b))
✅ Horizontal stacking (column-wise)
np.hstack((a, b))
Arrays must have compatible shapes to stack.
Descriptive Statistics with NumPy and SciPy
Descriptive statistics summarize and describe a dataset, forming the foundation of any statistical analysis.
We’ll use a 7×4 array for examples:
a = np.random.randint(1, 10, (7,4))
- Mean Mean (average) is computed using: np.mean(a)
✅ Mean by column
np.mean(a, axis=0)
✅ Mean by row
np.mean(a, axis=1)
Mean is widely used but sensitive to outliers.
- Median Median represents the middle value when data is sorted. np.median(a)
✅ Median by rows or columns
np.median(a, axis=0)
Median is preferred over mean when data contains extreme values.
- Mode Mode is available through SciPy. from scipy import stats stats.mode(a, axis=0)
Mode is useful for categorical or discrete values.
- Range Range = max − min Using NumPy: np.ptp(a) # ptp = peak-to-peak np.ptp(a, axis=0)
Range is easy to compute but sensitive to outliers and gives no information about internal distribution.
Variance
Variance measures the spread of data around the mean.
np.var(a)
np.var(a, axis=0)Standard Deviation
Standard deviation is simply:
np.std(a)
np.std(a, axis=0)
It is widely used in finance, forecasting, simulations, and probability.
- Interquartile Range (IQR) IQR = Q3 − Q1 from scipy.stats import iqr iqr(a, axis=0, interpolation='linear')
IQR is critical for detecting outliers (boxplot whiskers are based on IQR).
- Skewness Skewness describes the asymmetry of a distribution. from scipy.stats import skew skew(a, axis=0)
Positive skew → long right tail
Negative skew → long left tail
Skewness helps determine where most values lie relative to the average.
Conclusion
NumPy and SciPy together provide a powerful, efficient, and intuitive way to perform statistical analysis in Python.
While descriptive statistics help summarize data, they cannot be used to generalize findings to a broader population. For that, inferential statistics — such as hypothesis testing, confidence intervals, or regression — are required.
NumPy and SciPy both support these advanced techniques, making them an essential part of every data scientist’s toolkit.
At Perceptive Analytics, we help organizations transform data into actionable intelligence. Companies looking to hire Power BI consultants rely on us to build scalable dashboards, automate reporting, and strengthen their BI foundations. Our Tableau consultancy delivers advanced visualization, dashboard development, and analytics solutions that give leaders clarity and confidence in their decisions. With deep expertise across BI, analytics, and data engineering, we enable businesses to move faster with data-driven insights.
Top comments (0)