Linear algebra is everywhere in machine learning, even if it’s often hidden behind frameworks like PyTorch or TensorFlow. Concepts like row space, column space, rank, and nullity aren’t just abstract math — they help us understand what our model can learn and predict.
In this blog, we’ll explore these ideas using a simple house price prediction dataset.
1. The Dataset
Suppose we have a dataset of houses with 3 features:
| House | Size (sq ft) | Bedrooms | Age (years) |
|---|---|---|---|
| 1 | 1200 | 3 | 10 |
| 2 | 1500 | 4 | 5 |
| 3 | 2700 | 7 | 25 |
We can represent this as a matrix (X):
[
X =
\begin{bmatrix}
1200 & 3 & 10 \
1500 & 4 & 5 \
2700 & 7 & 25
\end{bmatrix}
]
Rows = houses
Columns = features
2. Column Space: The Space of Predictions
The column space is the set of all linear combinations of the columns of a matrix.
In simple terms:
- Each column represents a feature across all houses.
- Column space represents all possible outputs (predictions) the model can generate using these features.
For our dataset, column space captures the independent patterns in the features.
For example:
- Column 1 = [1200, 1500, 2700] → “Size pattern”
- Column 2 = [3, 4, 7] → “Bedrooms pattern”
- Column 3 = [10, 5, 25] → “Age pattern”
If one column can be written as a combination of others, it does not add new information to the column space.
3. Row Space: The Feature Patterns in the Dataset
The row space is the set of all linear combinations of the rows.
- Each row represents one house (all its features).
- Row space captures all feature patterns that exist in the dataset.
Let’s look closely:
- (r_1 = [1200, 3, 10])
- (r_2 = [1500, 4, 5])
- (r_3 = [2700, 7, 25])
Notice:
[
r_3 = r_1 + r_2
]
(or, more generally, (r_3) can be expressed as a linear combination of (r_1) and (r_2)).
✅ This means:
- r3’s features (size, bedrooms, age) can be predicted exactly from r1 and r2
- Mathematically, it adds no new direction in the row space
So, in essence:
Row space = all house patterns that can be represented by combining existing houses
In ML terms, the model can only learn relationships within the row space. A house lying outside it (say, extremely large or unusual) may produce unreliable predictions.
4. Rank: How Many Independent Directions Exist
- Rank = number of independent rows or columns
-
In our example:
- r1 and r2 are independent
- r3 is dependent
So, rank = 2
Rank tells us how many independent patterns exist in the dataset.
5. Null Space: What the Model Ignores
- Null space = all vectors (x) such that (Xx = 0)
- Intuition: combinations of features that do not affect predictions
- If a column is in the null space, the model ignores that feature because it can be written as a combination of other features
6. Why This Matters in Machine Learning
- Column space = all possible outputs the model can produce
- Row space = all feature patterns the model sees during training
- Rank = how much independent information is in the data
- Null space = directions that don’t affect predictions
Understanding these helps with:
- Detecting redundant features
- Avoiding multicollinearity in regression
- Understanding limitations of your model for prediction
7. Summary Table
| Concept | Intuition (House Price) |
|---|---|
| Column space | All possible patterns of predictions using features |
| Row space | All feature patterns that exist in the dataset |
| Rank | Number of independent feature/house patterns |
| Null space | Feature combinations that don’t influence prediction |
8. Key Takeaways
- A dependent row like r3 doesn’t add new information, but we usually keep it in training for stability
- Column space tells you what predictions are possible
- Row space tells you what the model can learn from the dataset
By looking at row space and column space, you can visualize:
- What your dataset contains
- What your model can learn
- Which features or houses are redundant
Understanding these concepts helps make linear regression and deep learning models more interpretable.
If you want, I can also add a small 3D plot showing row space vs column space for this house dataset — it’s super intuitive and looks great in a blog.
Do you want me to do that next?
Top comments (0)