DEV Community

Dipti
Dipti

Posted on

Sharpening the Axe: Performing Principal Component Analysis (PCA) in R for Modern Machine Learning

“Give me six hours to chop down a tree and I will spend the first four sharpening the axe.”
— Abraham Lincoln

This quote resonates strongly with modern machine learning and data science. In real-world projects, the majority of time is not spent on modeling, but on data preprocessing, feature engineering, and dimensionality reduction.

One of the most powerful and widely used dimensionality reduction techniques is Principal Component Analysis (PCA). PCA helps us transform high-dimensional data into a smaller, more informative feature space—often improving model performance, interpretability, and computational efficiency.

In this article, you will learn the conceptual foundations of PCA and how to implement PCA in R using modern, industry-standard practices.

Table of Contents

Lifting the Curse with Principal Component Analysis

Curse of Dimensionality in Simple Terms

Key Insights from Shlens’ PCA Perspective

Conceptual Background of PCA

Implementing PCA in R (Modern Approach)

Loading and Preparing the Iris Dataset

Scaling and Standardization

Covariance Matrix and Eigen Decomposition

PCA with prcomp()

Understanding PCA Outputs

Variance Explained

Loadings and Scores

Scree Plot and Biplot

PCA in a Modeling Workflow (Naive Bayes Example)

Summary and Practical Takeaways

  1. Lifting the Curse with Principal Component Analysis

A common myth in analytics is:

“More features and more data will always improve model accuracy.”

In practice, this is often false.

When the number of features grows faster than the number of observations, models become unstable, harder to generalize, and prone to overfitting. This phenomenon is known as the curse of dimensionality.

PCA helps address this issue by reducing the dimensionality of data while preserving most of its informational content.

  1. Curse of Dimensionality in Simple Terms

In layman’s language, the curse of dimensionality means:

Adding more features can decrease model accuracy

Model complexity grows exponentially

Distance-based and probabilistic models degrade rapidly

There are two general ways to mitigate this:

Collect more data (often expensive or impossible)

Reduce the number of features (preferred and practical)

Dimensionality reduction techniques like PCA fall into the second category.

  1. Shlens’ Perspective on PCA

In his well-known paper, Jonathon Shlens describes PCA using a simple analogy: observing the motion of a pendulum.

If the pendulum moves in one direction but we don’t know that direction, we may need several cameras (features) to capture its motion. PCA helps us rotate the coordinate system so that we capture the motion with fewer, orthogonal views.

In essence, PCA:

Transforms correlated variables into uncorrelated (orthogonal) components

Orders these components by variance explained

Allows us to retain only the most informative components

  1. PCA: Conceptual Background

Assume a dataset with:

m observations

n features

This can be represented as an m × n matrix A.

PCA transforms A into a new matrix A′ of size m × k, where k < n.

Key ideas:

PCA relies on eigenvectors and eigenvalues

Eigenvectors define new axes (principal components)

Eigenvalues represent variance captured along those axes

Components are orthogonal and uncorrelated

Why Scaling Matters

PCA is scale-sensitive. Variables with larger units dominate variance.

Modern best practice:

Always standardize features unless units are naturally comparable

Perform PCA on the correlation matrix, not raw covariance, for most ML tasks

  1. Implementing PCA in R (Modern Approach)

Loading and Preparing the Iris Dataset

Load numeric features only

data_iris <- iris[, 1:4]

The Iris dataset contains:

150 observations

4 numeric features

3 species (target variable)

Scaling the Data (Industry Standard)

data_scaled <- scale(data_iris)

Covariance Matrix and Eigen Decomposition

cov_data <- cov(data_scaled)
eigen_data <- eigen(cov_data)

Eigenvalues indicate variance explained by each component.

Performing PCA with prcomp()

Why prcomp()?
prcomp() is now preferred over princomp() because it:

Uses singular value decomposition (SVD)

Is numerically more stable

Works better for high-dimensional data

pca_data <- prcomp(data_iris, scale. = TRUE)

  1. Understanding PCA Outputs

Variance Explained

summary(pca_data)

Example output:

PC1 explains ~92% variance

PC2 explains ~5% variance

First two components explain ~97% variance cumulatively

This means we can reduce 4 features → 2 components with minimal information loss.

Loadings (Feature Contributions)

pca_data$rotation

Loadings show how original features contribute to each principal component.

Visualizations

Scree Plot

screeplot(pca_data, type = "lines")

The “elbow” typically indicates the optimal number of components.

Biplot

biplot(pca_data, scale = 0)

The biplot reveals:

Feature directions

Component importance

Correlations between variables

  1. PCA in a Modeling Workflow (Naive Bayes Example)

Baseline Model (All Features)

library(e1071)

model_full <- naiveBayes(iris[, 1:4], iris[, 5])
pred_full <- predict(model_full, iris[, 1:4])

table(pred_full, iris[, 5])

Model Using First Principal Component

pc_scores <- pca_data$x[, 1, drop = FALSE]

model_pca <- naiveBayes(pc_scores, iris[, 5])
pred_pca <- predict(model_pca, pc_scores)

table(pred_pca, iris[, 5])

Result

Slight reduction in accuracy

75% reduction in feature space

Faster training and simpler model

This tradeoff is often acceptable—and desirable—in production systems.

  1. Summary and Practical Takeaways

PCA remains one of the most important tools in modern data science.

Strengths

Effective dimensionality reduction

Removes multicollinearity

Improves model stability and performance

Widely used in image processing, genomics, NLP, and finance

Limitations

Sensitive to scaling

Components may lack business interpretability

Captures only linear relationships

Mean and variance dependent

Best Practices (2025+)

Always scale features

Use prcomp() instead of princomp()

Combine PCA with cross-validation

Apply PCA inside modeling pipelines, not before data splitting

Final Thoughts

PCA is not just a mathematical trick—it is a practical engineering tool. When used thoughtfully, it allows you to build simpler, faster, and more robust machine learning systems without sacrificing accuracy.

Just like sharpening the axe, investing time in feature engineering and dimensionality reduction pays off exponentially.

Our mission is “to enable businesses unlock value in data.” We do many activities to achieve that—helping you solve tough problems is just one of them. For over 20 years, we’ve partnered with more than 100 clients — from Fortune 500 companies to mid-sized firms — to solve complex data analytics challenges. Our services include power bi experts and power bi development company— turning raw data into strategic insight.

Top comments (0)