📘 Machine Learning Concept — PCA

Principal Component Analysis (PCA)

Reduce dimensionality by finding new, orthogonal axes that capture the greatest variance in your data. Useful for visualization, compression, denoising, and as a preprocessing step for many ML models.

Intuition & Math

Idea. PCA rotates your data into new coordinates (principal components) so that the first axis captures as much spread (variance) as possible, the second the next most, and so on. Components are orthogonal.

Steps.

Standardize features (optional but common): subtract mean and divide by standard deviation.
Compute the covariance matrix Σ = (1/(n−1)) X^TX for mean-centered data X.
Find eigenvalues/eigenvectors of Σ. Eigenvectors are principal directions; eigenvalues are the variance explained.
Sort by descending eigenvalue and project: Z = X · W where columns of W are top-k eigenvectors.

Explained variance ratio. For eigenvalues λ, EVR_i = λ_i / Σλ. Choose k such that cumulative EVR reaches your target (e.g., 95%).

Note: PCA is linear; it captures linear structure. For curved manifolds, consider Kernel PCA, t‑SNE, or UMAP.

Interactive Example (2D)

Dataset Noise (σ) Standardize features

Data PC1 PC2

—

Worked Example (by hand, conceptually)

Suppose we have points in 2D: (2, 0), (0, 1), (3, 2), (4, 1).

Mean-center → subtract the mean of each column.
Compute covariance matrix Σ.
Compute eigenpairs of Σ (for 2×2, closed form); the eigenvector with larger eigenvalue is PC1.
Project onto PC1 to get a 1D summary; PC1 captures the trend with maximal variance.

This HTML uses the same procedure under the hood for the interactive demo.

Common Applications in Machine Learning

Exploratory visualization
Plot high‑D data in 2D/3D to see clusters/outliers before modeling.

Noise reduction / denoising
Keep top components; drop low‑variance components dominated by noise.

Feature compression
Reduce dimensionality to speed up models (e.g., linear models, k‑NN).

Preprocessing for clustering
Run PCA before k‑means to remove correlated dimensions.

Image compression
Treat pixels as features; PCA approximates images with fewer components.

Genomics / NLP
Summarize thousands of features (genes, token counts) into compact factors.

Anomaly detection
Model normal patterns in top PCs; anomalies have high residual in discarded PCs.

Recommendation systems
PCA/SVD on user–item matrices uncovers latent preference factors.

When to avoid: if features are on very different scales (without standardization), if relationships are highly nonlinear, or if interpretability of original features is critical.

Principal Component Analysis (PCA) is one of the most widely used techniques in machine learning and data science for dimensionality reduction. It transforms a dataset with possibly correlated features into a new set of uncorrelated features called principal components.

Principal Component Analysis (PCA)

Intuition & Math

Interactive Example (2D)

Worked Example (by hand, conceptually)

Common Applications in Machine Learning

Lecture 7: Principal Component Analysis (PCA)

1. Introduction

2. Key Concepts

3. Steps in PCA

4. Example (2D Dataset)

5. Applications in Machine Learning

6. PCA Calculator (Simple Simulation)