Lecture 7- Principal Component Analysis (PCA) — Intro, Example & ML Applications
📘 Machine Learning Concept — PCA

Principal Component Analysis (PCA)

Reduce dimensionality by finding new, orthogonal axes that capture the greatest variance in your data. Useful for visualization, compression, denoising, and as a preprocessing step for many ML models.

Intuition & Math

Idea. PCA rotates your data into new coordinates (principal components) so that the first axis captures as much spread (variance) as possible, the second the next most, and so on. Components are orthogonal.

Steps.

  1. Standardize features (optional but common): subtract mean and divide by standard deviation.
  2. Compute the covariance matrix Σ = (1/(n−1)) XTX for mean-centered data X.
  3. Find eigenvalues/eigenvectors of Σ. Eigenvectors are principal directions; eigenvalues are the variance explained.
  4. Sort by descending eigenvalue and project: Z = X · W where columns of W are top-k eigenvectors.

Explained variance ratio. For eigenvalues λ, EVRi = λi / Σλ. Choose k such that cumulative EVR reaches your target (e.g., 95%).

Note: PCA is linear; it captures linear structure. For curved manifolds, consider Kernel PCA, t‑SNE, or UMAP.

Interactive Example (2D)

Data PC1 PC2

Worked Example (by hand, conceptually)

Suppose we have points in 2D: (2, 0), (0, 1), (3, 2), (4, 1).

  1. Mean-center → subtract the mean of each column.
  2. Compute covariance matrix Σ.
  3. Compute eigenpairs of Σ (for 2×2, closed form); the eigenvector with larger eigenvalue is PC1.
  4. Project onto PC1 to get a 1D summary; PC1 captures the trend with maximal variance.

This HTML uses the same procedure under the hood for the interactive demo.

Common Applications in Machine Learning

Exploratory visualization
Plot high‑D data in 2D/3D to see clusters/outliers before modeling.
Noise reduction / denoising
Keep top components; drop low‑variance components dominated by noise.
Feature compression
Reduce dimensionality to speed up models (e.g., linear models, k‑NN).
Preprocessing for clustering
Run PCA before k‑means to remove correlated dimensions.
Image compression
Treat pixels as features; PCA approximates images with fewer components.
Genomics / NLP
Summarize thousands of features (genes, token counts) into compact factors.
Anomaly detection
Model normal patterns in top PCs; anomalies have high residual in discarded PCs.
Recommendation systems
PCA/SVD on user–item matrices uncovers latent preference factors.

When to avoid: if features are on very different scales (without standardization), if relationships are highly nonlinear, or if interpretability of original features is critical.

Lecture 7: Principal Component Analysis (PCA)

Lecture 7: Principal Component Analysis (PCA)

1. Introduction

Principal Component Analysis (PCA) is one of the most widely used techniques in machine learning and data science for dimensionality reduction. It transforms a dataset with possibly correlated features into a new set of uncorrelated features called principal components.

2. Key Concepts

3. Steps in PCA

  1. Standardize the dataset (mean = 0, variance = 1).
  2. Compute the covariance matrix.
  3. Perform eigenvalue decomposition (or SVD) on the covariance matrix.
  4. Select the top k eigenvectors corresponding to largest eigenvalues.
  5. Project the original data onto these new axes to obtain reduced-dimension data.

4. Example (2D Dataset)

Suppose we have data points in 2D space with correlation between X and Y. PCA finds a new axis (first principal component) that captures most variance. The second principal component is orthogonal to the first and captures remaining variance.

Dataset: X = [2.5, 0.5, 2.2, 1.9]
         Y = [2.4, 0.7, 2.9, 2.2]

Step 1: Compute covariance matrix
Step 2: Find eigenvalues (λ1, λ2) and eigenvectors (v1, v2)
Step 3: Choose v1 (corresponding to largest λ1) as first principal component
Step 4: Transform data into new coordinates
    

5. Applications in Machine Learning

6. PCA Calculator (Simple Simulation)

Enter 2D dataset (comma separated values for X and Y):