9. Dimensionality Reduction

Dimensionality reduction helps simplify datasets by reducing the number of input features, making models more efficient and easier to visualize.

📚 Table of Contents

9.1 Principal Component Analysis (PCA)

PCA is a statistical method that transforms the original features into a new set of orthogonal axes (principal components) capturing maximum variance.

PC = X * W where X is the data matrix, and W contains the eigenvectors
Key Points:

9.1.1 PC1 & PC2

Principal components are ranked by variance explained. PC1 captures the most variance, followed by PC2, and so on.

Explained Variance Ratio = (λᵢ / Σλ)
Key Points:

9.1.2 Projection

Data points are projected onto the lower-dimensional space formed by selected principal components.

Projected_X = X · Wk where Wk are top-k eigenvectors

This projection preserves as much variance as possible while reducing dimensionality.

Key Points:

9.1.3 Visualization

PCA is commonly used to plot data in 2D or 3D. Useful for clustering analysis and detecting outliers.

Key Points:
Concept Description
PCA Transforms data using orthogonal axes maximizing variance
PC1, PC2 Top principal components ranked by variance captured
Projection Maps high-dimensional data onto reduced subspace
Visualization Plots data on top PCs to explore patterns