9. Dimensionality Reduction

Dimensionality reduction helps simplify datasets by reducing the number of input features, making models more efficient and easier to visualize.

📚 Table of Contents

9.1 PCA
9.1.1 PC1 & PC2
9.1.2 Projection
9.1.3 Visualization

9.1 Principal Component Analysis (PCA)

PCA is a statistical method that transforms the original features into a new set of orthogonal axes (principal components) capturing maximum variance.

PC = X * W where X is the data matrix, and W contains the eigenvectors

Key Points:

Unsupervised technique used for preprocessing and visualization
Maximizes variance in lower-dimensional space
Orthogonal components reduce redundancy

9.1.1 PC1 & PC2

Principal components are ranked by variance explained. PC1 captures the most variance, followed by PC2, and so on.

PC1: The direction of maximum variance
PC2: Orthogonal to PC1 and captures remaining variance

Explained Variance Ratio = (λᵢ / Σλ)

Key Points:

PC1 and PC2 often used for 2D visualizations
Each PC is a linear combination of original features
Helps identify dominant patterns in data

9.1.2 Projection

Data points are projected onto the lower-dimensional space formed by selected principal components.

Projected_X = X · W_k where W_k are top-k eigenvectors

This projection preserves as much variance as possible while reducing dimensionality.

Key Points:

Projection helps visualize high-dimensional data in 2D or 3D
Reduces overfitting by eliminating noise
Improves model training time and performance

9.1.3 Visualization

PCA is commonly used to plot data in 2D or 3D. Useful for clustering analysis and detecting outliers.

Each point is a data sample projected onto PC1 and PC2
Coloring based on class helps visualize separability

Key Points:

Enables exploratory data analysis (EDA)
Good for checking clustering tendencies visually
Works best when top 2 PCs capture enough variance

Concept	Description
PCA	Transforms data using orthogonal axes maximizing variance
PC1, PC2	Top principal components ranked by variance captured
Projection	Maps high-dimensional data onto reduced subspace
Visualization	Plots data on top PCs to explore patterns