Dimensionality reduction helps simplify datasets by reducing the number of input features, making models more efficient and easier to visualize.
PCA is a statistical method that transforms the original features into a new set of orthogonal axes (principal components) capturing maximum variance.
PC = X * W where X is the data matrix, and W contains the eigenvectors
Principal components are ranked by variance explained. PC1 captures the most variance, followed by PC2, and so on.
Explained Variance Ratio = (λᵢ / Σλ)
Data points are projected onto the lower-dimensional space formed by selected principal components.
Projected_X = X · Wk where Wk are top-k eigenvectors
This projection preserves as much variance as possible while reducing dimensionality.
PCA is commonly used to plot data in 2D or 3D. Useful for clustering analysis and detecting outliers.
| Concept | Description |
|---|---|
| PCA | Transforms data using orthogonal axes maximizing variance |
| PC1, PC2 | Top principal components ranked by variance captured |
| Projection | Maps high-dimensional data onto reduced subspace |
| Visualization | Plots data on top PCs to explore patterns |