6. Unsupervised Learning

6.1 Clustering

Definition: Clustering is an unsupervised learning technique that groups similar data points into clusters without labeled data.

Explanation: It helps discover hidden patterns or groupings in data and is widely used in customer segmentation, social network analysis, and image compression.

🧾 Examples:
- Grouping news articles by topic.
- Segmenting customers by purchasing behavior.
💡 Key Points:
  • No labeled data required.
  • Output is groups/clusters with high intra-group similarity.
  • Popular algorithms: K-Means, DBSCAN, Hierarchical.

6.1.1 K-Means

Definition: K-Means is a centroid-based clustering algorithm that partitions data into K clusters based on proximity to cluster centers.

Explanation: The algorithm minimizes intra-cluster distances by assigning each point to the nearest centroid and then recalculating centroids until convergence.

🧾 Examples:
- Customer segmentation in marketing.
- Image color quantization.
📊 Facts:
- Requires K to be predefined.
- Sensitive to initial centroids.
- Fast and scalable.

6.1.1.1 Elbow Method

Definition: The Elbow Method is a technique to find the optimal number of clusters (K) by plotting within-cluster variance versus number of clusters.

🧾 Example:
- Used in K-Means to determine the ideal K in customer segmentation.
💡 Look for the "elbow point" where adding more clusters doesn’t significantly reduce the variance.

6.1.1.2 K-Means++

Definition: K-Means++ is an enhanced version of K-Means that improves the initial centroid selection to avoid poor clustering results.

🧾 Example:
- Used in clustering large-scale location data with better consistency.
📊 Improves convergence and clustering accuracy over standard K-Means.

6.1.2 Hierarchical Clustering

Definition: Hierarchical clustering builds a tree of clusters by either merging or splitting them recursively.

🧾 Example:
- Taxonomy of species.
- Document categorization.
💡 Doesn’t require predefining K and creates dendrograms for visualization.

6.1.2.1 Agglomerative Clustering

Definition: A bottom-up approach where each data point starts as its own cluster, and pairs of clusters are merged as one moves up the hierarchy.

🧾 Example:
- Grouping species by genetic similarity in biology.
- Merging social network users based on mutual friends.
💡 Key Points:
  • Does not require specifying the number of clusters (K).
  • Produces a dendrogram to visualize cluster merging.
  • Slower than K-Means on large datasets due to repeated distance calculations.

6.1.2.2 Divisive Clustering

Definition: A top-down approach that starts with all data in one cluster and recursively splits it into smaller clusters.

🧾 Example:
- Separating academic departments from a university-wide dataset.
- Breaking down a customer base from general to specific segments.
💡 Key Points:
  • Opposite of agglomerative — starts big and splits downward.
  • Less commonly used due to higher computational complexity.
  • Useful when the natural grouping is known to start large.

6.1.2.3 Dendrograms

Definition: A dendrogram is a tree-like diagram that records the sequences of merges or splits in hierarchical clustering.

🧾 Example:
- Visualizing how different types of wines group based on taste features.
- Showing user communities in a social media clustering analysis.
💡 Key Points:
  • Shows the entire clustering process and merging steps.
  • Cutting the dendrogram at a specific height reveals K clusters.
  • Helps evaluate natural grouping and hierarchy in the data.

6.2 Association

Definition: Association learning is a rule-based machine learning method for discovering interesting relations between variables in large datasets.

Explanation: It finds frequent patterns, associations, or correlations in data. Often used for market basket analysis to find products bought together.

🧾 Examples:
- "People who buy bread often buy butter."
- Amazon recommendation rules.
💡 Key Points:
  • Uses support, confidence, and lift to evaluate rules.
  • Apriori and FP-Growth are common algorithms.
  • Highly interpretable rules.