Module 4: Supervised Learning

4.1 Regression

Regression is used when the output variable is a continuous value. It helps in predicting real-world quantities such as prices, sales, or temperature.

4.1.1 Linear Regression

Models the relationship between inputs and a continuous output using a straight line.

Example 1: Predicting monthly electricity usage from square footage.
Example 2: Estimating the price of a used laptop based on age and brand.

Key Points:

Assumes a linear relationship.
Minimizes the sum of squared errors.
Fast to train and easy to interpret.
Prone to underfitting on complex data.

4.1.1.1 Simple Linear Regression

Uses one independent variable to predict a continuous outcome.

Example 1: Predicting crop yield based on rainfall.
Example 2: Estimating weight based on height.

Key Points:

One input, one output.
Best for linear trends.
Model: y = mx + b
Limited by simplicity.

4.1.1.2 Multiple Linear Regression

Involves two or more independent variables to predict the outcome.

Example 1: Predicting house prices based on area, number of rooms, and location.
Example 2: Estimating income from education level, age, and work experience.

Key Points:

Supports multiple input variables.
Useful for more realistic datasets.
Still assumes linearity.
Susceptible to multicollinearity.

4.1.2 Polynomial Regression

Extends linear regression by allowing curved relationships between variables.

Example 1: Modeling population growth over time.
Example 2: Predicting CPU temperature based on load.

Key Points:

Fits curves using polynomial terms (x², x³, ...).
Can model complex trends.
Higher degrees increase overfitting risk.
Requires feature transformation.

4.1.4.1 Pruning in Decision Trees

Reduces model complexity by trimming branches from decision trees.

Example 1: Optimizing decision trees used for loan approvals.
Example 2: Preventing a medical diagnosis model from overfitting.

Key Points:

Removes less informative branches.
Improves generalization.
Two types: pre-pruning and post-pruning.
Reduces training time and size.

4.1.5 Random Forest Regressor

Uses an ensemble of decision trees for robust predictions.

Example 1: Predicting rental prices across cities.
Example 2: Estimating yield from farmland data.

Key Points:

Reduces overfitting using multiple trees.
Good for large datasets.
Handles non-linear data well.
Slower than individual models.

4.1.6 Support Vector Regressor (SVR)

Uses support vectors and margin of tolerance to predict values.

Example 1: Predicting stock prices.
Example 2: Modeling house energy usage.

Key Points:

Effective in high-dimensional spaces.
Kernel trick allows non-linear modeling.
Robust to outliers.
Requires careful parameter tuning.

4.1.8 Underfitting

Occurs when a model is too simple to capture data patterns.

Example 1: Using linear regression on non-linear data.
Example 2: Predicting traffic with only time of day.

Key Points:

High training and test errors.
Model is too simple.
Can be fixed with more features or complex models.

4.1.9 Overfitting

Occurs when a model memorizes training data, failing to generalize.

Example 1: Deep decision tree with noise.
Example 2: Complex polynomial model on few samples.

Key Points:

Low training error, high test error.
Regularization and pruning help mitigate it.
Use cross-validation to detect.

4.2 Classification

Classification is used when the output is a discrete class. It answers questions like “Is this email spam?” or “Does this image contain a cat?”

4.2.1 Logistic Regression

Predicts probability of a binary or multi-class outcome.

Example 1: Spam email detection.
Example 2: Predicting customer churn.

Key Points:

Outputs values between 0 and 1.
Uses sigmoid function for prediction.
Good for linearly separable data.

4.2.2 KNN Classifier

Classifies based on majority class among k nearest neighbors.

Example 1: Handwritten digit classification.
Example 2: Recommending movies based on user profile.

Key Points:

No training phase (“lazy” learner).
Simple and intuitive.
Computationally expensive on large data.

4.2.3 Naïve Bayes

Uses Bayes' Theorem assuming feature independence.

Example 1: News categorization.
Example 2: Sentiment analysis of reviews.

Key Points:

Fast and efficient.
Performs well on text data.
Assumes independence between features.

4.2.3.1 Multinomial

Handles word counts and frequencies in text classification.

4.2.3.2 Gaussian

Assumes continuous features follow a Gaussian distribution.

4.2.4 Decision Tree

Splits data into branches to reach a decision or class label.

Example 1: Disease diagnosis from symptoms.
Example 2: Predicting if a loan will be approved.

Key Points:

Visual and interpretable.
Prone to overfitting.
Works with both numerical and categorical data.

4.2.5 Random Forest

Combines many decision trees to vote for the best result.

Example 1: Classifying email as spam or not.
Example 2: Credit card fraud detection.

4.2.6 SVM

Finds the optimal hyperplane to separate classes with the widest margin.

Example 1: Image classification.
Example 2: Face recognition system.

Key Points:

Effective in high-dimensional space.
Can use kernel trick for non-linear data.
Requires good parameter tuning.