Module 4: Supervised Learning
4.1 Regression
Regression is used when the output variable is a continuous value. It helps in predicting real-world quantities such as prices, sales, or temperature.
4.1.1 Linear Regression
Models the relationship between inputs and a continuous output using a straight line.
Example 1: Predicting monthly electricity usage from square footage.
Example 2: Estimating the price of a used laptop based on age and brand.
Key Points:
- Assumes a linear relationship.
- Minimizes the sum of squared errors.
- Fast to train and easy to interpret.
- Prone to underfitting on complex data.
4.1.1.1 Simple Linear Regression
Uses one independent variable to predict a continuous outcome.
Example 1: Predicting crop yield based on rainfall.
Example 2: Estimating weight based on height.
Key Points:
- One input, one output.
- Best for linear trends.
- Model: y = mx + b
- Limited by simplicity.
4.1.1.2 Multiple Linear Regression
Involves two or more independent variables to predict the outcome.
Example 1: Predicting house prices based on area, number of rooms, and location.
Example 2: Estimating income from education level, age, and work experience.
Key Points:
- Supports multiple input variables.
- Useful for more realistic datasets.
- Still assumes linearity.
- Susceptible to multicollinearity.
4.1.2 Polynomial Regression
Extends linear regression by allowing curved relationships between variables.
Example 1: Modeling population growth over time.
Example 2: Predicting CPU temperature based on load.
Key Points:
- Fits curves using polynomial terms (x², x³, ...).
- Can model complex trends.
- Higher degrees increase overfitting risk.
- Requires feature transformation.
4.1.4.1 Pruning in Decision Trees
Reduces model complexity by trimming branches from decision trees.
Example 1: Optimizing decision trees used for loan approvals.
Example 2: Preventing a medical diagnosis model from overfitting.
Key Points:
- Removes less informative branches.
- Improves generalization.
- Two types: pre-pruning and post-pruning.
- Reduces training time and size.
4.1.5 Random Forest Regressor
Uses an ensemble of decision trees for robust predictions.
Example 1: Predicting rental prices across cities.
Example 2: Estimating yield from farmland data.
Key Points:
- Reduces overfitting using multiple trees.
- Good for large datasets.
- Handles non-linear data well.
- Slower than individual models.
4.1.6 Support Vector Regressor (SVR)
Uses support vectors and margin of tolerance to predict values.
Example 1: Predicting stock prices.
Example 2: Modeling house energy usage.
Key Points:
- Effective in high-dimensional spaces.
- Kernel trick allows non-linear modeling.
- Robust to outliers.
- Requires careful parameter tuning.
4.1.8 Underfitting
Occurs when a model is too simple to capture data patterns.
Example 1: Using linear regression on non-linear data.
Example 2: Predicting traffic with only time of day.
Key Points:
- High training and test errors.
- Model is too simple.
- Can be fixed with more features or complex models.
4.1.9 Overfitting
Occurs when a model memorizes training data, failing to generalize.
Example 1: Deep decision tree with noise.
Example 2: Complex polynomial model on few samples.
4.2 Classification
Classification is used when the output is a discrete class. It answers questions like “Is this email spam?” or “Does this image contain a cat?”
4.2.1 Logistic Regression
Predicts probability of a binary or multi-class outcome.
Example 1: Spam email detection.
Example 2: Predicting customer churn.
Key Points:
- Outputs values between 0 and 1.
- Uses sigmoid function for prediction.
- Good for linearly separable data.
4.2.2 KNN Classifier
Classifies based on majority class among k nearest neighbors.
Example 1: Handwritten digit classification.
Example 2: Recommending movies based on user profile.
Key Points:
- No training phase (“lazy” learner).
- Simple and intuitive.
- Computationally expensive on large data.
4.2.3 Naïve Bayes
Uses Bayes' Theorem assuming feature independence.
Example 1: News categorization.
Example 2: Sentiment analysis of reviews.
Key Points:
- Fast and efficient.
- Performs well on text data.
- Assumes independence between features.
4.2.3.1 Multinomial
Handles word counts and frequencies in text classification.
4.2.3.2 Gaussian
Assumes continuous features follow a Gaussian distribution.
4.2.4 Decision Tree
Splits data into branches to reach a decision or class label.
Example 1: Disease diagnosis from symptoms.
Example 2: Predicting if a loan will be approved.
Key Points:
- Visual and interpretable.
- Prone to overfitting.
- Works with both numerical and categorical data.
4.2.5 Random Forest
Combines many decision trees to vote for the best result.
Example 1: Classifying email as spam or not.
Example 2: Credit card fraud detection.
4.2.6 SVM
Finds the optimal hyperplane to separate classes with the widest margin.
Example 1: Image classification.
Example 2: Face recognition system.
Key Points:
- Effective in high-dimensional space.
- Can use kernel trick for non-linear data.
- Requires good parameter tuning.