Logistic Regression | Decision Trees | KNN | Naïve Bayes | SVM | Random Forest
Binary prediction problems occur when the output variable has only two categories, e.g., Yes/No, Pass/Fail, Spam/Not-Spam. In this lecture, we use a simple dataset: predicting whether a student passes an exam (1) or fails (0) based on study hours and attendance.
Hours Studied | Attendance (%) | Result (Pass=1, Fail=0)
-------------------------------------------------------
2 | 60 | 0
4 | 70 | 0
5 | 80 | 1
7 | 90 | 1
Uses the logistic (sigmoid) function to predict probabilities between 0 and 1. Decision threshold (usually 0.5) determines final prediction.
P(pass) = 0.8, classify as Pass (1).
Splits data into branches based on rules like "If Hours > 4 → Pass". Easy to interpret, but prone to overfitting.
Classifies a point based on the majority label of its k-nearest neighbors in feature space.
Applies Bayes theorem with assumption of independence among features. Works well with small datasets.
Finds the best hyperplane that separates Pass vs Fail. Maximizes the margin between two classes.
Ensemble of multiple decision trees. Each tree votes, and the majority class is chosen.
Enter study hours and attendance to estimate pass probability: