Lecture 26 - Binary Data Prediction Models

Introduction

Binary prediction problems occur when the output variable has only two categories, e.g., Yes/No, Pass/Fail, Spam/Not-Spam. In this lecture, we use a simple dataset: predicting whether a student passes an exam (1) or fails (0) based on study hours and attendance.

Example Dataset

Hours Studied | Attendance (%) | Result (Pass=1, Fail=0)
-------------------------------------------------------
    2         |      60        |          0
    4         |      70        |          0
    5         |      80        |          1
    7         |      90        |          1

1. Logistic Regression

Uses the logistic (sigmoid) function to predict probabilities between 0 and 1. Decision threshold (usually 0.5) determines final prediction.

Application: Predicting pass probability from hours studied.
If P(pass) = 0.8, classify as Pass (1).

2. Decision Trees

Splits data into branches based on rules like "If Hours > 4 → Pass". Easy to interpret, but prone to overfitting.

Rule Example: If Attendance > 75% and Hours Studied > 4 → Pass, else Fail.

3. K-Nearest Neighbors (KNN)

Classifies a point based on the majority label of its k-nearest neighbors in feature space.

Example: A student with 5 hours studied, 80% attendance has neighbors (1,1,0). Majority is Pass → Predict Pass.

4. Naïve Bayes

Applies Bayes theorem with assumption of independence among features. Works well with small datasets.

Example: Probability(Pass | Hours, Attendance) = Likelihood × Prior / Evidence.

5. Support Vector Machine (SVM)

Finds the best hyperplane that separates Pass vs Fail. Maximizes the margin between two classes.

Example: SVM will find a boundary line such that all Pass students are on one side and Fail students on the other.

6. Random Forest

Ensemble of multiple decision trees. Each tree votes, and the majority class is chosen.

Example: 100 trees predict Pass, 80 say Pass, 20 say Fail → Final result = Pass.

Interactive Calculator

Binary Prediction Demo (Simple Logistic Approximation)

Enter study hours and attendance to estimate pass probability: