Lecture 25 - Linear Regression

Linear Regression is one of the most fundamental and widely used algorithms in machine learning and statistics. It models the relationship between a dependent variable (target) and one or more independent variables (features) by fitting a straight line (or hyperplane) through the data.

1) Mathematical Foundation

y = w₀ + w₁x₁ + w₂x₂ + ... + w_nx_n + ε

y: Dependent variable (target)
x_i: Independent variables (features)
w_i: Coefficients (weights) learned by the model
w₀: Intercept term
ε: Error term (residuals)

The goal is to estimate coefficients w that minimize the squared error:

J(w) = \(\frac{1}{m}\) ∑ (y_i - ŷ_i)²

2) Assumptions of Linear Regression

3) Graphical Representation

Assumption	Description
Linearity	Relationship between predictors and target is linear.
Independence	Observations are independent of each other.
Homoscedasticity	Constant variance of errors across values of predictors.
No multicollinearity	Predictors should not be highly correlated with each other.
Normality of errors	Residuals are normally distributed.

Simple Linear Regression

A straight line fitted to data points in 2D (X vs Y).

Multiple Linear Regression

A hyperplane fitted in higher dimensions.

4) Applications

5) Hands-on Example 1: Diabetes Classification (Binary → Logistic Extension)

Although diabetes prediction is a classification problem, linear regression can be used first as a baseline model by predicting continuous probabilities (then thresholded).

from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)

preds = model.predict(X_test)
print("MSE:", mean_squared_error(y_test, preds))

Use diabetes dataset (continuous outcome, disease progression).
Train/test split, evaluate with MSE and R² score.
Next step → upgrade to Logistic Regression for classification (Lecture 27).

6) Hands-on Example 2: Sales Forecasting (Continuous Target)

We predict monthly sales using advertising spend on TV, radio, and newspaper.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

# Example Sales Data
sales = pd.DataFrame({
    "TV":[230.1,44.5,17.2,151.5,180.8],
    "Radio":[37.8,39.3,45.9,41.3,10.8],
    "Newspaper":[69.2,45.1,69.3,58.5,58.4],
    "Sales":[22.1,10.4,9.3,18.5,12.9]
})

X = sales[["TV","Radio","Newspaper"]]
y = sales["Sales"]

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42)
model = LinearRegression()
model.fit(X_train,y_train)

preds = model.predict(X_test)
print("R² Score:", r2_score(y_test,preds))

By interpreting coefficients, we can see which medium (TV, Radio, Newspaper) contributes most to sales.

7) Regularization (Ridge, Lasso, ElasticNet)

Ridge

Penalizes large coefficients using L2 norm.

J = RSS + λ ∑ w²

Lasso

Penalizes absolute values of coefficients (L1 norm) → feature selection.

J = RSS + λ ∑ |w|

ElasticNet

Combination of Ridge + Lasso.

J = RSS + λ₁∑ w² + λ₂∑ |w|

8) Practical Playbook

Check assumptions with residual plots.
Scale features when using regularization.
Use cross-validation to avoid overfitting.
Interpret coefficients carefully (units matter).

# Pipeline with scaling + regularization
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge

pipe = Pipeline([
    ("scale", StandardScaler()),
    ("ridge", Ridge(alpha=1.0))
])
pipe.fit(X_train, y_train)

Lecture 25: Linear Regression