Optimization Techniques in ML

Lecture 18 – Example 2: Logistic Regression (Gradient Descent)

Model & Loss

For inputs \(\mathbf{x}_i\in\mathbb{R}^d\), labels \(y_i\in\{0,1\}\), parameters \(\mathbf{w}, b\):

\[ \hat{p}_i = \sigma(\mathbf{w}^\top \mathbf{x}_i + b), \quad \sigma(z)=\frac{1}{1+e^{-z}}. \]

Negative log-likelihood (binary cross-entropy):

\[ \mathcal{L}(\mathbf{w},b) = -\sum_{i=1}^n\big[ y_i\log \hat{p}_i + (1-y_i)\log(1-\hat{p}_i)\big]. \]

Gradients

Let \(\mathbf{X}\in\mathbb{R}^{n\times d}\) and \(\hat{\mathbf{p}}=\sigma(\mathbf{X}\mathbf{w}+b\mathbf{1})\). Then

\[ \nabla_{\mathbf{w}}\mathcal{L} = \mathbf{X}^\top(\hat{\mathbf{p}}-\mathbf{y}), \qquad \partial_b\mathcal{L}= \mathbf{1}^\top(\hat{\mathbf{p}}-\mathbf{y}). \]

Gradient descent update with step-size \(\eta\):

\[ \mathbf{w} \leftarrow \mathbf{w} - \eta\, \nabla_{\mathbf{w}}\mathcal{L}, \qquad b \leftarrow b - \eta\, \partial_b\mathcal{L}. \]

Worked Example (Tiny Dataset)

import numpy as np
X = np.array([[0.,0.], [0.,1.], [1.,0.], [1.,1.]])
y = np.array([0., 0., 0., 1.])  # AND gate
w = np.zeros(2); b = 0.0
eta = 0.5
for t in range(10):
    z = X @ w + b
    p = 1/(1+np.exp(-z))
    grad_w = X.T @ (p - y)
    grad_b = np.sum(p - y)
    w -= eta*grad_w
    b -= eta*grad_b
    print(t+1, w, b)

Weights move to separate the positive example \((1,1)\) from others.

Regularization (Optional)

Add L2: \(\mathcal{L}_\text{reg} = \mathcal{L} + \tfrac{\lambda}{2}\lVert\mathbf{w}\rVert^2\) ⇒ gradient adds \(\lambda\mathbf{w}\). Helps generalization & conditioning.