Lecture 19 – Example 3: Newton's Method & Line Search

Newton's Update

Given twice-differentiable \(f\), Newton's method updates

\[ \mathbf{w}_{t+1} = \mathbf{w}_t - \big[\nabla^2 f(\mathbf{w}_t)\big]^{-1} \nabla f(\mathbf{w}_t). \]

Near a local minimum where Hessian is positive definite, Newton enjoys quadratic convergence.

Armijo Backtracking Line Search

Choose direction \(\mathbf{p}_t = -\big[\nabla^2 f(\mathbf{w}_t)\big]^{-1} \nabla f(\mathbf{w}_t)\). Find smallest \(m\in\{0,1,2,...\}\) such that with \(\alpha=\beta^m\):

\[ f(\mathbf{w}_t + \alpha \mathbf{p}_t) \le f(\mathbf{w}_t) + c\,\alpha\,\nabla f(\mathbf{w}_t)^\top \mathbf{p}_t, \quad c\in(0,1),\ \beta\in(0,1). \]

Worked Example (Ridge-Logistic Newton Step)

import numpy as np
X = np.array([[1.,0.], [0.,1.], [1.,1.], [2.,1.]])
y = np.array([0., 0., 1., 1.])
w = np.zeros(2); b = 0.0; lam = 1e-2
for t in range(6):
    z = X @ w + b
    p = 1/(1+np.exp(-z))
    # gradient (add L2 on w)
    grad_w = X.T @ (p - y) + lam*w
    grad_b = np.sum(p - y)
    # Hessian (block form)
    S = np.diag(p*(1-p))
    H_ww = X.T @ S @ X + lam*np.eye(2)
    H_wb = X.T @ (p*(1-p))
    H_bb = np.sum(p*(1-p))
    # Solve Newton step for (w,b)
    H = np.block([[H_ww, H_wb.reshape(-1,1)], [H_wb.reshape(1,-1), np.array([[H_bb]])]])
    g = np.concatenate([grad_w, [grad_b]])
    step = np.linalg.solve(H, g)
    w -= step[:2]; b -= step[2]
    print(t+1, w, b)

Combine with backtracking to ensure monotone decrease in \(f\).

Notes

Per-iteration cost is higher than GD due to Hessian solves.
Trust-region methods are an alternative when Hessian is ill-conditioned.

← Back to L18 · Forward to L20 →