Newton's Update
Given twice-differentiable \(f\), Newton's method updates
\[ \mathbf{w}_{t+1} = \mathbf{w}_t - \big[\nabla^2 f(\mathbf{w}_t)\big]^{-1} \nabla f(\mathbf{w}_t). \]
Near a local minimum where Hessian is positive definite, Newton enjoys quadratic convergence.
Armijo Backtracking Line Search
Choose direction \(\mathbf{p}_t = -\big[\nabla^2 f(\mathbf{w}_t)\big]^{-1} \nabla f(\mathbf{w}_t)\). Find smallest \(m\in\{0,1,2,...\}\) such that with \(\alpha=\beta^m\):
\[ f(\mathbf{w}_t + \alpha \mathbf{p}_t) \le f(\mathbf{w}_t) + c\,\alpha\,\nabla f(\mathbf{w}_t)^\top \mathbf{p}_t, \quad c\in(0,1),\ \beta\in(0,1). \]
Worked Example (Ridge-Logistic Newton Step)
import numpy as np
X = np.array([[1.,0.], [0.,1.], [1.,1.], [2.,1.]])
y = np.array([0., 0., 1., 1.])
w = np.zeros(2); b = 0.0; lam = 1e-2
for t in range(6):
z = X @ w + b
p = 1/(1+np.exp(-z))
# gradient (add L2 on w)
grad_w = X.T @ (p - y) + lam*w
grad_b = np.sum(p - y)
# Hessian (block form)
S = np.diag(p*(1-p))
H_ww = X.T @ S @ X + lam*np.eye(2)
H_wb = X.T @ (p*(1-p))
H_bb = np.sum(p*(1-p))
# Solve Newton step for (w,b)
H = np.block([[H_ww, H_wb.reshape(-1,1)], [H_wb.reshape(1,-1), np.array([[H_bb]])]])
g = np.concatenate([grad_w, [grad_b]])
step = np.linalg.solve(H, g)
w -= step[:2]; b -= step[2]
print(t+1, w, b)
Combine with backtracking to ensure monotone decrease in \(f\).
Notes
- Per-iteration cost is higher than GD due to Hessian solves.
- Trust-region methods are an alternative when Hessian is ill-conditioned.