Lecture 6 — Vector Calculus: Gradients, Jacobians, Hessians & ML Applications

Lecture 6 — Vector Calculus

Gradients, directional derivatives, Jacobian, Hessian, optimization & applications in machine learning

1. Why vector calculus matters in ML

Modern machine learning relies heavily on calculus of multivariable functions. Training models typically requires minimizing loss functions using gradients (first derivatives) and sometimes Hessians (second derivatives). Backpropagation in neural networks is repeated application of the chain rule (multivariable calculus).

2. Key concepts — short theory

Scalar field and gradient

A scalar field is a function \(f: \mathbb{R}^n \to \mathbb{R}\). The gradient of \(f\) is the vector of partial derivatives:

∇f = [∂f/∂x₁, ∂f/∂x₂, …, ∂f/∂xₙ]

The gradient points in the direction of steepest increase. In optimization we follow −∇f to descend.

Directional derivative

The directional derivative of f at x in direction u (unit vector) is:

D_u f(x) = ∇f(x) ⋅ u

Vector-valued functions & Jacobian

For g: ℝⁿ → ℝᵐ, the Jacobian J_g(x) is the m×n matrix of partial derivatives, whose i-th row is ∇(g_i)(x). Jacobian generalizes the gradient and is used in coordinate transforms and backpropagation.

Hessian (matrix of second derivatives)

For scalar f, the Hessian H is an n×n symmetric matrix with entries H_{ij}=∂²f/∂x_i∂x_j. Hessian tells curvature; positive definite Hessian implies a local minimum.

3. Worked examples

Example A — Quadratic function (2D)

f(x,y) = 3x^2 + 4xy + 2y^2 + 5x + 1
Gradient: ∂f/∂x = 6x + 4y + 5
          ∂f/∂y = 4x + 4y
Hessian: [[6, 4],
          [4, 4]]  (constant)
      

Because Hessian is constant and (check eigenvalues), we can classify curvature.

Example B — Logistic loss (single sample)

For y∈{0,1}, model p = σ(w^T x) where σ = sigmoid.
Loss (negative log-likelihood): L(w) = -[ y log p + (1-y) log(1-p) ]
Gradient: ∇_w L = (p - y) x
Hessian: H = p(1-p) x x^T  (rank-1, PSD)
      

This shows why logistic loss is convex (H PSD) for linear models.

4. Optimization connection

Gradient descent update: w ← w − η ∇L(w). Learning rate η and gradient magnitude determine steps. Newton's method uses Hessian for second-order updates: w ← w − H^{-1} ∇L(w) (fast convergence near optimum if Hessian invertible).

5. Interactive numeric tools

Numeric gradient & directional derivative (2 variables)

Enter a function f(x,y). Use JavaScript math syntax (Math.* allowed). Examples: x*x + y*y, 3*x*x + 4*x*y + 2*y*y + 5*x + 1, 1/(1+Math.exp(-(a*x + b*y)))


Jacobian for vector-valued function (2→2)

Enter two component functions separated by semicolon. Example: x*x + y; x - y*y


Hessian (2 variables)


6. Summary & links

If you'd like, I can also produce: