Lecture 6 — Vector Calculus

Gradients, directional derivatives, Jacobian, Hessian, optimization & applications in machine learning

1. Why vector calculus matters in ML

Modern machine learning relies heavily on calculus of multivariable functions. Training models typically requires minimizing loss functions using gradients (first derivatives) and sometimes Hessians (second derivatives). Backpropagation in neural networks is repeated application of the chain rule (multivariable calculus).

2. Key concepts — short theory

Scalar field and gradient

A scalar field is a function \(f: \mathbb{R}^n \to \mathbb{R}\). The gradient of \(f\) is the vector of partial derivatives:

∇f = [∂f/∂x₁, ∂f/∂x₂, …, ∂f/∂xₙ]

The gradient points in the direction of steepest increase. In optimization we follow −∇f to descend.

Directional derivative

The directional derivative of f at x in direction u (unit vector) is:

D_u f(x) = ∇f(x) ⋅ u

Vector-valued functions & Jacobian

For g: ℝⁿ → ℝᵐ, the Jacobian J_g(x) is the m×n matrix of partial derivatives, whose i-th row is ∇(g_i)(x). Jacobian generalizes the gradient and is used in coordinate transforms and backpropagation.

Hessian (matrix of second derivatives)

For scalar f, the Hessian H is an n×n symmetric matrix with entries H_{ij}=∂²f/∂x_i∂x_j. Hessian tells curvature; positive definite Hessian implies a local minimum.

3. Worked examples

Example A — Quadratic function (2D)

f(x,y) = 3x^2 + 4xy + 2y^2 + 5x + 1
Gradient: ∂f/∂x = 6x + 4y + 5
          ∂f/∂y = 4x + 4y
Hessian: [[6, 4],
          [4, 4]]  (constant)

Because Hessian is constant and (check eigenvalues), we can classify curvature.

Example B — Logistic loss (single sample)

For y∈{0,1}, model p = σ(w^T x) where σ = sigmoid.
Loss (negative log-likelihood): L(w) = -[ y log p + (1-y) log(1-p) ]
Gradient: ∇_w L = (p - y) x
Hessian: H = p(1-p) x x^T  (rank-1, PSD)

This shows why logistic loss is convex (H PSD) for linear models.

4. Optimization connection

Gradient descent update: w ← w − η ∇L(w). Learning rate η and gradient magnitude determine steps. Newton's method uses Hessian for second-order updates: w ← w − H^{-1} ∇L(w) (fast convergence near optimum if Hessian invertible).

5. Interactive numeric tools

Numeric gradient & directional derivative (2 variables)

Enter a function f(x,y). Use JavaScript math syntax (Math.* allowed). Examples: x*x + y*y, 3*x*x + 4*x*y + 2*y*y + 5*x + 1, 1/(1+Math.exp(-(a*x + b*y)))

Function f(x,y) Point (x,y) Directional vector u (optional, leave blank to skip directional derivative)

Jacobian for vector-valued function (2→2)

Enter two component functions separated by semicolon. Example: x*x + y; x - y*y

g(x,y) = [ g1 ; g2 ] Point (x,y)

Hessian (2 variables)

Function f(x,y) Point (x,y)

6. Summary & links

Gradient: direction of steepest ascent; used for first-order optimization (gradient descent).
Jacobian: multivariable generalization — crucial for mapping derivatives through vector-valued layers (backprop).
Hessian: curvature information — basis for second-order methods (Newton), condition-number analysis.
In ML: gradients + chain rule = backprop; Hessians appear in optimization & uncertainty estimation.

If you'd like, I can also produce:

a downloadable Jupyter notebook `Lecture6_VectorCalculus.ipynb` with symbolic (SymPy) + numeric examples,
visual plots of gradient fields and contour + GD trajectories, or
an expanded backpropagation walkthrough (vector calculus steps for a small neural net).