Modern machine learning relies heavily on calculus of multivariable functions. Training models typically requires minimizing loss functions using gradients (first derivatives) and sometimes Hessians (second derivatives). Backpropagation in neural networks is repeated application of the chain rule (multivariable calculus).
A scalar field is a function \(f: \mathbb{R}^n \to \mathbb{R}\). The gradient of \(f\) is the vector of partial derivatives:
∇f = [∂f/∂x₁, ∂f/∂x₂, …, ∂f/∂xₙ]
The gradient points in the direction of steepest increase. In optimization we follow −∇f to descend.
The directional derivative of f at x in direction u (unit vector) is:
D_u f(x) = ∇f(x) ⋅ u
For g: ℝⁿ → ℝᵐ, the Jacobian J_g(x) is the m×n matrix of partial derivatives, whose i-th row is ∇(g_i)(x). Jacobian generalizes the gradient and is used in coordinate transforms and backpropagation.
For scalar f, the Hessian H is an n×n symmetric matrix with entries H_{ij}=∂²f/∂x_i∂x_j. Hessian tells curvature; positive definite Hessian implies a local minimum.
f(x,y) = 3x^2 + 4xy + 2y^2 + 5x + 1
Gradient: ∂f/∂x = 6x + 4y + 5
∂f/∂y = 4x + 4y
Hessian: [[6, 4],
[4, 4]] (constant)
Because Hessian is constant and (check eigenvalues), we can classify curvature.
For y∈{0,1}, model p = σ(w^T x) where σ = sigmoid.
Loss (negative log-likelihood): L(w) = -[ y log p + (1-y) log(1-p) ]
Gradient: ∇_w L = (p - y) x
Hessian: H = p(1-p) x x^T (rank-1, PSD)
This shows why logistic loss is convex (H PSD) for linear models.
Gradient descent update: w ← w − η ∇L(w). Learning rate η and gradient magnitude determine steps. Newton's method uses Hessian for second-order updates: w ← w − H^{-1} ∇L(w) (fast convergence near optimum if Hessian invertible).
Enter a function f(x,y). Use JavaScript math syntax (Math.* allowed). Examples: x*x + y*y, 3*x*x + 4*x*y + 2*y*y + 5*x + 1, 1/(1+Math.exp(-(a*x + b*y)))
Enter two component functions separated by semicolon. Example: x*x + y; x - y*y
If you'd like, I can also produce: