Lecture 9: Optimization Techniques in Machine Learning

1. Introduction

Optimization is the core of machine learning. Training a model essentially means minimizing or maximizing an objective (loss) function. For example:

Linear Regression → minimize Mean Squared Error (MSE)
Logistic Regression → minimize Cross-Entropy Loss
Neural Networks → minimize complex loss using Gradient Descent

2. Types of Optimization Problems

Unconstrained optimization: No restrictions on parameter values.
Constrained optimization: Parameters must satisfy certain conditions (e.g., non-negativity).
Convex vs Non-Convex problems: Convex problems guarantee global minimum, non-convex may have many local minima.

3. Key Optimization Techniques

Gradient Descent (GD): Iteratively update parameters opposite to the gradient direction.
Stochastic Gradient Descent (SGD): Uses random mini-batches for faster updates.
Momentum: Adds a fraction of the previous update to speed up convergence.
Adaptive Methods: (Adagrad, RMSProp, Adam) adjust learning rate dynamically.

4. Example: Gradient Descent for a Quadratic Function

Suppose we want to minimize:

f(x) = (x - 3)²

The derivative is f'(x) = 2(x - 3). Gradient Descent update rule:

x_new = x_old - η * f'(x)

where η is the learning rate.

5. Applications in Machine Learning

Regression models: Optimize weights to minimize error.
Classification: Optimize cross-entropy loss for better accuracy.
Deep Learning: Training large neural networks with millions of parameters.
Support Vector Machines (SVMs): Use optimization to maximize margin.
Reinforcement Learning: Optimize reward functions.

6. Interactive Gradient Descent Calculator

Minimize f(x) = (x - 3)² using Gradient Descent.

Initial x: Learning rate (η): Iterations: