Lecture 14: Optimization for ML in Practice

Lecture 14: Optimization for Machine Learning in Practice

In this lecture, we explore how optimization algorithms are practically implemented in modern deep learning frameworks like PyTorch and TensorFlow. Optimization is at the heart of training machine learning models, especially neural networks. These frameworks provide built-in optimizers that make it easy to experiment with different techniques.

1. Why Optimization Matters?

Finds the best parameters (weights & biases) for a model.
Minimizes a loss function (e.g., MSE, cross-entropy).
Determines how fast and how well the model learns.

Example: In a linear regression model, optimization finds the line that best fits the data by minimizing the squared error between predictions and actual values.

2. Optimization in PyTorch

Example: Linear Regression with SGD

import torch
import torch.nn as nn
import torch.optim as optim

# Sample data
x = torch.tensor([[1.0], [2.0], [3.0], [4.0]])
y = torch.tensor([[2.0], [4.0], [6.0], [8.0]])

# Model: y = wx + b
model = nn.Linear(1, 1)

# Loss function (Mean Squared Error)
criterion = nn.MSELoss()

# Optimizer: Stochastic Gradient Descent
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training loop
for epoch in range(100):
    y_pred = model(x)
    loss = criterion(y_pred, y)

    optimizer.zero_grad()   # Reset gradients
    loss.backward()         # Backpropagation
    optimizer.step()        # Update parameters

print("Learned parameters:", list(model.parameters()))

Key points:

optimizer.zero_grad() clears old gradients.
loss.backward() computes gradients using backpropagation.
optimizer.step() updates parameters.

3. Optimization in TensorFlow / Keras

Example: Linear Regression with Adam

import tensorflow as tf

# Sample data
x = tf.constant([[1.0], [2.0], [3.0], [4.0]])
y = tf.constant([[2.0], [4.0], [6.0], [8.0]])

# Model: y = wx + b
model = tf.keras.Sequential([tf.keras.layers.Dense(1)])

# Compile with optimizer and loss
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.01),
              loss='mse')

# Train model
model.fit(x, y, epochs=100, verbose=0)

# Show learned weights
print("Learned parameters:", model.layers[0].get_weights())

Key points:

Keras makes optimization more automated than PyTorch.
Different optimizers like SGD, Adam, RMSprop can be swapped easily.

4. Popular Optimizers in ML Frameworks

SGD – Simple and widely used, requires careful tuning of learning rate.
Momentum – Speeds up SGD by damping oscillations.
Adam – Adaptive learning rates, works well in practice.
RMSprop – Scales learning rates based on recent gradients.
AdaGrad – Adjusts learning rate per parameter.

5. Applications in ML

Training deep neural networks for image recognition.
Natural language processing tasks like translation.
Reinforcement learning where agents optimize policies.
Large-scale recommendation systems.

6. Summary

PyTorch and TensorFlow make optimization practical by handling gradient computation, parameter updates, and efficient GPU usage. Choosing the right optimizer and tuning learning rates can dramatically improve performance.