In this lecture, we explore how optimization algorithms are practically implemented in modern deep learning frameworks like PyTorch and TensorFlow. Optimization is at the heart of training machine learning models, especially neural networks. These frameworks provide built-in optimizers that make it easy to experiment with different techniques.
import torch
import torch.nn as nn
import torch.optim as optim
# Sample data
x = torch.tensor([[1.0], [2.0], [3.0], [4.0]])
y = torch.tensor([[2.0], [4.0], [6.0], [8.0]])
# Model: y = wx + b
model = nn.Linear(1, 1)
# Loss function (Mean Squared Error)
criterion = nn.MSELoss()
# Optimizer: Stochastic Gradient Descent
optimizer = optim.SGD(model.parameters(), lr=0.01)
# Training loop
for epoch in range(100):
y_pred = model(x)
loss = criterion(y_pred, y)
optimizer.zero_grad() # Reset gradients
loss.backward() # Backpropagation
optimizer.step() # Update parameters
print("Learned parameters:", list(model.parameters()))
Key points:
optimizer.zero_grad() clears old gradients.loss.backward() computes gradients using backpropagation.optimizer.step() updates parameters.
import tensorflow as tf
# Sample data
x = tf.constant([[1.0], [2.0], [3.0], [4.0]])
y = tf.constant([[2.0], [4.0], [6.0], [8.0]])
# Model: y = wx + b
model = tf.keras.Sequential([tf.keras.layers.Dense(1)])
# Compile with optimizer and loss
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.01),
loss='mse')
# Train model
model.fit(x, y, epochs=100, verbose=0)
# Show learned weights
print("Learned parameters:", model.layers[0].get_weights())
Key points:
SGD, Adam,
RMSprop can be swapped easily.PyTorch and TensorFlow make optimization practical by handling gradient computation, parameter updates, and efficient GPU usage. Choosing the right optimizer and tuning learning rates can dramatically improve performance.