Introduction
After grasping the concepts of linear regression and its optimization technique, gradient descent, in the previous article, here's an opportunity to dive deeper into gradient descent for a more comprehensive understanding.
Types of Gradient Descent
The gradient descent methods can be broadly categorized into three primary types.
Batch Gradient Descent
In this optimization technique, the entire dataset is used to compute the gradient of the cost function. Essentially, it involves evaluating the loss and updating model parameters once per epoch (a complete pass through the dataset). Batch Gradient Descent is computationally efficient for small to medium-sized datasets but can be slow for large datasets.
Stochastic Gradient Descent, SGD
Unlike Batch Gradient Descent, SGD processes one training example at a time to calculate the gradient. This results in frequent updates to the model parameters and introduces more randomness into the optimization process. While it can be faster and can escape local minima, it may exhibit more oscillations in convergence due to the noise from individual data points.
Mini-batch Gradient Descent
Mini-batch Gradient Descent strikes a balance between Batch and Stochastic Gradient Descent. It divides the dataset into smaller subsets called mini-batches. The gradient is calculated and model parameters are updated after processing each mini-batch. This approach combines some benefits of both previous methods: it's computationally efficient and introduces some noise for faster convergence.
Implementation of Gradient Descent
For the scope of this article, our primary focus will be on the batch gradient descent method. Let's consider a simple example to illustrate the gradient descent method. We will use the following cost function:
The partial derivatives of with respect to and are:
Here, we will minimize the cost function using the gradient descent method. We will start with an initial point .
To begin, let's import the required libraries and define the problem we aim to solve.
pip install numpy==1.23.5 matplotlib==3.7.4
import numpy as np
def f(solution: np.ndarray) -> float:
"""The function to minimize.
:param solution: The solution to the function.
:return: The value of the function.
"""
x, y = solution # Unpack the solution
return 3 * x ** 2 - 2 * x * y + 3 * y ** 2 + 5 * x - 5 * y
def df(solution: np.ndarray) -> np.ndarray:
"""The derivative of the function.
:param solution: The solution to the function.
:return: The gradient of the function.
"""
x, y = solution # Unpack the solution
return np.array([6 * x - 2 * y + 5, -2 * x + 6 * y - 5])
Subsequently, we will proceed to implement the gradient descent method without relying on existing libraries.
from typing import Callable
class GradientDescent:
"""Gradient Descent Method."""
def __init__(self, f: Callable, df: Callable, alpha: float = 0.01, eps: float = 1e-6) -> None:
"""Initialize the gradient descent method.
:param f: The function to minimize.
:param df: The derivative of the function.
:param alpha: The learning rate.
:param eps: The convergence criterion.
"""
self.f = f
self.df = df
self.alpha = alpha
self.eps = eps
self.solutions = [] # Store the solutions (parameters) at each iteration
self.answers = [] # Store the value of the function at each iteration
self.gradients = [] # Store the gradient of the function at each iteration
def solve(self) -> None:
"""Solve the optimization problem."""
self.solutions = [] # Empty the solutions
self.answers = [] # Empty the value of the function
self.gradients = [] # Empty the gradient of the function
solution = np.array([1.0, 1.0]) # Initial solution
answer = self.f(solution) # Value of the function at the initial solution
grad = self.df(solution) # Gradient of the function at the initial solution
self.solutions.append(solution)
self.answers = [answer]
self.gradients.append(grad)
# Iterate until the gradient is close to zero
while (grad ** 2).sum() > self.eps ** 2:
solution = solution - self.alpha * grad # Update the solution
answer = self.f(solution) # Value of the function at the updated solution
grad = self.df(solution) # Gradient of the function at the updated solution
self.solutions.append(solution)
self.answers.append(answer)
self.gradients.append(grad)
self.solutions = np.array(self.solutions)
self.answers = np.array(self.answers)
self.gradients = np.array(self.gradients)
With the implementation described above, we can address the optimization problem as follows while also visualizing the optimization process.
problem = GradientDescent(f, df)
problem.solve()
import matplotlib.pyplot as plt
plt.scatter(problem.solutions[0, 0], problem.solutions[0, 1], color="k", marker="o", label="Initial Solution")
plt.plot(problem.solutions[:, 0], problem.solutions[:, 1], color="k", linewidth=1.5)
xs = np.linspace(-2.5, 1.5, 100)
ys = np.linspace(-1.5, 2.5, 100)
xmesh, ymesh = np.meshgrid(xs, ys)
z = np.concatenate([xmesh.reshape(1, -1), ymesh.reshape(1, -1)], axis=0)
levels = [-3, -2.8, -2.6, -2.4, -2.2, -2, -1, 0, 1, 2, 3, 4]
plt.contour(xs, ys, f(z).reshape(xmesh.shape), levels=levels, colors="k", linestyles="dotted")
plt.show()
The problem.solutions attribute will contain the solutions at each iteration, while the problem.answers attribute will contain the value of the function at each iteration. We can visualize the convergence of the gradient descent method using the following code.
fig = plt.figure(figsize=(15, 5))
ax = fig.add_subplot(2, 2, 1)
ax.set_title("Gradient (x)")
ax.set_ylabel("Gradient")
ax.set_xlabel("Iteration")
ax.plot(np.arange(len(problem.gradients)), problem.gradients[:, 0], color="b")
ax = fig.add_subplot(2, 2, 2)
ax.set_title("Gradient (y)")
ax.set_ylabel("Gradient")
ax.set_xlabel("Iteration")
ax.plot(np.arange(len(problem.gradients)), problem.gradients[:, 1], color="r")
ax = fig.add_subplot(2, 2, 3)
ax.set_title("Answer")
ax.set_ylabel("Value")
ax.set_xlabel("Iteration")
ax.plot(np.arange(len(problem.answers)), problem.answers, color="r")
plt.show()
Top comments (0)