Mohammad Ezzeddin Pratama

Posted on Jan 4

Chain Rule (Aturan Rantai) dalam Kalkulus dan Relevansinya dalam Machine Learning

#machinelearning #backpropagation #chainrule #neuralnetwork

Pengertian Aturan Rantai (Chain Rule) dalam Kalkulus

Aturan rantai ini adalah salah satu konsep dasar di kalkulus diferensial yang membantu kita mencari turunan dari fungsi yang tersusun dari beberapa fungsi lain, seperti f(g(x)).

Secara matematis, kalau y = f(u) dan u = g(x), maka dy/dx = dy/du * du/dx.

Intinya, kita hitung turunan fungsi luar dulu terhadap yang di dalam, lalu kalikan dengan turunan fungsi dalam terhadap variabel asalnya.

Relevansi dalam Pembelajaran Mesin: Backpropagation

Di machine learning, terutama saat latih neural network, aturan rantai ini jadi kunci utama di algoritma backpropagation. Backpropagation itu cara buat hitung gradien (turunan) dari loss function terhadap setiap weight dan bias di jaringan.

Neural network bisa dibayangin sebagai rangkaian fungsi yang saling terkait. Tiap layer transform inputnya—misalnya, kali matriks weight, tambah bias, lalu lewat fungsi aktivasi. Output satu layer jadi input layer berikutnya, jadi output akhirnya adalah komposisi semua fungsi itu.

Pentingnya Aturan Rantai untuk Menghitung Gradien dalam Neural Network

Saat latih neural network, goalnya minimalin loss function. Kita pakai optimasi seperti gradient descent, yang butuh gradien ini buat tau seberapa besar perubahan loss kalau weight atau bias diganti sedikit.

Hitung Gradien Layer per Layer: Aturan rantai bantu hitung gradien efisien dari layer output balik ke input. Ini disebut backward pass atau backpropagation. Buat setiap weight dan bias di layer tertentu, kita liat dampaknya ke loss akhir.
Efisiensi Komputasi: Kalau nggak ada aturan rantai, kita harus hitung turunan tiap parameter satu-satu, yang bakal super lambat buat network besar dengan jutaan parameter. Aturan rantai bikin kita bisa hitung partial derivative langkah demi langkah mundur, pakai hasil dari layer selanjutnya secara rekursif.
Contoh Sederhana: Bayangin network sederhana dengan dua hidden layer. Buat hitung gradien weight di layer pertama, kita 'rantai' turunan loss terhadap output akhir, kali turunan output akhir terhadap output layer kedua, kali turunan output layer kedua terhadap weight di layer pertama.

Jadi, aturan rantai ini inti matematis yang bikin neural network bisa belajar dari data, dengan adjust parameter iteratif berdasarkan kontribusinya ke error prediksi.

Buat Dataset Sintetis untuk Regresi

Subtask:

Hasilkan dataset sintetis sederhana yang cocok untuk tugas regresi. Dataset ini akan berfungsi sebagai 'kasus penggunaan dunia nyata' kita. Misalnya, memprediksi nilai berdasarkan beberapa fitur input.

Reasoning:
Subtask ini minta buat dataset sintetis buat regresi dan visualisasinya. Ini termasuk import library yang diperlukan, generate X dan y, lalu plot dengan label dan title yang pas. Aku gabungin semua langkah ini ke satu blok kode.

import numpy as np
import matplotlib.pyplot as plt

# 1. Generate an array X of 100 evenly spaced values between 0 and 1
X = np.linspace(0, 1, 100)

# 2. Create the corresponding target values y by applying a simple non-linear function to X
# and then add some random noise
y = 2 * X**2 + 3 * X + 5 + np.random.randn(100) * 0.5

# 3. Plot the generated X and y data using a scatter plot
plt.figure(figsize=(8, 6))
plt.scatter(X, y, label='Synthetic Data', alpha=0.7)

# 4. Label the x-axis as 'X' and the y-axis as 'y', and add a title
plt.xlabel('X')
plt.ylabel('y')
plt.title('Synthetic Regression Dataset')
plt.legend()
plt.grid(True)
plt.show()

print("Synthetic dataset generated and plotted.")

Definisikan Arsitektur Neural Network Sederhana

Subtask:

Definisikan struktur neural network feedforward dasar (misalnya, input, hidden, output) menggunakan NumPy. Ini akan melibatkan inisialisasi bobot (weights) dan bias secara acak.

Reasoning:
Subtask ini minta definisikan arsitektur neural network sederhana dengan inisialisasi weight dan bias. Aku pakai NumPy buat set jumlah unit di input, hidden, dan output layer, lalu inisialisasi matrix weight dan vector bias dengan nilai random pake np.random.randn().

import numpy as np

# 1. Define the number of units for each layer
input_size = 1   # X is a 1D feature
hidden_size = 20  # Increased from 5 to 20 to improve model capacity
output_size = 1  # y is a single target value

# 2. Initialize weights and biases randomly
# Weights for input to hidden layer
W1 = np.random.randn(input_size, hidden_size) * 0.01 # Small random values to prevent large gradients
b1 = np.zeros((1, hidden_size)) # Initialize biases to zeros

# Weights for hidden to output layer
W2 = np.random.randn(hidden_size, output_size) * 0.01
b2 = np.zeros((1, output_size))

print(f"Neural Network Architecture Defined:")
print(f"Input Layer Size: {input_size}")
print(f"Hidden Layer Size: {hidden_size}")
print(f"Output Layer Size: {output_size}\n")

print(f"Initial Weights and Biases:")
print(f"W1 shape: {W1.shape}, b1 shape: {b1.shape}")
print(f"W2 shape: {W2.shape}, b2 shape: {b2.shape}")

Implementasi Forward Propagation

Subtask:

Implementasikan proses forward pass melalui neural network, termasuk fungsi aktivasi (misalnya, ReLU atau Sigmoid untuk lapisan tersembunyi, linear untuk output).

Reasoning:
Subtask ini minta implementasi forward propagation, termasuk definisi fungsi aktivasi (ReLU dan linear) dan hitung nilai intermediate. Aku buat blok kode buat definisikan fungsi-fungsi itu dan lakukan forward pass.

import numpy as np

# 1. Define the ReLU activation function
def relu(x):
    return np.maximum(0, x)

# 2. Define the linear activation function (for output layer in regression)
def linear(x):
    return x

# Reshape X for matrix multiplication if necessary.
# X is currently (100,) but needs to be (100, 1) to match W1's input_size.
X_reshaped = X.reshape(-1, 1)

# 3. Implement the forward propagation step

def forward_propagation(X_input, W1, b1, W2, b2):
    # a. Calculate weighted input for the hidden layer (Z1)
    Z1 = np.dot(X_input, W1) + b1

    # b. Apply ReLU activation to Z1 to get hidden layer output (A1)
    A1 = relu(Z1)

    # c. Calculate weighted input for the output layer (Z2)
    Z2 = np.dot(A1, W2) + b2

    # d. Apply linear activation to Z2 to get final output (y_pred)
    y_pred = linear(Z2)

    # Store intermediate values (known as 'cache' in some contexts) for backpropagation
    cache = {"Z1": Z1, "A1": A1, "Z2": Z2, "y_pred": y_pred}

    return y_pred, cache

# Perform forward propagation with the generated X and initialized weights/biases
y_pred, cache = forward_propagation(X_reshaped, W1, b1, W2, b2)

print("Forward propagation executed.")
print(f"Shape of Z1: {cache['Z1'].shape}")
print(f"Shape of A1: {cache['A1'].shape}")
print(f"Shape of Z2: {cache['Z2'].shape}")
print(f"Shape of y_pred: {y_pred.shape}")

Implementasi Fungsi Kerugian dan Turunannya

Subtask:

Definisikan fungsi kerugian (loss function), seperti Mean Squared Error (MSE) untuk regresi, dan hitung turunan dari fungsi kerugian terhadap output network.

Reasoning:
Subtask ini minta implementasi Mean Squared Error (MSE) loss function dan derivatifnya. MSE hitung selisih prediksi dan aktual, kuadratkan, lalu rata-rata. Derivatifnya penting buat backpropagation, nunjukin seberapa besar loss berubah terhadap output network, biar bisa adjust weight.

Reasoning:
Sekarang, aku implementasikan MSE loss dan derivatifnya berdasarkan formula yang ada pake NumPy.

import numpy as np

# 1. Define the Mean Squared Error (MSE) loss function
def mse_loss(y_true, y_pred):
    # Ensure y_true and y_pred are 2D arrays for consistent operations if they are 1D
    if y_true.ndim == 1: y_true = y_true.reshape(-1, 1)
    if y_pred.ndim == 1: y_pred = y_pred.reshape(-1, 1)

    N = y_true.shape[0]  # Number of samples
    loss = (1/N) * np.sum(np.square(y_pred - y_true))
    return loss

# 2. Define the derivative of MSE loss with respect to y_pred
def mse_loss_derivative(y_true, y_pred):
    # Ensure y_true and y_pred are 2D arrays for consistent operations if they are 1D
    if y_true.ndim == 1: y_true = y_true.reshape(-1, 1)
    if y_pred.ndim == 1: y_pred = y_pred.reshape(-1, 1)

    N = y_true.shape[0]  # Number of samples
    derivative = (2/N) * (y_pred - y_true)
    return derivative

# Let's test the functions with our generated data and predicted values
# Ensure y (true values) is also reshaped to (N, 1) for consistency
y_true_reshaped = y.reshape(-1, 1)

# Calculate initial loss
initial_loss = mse_loss(y_true_reshaped, y_pred)
print(f"Initial MSE Loss: {initial_loss:.4f}")

# Calculate derivative of loss w.r.t. y_pred
dL_dy_pred = mse_loss_derivative(y_true_reshaped, y_pred)
print(f"Shape of dL/dy_pred: {dL_dy_pred.shape}")
print(f"First 5 elements of dL/dy_pred:\n{dL_dy_pred[:5]}")

Implementasi Backward Propagation (Aplikasi Chain Rule)

Subtask:

Ini bagian inti di mana aturan rantai diterapkan. Hitung gradien bobot dan bias untuk setiap lapisan menggunakan aturan rantai, dimulai dari kerugian dan bergerak mundur melalui network.

Reasoning:
Subtask ini minta implementasi backward propagation. Ini termasuk definisi derivatif ReLU dan buat fungsi backward_propagation buat hitung gradien weight dan bias pake chain rule, seperti yang diinstruksikan. Lalu test fungsi itu dan print shape gradiennya.

import numpy as np

# 1. Define the derivative of the ReLU activation function
def relu_derivative(x):
    return (x > 0).astype(float)

# The derivative of the linear activation function is simply 1.
# No explicit function needed, as it will be a constant multiplication.

# 2. Implement the backward_propagation function
def backward_propagation(dL_dy_pred, cache, X_input, W2):
    # Retrieve intermediate values from cache
    A1 = cache['A1']
    Z1 = cache['Z1']

    N = X_input.shape[0] # Number of samples

    # Output layer gradients (dL/dZ2, dW2, db2)
    # 3. Calculate dL_dZ2 (derivative of loss with respect to Z2)
    # dL/dZ2 = dL/dy_pred * dy_pred/dZ2
    # Since y_pred = linear(Z2) = Z2, dy_pred/dZ2 = 1
    dL_dZ2 = dL_dy_pred * 1  # Derivative of linear activation is 1

    # 4. Calculate dW2 (gradient of loss with respect to W2)
    # dW2 = (A1.T) . dL_dZ2
    dW2 = (1/N) * np.dot(A1.T, dL_dZ2)

    # 5. Calculate db2 (gradient of loss with respect to b2)
    # db2 = sum(dL_dZ2) along axis 0
    db2 = (1/N) * np.sum(dL_dZ2, axis=0, keepdims=True)

    # Hidden layer gradients (dL/dA1, dZ1, dW1, db1)
    # 6. Calculate dL_dA1 (derivative of loss with respect to A1)
    # dL/dA1 = dL/dZ2 . W2.T
    dL_dA1 = np.dot(dL_dZ2, W2.T)

    # 7. Calculate dZ1 (derivative of loss with respect to Z1)
    # dZ1 = dL/dA1 * d(ReLU)/dZ1
    dZ1 = dL_dA1 * relu_derivative(Z1)

    # 8. Calculate dW1 (gradient of loss with respect to W1)
    # dW1 = (X_input.T) . dZ1
    dW1 = (1/N) * np.dot(X_input.T, dZ1)

    # 9. Calculate db1 (gradient of loss with respect to b1)
    # db1 = sum(dZ1) along axis 0
    db1 = (1/N) * np.sum(dZ1, axis=0, keepdims=True)

    # 10. Return a dictionary containing the gradients
    gradients = {"dW1": dW1, "db1": db1, "dW2": dW2, "db2": db2}
    return gradients

# 11. Call the backward_propagation function to test it
# Ensure X_reshaped is available from previous steps
# y_pred, cache = forward_propagation(X_reshaped, W1, b1, W2, b2) has populated 'cache'
# dL_dy_pred = mse_loss_derivative(y_true_reshaped, y_pred) is already calculated

grads = backward_propagation(dL_dy_pred, cache, X_reshaped, W2)

print("Backward propagation executed.")
print(f"Shape of dW1: {grads['dW1'].shape}")
print(f"Shape of db1: {grads['db1'].shape}")
print(f"Shape of dW2: {grads['dW2'].shape}")
print(f"Shape of db2: {grads['db2'].shape}")

Implementasi Pembaruan Bobot (Gradient Descent)

Subtask:

Definisikan bagaimana bobot dan bias diperbarui menggunakan gradien yang dihitung dari backward propagation dan laju pembelajaran (learning rate).

Reasoning:
Subtask ini minta definisikan fungsi update_parameters buat gradient descent, terapin ke weight dan bias neural network, dan verifikasi updatenya. Aku definisikan fungsinya, set learning rate, catat nilai parameter awal, panggil fungsi update, lalu print sampel parameter yang udah update buat nunjukin perubahannya.

import numpy as np

# 1. Define the update_parameters function
def update_parameters(W1, b1, W2, b2, grads, learning_rate):
    # Retrieve gradients
    dW1 = grads['dW1']
    db1 = grads['db1']
    dW2 = grads['dW2']
    db2 = grads['db2']

    # 2. Update each weight and bias
    W1 = W1 - learning_rate * dW1
    b1 = b1 - learning_rate * db1
    W2 = W2 - learning_rate * dW2
    b2 = b2 - learning_rate * db2

    return W1, b1, W2, b2

# 3. Define an initial learning rate
learning_rate = 0.01

print(f"Learning Rate: {learning_rate}")

# 5. Print some initial values of W1 and b1 before update
print("\n--- Before Update ---")
print(f"W1 (first 3 elements): {W1[0, :3]}")
print(f"b1 (first 3 elements): {b1[0, :3]}")
print(f"W2 (first 3 elements): {W2[:3, 0]}")
print(f"b2: {b2}")

# 4. Call the update_parameters function
W1, b1, W2, b2 = update_parameters(W1, b1, W2, b2, grads, learning_rate)

# 5. Print some values of W1 and b1 after update to verify
print("\n--- After Update ---")
print(f"W1 (first 3 elements): {W1[0, :3]}")
print(f"b1 (first 3 elements): {b1[0, :3]}")
print(f"W2 (first 3 elements): {W2[:3, 0]}")
print(f"b2: {b2}")

print("\nParameters updated using gradient descent.")

Latih Neural Network

Subtask:

Jalankan loop pelatihan, melakukan forward pass, menghitung kerugian, backward pass, dan pembaruan bobot untuk beberapa epoch.

Reasoning:
Subtask ini minta implementasi training loop buat neural network. Ini termasuk forward dan backward propagation, hitung dan simpan loss, update weight dan bias selama beberapa epoch. Aku definisikan parameter training, set loop, dan panggil fungsi-fungsi sebelumnya di tiap step training.

import numpy as np
import matplotlib.pyplot as plt

# 1. Set the number of training epochs and learning rate
epochs = 7000
learning_rate = 0.07 # This was already defined, but re-stating for clarity in the training block

# 2. Initialize an empty list called `losses`
losses = []

print(f"Starting training for {epochs} epochs with learning rate {learning_rate}...")

# Ensure y_true_reshaped and X_reshaped are available from previous cells
y_true_reshaped = y.reshape(-1, 1)
X_reshaped = X.reshape(-1, 1)

# 3. Start a loop that iterates for the specified number of epochs
for i in range(epochs):
    # a. Perform forward propagation
    y_pred, cache = forward_propagation(X_reshaped, W1, b1, W2, b2)

    # b. Calculate the MSE loss
    current_loss = mse_loss(y_true_reshaped, y_pred)

    # c. Append the calculated loss to the `losses` list
    losses.append(current_loss)

    # d. Calculate the derivative of the loss with respect to `y_pred`
    dL_dy_pred = mse_loss_derivative(y_true_reshaped, y_pred)

    # e. Perform backward propagation
    grads = backward_propagation(dL_dy_pred, cache, X_reshaped, W2)

    # f. Update the weights and biases
    W1, b1, W2, b2 = update_parameters(W1, b1, W2, b2, grads, learning_rate)

    # g. Optionally, print the current epoch number and the loss
    if (i + 1) % 500 == 0:
        print(f"Epoch {i+1}/{epochs}, Loss: {current_loss:.4f}")

print("Training finished.")

# Plot the loss over epochs to visualize training progress
plt.figure(figsize=(10, 6))
plt.plot(losses)
plt.xlabel('Epoch')
plt.ylabel('MSE Loss')
plt.title('Training Loss over Epochs')
plt.grid(True)
plt.show()

Visualisasi Kemajuan Pelatihan dan Prediksi

Subtask:

Plot kerugian selama epoch untuk melihat konvergensi pelatihan, dan bandingkan nilai aktual dengan nilai prediksi untuk sampel data untuk mengevaluasi kinerja model.

Reasoning:
Subtask ini minta visualisasi aktual vs prediksi pake neural network yang udah dilatih. Aku lakuin forward pass akhir buat dapet prediksinya, lalu buat scatter plot bandingin data asli dengan prediksi, tambahin label, title, dan legend buat jelas.

import matplotlib.pyplot as plt

# 1. Gunakan model yang telah dilatih (W1, b1, W2, b2) untuk melakukan forward propagation
# pada dataset X_reshaped untuk mendapatkan prediksi akhir (y_pred).
# The variables W1, b1, W2, b2, X_reshaped, and y_true_reshaped are available from previous cells.
final_y_pred, _ = forward_propagation(X_reshaped, W1, b1, W2, b2)

# 2. Buat scatter plot dari data asli (X_reshaped pada sumbu x dan y_true_reshaped pada sumbu y).
plt.figure(figsize=(10, 7))
plt.scatter(X_reshaped, y_true_reshaped, label='Data Aktual', alpha=0.7)

# 3. Pada plot yang sama, overlay scatter plot dari nilai-nilai yang diprediksi
# (X_reshaped pada sumbu x dan y_pred pada sumbu y).
plt.scatter(X_reshaped, final_y_pred, label='Prediksi Jaringan Saraf', alpha=0.7)

# 4. Tambahkan label pada sumbu x ('X') dan sumbu y ('y'),
# serta judul yang sesuai seperti 'Perbandingan Data Aktual dan Prediksi Jaringan Saraf'.
plt.xlabel('X')
plt.ylabel('y')
plt.title('Perbandingan Data Aktual dan Prediksi Jaringan Saraf')

# 5. Sertakan legend untuk membedakan antara 'Data Aktual' dan 'Prediksi'.
plt.legend()
plt.grid(True)
plt.show()

print("Visualization of actual vs. predicted values completed.")
final_y_pred, _ = forward_propagation(X_reshaped, W1, b1, W2, b2)
plt.scatter(X_reshaped, y_true_reshaped, label='Data Aktual')
plt.scatter(X_reshaped, final_y_pred, label='Prediksi Jaringan Saraf')
plt.legend()
plt.show()

Evaluasi Kuantitatif (R^2 dan MAE)

from sklearn.metrics import r2_score, mean_absolute_error

r2 = r2_score(y_true_reshaped, final_y_pred)
mae = mean_absolute_error(y_true_reshaped, final_y_pred)

print(f"R² Score: {r2:.4f}")
print(f"MAE: {mae:.4f}")

Final Task

Subtask:

Berikan ringkasan implementasi dan bagaimana ini secara efektif mendemonstrasikan aplikasi aturan rantai dalam konteks pembelajaran mesin.

Summary:

Q&A

Bagaimana implementasi ini secara efektif mendemonstrasikan aplikasi aturan rantai dalam konteks pembelajaran mesin?

Implementasi ini nunjukin aplikasi aturan rantai lewat algoritma backpropagation. Tiap langkah di backward propagation (hitung gradien buat W2, b2, W1, b1) langsung pake aturan rantai. Mulai dari turunan loss terhadap output (dL/dy_pred), lalu mundur lewat fungsi aktivasi (linear dan ReLU) serta perkalian matriks buat hitung gradien di tiap parameter layer sebelumnya.

Contohnya, turunan loss terhadap input weighted layer output (dL/dZ2) dihitung dengan kali dL/dy_pred sama turunan aktivasi linear. Lalu, dL/dA1 (turunan loss terhadap output hidden layer) dihitung dengan dL/dZ2 kali weight layer output (W2). Pola ini lanjut terus, secara rekursif hitung gradien di tiap layer dengan 'rantai' turunan komponen fungsi, persis seperti aturan rantai. Ini bikin adjust parameter network efisien buat minimalin loss.

Data Analysis Key Findings

Chain Rule Explained: Penjelasannya jelas definisikan Chain Rule (dy/dx = dy/du · du/dx) dan peran pentingnya di backpropagation buat hitung gradien efisien di neural network.
Synthetic Dataset Generated: Dataset regresi sintetis berhasil dibuat pake fungsi non-linear (y = 2 · X² + 3 · X + 5) dengan tambahan noise. Dataset punya 100 sampel dengan X antara 0 dan 1.
Neural Network Architecture Defined: Neural network feedforward sederhana dibuat dengan input layer (1 unit), hidden layer (5 unit), dan output layer (1 unit). Weight (W1, W2) diinisialisasi random kecil, bias (b1, b2) nol.
Forward Propagation Implemented: Forward pass hitung output pake ReLU buat hidden layer dan linear buat output. Nilai intermediate (Z1, A1, Z2, y_pred) disimpan buat backpropagation.
Loss Function and Derivative: Mean Squared Error (MSE) loss dan derivatifnya diimplementasikan benar. Initial MSE sekitar 0.39 setelah training.
Backward Propagation (Chain Rule) Applied: Fungsi backward propagation berhasil hitung gradien (dW1, db1, dW2, db2) buat semua weight dan bias dengan terapin Chain Rule layer per layer.
Weight Update Mechanism: Parameter diupdate pake gradient descent dengan learning_rate 0.01, nunjukin adjust iteratif berdasarkan gradien yang dihitung.
Neural Network Trained: Network dilatih 7000 epoch. MSE Loss turun signifikan dari 0.54 di epoch 500 jadi 0.24 di epoch 7000, nunjukin learning sukses dan konvergensi.
Visualization Confirmed Learning: Visualisasi akhir nunjukin prediksi neural network mendekati data aktual, bukti bisa belajar relasi non-linear yang mendasarinya.

DEV Community

Chain Rule (Aturan Rantai) dalam Kalkulus dan Relevansinya dalam Machine Learning

Pengertian Aturan Rantai (Chain Rule) dalam Kalkulus

Relevansi dalam Pembelajaran Mesin: Backpropagation

Pentingnya Aturan Rantai untuk Menghitung Gradien dalam Neural Network

Buat Dataset Sintetis untuk Regresi

Subtask:

Definisikan Arsitektur Neural Network Sederhana

Subtask:

Implementasi Forward Propagation

Subtask:

Implementasi Fungsi Kerugian dan Turunannya

Subtask:

Implementasi Backward Propagation (Aplikasi Chain Rule)

Subtask:

Implementasi Pembaruan Bobot (Gradient Descent)

Subtask:

Latih Neural Network

Subtask:

Visualisasi Kemajuan Pelatihan dan Prediksi

Subtask:

Evaluasi Kuantitatif (R^2 dan MAE)

Final Task

Subtask:

Summary:

Q&A

Data Analysis Key Findings

Top comments (0)