DEV Community: HoangNg

Some key concepts when working with AWS VPC (Virtual Private Cloud)

HoangNg — Wed, 19 Feb 2025 15:07:50 +0000

1. VPC (Virtual Private Cloud)
A VPC is like a company building. Inside this building, you will have different sections (subnets) where company workers work (e.g., EC2 instances, Lambda functions, etc.).

A VPC is critically important because It provides security and keeps all your resources organized in one isolated place.

Example code to create a VCP using terraform

resource "aws_vpc" "example_vpc" {
  cidr_block = "10.0.0.0/16"  # The address range for your 'company' (65,536 IPs)

  tags = {
    Name = "example-vpc"  # Name of the company building,
    # add more tags if needed
  }
}

2. Subnets
A subnet is like a specific section inside the company building. There are public and private subsets inside a VPC.

Public Subnet: A subnet is public if it has a route to an Internet Gateway (IGW). This means resources inside the subnet can be accessed from the internet if security measures allow it to do so.

Private Subnet: A subnet is private if it does not have a direct route to an IGW. Resources inside this subnet cannot be accessed from the internet directly.

Example code to create Public and Private Subnets

resource "aws_subnet" "public_subnet" {
  vpc_id                  = aws_vpc.example_vpc.id
  cidr_block              = "10.0.1.0/24"  # This is a smaller section in the building (256 rooms)
  map_public_ip_on_launch = true  # public access

  tags = {
    Name = "public-subnet"  # Name of the public section
    # add more tags if needed
  }
}

resource "aws_subnet" "private_subnet" {
  vpc_id     = aws_vpc.example_vpc.id
  cidr_block = "10.0.2.0/24"  # Another smaller section in the building (256 rooms)
  map_public_ip_on_launch = false # private access

  tags = {
    Name = "private-subnet"  # Name of the private section
    # add more tags if needed
  }
}

3. Internet Gateway (IGW)
An Internet Gateway is like the main door to your company building. It allows certain sections to access the outside world (the internet).

Example code to create an Internet Gateway

resource "aws_internet_gateway" "gw" {
  vpc_id = aws_vpc.example_vpc.id  # Attach the door to the company building

  tags = {
    Name = "main-igw"  # Name of the main door
    # more tags if needed
  }
}

4. Route Tables
A Route Table defines how traffic is routed in a subnet. It is like the internal map of the company building. It tells people how to get from one section to another (or outside the building).

If the map doesn’t exist, employees won’t know how to get in and out of the building.

Example code to create Route Tables

# Public Route Table
resource "aws_route_table" "public_rt" {
  vpc_id = aws_vpc.example_vpc.id

  route {
    cidr_block = "0.0.0.0/0"  # Go to the outside world, need serious consideration when setting this line
    gateway_id = aws_internet_gateway.gw.id  # Use the main door (IGW)
  }

  tags = {
    Name = "public-route-table"  # Name of the map for public section
  }
}

# Associating Route Table with Public Subnet
resource "aws_route_table_association" "public_assoc" {
  subnet_id      = aws_subnet.public_subnet.id
  route_table_id = aws_route_table.public_rt.id
}

5. NAT Gateway
A NAT Gateway allows resources in private subnets to access the internet while remaining isolated from incoming traffic. It is like a secret exit that allows people in the private section to go out to the internet, but not receive visitors.

It helps private sections access the internet securely (for updates or external services), without being exposed to direct external access.

Example code to create a NAT Gateway

# Elastic IP for NAT Gateway
resource "aws_eip" "nat_eip" {}

# NAT Gateway
resource "aws_nat_gateway" "nat" {
  subnet_id     = aws_subnet.public_subnet.id
  allocation_id = aws_eip.nat_eip.id  # The secret exit uses an Elastic IP

  tags = {
    Name = "nat-gateway"  # Name of the secret exit
  }
}

6. Security Groups (SG)
Security Groups are virtual firewalls for controlling inbound and outbound traffic to your instances. They can be thought of as security guards at the entrance of each section of your building. They decide who gets in and who doesn’t based on the rules (ports, IPs).

Example code to create Security Group

resource "aws_security_group" "web_sg" {
  vpc_id = aws_vpc.example_vpc.id

  ingress {
    from_port   = 22  # SSH (Security guard allows SSH access)
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]  # Allow SSH from anywhere (could be restricted for security)
  }

  ingress {
    from_port   = 80  # HTTP (Security guard allows web traffic)
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0  # Allow all outgoing traffic
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "web-security-group"
  }
}

Fundamentals of MongoDB (part 2) - optimization

HoangNg — Fri, 05 Jul 2024 15:02:18 +0000

In this article, I present an experiment using one query for two different designs in MongoDB, with one having a significant reduction in the workload compared to the other. In other words, the query is exactly the same; however, using some optimizing techniques in the database can make a significant difference. The system went from scanning 1,000,000 documents to only 2651 documents to complete the same task. The article also notes that understanding the business use case is the key.

To replicate this experiment, some basic knowledge is required to install MongoDB tools such as MongoDB Shell or MongoDB Compass to connect your local machine to your MongoDB, or it is fine to use MongoDB locally.

Data simulation

First of all, we need some data. Therefore, I have some code to simulate a collection of 1,000,000 customers, as shown below. After running the simulation code, it creates a collection named "customers" containing 1,000,000 documents with the fields of name, age, address, email, phone number, and profession.

// create a function to generate random customers
function generateRandomCustomer() {
  const specificNames = ["A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8"];
  const professions = ["B1", "B2", "B3", "B4", "B5", "B6", "B7", "B8"];

  const randomName = specificNames[Math.floor(Math.random() * specificNames.length)];
  const randomAge = Math.floor(Math.random() * 50 + 20);
  const randomAddress = "Address" + Math.floor(Math.random() * 1000);
  const randomEmail = randomName.toLowerCase() + "@email.com";
  const randomPhoneNumber = "123-456-789";
  const randomProfession = professions[Math.floor(Math.random() * professions.length)];

  return {
    name: randomName,
    age: randomAge,
    address: randomAddress,
    email: randomEmail,
    phoneNumber: randomPhoneNumber,
    profession: randomProfession
  };
}

// simulate 1000000 customers
const countDocuments = 1000000
const randomCustomers = [];

for (let i = 0; i < countDocuments; i++) {
  randomCustomers.push(generateRandomCustomer());
}

// insert the customers to the database using a batch size of 10000
const batchSize = 10000;

for (let i = 0; i < randomCustomers.length; i += batchSize) {
  const batch = randomCustomers.slice(i, i + batchSize);
  db.customers.insertMany(batch);
}

There is one note about the batch size in the simulation code. To insert a large number of data into the database, we need to divide the whole dataset into smaller batches for efficiency, and I chose a batch size of 10,000 for this experiment.

Overview of the data

Use the following command in MongoDB Shell to view the example of one document.

db.customers.findOne()
{
  _id: ObjectId('6684b910294f269e10fd8498'),
  name: 'A7',
  age: 29,
  address: 'Address438',
  email: 'a7@email.com',
  phoneNumber: '123-456-789',
  profession: 'B3'
}

Check the number of documents

db.customers.countDocuments()
1000000

We are sure that we have a collection of 1,000,000 customers with their information.

Experiment

Imagine if I need to find a customer named "A1" and the age of 36. In a real-world use case, it could be a task to query all transactions, given the customer's name and the date.

The query is as follows.

db.customers.find({name:"A1", age:36})

Query output

Atlas atlas-4gwxni-shard-0 [primary] justatest> db.customers.find({name:"A1", age:36})
[
  {
    _id: ObjectId('6684b910294f269e10fd84aa'),
    name: 'A1',
    age: 36,
    address: 'Address534',
    email: 'a1@email.com',
    phoneNumber: '123-456-789',
    profession: 'B5'
  },

...

  {
    _id: ObjectId('6684b910294f269e10fd85ff'),
    name: 'A1',
    age: 36,
    address: 'Address802',
    email: 'a1@email.com',
    phoneNumber: '123-456-789',
    profession: 'B3'
  },

In my experiment, it returns 2561 results. It is noted that the simulation is random; therefore, the number of results could be different between different trials.

Cost examination

Now, it comes to the interesting part when I check the workload the system needs to do to implement the query. To do that, I use the command "explain()" as shown below.

db.customers.find({name:"A1", age:36}).explain("executionStats");

The output is lengthy. Therefore, I only focus on the important lines.

totalKeysExamined: 0,
totalDocsExamined: 1000000,
    executionStages: {
      stage: 'COLLSCAN',
      filter: {
        '$and': [ { age: { '$eq': 36 } }, { name: { '$eq': 'A1' } } ]
      },
      nReturned: 2561,
      executionTimeMillisEstimate: 455,

The output above shows that the system must go through 1,000,000 documents to complete the work (i.e., returning 2561 results). The reason is that the algorithm used to implement the query is "COLLSCAN." It simply means that the system must scan the entire collection to complete the work.

Another way to view the cost is to click the "explain" button in the MongoDB Compass.

Database optimization

The workload seems to be heavy. Introducing indexes to the database is a commonly used technique to reduce the workload. We can do that using the command "createIndex()".

db.customers.createIndex({name: 1, age: 1}, {name: "IDX_NAME_AGE"});
IDX_NAME_AGE

What the command does is it creates indexes for the fields of name and age. We can check the indexes as follows.

db.customers.getIndexes();
[
  { v: 2, key: { _id: 1 }, name: '_id_' },
  { v: 2, key: { name: 1, age: 1 }, name: 'IDX_NAME_AGE' }
]

Querying after optimization

The next step is to use the "explain" command to check the workload required to complete the same query.

db.customers.find({name:"A1", age:36}).explain("executionStats");

Output

winningPlan: {
      stage: 'FETCH',
      inputStage: {
        stage: 'IXSCAN',

...

 totalKeysExamined: 2561,
    totalDocsExamined: 2561,
    executionStages: {
      ...
      nReturned: 2561,
      executionTimeMillisEstimate: 6,

The output demonstrates that the workload gets significantly reduced. The system went from scanning the entire 1,000,000 documents to only scanning 2561 documents. It might be observed that the execution time drops significantly; however, the execution time can vary a lot depending on the physical machine. Therefore, I only want to draw your attention to the workload.

Important notes

As presented above, using indexes can be very helpful. However, the mechanism behind introducing indexes is that the database creates a smaller table and sorts all the data into order. The simple imagination is that you have a line of 1 million people of different ages and names. The task is to find people with names such as "A1" and the age of 36. The only way to complete the task is to come over and ask everyone making up the workload to ask 1,000,000 times. However, if you have sorted the people in the line by their names and ages, it becomes much easier to find the expected ones.

The database creates a smaller table with sorted information. It physically takes up some space in your storage; creating redundant indexes might not be considered a good practice. It, therefore, highlights the importance of understanding the business. Certain fields might be used for querying much more often than others in a specific use case of a business. Those often-used data must be prioritized when designing database configuration, while the not-frequently-used ones can be treated differently.

There is still much more to mention about indexing, partitioning and other techniques and strategies for database optimization. I believe that understanding those things would be significantly beneficial for developers.

Thank you for reading this far!
Have a nice day
Hoang
P/S: I will come back to this topic for another experiment.

Code a Neural Network from scratch to solve the binary MNIST problem

HoangNg — Wed, 15 May 2024 03:31:46 +0000

Introduction

This article provides the development of a 3-layer Neural Network (NN) from scratch (i.e., only using Numpy) for solving the binary MNIST dataset. This project offers a practical guide to the foundational aspects of deep learning and the architecture of neural networks. It primarily concentrates on building the network from the ground up (i.e., the mathematics running under the hood of NNs). It is noted that this project is an extension of a project titled "Code a 2-layer Neural Network from Scratch," where I explained in detail the mathematics behind the senses of a neural network (see this article for more details). In other words, solving the binary MNIST can be considered a from-scratch neural network use case.

Load MNIST dataset

Once the helper files are available in AWS SageMaker, we use pre-defined functions to load the MNIST dataset.

from utils_data import *
download_and_save_MNIST(path="data/")

The purpose of this experiment is to handle the binary MNIST only. Therefore, we need a function to load the binary MNIST of 0 and 1 only (i.e., other MNIST digits from 2 to 9 are out of scope in this example).

We applied those functions to load the binary MNIST dataset.

X_train_org, Y_train_org, X_test_org, Y_test_org = load_mnist()
X_train_org, Y_train_org, X_test_org, Y_test_org = load_binary_mnist(X_train_org, Y_train_org, X_test_org, Y_test_org)

Data visualization

visualize_multi_images(X_train_org, Y_train_org, layout=(3, 3), figsize=(10, 10), fontsize=12)

I store data in an AWS S3 bucket for later use if needed.

key = "data/mnist.npz"
bucket_url = "s3://{}/{}".format(BUCKET_NAME, key)
boto3.Session().resource("s3").Bucket(BUCKET_NAME).Object(key).upload_file("data/mnist.npz")

Data preparation for model training

We applied a helper function to prepare the binary MNIST for training the Neural Network.

X_train, X_test, Y_train, Y_test = make_inputs(X_train_org, X_test_org, Y_train_org, Y_test_org)

Build a Neural Network for solving binary MNIST

To build a Neural Network, we must define helper functions as the building blocks for constructing the architecture. I will not list those functions here because they will make this writing unnecessarily long. I only present the construction of the Neural Network. For more details regarding helper functions and components of nn_Llayers_binary(), please see utils_binary.py in this repository or refer to this article for more details).

Setup the hyperparameters

layer_dims = [784, 128, 64, 1]
learning_rate = 0.01
number_iterations = 250

Train the Neural Network

from utils_binary import *

parameters, costs, time = nn_Llayers_binary(X_train, Y_train, layer_dims, learning_rate, number_iterations, print_cost=False)

Compute accuracy on train and test datasets

Yhat_train = predict_binary(X_train, Y_train, parameters)
train_accuracy = compute_accuracy(Yhat_train, Y_train)

Yhat_test = predict_binary(X_test, Y_test, parameters)
test_accuracy = compute_accuracy(Yhat_test, Y_test)

print(f"Train accuracy: {train_accuracy} %")
print(f"Test accuracy: {test_accuracy} %")

The accuracy output

Train accuracy: 99.65 %
Test accuracy: 99.81 %

Given that the MNIST dataset is not difficult, using only binary MNIST to distinguish between 0 and 1 makes this task even simpler. Therefore, it is no surprise to see such high accuracy on both the train and test datasets, even though the solution presented in this experiment is not an advanced neural network.

There are some visualizations of the misclassified cases.

Summary

This repository could make a great introductory project for those new to artificial intelligence, machine learning, and deep learning. Experimenting with this simple neural network taught me many basic principles that operate behind the scenes.

Fundamentals of MongoDB (part 1) - architecture

HoangNg — Thu, 11 Apr 2024 00:56:09 +0000

What is MongoDB

MongoDB is a document-oriented NoSQL database that stores data in flexible JSON-like documents. This flexibility, along with its scalability and query capabilities, makes it a popular choice for modern applications dealing with diverse and rapidly changing data.

Relational Database Management System vs MongoDB

It would be highly transferable to MongoDB if you have already worked with a relational database management system (RDBMS) before. The diagram below presents concepts often used in RDBMS and their counterparts in MongoDB.

Deployment Architecture

There are three options for deploying a MongoDB: standalone, replication, and sharding architectures.

1) Standalone architecture

There's only one server (i.e., standalone).

Advantages:
Simplicity, lower resource requirements and faster startup time.

Disadvantages:
No high availability, limited scalability and no fault tolerance. More specifically, if the server hosting the database malfunctions or experiences data corruption, you could lose all your data. There's no built-in mechanism for data redundancy or recovery.

2) Replication architecture

Data is replicated from the primary server across multiple secondary servers. If the primary server in the set fails, another member (secondary) can be automatically elected and promoted to become the primary, minimizing downtime and ensuring data remains accessible.

Advantages:
High availability, improved read scalability (i.e., data can be read from secondary servers, reducing bottleneck issue), disaster recovery (i.e., data can be restored from a surviving secondary server as mentioned above).

Disadvantages:
Complexity and higher hardware costs when compared to the standalone architecture.

3) Sharding architecture

Sharding allows for horizontal scaling of the database by adding more shard servers. This is ideal for handling massive datasets and high write/read throughput that a single server can't manage.

Advantages:
Horizontal scalability, improved performance for specific queries (i.e., only the relevant shard(s) need to be accessed for the query), and flexibility (i.e., independently scale different parts of the database).

Disadvantages:
Increased complexity, potential performance overhead and uneven data distribution and bottlenecks.

What I see a MongoDB from my previous developer's view

I admit that, previously, I only focused on making a correct configuration for the data flowing to the database (i.e., fetching the right data for the right collection, field, and document) without understanding the database architecture, as depicted below.

However, I believe it would be very beneficial to have a solid understanding of databases. Then, I could optimize my queries. Therefore, I'm learning more about databases.

What happens when querying

We first need to know that our query operations happen in the database memory, and users (i.e., developers) often interact with the memory. If we consider our database as a car, the database memory is the engine. The storage engine applied for the database memory and the design of the data distribution in the physical storage strongly influences our database's performance. WiredTiger and In-Memory are two commonly used storage engines.

To check the storage engine, type the following command.

db.serverStatus().storageEngine

The following illustration presents a typical architecture of a WiredTiger storage engine. It provides a fundamental understanding of a WiredTiger database architecture.

Thank you for reading this far
Hoang
P/S: In part 2, I will write an example of optimizing a query in MongoDB.

Code a 2-layer Neural Network from Scratch

HoangNg — Tue, 09 Apr 2024 19:05:25 +0000

Introduction

This article provides the development of a 2-layer neural network (NN) only using NumPy. This project is a practical introduction to the fundamentals of deep learning and neural network architecture. The main focus will be on the step-by-step construction of the network, aiming to provide a clear and straightforward understanding of its underlying mechanics (i.e., the mathematics behind NNs).

Why a 2-layer neural network?

There is no secret behind the selection of 2 layers. In this project, we will experiment with different choices for hyperparameters for the NN; therefore, a 2-layer architecture is simple enough to make the test feasible.

Data simulation

First of all, we simulate some data using datasets from the Scikit-learn package.

from utils_data import *

N = 2000
noise = 0.25
# load and visualize data
X, Y = load_data(N, noise)

# visualize the data
path_to_save_plot = os.path.join("input", "viz")
plot_data(X, Y, path_to_save_plot)

Our dataset consists of two categories, represented by red and blue dots. If you like to think of a real-world problem, the blue could represent males, and the red could represent females in a sample.

The objective is to develop a model that accurately distinguishes between the red and blue groups. The challenge here is that the data isn’t linearly separable; in other words, it’s likely difficult to draw a straight line that cleanly divides the two groups. This limitation means that linear models (e.g., logistic regression) are unlikely to be effective. This scenario highlights one of the key strengths of neural networks: their ability to handle data effectively that isn’t linearly separable.

A general form of neural network

A neural network comprises layers of interconnected nodes (or “neurons”). These layers include:

Input Layer: This is where the network receives its input data.
Hidden Layers: These layers, which can be one or multiple, perform computations on the input data. Each neuron in these layers applies 2 mathematical functions to the data.
Output Layer: This layer produces the final output of the network, such as a classification (e.g., identifying whether a data point belongs to red or blue groups) or a continuous value (e.g., predicting house prices).

source: Andrew Ng

How a neural network makes predictions

In this example of 2-layer NN, data flows from the input layer, undergoes computing in the hidden layers, and the output layer generates the outcome. Mathematically, the NN generates a probability that determines the outcome prediction belongs to group 0 or 1. The computation can be written as follows:

This project will examine three activation function options: sigmoid, tanh and relu.

In python code

# define helper functions in utils_1batch.py

# ________________ sigmoid function ________________ #
def sigmoid(x):
    s = 1 / (1 + np.exp(-x))
    return s
# ________________ relu function ________________ #
def relu(x):
    return np.maximum(0, x)

We used tanh() from the NumPy package

Then, we need a way to initialize the parameters and compute the forward propagation in Python code as follows:

# ________________ initialize parameters ________________ #
def initialize_parameters(n_x, n_h, n_y):
    W1 = np.random.randn(n_h, n_x) * 0.01
    b1 = np.zeros((n_h, 1))
    W2 = np.random.randn(n_y, n_h) * 0.01
    b2 = np.zeros((n_y, 1))

    parameters = {"W1": W1, "b1": b1, "W2": W2, "b2": b2}

    return parameters

# ________________ compute forward propagation ________________ #
def forward_propagation(X, parameters, activation):
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]

    Z1 = np.dot(W1, X) + b1

    # there are 3 options for the function g()
    if activation == "tanh":
        A1 = np.tanh(Z1)
    elif activation == "sigmoid":
        A1 = sigmoid(Z1)
    elif activation == "relu":
        A1 = relu(Z1)

    Z2 = np.dot(W2, A1) + b2
    A2 = sigmoid(Z2)

    # store values for the back_propagation usage later
    temp_cache = {
        "Z1": Z1,
        "A1": A1,
        "Z2": Z2,
        "A2": A2,
    }

    return A2, temp_cache

How a neural network learn

Training our neural network involves identifying the optimal parameters (W1, b1, W2, b2) that minimize the discrepancy between prediction and ground truth. The key question is how to quantify this discrepancy or error. To evaluate this error, we use what’s called a cost function J() as follows:

In python code

# define helper functions in utils_1batch.py
def compute_cost(A2, Y):
    # get the number of examples
    m = Y.shape[1]

    # compute the loss function
    logprobs = np.multiply(np.log(A2), Y) + np.multiply(np.log(1 - A2), (1 - Y))
    # sum of loss funtions = the cost function
    cost = -np.sum(logprobs) / m

    cost = float(np.squeeze(cost))

    return cost

Once the cost can be computed, the goal will be to minimize the cost. In other words, we search for a solution to minimize the variance between the prediction and the ground truth (i.e., maximizing the likelihood). It is where gradient descent comes in. In this project, we will implement a vanilla version of the gradient descent algorithm (i.e., applying gradient descent through the entire batch of data with one fixed learning rate). If you’re curious about different techniques regarding gradient descent, see CS231n.

For gradient descent to work, it needs the gradients (i.e., the vector of derivatives) concerning the parameters as follows

The calculation of these gradients is achieved through the backpropagation algorithm, an efficient method that begins at the output and works its way backwards to determine the gradients. The parameters are updated simultaneously until the minimum cost is determined.

In Python code:


# define helper functions in utils_1batch.py
# ________________ compute back propagation ________________ #
def backward_propagation(parameters, temp_cache, X, Y, activation):
    m = X.shape[1]

    W1 = parameters["W1"]
    W2 = parameters["W2"]

    A1 = temp_cache["A1"]
    A2 = temp_cache["A2"]

    # compute the backward_propagation
    dZ2 = A2 - Y
    dW2 = np.dot(dZ2, A1.T) / m
    db2 = np.sum(dZ2, axis=1, keepdims=True) / m

    if activation == "tanh":
        dZ1 = np.dot(W2.T, dZ2) * (1 - np.power(A1, 2))  # derivative of tanh
    elif activation == "sigmoid":
        dZ1 = np.dot(W2.T, dZ2) * (A1 * (1 - A1))  # derivative of sigmoid
    elif activation == "relu":
        dZ1 = np.dot(W2.T, dZ2) * relu_derivative(A1)  # derivative of ReLU

    dW1 = np.dot(dZ1, X.T) / m
    db1 = np.sum(dZ1, axis=1, keepdims=True) / m

    gradients = {"dW1": dW1, "db1": db1, "dW2": dW2, "db2": db2}

    return gradients

# ________________ update the parameters ________________ #
def update_parameters(parameters, grads, learning_rate):
    # retrieve the parameters from the input
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    # retrieve the gradient from the input
    dW1 = grads["dW1"]
    db1 = grads["db1"]
    dW2 = grads["dW2"]
    db2 = grads["db2"]
    # update the parameters after comparing
    W1 = W1 - learning_rate * dW1
    b1 = b1 - learning_rate * db1
    W2 = W2 - learning_rate * dW2
    b2 = b2 - learning_rate * db2

    parameters = {
        "W1": W1,
        "b1": b1,
        "W2": W2,
        "b2": b2,
    }

    return parameters

Bring everything together to make a 2-layer NN as follows:


# define helper functions in utils_1batch.py
def nn_1layer_1batch(
    X, Y, n_h, learning_rate, activation, number_iterations, print_cost=False
):
    # set up
    np.random.seed(0)
    n_x = X.shape[0]
    n_y = Y.shape[0]

    # initialize parameters
    parameters = initialize_parameters(n_x, n_h, n_y)

    # initialize cost array
    costs = np.zeros(number_iterations)

    # Loop through forward and backward propagations
    for i in range(0, number_iterations):
        # apply forward_propagation
        A2, temp_cache = forward_propagation(X, parameters, activation)

        # compute the cost
        cost = compute_cost(A2, Y)

        # save the cost
        costs[i] = cost

        # apply backward_propagation
        grads = backward_propagation(parameters, temp_cache, X, Y, activation)

        # gradient descent parameter updats
        parameters = update_parameters(parameters, grads, learning_rate=learning_rate)

        # print the cost after every 1000 loops
        if print_cost and i % 1000 == 0:
            print("Cost after interation %i: %f" % (i, cost))

    return parameters, costs

Test different model configurations

A commonly asked question from beginners when learning NN is, what are the optimal hyperparameters to use? A fairly simple architecture like this 2-layer NN allows experiments for multiple choices of hyperparameters.


from utils_1batch import *

# number of features
n_x = 2
# the number of nodes in the hidden layer
n_hs = np.array([1, 2, 3, 4, 5, 10, 50])
# choice of activation funtion
activations = np.array(["tanh", "sigmoid", "relu"])
# learning rate
learning_rates = np.array([1.2, 0.6, 0.1, 0.01, 0.001])
# number of iterations
number_iterations = np.array([100, 1000, 10000, 100000])

def run_test_1batch():
    # run the test
    test_nodes_1batch(
        file_name,
        data,
        n_hs,
        number_iterations=number_iterations,
        learning_rates=learning_rates,
        activations=activations,
        batch_type="one batch",
    )

A crucial component is examining how each configuration performs on train and test datasets. We will apply the below functions to compute accuracy.


# ________________ make predictions using the NN ________________ #
def predict(X, parameters, activation):
    X = X.T
    A2, temp_cache = forward_propagation(X, parameters, activation)
    predictions = (A2 > 0.5).astype(int)

    return predictions

# ________________ compute the accuracy of the NN ________________ #
def compute_accuracy(Y, Y_hat):
    accuracy = float(
        (np.dot(Y, Y_hat.T) + np.dot(1 - Y, 1 - Y_hat.T)) / float(Y.size) * 100
    )

    accuracy = round(accuracy, 2)

    return accuracy

Examining results

The visualization reveals a broad spectrum of outcomes when using different configurations for the same neural network architecture to solve the same problem. Some configurations stand out, achieving high accuracy levels above 90%, and within this group, there are exceptional cases where accuracy surpasses 95%. These high-performing configurations are prime candidates for further investigation. The next analytical step would be to filter out configurations with an accuracy greater than 95% and conduct a more focused comparison between the training and development datasets

A common practice in model development is that we don’t want overfitting issues when a model performs greatly on the training dataset while fitting poorly on the development one. Therefore, we remove the configurations meeting two conditions:

Accuracies of > 95%; and
The difference between the training and development datasets is > 1%.

The 1% difference is subjective; however, it is reasonably good in this exercise.

After filtering, we now have several configurations left. The next step is considering how much training time each configuration takes.

From the scatter plot, we can observe a significant variation in the training times for different configurations. Even though the model performance on the goodness of fit is very similar, some configurations require a lot of training resources, while others can be trained very fast. In solving real-world tasks, the ones requiring fewer resources are likely preferred. I will now filter out the configurations with a training time of < 30 seconds. Therefore, after filtering, two candidates are retained.

Looking at the two candidates, I will choose the one with the least training time required. Finally, in this experiment, I will try the chosen configuration on a new simulation dataset to see how it performs on completely new data.

Data simulation


N = 500
noise = 0.25
# load and visualize data
X, Y = load_data(N, noise)
X = X.T
Y =  Y.reshape(1, Y.shape[0])

Testing on the newly simulated data


import pickle

with open('../output/data/parameters/parameters_3_tanh_one batch_0.6_10000.pkl', 'rb') as file:
    parameters = pickle.load(file)

Y_hat = predict(X.T, parameters, "tanh")
accuracy = compute_accuracy(Y, Y_hat)
print(f"Accuracy: {accuracy} % ")

Accuracy: 93.6 %

All the steps presented in this examining result section can be found in the file EDA.ipynb.

Source code

To wrap things up, this 2-layer neural network will not solve any real-world tasks, but it is an excellent starting point for anyone diving into artificial intelligence, machine learning, and deep learning. It is often the case that you will employ well-known libraries for solving real-world problems. This experiment could clear the mist about what is happening under the hood. It’s important to recognize that this is just the tip of the iceberg in the universe of deep learning, which includes advanced concepts like minibatch, learning rate decay, and many more.

Thank you for reading this far

Have a great day
Hoang

Python Optimization with NumPy (Vectorization)

HoangNg — Thu, 04 Apr 2024 14:37:48 +0000

Methods

I created different methods to simulate some data and compare these methods regarding their performance when increasing the sample size.

Method 1: Unvectorized method using Python list;
Method 2: Unvectorized method using Numpy array;
Method 3: Partially vectorized method (i.e., this method still utilizes a Python list and an explicit loop)
Method 4: Fully vectorized method (i.e., only use Numpy array and vectorization provided by Numpy)

See the code below

def make_dummy_y_unvectorized1(x, vector_w, b, error_term):
    y = []
    m = x.shape[1]
    for i in range(m):
        y_i = 0
        for j in range(len(vector_w)):
            y_i += vector_w[j] * x[j, i]
        y_i = (y_i + b) * np.exp(error_term[i])

        y.append(y_i)
        y = np.array(y)
    return y

def make_dummy_y_unvectorized2(x, vector_w, b, error_term):
    m, n = x.shape
    y = np.zeros(n)
    for i in range(n):
        for j in range(m):
            y[i] += vector_w[j] * x[j, i]
    y = (y + b) * np.exp(error_term)
    return y


def make_dummy_y_vectorized1(x, vector_w, b, error_term):
    y = []
    for i in range(x.shape[1]):
        y.append((np.dot(vector_w, x[:, i]) + b) * np.exp(error_term[i]))
        y = np.array(y)
    return y


def make_dummy_y_vectorized2(x, vector_w, b, error_term):
    y = (np.dot(vector_w, x) + b) * np.exp(error_term)
    return y

In the comparison chart, method 1 and method 2 show a sharp increase in the time it takes to finish calculations as the amount of data grows, indicating they're not well-suited for large tasks. Method 3 improves this by handling more data before slowing down. Method 4 - a fully vectorized method - stands out as the clear winner, maintaining a fast and consistent performance regardless of data size, showcasing its efficiency with heavy workloads.

Source code

Have a nice day
Hoang