DEV Community: Awadelrahman M. A. Ahmed

Chatting with the Cloud: Setting up a ChatGPT Service on AWS

Awadelrahman M. A. Ahmed — Sat, 11 Feb 2023 17:28:10 +0000

Are you interested in setting up a ChatGPT service in the cloud? In this tutorial, I'll walk you through the steps of getting a ChatGPT service up and running on Amazon Web Services (AWS).

Here's what you'll need before you get started:

An AWS account
An OpenAI API key
An SSH client like PuTTY or Terminal (if you're using a Windows machine)

Step 1: Launch an EC2 Instance

The first step is to launch an EC2 instance, which is essentially a virtual machine in the cloud that you can use to run your ChatGPT service. Here's how:

Log in to the AWS Management Console and navigate to the EC2 dashboard.
Click the "Launch Instance" button.
Choose the Amazon Linux 2 AMI.
Select an instance type that meets the requirements for OpenAI's GPT-3 API. OpenAI recommends using an instance with at least 16 GB of RAM and 2 vCPUs, such as the c5.large instance type.
Follow the steps to configure your instance and launch it.

Step 2: Connect to Your Instance

Once your instance is up and running, you'll need to connect to it using an SSH client. Here's how:

Open your SSH client.
Enter the public DNS or IP address of your EC2 instance.
Log in as the ec2-user user.

Step 3: Install Dependencies

Once you're connected to your EC2 instance, you'll need to install Git, Docker, and the dependencies required to run OpenAI's GPT-3 API. Here's the code you'll need to run:

# Install Git
sudo yum install git -y

# Install Docker
sudo amazon-linux-extras install docker
sudo service docker start
sudo usermod -a -G docker ec2-user

# Install Docker Compose
sudo curl -L "https://github.com/docker/compose/releases/download/1.26.0/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

# Install pip
sudo yum install python3-pip -y

Step 4: Clone the OpenAI Repository

Now that you've installed the necessary dependencies, you can clone the OpenAI repository and install the dependencies required to run the API. Here's the code you'll need to run:

# Clone the OpenAI repository
git clone https://github.com/openai/api
cd api

# Install the dependencies
pip install -r requirements.txt

Step 5: Start the API

Finally, you can start the API by running the following command:

docker-compose up -d

This will start the API in the background. To test that your API is working, you can make a request to the endpoint http://your-instance-ip:8080/request and provide your OpenAI API key in the Keyheader. You should receive a response from the model!

And that's it! You now have a fully functional ChatGPT service running on AWS. Start building your conversational applications today!

How I use AWS SageMaker DeepAR to Manage my Child Diabetes! Part [1/3]

Awadelrahman M. A. Ahmed — Sat, 12 Mar 2022 08:24:40 +0000

Introducing the Problem

"I Am Not a Medical Professional” Disclaimer !

I clearly state here that I am not a doctor or any kind of medical professional! I am writing these posts only to reflect my experience and share my story of how I used technology at a personal level to tackle one of the tedious and critical tasks we have in our daily life. So, this content does not constitute any professional medical advice, diagnosis, or treatment!

1. Pre-

I split my write-up into 3 parts which I post successively. In this first post, I intend to describe the problem and provide essential domain knowledge to make it smoother to follow up the following posts.

The second post will describe the data, data extraction, preparation and transformation. The last (third) post will explain the modelling part and how I used AWS DeepAR algorithm to tackle the discussed problem.

At this moment, the other two post are not yet published, I will update those lines with the links when published.

2. What am I sharing?

Being a parent of a child with type 1 diabetes (T1D) is a kind of a hassle, even with being fortunate to acquire smart systems like continuous glucose monitoring system (CGM) and insulin pumps. Those are small computerized devices to mimic the way the human pancreas works by delivering small doses of short acting insulin continuously. However, these devices still need a lot of careful tuning and parameterization to deliver the correct amount of insulin at the right time. The parameters and factors' values are basically calculated and set by diabetes doctors along with the discussion with the patient (parent in our case).

Periodically (e.g., every 2 months), the diabetes doctor looks through the blood glucose levels registered by the CGM and the insulin amounts delivered by the insulin pump and tries to adjust particular factors, aiming to keep the blood glucose in range. Normally, the values are set following some rules and learning from experience, patterns and trends.

I believe we are so privileged to have a doctor who regards discussing the technical details. So every check-up appointment, we have a deep discussion akin to brain storming session to decide which parameter to keep and which to adjust.

Frankly to say, those discussions always tickle my sentiment regarding extracting knowledge and insights from data, as what we basically try to do is to find some patterns in the historical data and try to optimize the pump parameters in order to keep the blood glucose within range! It looks like a very legit machine learning problem! That stimulated my current passion regarding AWS and its machine learning algorithms, so I tried to experiment using AWS to enlighten our parameters-setting discussions! Yes, it might not yet be obvious how my problem is an awesome use case of AWS, so let me explain the details of the problem to be more acquainted with technical terms.

3. Acting like a pancreas

It is ironic that we can not actualize either the wise saying of the scientist and philosopher, Benjamin Franklin (1706-1790) "Eat to live, and not live to eat." or the sarcastic quote by anonymous "I don't eat to live, I live to eat"; without possessing a well-functioning pancreas!

The pancreas is an organ that plays a very essential role in converting the food we eat into fuel for the body's cells. A principal function of the pancreas is to regulate blood glucose. The pancreas secretes a hormone called insulin in response to elevated blood glucose levels. The insulin acts as a key 🗝️ to unlock the cell allowing the glucose to enter the cell and being used for energy. I tried to visually simplify the idea in the animation below.

So, without insulin, the glucose remains in the bloodstream, which can lead to dangerous complications like what is called hyperglycemia. If hyperglycemia is not treated, it may lead to even more serious problem, a condition called ketoacidosis (diabetic coma!). Without proper production of insulin, the body can't thrust glucose into cells and the body reacts by breaking down the body fat to use for energy!

A consequence of this temporary remedy of producing energy by breaking down fats is the production of waste products called ketones. It's a sad thing to know that our bodies cannot tolerate large amounts of ketones and tries to get rid of them through the urinary system. But it is even sadder to know that if the body cannot release all the ketones and they build up in the bloodstream, the ketoacidosis will be life-threatening and needs immediate treatment.

The plot below from [1] shows how the insulin production goes hand in hand with the glucose levels increase caused by the meals.

You might get many insights from the above plot, but two essential observations to understand for our machine learning problem which also map sharply to two insulin components used by the insulin pump to imitate the pancreas are:

There are peaks (bursts) of insulin production by the pancreas every time the body consumes food. The insulin pump tries to imitate these peaks by injecting a quick-acting insulin before mealtimes and it is called Bolus.
The plot above never crosses 0 of the Y-axis. In other words, the pancreas, continuously, produces some base amount of insulin around the clock. To mimic this, Basal insulin is continuously injected throughout the day as a longer-acting insulin that helps keep glucose levels steady day and night.

This plot from [2] shows the two insulin types Basal and Bolus working together to mimic the pancreas function.

4. "How much insulin to inject" is a factors' game

Now we know that insulin pumps are smart devices that attached to the patient body and tries to mimic the function of the pancreas by delivering both types of insulin, the Basal and the Bolus.

We tried using different types of insulin pumps and each had its pros and cons for us. Lastly, we ended up using Omnipod system for a couple of years now. Even though Omnipod still lacks the closed-loop control system compared to Medtronic products, i.e., it does not allow the CGM (glucose sensor) to communicate directly to inject the insulin, we still see its tubeless system is a great advantage for our active kid.

4.1 Basal program

The insulin pump PDM allows setting a basal program (i.e., schedule) to deliver a pre-set insulin amount for each pre-set time period. I thought at the beginning that the basal should have only a single value for the whole day/week, but it turned out the need for insulin changes during the day based on the person's activity level and other conditions. We can also have multiple basal programs to meet different activity levels and conditions such as weekends or gym-days. The image below shows how a basal program looks like as a graph and as a list that stored in the Omnipod PDM (personal diabetes manager).

4.2 Bolus settings

When deciding to have a meal, we should also decide how much insulin we take for that a meal. A that moment, we should garb a pen and a piece of paper and calculate a bolus dose that consists of two components:

A component that is proportional to the carbohydrate-content of the meal with a coefficient of proportionality that is called Insulin-to-Carbohydrate factor (ICF). This ICF defines how many carbohydrates can be handled by one unit of insulin.
It is typical that the blood glucose be at an undesired level due to some error propagation. For instance if we incorrectly calculated the carbs for the breakfast, the blood glucose may stay high till the dinner time if we do not correct as rapidly as possible. Hence a second bolus component is calculated when having a meal or whenever required during the day. This correction component is controlled by a Correction Factor (CF) which defines how much one unit of insulin lowers the blood glucose level, and then we can calculate how much insulin we need to reach our an optimal (preferred) target blood glucose level, e.g. 5.0 mmol/L.

So, to calculate a bolus dose we use this simple equation:

As an example, suppose we pre-decided that these factor to be, ICF=20 and CF=10. And suppose the patient would like to drink a cup of milk which contains 10 gm of carbohydrates. At that moment, we measured the blood glucose and it was 7.0 mmol/L; however, we wish if we had it at 5.0 mmol/L. So to calculate the bolus, we plug those values into the above equation and that will lead to a bolus dose of 0.70 units as below:

Now you can imagine doing this calculation every time the patient needs to eat something! Things become more complicated when you know that those ICF and CF usually have different values for different times of the day, e.g. ICF might be 20.0 for the breakfast but 15.0 for the lunch! This why being equipped with an insulin pump that has a bolus calculator within it along with a CGM (continuous glucose monitoring) can save a lot of hassle.

We can compare the situation in which we do not have smart devices and the situation of having them. In the table below, I summarize the steps/activities that we must go through if we "decide" to have a meal. I describe how we perform each step in both situation.

5. Motivation to use machine learning?

Yes, so far that looks satisfactory and smart, and the table above gives a fairly positive picture of how the CGM and the insulin pump smartly lift a huge burden off our shoulders. However, to ensure an effective and tight control over blood sugar, we must optimally design the the basal program and the bolus factors. We have already mentioned that those factors can have a wide range of settings, not only at the factor value dimension but also the time and the activity dimensions. In other words, those factors have different optimal values during the day and based on the activity and many other conditions. I envision that to look like the plot below which this leads to unlimited number of values.

Normally, the optimal values of the factors are set following some rules and learning from experience. As still there is a substantial portion of the setting procedure done empirically, I take this as a strong motivation to leverage machine learning to at least give some insights. My motivation is also supported by the fact that those devices register considerable amount of data to learn from.

In the next post, I try to formulate the problem in the terminology of data science and machine learning context and discuss the data extraction and transformation.

Tutorial: Graph Neural Networks for Social Networks Using PyTorch

Awadelrahman M. A. Ahmed — Wed, 07 Jul 2021 16:41:10 +0000

1. Set your expectations of this tutorial

You can follow this tutorial if you would like to know about Graph Neural Networks (GNNs) through a practical example using PyTorch framework.
I am aiming, at the end of this step-by-step tutorial, that you will be able to:

Gain insights about what graph neural networks (GNNs) are and what type of problems they may solve.
Know how graph datasets, which are expected by GNNs, look like. We will download and explore a social network dataset collected from GitHub.
Construct graphs and visualize them using code.
Construct and train a simple GNN model for node classification task based on convolutional GNN using torch_geometric, the geometric deep learning extension library for PyTorch.

Of course you are so welcome to post comments if you have any question or you feel that I deviated from what I promised you here!

2. GNN $≈\approx$ Graph + Convolution Neural Network!

A BIG caveat here is to emphasize that I do not mean GNNs are just CNNs that operate on graphs, but what I want to say is that I felt comfortable with GNNs when I linked it to my understanding of CNNs and learnt something about graphs. For sure there are many other variations of GNNs but let us stick to this for those 10 minutes of reading. I hope this works for you as well, note that I put the $≈\approx$ sign to avoid causing some people to cringe.

Graph neural networks, as their name tells, are neural networks that work on graphs. And the graph is a data structure that has two main ingredients: nodes (a.k.a. vertices) which are connected by the second ingredient: edges. You can conceptualize the nodes as the graph entities or objects and the edges are any kind of relation that those nodes may have. Also you should know that both nodes and edges can have features (a.k.a. attributes), note the can as having features is not a must.

Mathematicians like to be concise and they represent the graph as the set $G=(V,E)\mathcal{G = (V, E)}$ where $G,V,E\mathcal{G, V, E}$ are the graph, vertices and edges sets respectively. But for our tutorial, let us make some visually appealing illustrations. The figure below represents a graph that consists of 4 nodes, represented by those blue circles with indices (1,2,3 and 4) and connected by directed edges represented by those green arrows. This graph is directed because the edges may represent asymmetric relationship, for instance if the nodes represent social network users and node 1 follows node 2 it does not mean that node 2 follows node 1. The contrary is the undirected graph which has bidirectional edges, then the relation could be friendship for example.

Since you are now reading this post about GNNs, I can confidently assume that you already have knowledge about CNNs. Yes, convolutional neural networks, the deep neural network class, that is most commonly applied to image analysis and computer vision applications.

Most of us, if not all, already know that images consist of pixels and the CNN leverages the convolution operation to calculate latent (hidden) features for different pixels based on their surrounding pixel values. It does this by sliding a kernel (a.k.a. filter) over the input image and calculating the dot product between that small filter and the overlapping image area. This dot product leads to aggregate the neighboring pixel values to one representative scaler. Now let us twist our conceptualization of images a little bit and think of images as a graphs. You can think of the pixels of the image as the nodes of the graph and the edges are present only between the adjacent pixels. You might have your own way to represent the image as a graph, what matters is that you just have the image-to-graph concept in your mind. If you get this then it is easier for you to think of GNNs as a generic version of CNNs that can operate on graphs.

The benefit of using GNNs is the provision of a generalized form that enables us to exploit non-Euclidean space data. In contrast to the pixels representation of images, the graph can be visualized as a collection of nodes and edges without having any order. So if we can learn node features in the latent space, we can do many interesting things. Those things are under three main categories:

Graph classification: where we can answer questions like: can we predict chemical compounds activity against lung cancer? Remember that chemical compounds can be represented as graphs with compound atoms as its nodes and chemical bonds are its edges.
Edge classification: as an example we can predict the congestion in a particular street, if we are given the city map. Then our graph has the streets as its edges and their intersections as its nodes.
Node classification: our tutorial problem falls under this category, as we are given a graph of a social network where users represent its nodes and the relationships between them are its edges and our task is to predict each user group (i.e. node class).

3. Adjacency matrix? Degree matrix? an example

Let us take a simple type of graphs, the undirected graph, this graph type is simple because the relationship between nodes is symmetric. Let us have an example of undirected graphs which has some numerical features attached to its nodes as well as its edges. Those features can represent any attributes that describe a node (blue boxes) or an edge (green boxes). Features are normally vectors, but we represent them by scalers here for simplicity.

We can express the above graph using two matrices which can capture its structure. The first is called the adjacency matrix and mostly denoted by A which is a square matrix used to represent finite graphs like this one. The elements of A indicate whether pairs of nodes are adjacent (i.e. connected by edges) or not in the graph. Those elements can be weighted (e.g. by edge features) as in our case; or can be unweighted (i.e. zeros
for disconnected nodes and ones for the adjacent nodes). For our graph we come up with a symmetric matrix with zero diagonal as shown below. The other matrix is the features matrix H (a.k.a. graph signal X) which holds the nodes' features of the graph.

An interesting operation that we may like to do is to calculate the dot product between those two matrices. This product, if we closely look at it, is generating a new feature vector that captures the feature of the surrounding nodes for each node based on how those surrounding nodes are connected to it. You can take a few seconds to relate the numbers on the calculation to the graph.

Now, if we take a critical look at the result we can observe two problems. The first is that when calculating the entry of the features matrix H' for any of the nodes we exclude that particular node features from the calculation as shown by red zeros. Which seems to be absurd! in the sense that we are ignoring our central node! To solve this we can add self-connections to every node on the graph and this can be achieved by adding the identity matrix ${I}$ to the graph adjacency matrix, which will alter its zero-valued diagonal to ones. So, now we can define the modified adjacency matrix $A~\tilde{A}$ as follows:

The second problem is the risk that may result from aggregating the features as we might end up with extremely high values which can lead to computation instability. We can see already signs of that in our simple example on the latent feature matrix H' which has entries at least 10 times more than the entries in the original H. The solution of this problem is to normalize the latent features by dividing by the graph degree matrix D (specifically we multiply by its inverse). The degree matrix is a diagonal matrix that contains information about the number of edges attached to each vertex. Our graph degree matrix looks like this:

Then the latent space feature representation can be calculated by the following, note that you may see different ways to solve this problem in the literature, but their common goal is to achieve some level of normalization by exploiting the graph structure:

4. Let us add some intelligence (learning)

So far, we discussed how we can calculate latent features of a graph data structure. But if we want to accomplish a particular task we can guide this calculation toward our goal. Particularly, we can leverage our last equation to be multiplied by a learnable weight matrix W and apply a nonlinear function to the product as in the next equation. We can learn the optimal W based on the task at hand. And here is where we can start doing interesting things.

Implementation wise, W can be a linear transformation (e.g. a linear layer) of the product and $ϕ\phi$ is a non-linear activation function. I do not know if now you got the same intuition of mine (which is true) that we can stack as many functions of those as we like and learn the weight matrices W using backpropagation. Stacking those functions together promote this equation to be a deep learning model.

5. Time to get hands dirty!

Even though the above functions look natural and manageable to implement using many machine learning frameworks, torch_geometric (PyTorch Geometric), a geometric deep learning extension library for PyTorch, provides many variations of deep learning on graphs and other irregular structures.

Of course, as this field is evolving, you will find many interesting original GNN models and many "bells and whistles" for better suiting your problem. But for the sake of this tutorial, we will keep it simple and use only one flavor of GNNs which is the graph convolutional operator class GCNConv which is implemented from the paper “Semi-supervised Classification with Graph Convolutional Networks”, if you would like to have a look at how it was developed.

The particular problem we will be working on is a typical node classification problem. We have a large social network of GitHub developers named musae-github which was collected from the public API. Nodes represent developers (platform users) who have starred at least 10 repositories and edges are mutual follower relationships between them (note that word mutual indicates undirected relationship). The node features are extracted based on the location, repositories starred, employer and e-mail address. Our task is to predict whether the GitHub user is a web or a machine learning developer. This target feature was derived from the job title of each user. The dataset referenced paper is "Multi-scale Attributed Node Embedding".

Working with data, a golden rule is to learn as we go about the dataset. I am pasting here a dataset statistics table from the same source, so we can have a preliminary idea about what we are dealing with. This statistics tell us that our data set has those many nodes and those many edges. It also tells us that the edges has no features and no directions. Also, the data has no temporal property. All this information can give us a feeling that our dataset is a right candidate for this tutorial in term of simplicity.

After downloading the dataset, we can see 3 important files:
musae_git_edges.csv which contains the edges' indices.
musae_git_features.json which contains the nodes' features.
musae_git_target.csv which contains the targets, i.e. node labels.

5.1 Required packages

To begin with, we must download and import the required packages. To download torch_geometric please follow strictly the installation guide here, there are different options if you want to install using Pip Wheels, please check your syntax to download the compatible version based on your machine and software.
We will also need NetworkX Python package, we will use it to visualize the graph, I installed with pip. From torch_geometric we need to import AddTrainValTestMask, this will help us to segregate the training and test sets later. We also need json,pandas and numpy and some other packages.



%matplotlib inline

import json
import collections
import numpy as np
import pandas as pd

import matplotlib
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.nn.functional as F

from torch_geometric.data import Data
from torch_geometric.transforms import AddTrainValTestMask as masking
from torch_geometric.utils.convert import to_networkx
from torch_geometric.nn import GCNConv

import networkx as nx

5.2 Dataset exploration

I assume the dataset files are in a subfolder named data. We read them and plot the top 5 rows and the last 5 rows from the labels file. Even though we see 4 columns but only 2 columns concern us here: id of the node (i.e. user) and ml_target which is 1 if the user is a machine learning community user and 0 otherwise. Now, we are sure that our task is a binary classification problem, as we only have 2 classes.



with open("data/musae_git_features.json") as json_data:
    data_raw = json.load(json_data)
edges=pd.read_csv("data/musae_git_edges.csv")
target_df=pd.read_csv("data/musae_git_target.csv")

print("5 top nodes labels")
print(target_df.head(5).to_markdown())
print()
print("5 last nodes")
print(target_df.tail(5).to_markdown())

Another important property we need to look at the class balance, this is because severe class imbalances may cause the classifier to simply guessing the majority class and not making any evaluation on the underrepresented class. By plotting the histogram (frequency distribution), we see some imbalance as the machine learning (label=1) are fewer than the other classes. This leads to some issues, but let us say, for now, it is not a severe problem.

We may also see how nodes are different based on the number of features. The second histogram tells that most of the users have between 15 and about 23 features, and very few of them have more than 30 and less than 5 features. The third histogram shows the most common features accross users, we can see different peaks on the distribution.



plt.hist(target_df.ml_target,bins=4);
plt.title("Classes distribution")
plt.show()

plt.hist(feat_counts,bins=20)
plt.title("Number of features per graph distribution")
plt.show()

plt.hist(feats,bins=50)
plt.title("Features distribution")
plt.show()

5.3 Features encoding

The nodes features tell us which feature is attached to each node. We can one-hot encode those features by writing our function encode_data. Our plan is to use this function to encode a light subset of the graph (e.g. only 60 nodes) for the purpose of visualization. Here is the function.



def encode_data(light=False,n=60):
    if light==True:
        nodes_included=n
    elif light==False:
        nodes_included=len(data_raw)

    data_encoded={}
    for i in range(nodes_included):# 
        one_hot_feat=np.array([0]*(max(feats)+1))
        this_feat=data_raw[str(i)]
        one_hot_feat[this_feat]=1
        data_encoded[str(i)]=list(one_hot_feat)

    if light==True:
        sparse_feat_matrix=np.zeros((1,max(feats)+1))
        for j in range(nodes_included):
            temp=np.array(data_encoded[str(j)]).reshape(1,-1)
            sparse_feat_matrix=np.concatenate((sparse_feat_matrix,temp),axis=0)
        sparse_feat_matrix=sparse_feat_matrix[1:,:]
        return(data_encoded,sparse_feat_matrix)
    elif light==False:
        return(data_encoded, None)

We can plot the first 250 features columns (the total is 4005) of the code for the first 60 users. Below, we can see how sparse is the matrix that we construct for nodes features.



data_encoded_vis,sparse_feat_matrix_vis=encode_data(light=True,n=60)
plt.figure(figsize=(25,25));
plt.imshow(sparse_feat_matrix_vis[:,:250],cmap='Greys');

5.4 Graph construction and visualization

To construct our graph, we will use torch_geometric.data.Data which is a plain old python object to model a single graph with various (optional) attributes. We will construct our graph object using this class and passing the following attributes, noting that all arguments are torch tensors.

x: will be assigned to the encoded node features, its shape is [number_of_nodes, number_of_features].
y: will be assigned to the node labels, its shape is [number_of_nodes]
edge_index: to represent an undirected graph, we need to extend the original edge indices in a way that we can have two separate directed edges connecting the same two nodes but pointing opposite to each other. For example, we need to have 2 edges between node 100 and node 200, one edge points from 100 to 200 and the other points from 200 to 100. This is a way to represent the undirected graph if we are given the edge indecies. The tensor shape will be [2,2*number_of_original_edges].

It deserve mentioning here that this Data class is very abstract in a sense that you can add any attribute that you think it describes your graph. For instance we can add a metadata attribute g["meta_data"]="bla bla bla" which make it flexible to encapsulate any information you would like. Now, we will build construct_graph function which does the following:



def construct_graph(data_encoded,light=False):
    node_features_list=list(data_encoded.values())
    node_features=torch.tensor(node_features_list)
    node_labels=torch.tensor(target_df['ml_target'].values)
    edges_list=edges.values.tolist()
    edge_index01=torch.tensor(edges_list, dtype = torch.long).T
    edge_index02=torch.zeros(edge_index01.shape, dtype = torch.long)#.T
    edge_index02[0,:]=edge_index01[1,:]
    edge_index02[1,:]=edge_index01[0,:]
    edge_index0=torch.cat((edge_index01,edge_index02),axis=1)
    g = Data(x=node_features, y=node_labels, edge_index=edge_index0)
    g_light = Data(x=node_features[:,0:2],
                     y=node_labels   ,
                     edge_index=edge_index0[:,:55])
    if light:
        return(g_light)
    else:
        return(g)

For drawing a graph, we construct our draw_graph function. We need to convert our homogeneous graph to a NetworkX graph then pot using NetworkX.draw.



def draw_graph(data0):
    #node_labels=data0.y
    if data0.num_nodes>100:
        print("This is a big graph, can not plot...")
        return

    else:
        data_nx = to_networkx(data0)
        node_colors=data0.y[list(data_nx.nodes)]
        pos= nx.spring_layout(data_nx,scale =1)
        plt.figure(figsize=(12,8))
        nx.draw(data_nx, pos,  cmap=plt.get_cmap('Set1'),
                node_color =node_colors,node_size=600,connectionstyle="angle3",
                width =1, with_labels = False, edge_color = 'k', arrowstyle = "-")

Now, we can draw a sub graph by calling construct_graph with light=True. Then we can pass it to draw_graph, to show the following graph. You can see how the nodes are connected by edges and labeled by color.



g_sample=construct_graph(data_encoded=data_encoded_vis,light=True)
draw_graph(g_sample)

5.5 Construct and train our GNN model

We start by encoding the whole data by calling encode_data with light=False and construct the full graph by calling construct_graph with light=False. We will not try to visualize this big graph as I am assuming that you are using your local computer with limited resources.



data_encoded,_=encode_data(light=False)
g=construct_graph(data_encoded=data_encoded,light=False)

The usual immediate step after data preparation in the machine learning pipelines is to perform data segregation and split subsets of the dataset to train the model and further validate how it performs against new data and save a test segment to report the overall performance.

The splitting is not straight forward in our case as we have a single giant graph that should be taken all at once (there are some other ways to deal with graph segments, but let us assume this is the case for now). In order to tell the training phase which nodes that should be included during training and tell the inference phase which are the test data we can use masks which are binary vectors that indicate (using 0 and 1) which nodes belong to each particular mask. torch_geometric.transforms.AddTrainValTestMask class can take our graph and let us set how we want our masks to formed and it will add a node-level split via train_mask, val_mask and test_mask attributes. In our training, we use 30% as a validation set and 60% as test set while we keep only 10% for the training. You may have different split ratios, but in this way we may have a more realistic performance and we will not overfit easily (I know you might disagree with me on this point)! We can also print the graph information and the number of nodes each set (mask). The numbers between the brackets is the shape of each attribute tensor.



msk=masking(split="train_rest", num_splits = 1, num_val = 0.3, num_test= 0.6)
g=msk(g)
print(g)
print()
print("training samples",torch.sum(g.train_mask).item())
print("validation samples",torch.sum(g.val_mask ).item())
print("test samples",torch.sum(g.test_mask ).item())

At this point, I believe you agree with me that we are ready to construct our GNN model class. As we agreed we will use torch_geometric.nn.GCNConv class, however there are many other layers you can try on PyTorch Geometric documentation.
We will stack two GCNConv layers, the first has input features equal to the number of features in our graph and some arbitrary number of output features f. Then we apply a relu activation function and deliver the latent features to the second layer which has output nodes equal to the number our classes (i.e. 2). In the forward function, GCNConv can accept many arguments x as the nodes features, edge_index and edge_weight, in our case we only use the first two arguments.



class SocialGNN(torch.nn.Module):
    def __init__(self,num_of_feat,f):
        super(SocialGNN, self).__init__()
        self.conv1 = GCNConv(num_of_feat, f)
        self.conv2 = GCNConv(f, 2)

    def forward(self, data):
        x = data.x.float()
        edge_index =  data.edge_index
        x = self.conv1(x=x, edge_index=edge_index)
        x = F.relu(x)
        x = self.conv2(x, edge_index)
        return x

As our model will predict the class of all nodes in the graph, we, however, want to calculate the loss and accuracy for specific set based on the phase we are in. For instance, during training we want to calculate the training loss and accuracy only based on the training set and here where we should use our masks. We will define functions masked_loss and masked_accuracy for which we pass the respective masks and it returns the corresponding loss and accuracy. The idea is to calculate the loss and accuracy for all nodes and multiply it by the mask to zero out the not required nodes.



def masked_loss(predictions,labels,mask):
    mask=mask.float()
    mask=mask/torch.mean(mask)
    loss=criterion(predictions,labels)
    loss=loss*mask
    loss=torch.mean(loss)
    return (loss)    

def masked_accuracy(predictions,labels,mask):
    mask=mask.float()
    mask/=torch.mean(mask)
    accuracy=(torch.argmax(predictions,axis=1)==labels).long()
    accuracy=mask*accuracy
    accuracy=torch.mean(accuracy)
    return (accuracy)

Now, we can define our training function in which we will use torch.optim.Adam optimizer. We will run the training for some number of epochs and we keep track for the best validation accuracy. We also plot the losses and accuracies across epochs for the training.



def train_social(net,data,epochs=10,lr=0.01):
    optimizer = torch.optim.Adam(net.parameters(), lr=lr) # 00001
    best_accuracy=0.0

    train_losses=[]
    train_accuracies=[]

    val_losses=[]
    val_accuracies=[]

    test_losses=[]
    test_accuracies=[]

    for ep in range(epochs+1):
        optimizer.zero_grad()
        out=net(data)
        loss=masked_loss(predictions=out,
                         labels=data.y,
                         mask=data.train_mask)
        loss.backward()
        optimizer.step()
        train_losses+=[loss]
        train_accuracy=masked_accuracy(predictions=out,
                                       labels=data.y, 
                                       mask=data.train_mask)
        train_accuracies+=[train_accuracy]

        val_loss=masked_loss(predictions=out,
                             labels=data.y, 
                             mask=data.val_mask)
        val_losses+=[val_loss]

        val_accuracy=masked_accuracy(predictions=out,
                                     labels=data.y, 
                                     mask=data.val_mask)
        val_accuracies+=[val_accuracy]

        test_accuracy=masked_accuracy(predictions=out,
                                      labels=data.y, 
                                      mask=data.test_mask)
        test_accuracies+=[test_accuracy]
        if np.round(val_accuracy,4)> np.round(best_accuracy ,4):
            print("Epoch {}/{}, Train_Loss: {:.4f}, Train_Accuracy: {:.4f}, Val_Accuracy: {:.4f}, Test_Accuracy: {:.4f}"
                      .format(ep+1,epochs, loss.item(), train_accuracy, val_accuracy,  test_accuracy))
            best_accuracy=val_accuracy

    plt.plot(train_losses)  
    plt.plot(val_losses)
    plt.plot(test_losses)  
    plt.show()

    plt.plot(train_accuracies) 
    plt.plot(val_accuracies)
    plt.plot(test_accuracies) 
    plt.show()

At this point we constructed all the required functions and ready to instantiate our model and train it. We will construct the model with 16 filters and will use nn.CrossEntropyLoss as our loss criterion. Below, we can see that our simple model reached a very decent accuracy on the test set, more than 87%. We can also see the learning curves (losses) and accuracies development through epochs on the top and bottom plots respectively.



num_of_feat=g.num_node_features
net=SocialGNN(num_of_feat=num_of_feat,f=16)
criterion=nn.CrossEntropyLoss()
train_social(net,g,epochs=50,lr=0.1)

6. Conclusion and next steps

We have gone through this step-by-step tutorial covering fundamental concepts about graph neural networks and developed our simple GNN model based on convolutional GNN on PyTorch framework using torch_geometric extension. You might want to access the github repository of this code.

Awadelrahman / GNN4SocialNWTutorial

6.1 Further steps that might be interesting to you:

Solving other categories of problems that GNNs can solve, i.e. graph classification and edges classification.
You might want to check other GNN layers that torch_geometric offers.

7. Some resources that helped me to prepare this tutorial:

PyTorch Geometric Documentation.
“Semi-supervised Classification with Graph Convolutional Networks” Paper by Kipf, Thomas N., and Max Welling.
Intro to graph neural networks presentation by Petar Veličković on YouTube.

Thank you for following this tutorial, I hope it was of help for you🙂.