DEV Community: Alcione Paiva

Artificial Neurons: The Heart of AI

Alcione Paiva — Wed, 15 Jan 2025 13:50:36 +0000

Artificial Intelligence has generated a lot of excitement recently due to the advances made by large language models (LLM). This success has motivated a large number of people to try to enter the field to benefit from its growth. However, most texts do not address the fundamental basis of these neural networks: the artificial neuron. We believe that this knowledge is the foundation for a solid understanding of artificial neural networks. In this tutorial, we will describe the functioning of an artificial neuron, also called logistic regression. Despite its simplicity, the artificial neuron is very useful for solving various classification problems, such as spam detection, diabetes prediction, credit granting, among others.

Classification of Machine Learning Systems

To better understand this type of technique, it is important to be familiar with a way of classifying machine learning models. Machine Learning is a sub-field of Artificial Intelligence, aiming at the development of systems that can learn and improve automatically from data or information acquisition. We can categorize machine learning models into supervised, unsupervised, and reinforcement learning.

In supervised models, the system learns from examples. In the case of unsupervised techniques, the system detects patterns by examining data without these patterns being presented beforehand. Finally, in the third class of models, reinforcement learning, the system learns from its actions and the feedback received in terms of rewards.

The Artificial Neuron, in the form of logistic regression, is a supervised learning technique. Supervised models can be further divided into classification systems and regression systems.

Classification of machine learning models.

Logistic regression

In classification models, the system tries to identify which class is correct given an input. For example, based on a person’s financial data, the system attempts to determine whether it’s appropriate to lend money or deny the loan. Another example is when the system receives data about a specific animal and, based on that information, identifies whether it’s a mammal, reptile, bird, or fish.

In the case of regression, the system attempts to output a value based on the received data. For instance, using financial data, the system might try to predict the inflation rate — a technique commonly employed in the financial market.

Despite its name, logistic regression is used for classification. Classification can be binary, where there are only two classes, such as yes or no, positive or negative. It can also be multiclass, for example, classifying whether a word is a verb, noun, adjective, adverb, and so forth.

To distinguish logistic regression from linear regression, we can observe the graphical difference using an example with two inputs or dimensions. Using only two inputs makes visualization easier. In the case of linear regression applied to a set of points in a plane, our objective is to establish a line that effectively captures the underlying trend of point distribution in the plane.

Example of linear regression.

Once this line is adjusted, we can use it to predict one axis value based on the other. If it’s a three-dimensional space, we’ll try to fit a plane. If there are more dimensions, we’ll attempt to fit a hyperplane.

In the case of logistic regression, what we aim for is to return a decision, such as yes or no, or a classification. So, drawing a line won’t help. Consider this simple example where a decision needs to be made on whether to lend money to an individual based on their salary. With previous loan data, it becomes challenging to fit a line that can answer this question.

Illustration of the inadequacy of linear regression for classification.

Nevertheless, if you use a curve in an “S” shape, as opposed to a straight line, it becomes easier to make this adjustment. When entering the salary value into the curved function, if the value is closer to the upper part of the curve, the answer will be yes. Otherwise, it will be no. To transform the line into a curve, it is necessary to introduce non-linearity.

Demonstration of the suitability of the “S”-shaped curve for classification.

A widely used function to introduce this nonlinearity is the logistic function, hence the name logistic regression. Here we can see the general formula for this function . We would like to highlight the fact that it is a fraction, with the numerator being the number 1, and the denominator being equal to 1 or greater. This means that the value of the function is limited between 0 and 1. In the denominator, there is an exponentiation with the base being a mathematical constant called Euler’s number, whose value is approximately 2.718.

f (z) = \frac{1}{1 + e ^{- z}}

Logistic function.

The logistic function has interesting characteristics. Firstly, when plotted on a two-dimensional graph, it takes the form of an “S”. This is why it is referred to as a sigmoid function. The second characteristic is that the values returned by the function range between 0 and 1. This makes it very suitable for binary classifications, where there are two classes for classification, such as yes or no, positive or negative, lend or deny a loan.

Graphical representation of the logistic function.

The third advantage is that it is a continuous function. This means that at any point on the curve, you can draw a tangent line and calculate the slope of the tangent at that point. This characteristic is used during the model fitting stage, that is, during its learning. If the model calculates a wrong value, we can calculate the slope at the point that was predicted and determine the direction in which we should adjust the model to reduce the error. We will see how this is done in future posts.

Graphical representation of a tangent line to a point in the logistic function.

Calculation Example

Let’s now explore how logistic regression works in detail. To start, we need a dataset designed for classification purposes. For example, consider a scenario where we collect data from various individuals to assess whether they qualify for a loan. This dataset might include features such as their salary and the amount of money they wish to borrow.

In reality, companies use a much larger set of information to make such decisions. But for our example, these two pieces of information will be sufficient. Each type of information is called a feature.

We will employ a pre-classified dataset that denotes whether loans were approved. They are divided into two groups: one for learning, which will be used to train the model, and another for testing the model. At the end of the learning process, and after passing the tests with a predefined accuracy level, we can say that the system is ready to approve or deny new loan requests. Let’s now build our logistic regression model.

The Dataset.

In the first part of the model, we take each input value, which in our example is the salary and loan request values. We multiply each by a weight, and then add them to a value known as bias. The resulting value is referred to as Z.

You may be wondering where these weight values and the bias value come from. Initially, these values are random and are adjusted during the model’s learning stage. They are called system parameters. Thus, the model learns which values should be assigned to the weights and bias to produce a correct output. The weights determine the importance assigned to each input attribute, while the bias corresponds to a general adjustment of the model.

Let’s illustrate this calculation with an example. Suppose a person earns $3000 and wants a loan of $10,000. Assume both weight 1 and weight 2 are set to 0.01. Also, suppose the bias value is set to one. In this case, the value of Z would be 131. This value doesn’t convey much information, and what we want to determine is whether we should or should not grant the loan. To do this, we will input the value of Z into the sigmoid function in the second step of the model’s execution.

Example of calculating the value of Z.

In the second step, we will use the value of Z as an exponent in the denominator of the sigmoid formula. Upon performing the calculation, the final value will be close to 1.

f (131) = \frac{1}{1 + e ^{- 131}} = 1

Application of the sigmoid function to Z.

This can be interpreted as a suggestion that the loan should be granted. In other words, any value equal to or greater than 0.5 can be considered a Yes, while values below 0.5 may be considered a No. However, the data table indicates that the final value should be 0, meaning the loan should be denied. An error has occurred, and the model needs to be adjusted to correct this mistake. This correction is carried out during the learning stage, and we will explain how this step is performed a bit later. Now, let’s provide a more graphical explanation of what has been done.

We can view the logistic regression calculation as a flow. We have input attributes, which can be seen as a sequence of values x1, x2, up to xn. These values are multiplied by their respective weights w1, w2, up to wn. The results of the multiplication are summed along with a bias value, generating a value Z.

Graphical representation of the logistic regression calculation flow.

This value is then applied to the sigmoid function, represented here by the Greek letter sigma. This function is also called the activation function, and, as we will see later, there are other possible activation functions besides the sigmoid function. The value generated by the activation function is the output value emitted by the network, represented here by the letter epsilon with a circumflex, also called epsilon hat. We use this notation to indicate that it is an estimated or calculated value, differentiating it from the expected or real value.

This operation is a metaphor for the functioning of a neuron. A neuron is connected to other neurons through filaments called dendrites. Neurons provide inputs to others through electrochemical stimuli. The strength of each stimulus depends on the strength of each connection, which is equivalent to the role played by weights in logistic regression. In a neuron, if the received stimuli surpass a certain threshold, the neuron fires, emitting an electrochemical signal through its axon, which is transmitted to other neurons. Due to this superficial resemblance and not reflecting the complexity of a neuron, logistic regression can be seen as an artificial neuron. And, as we will see later, the composition of these neurons in a network forms an artificial neural network.

Neuron

Before we continue, let’s briefly discuss the notation used in our formulas. When a variable represents a single value, we use a regular lowercase letter. When it represents a vector or matrix, we use a lowercase letter in bold.

Notational convention.

Now that we understand the calculation performed by logistic regression, let’s describe this step in a slightly more formal way. Given an input vector x with values x1, x2, up to xn, a weight vector w with values w1, w2, up to wn, and a bias value b. Then, the linear function Z is defined as the multiplication of the transpose of vector w with vector x, added to the bias.

The transpose of a vector is simply the transformation of a column vector into a row vector to facilitate multiplication. It is a transformation that is particularly helpful in matrix multiplication.

Vector multiplication.

The multiplication of two vectors involves multiplying each element pair-wise and then summing these values, resulting in a single scalar value. When we apply the logistic function, or sigma activation function, using the value of Z as an argument, we obtain a value between 0 and 1.

Application of the sigmoid function to Z.

Let’s consider another example. Given a vector with a salary value of 1 and a loan request value of 4, both values in thousands, and also given a weight vector with values 0.2 and 0.1, along with a bias value of 0.1, in this case, the value of Z would be 0.7. Applying the logistic function to this value yields a resulting value of 0.67.

Example of calculation in logistic regression.

Artificial Neuron Implementation

Now, let’s demonstrate how to implement this calculation using the Python programming language. First, let’s import the exp function from the math module, which, when given a number, returns Euler’s number raised to that power. We use this function to create our sigmoid function.

from math import exp

def sigmoide(x):
  return 1 / (1 + exp(-x))

# Input X[0] Wage, x[1] Loan
X = [[3,10],[1.5,11.8],[5.5,20.0],[3.5,15.2],[3.1,14.5],
     [7.6,15.5],[1.5,3.5],[6.9,8.5],[8.6,2.0],[7.66,3.5]]
Y = [0   , 0   , 0   , 0   , 0   , 1   , 1  , 1  ,   1, 1]

The input will be defined by a matrix of 10 rows and 2 columns, where the index 0 column contains the salary value, and the index 1 column contains the loan request value. The 10 rows represent the ten loan request cases.

To train the network, we also need the expected outputs, represented as a 10-position vector. In this vector, a value of 0 indicates that the request should be rejected, while a value of 1 signifies that the request should be accepted. The process of training the neuron will be detailed in the next post.

Next, we need to establish the initial values for the system parameters, namely the weights and bias. Let’s randomly choose values of 0.2 and 0.1 for the weights and 0.1 for the bias. After that, the program executes a loop, going through each request and calculating Z and the prediction. We also calculate the error for each request based on the difference between the prediction and the expected value. The program prints, for each request, the input values and what was calculated. See the code below.

Definition of parameters and calculation of outputs

m = len(X)

w=[0.2,0.1]
b=0.1

for j in range(m):
  z = X[j][0]*w[0]+X[j][1]*w[1]+b
  yhat = sigmoide(z)

  # Calculates error
  erro = yhat-Y[j]

  print(" Wage:{0:5.2f}  Wage:{1:5.2f} Expected value:{2} ".
        format( X[j][0]*1000, X[j][1], Y[j]))
  print(" z:{0:2.3f}   yhat:{1:2.3f}  error:{2:2.3f}\n ".format( z, yhat, erro))

Below is the initial segment of the program’s execution output. This calculation reflects the values of the current weights. If the output has errors, the weights need to be adjusted. For this, we have the learning stage. This stage will be described in the next Post.

Output issued by the program.

We have reached the end of our Post. If this Post was useful to you, please consider leaving a comment.

[Boost]

Alcione Paiva — Mon, 06 Jan 2025 22:16:20 +0000

Using LangChain to Search Your Own PDF Documents

Alcione Paiva ・ Jan 3

#llm #rag #langchain #pdf

Using LangChain to Search Your Own PDF Documents

Alcione Paiva — Fri, 03 Jan 2025 13:12:18 +0000

Artificial Intelligence applications like OpenAI's ChatGPT or Google's Gemini enable users to explore a wide range of topics and ask questions with ease. However, there are situations where the information we seek is not readily accessible to these tools but resides in private or less accessible documents. Even in such cases, these applications can leverage their advanced language processing capabilities to analyze these documents, extract relevant information, and provide targeted answers—eliminating the need to manually read through the entire content.

Using a language model to search for information outside of its training base is one of the applications of a technique called RAG (Retrieval-Augmented Generation). In this post, we will show how it's possible to easily create an application to search through local documents. In our example, we will use a PDF document, but the example can be adapted for various types of documents, such as TXT, MD, JSON, etc. To assist us in building our example, we will use the LangChain library.

LangChain is a powerful open-source framework that simplifies the construction of natural language processing (NLP) pipelines using large language models (LLMs). LangChain stands out for its ability to build complex process chains, combining different stages of text manipulation and data processing in a modular and scalable manner.

As a development environment, we will use Google Colab Notebook. The notebook can be viewed at this link.

Step 1 - Download the PDF Document

To begin, we'll need to download the PDF document that we want to process and analyze using the LangChain library. In our example, we will use a document from the GLOBAL FINANCIAL STABILITY REPORT conducted by the International Monetary Fund. In the Colab Notebook, the document can be downloaded with the following command:

!wget https://www.imf.org/-/media/Files/Publications/GFSR/2024/April/English/text.ashx -O text.pdf

Step 2 - Install the Libraries

Next, we need to install the necessary libraries using pip. In the Google Colab Notebook, you can install these libraries by running the following commands:

!pip install langchain
!pip install -U langchain-community
!pip install -U langchain-openai
!pip install chromadb
!pip install pypdf2

Here is the explanation of each library:

LangChain:
LangChain is the main library for building natural language processing (NLP) pipelines using large language models (LLMs). This library facilitates the integration of different stages of text manipulation and data processing, enabling the creation of advanced NLP applications.

LangChain-Community:
LangChain-Community is an extension of the LangChain library that includes additional modules and functionalities developed by the community. This extension allows users to benefit from contributions and improvements made by other developers, expanding the capabilities and functionalities available.

LangChain-OpenAI:
LangChain-OpenAI is a specific module for integration with OpenAI's language models, such as GPT-3 and GPT-4. This package allows developers to efficiently use the OpenAI API within the LangChain ecosystem, facilitating the construction of pipelines involving OpenAI's powerful language models.

ChromaDB:
ChromaDB is a database library designed for the efficient storage and management of data as vectors. It is important because textual elements are represented in the form of numeric vectors (embeddings) for analysis by the language model. ChromaDB facilitates the retrieval and manipulation of these vectors for tasks such as search and information retrieval.

PyPDF2:
PyPDF2 is a Python library that enables reading, manipulating, and extracting text from PDF files. This library is essential when working with PDF documents in NLP applications, allowing you to load and process the content of PDFs programmatically.

Step 3 - Import the Modules to Be Used

from pprint import pprint
import PyPDF2
import os
from google.colab import userdata

import openai
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain_openai import OpenAI

The pprint module (short for pretty-print) is used to format complex data structures in a way that is more readable and organized for humans. The os module will be used to store the API key value in an environment variable for accessing the OpenAI API. userdata is used in Google Colab to access and manipulate user data, facilitating operations that involve exchanging data between the notebook and the user's Colab environment. In this case, we will use it to obtain the OpenAI API key, which should be registered in the Colab secrets space. The other modules will be explained at the time of their use.

Step 4 - Reading the Document and Converting it to Text

In the code snippet below, a PDF file is read, the text contained in each of its pages is extracted, and a portion of this text is displayed.

# Open the PDF file
file_path = "./text.pdf"
pdf_file = open(file_path, "rb")

# Create an Object to read the PDF
pdf_reader = PyPDF2.PdfReader(pdf_file)

# Extract text from each page
pdf_text = ""
for page_num in range(len(pdf_reader.pages)):
    page = pdf_reader.pages[page_num]
    pdf_text += page.extract_text()

# Close PDF
pdf_file.close()

# Shows an excerpt of the text read
pdf_text[:2000]

Below are the first 2000 characters of the text extracted from the PDF (pdf_text[:2000]).

Step 5 - Splitting the Text

The next step is to split the text before vectorizing it, that is, before converting the words into vectors. Splitting is important because language models, especially those based on Transformers like BERT, GPT, etc., have a limit on the number of tokens (words or characters) they can process at once. Long texts that exceed this limit need to be divided into smaller parts to be processed correctly. Additionally, dividing the text into smaller parts allows each segment to maintain coherent context. If a text is too long and not split, the model might lose context or ignore important parts of the text. By splitting it into segments, we ensure that each part is meaningful and comprehensible on its own.
In our example, we will use LangChain's RecursiveCharacterTextSplitter. It is designed to split the text into smaller, coherent, and meaningful pieces.

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
texts = text_splitter.split_text(pdf_text)

In the example, we use a chunk_size of 1000, defining the maximum size of each segment, and a chunk_overlap of 100, defining the number of characters that overlap between consecutive segments. The overlap helps to maintain context between the segments.

Step 6 - Text Vectorization

The code snippet below sets up an environment to use the OpenAI API and creates a vector database using ChromaDB to store text embeddings.

os.environ['OPENAI_API_KEY'] = userdata.get("OPENAI_API_KEY")
persist_directory = 'db'

embedding = OpenAIEmbeddings()
vectordb = Chroma.from_texts(texts=texts,  embedding=embedding,  persist_directory=persist_directory)

os.environ: A dictionary in Python that contains the system's environment variables.
userdata.get("OPENAI_API_KEY"): Retrieves the OpenAI API key from the user data registered in Google Colab's secrets.
persist_directory: Sets the path of the directory where persistent data will be stored. In this case, it is set as 'db'.
OpenAIEmbeddings(): Creates an instance of embeddings (vectors) provided by OpenAI.
Chroma.from_texts(...): A method from the ChromaDB library that creates a vector database from a list of texts.
texts=texts: Passes the list of texts that will be converted into vectors and stored in the database.
embedding=embedding: Specifies the OpenAI embeddings object to be used for converting the texts into vector representations.
persist_directory=persist_directory: Sets the directory where the database will be saved and persisted.

Step 7 - Create an Object for Querying

Now we will create an object to query the text. In the code snippet below, an instance of RetrievalQA is created using a specific chain type. RetrievalQA is a class used to answer questions based on an index of documents. It is used to set up a question-and-answer system that combines information retrieval capabilities with a large language model.

qa = RetrievalQA.from_chain_type(llm=OpenAI(),
    chain_type="stuff", retriever=vectordb.as_retriever())

The from_chain_type(...) method of this class creates an instance based on the specified chain type.

Arguments:
llm=OpenAI(): Creates an instance of the OpenAI language model. This instance will be used to generate responses based on the retrieved text.
OpenAI(): This command calls the class or function that creates a connection with the OpenAI language model, using the previously configured API key.
chain_type="stuff": The chain type is set to "stuff." This indicates how the retrieved documents will be combined to form the final answer. In the case of "stuff," the documents are simply concatenated.
retriever=vectordb.as_retriever(): vectordb is a vector database being used to retrieve relevant documents. The as_retriever() method transforms this database into an object that can be used to search for documents.

Step 8 - Conducting the Search

In this final step, we perform the query. In this case, we will ask to "Analyze cyber incidents in the current context." Remember to ask politely.

query = "Please, analyze cyber incidents in the current context."
response = qa.invoke(query)
pprint(response['result'])

response = qa.invoke(query): This line uses the qa object (created in the previous code) to search for the answer to the question. The invoke() method takes the question as a parameter and returns a response in the variable response.

pprint(response['result']): This line prints the answer stored in the result key of the response dictionary. The pprint() function formats the output to make it easier to read, by indenting and aligning the text.
Below is the output issued by the query:

(' Cyber incidents, including cyber attacks, have increased almost doubled '
 'since before the COVID-19 pandemic. However, the total number of incidents '
 'and losses may still be underestimated due to factors such as lag in '
 'reporting and concerns about reputation. Improved reporting and data '
 'collection are needed, and supervisors should require firms to have response '
 'and recovery procedures in place. Ongoing digital transformation and '
 'technological innovation, as well as geopolitical tensions, exacerbate the '
 'risk of cyber incidents. Recent significant incidents, such as a ransomware '
 'attack on a major Chinese bank, highlight the potential impact of cyber '
 'incidents on financial stability. ')

We have reached the end of this Post. If this Post was useful to you, please consider leaving a comment.