<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alcione Paiva</title>
    <description>The latest articles on DEV Community by Alcione Paiva (@alcionepaiva).</description>
    <link>https://dev.to/alcionepaiva</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2614765%2F0f2cde7e-5a9e-44ad-adc3-5194cb8d450b.png</url>
      <title>DEV Community: Alcione Paiva</title>
      <link>https://dev.to/alcionepaiva</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/alcionepaiva"/>
    <language>en</language>
    <item>
      <title>Artificial Neurons: The Heart of AI</title>
      <dc:creator>Alcione Paiva</dc:creator>
      <pubDate>Wed, 15 Jan 2025 13:50:36 +0000</pubDate>
      <link>https://dev.to/alcionepaiva/artificial-neurons-the-heart-of-ai-25kg</link>
      <guid>https://dev.to/alcionepaiva/artificial-neurons-the-heart-of-ai-25kg</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsvzmjlx7mvti367fd69z.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsvzmjlx7mvti367fd69z.jpeg" alt="Image description" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Artificial Intelligence&lt;/strong&gt; has generated a lot of excitement recently due to the advances made by large &lt;strong&gt;language models (LLM)&lt;/strong&gt;. This success has motivated a large number of people to try to enter the field to benefit from its growth. However, most texts do not address the fundamental basis of these neural networks: the artificial neuron. We believe that this knowledge is the foundation for a solid understanding of artificial neural networks. In this tutorial, we will describe the functioning of an &lt;strong&gt;artificial neuron&lt;/strong&gt;, also called &lt;strong&gt;logistic regression&lt;/strong&gt;. Despite its simplicity, the artificial neuron is very useful for solving various classification problems, such as spam detection, diabetes prediction, credit granting, among others. &lt;/p&gt;

&lt;h2&gt;
  
  
  Classification of Machine Learning Systems
&lt;/h2&gt;

&lt;p&gt;To better understand this type of technique, it is important to be familiar with a way of &lt;strong&gt;classifying machine learning models&lt;/strong&gt;. Machine Learning is a sub-field of Artificial Intelligence, aiming at the development of systems that can learn and improve automatically from data or information acquisition. We can categorize machine learning models into &lt;strong&gt;supervised&lt;/strong&gt;, &lt;strong&gt;unsupervised&lt;/strong&gt;, and &lt;strong&gt;reinforcement learning&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In &lt;strong&gt;supervised models&lt;/strong&gt;, the system learns from examples. In the case of &lt;strong&gt;unsupervised techniques&lt;/strong&gt;, the system detects patterns by examining data without these patterns being presented beforehand. Finally, in the third class of models, &lt;strong&gt;reinforcement learning&lt;/strong&gt;, the system learns from its actions and the feedback received in terms of rewards.&lt;/p&gt;

&lt;p&gt;The Artificial Neuron, in the form of &lt;strong&gt;logistic regression&lt;/strong&gt;, is a supervised learning technique. &lt;strong&gt;Supervised models&lt;/strong&gt; can be further divided into &lt;strong&gt;classification&lt;/strong&gt; systems and &lt;strong&gt;regression&lt;/strong&gt; systems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fut9bqcd3djqucbuhrrcz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fut9bqcd3djqucbuhrrcz.png" alt="Image description" width="800" height="192"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;&lt;center&gt;Classification of machine learning models.&lt;/center&gt;&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Logistic regression
&lt;/h2&gt;

&lt;p&gt;In &lt;strong&gt;classification models&lt;/strong&gt;, the system tries to identify which class is correct given an input. For example, based on a person’s financial data, the system attempts to determine whether it’s appropriate to lend money or deny the loan. Another example is when the system receives data about a specific animal and, based on that information, identifies whether it’s a mammal, reptile, bird, or fish.&lt;/p&gt;

&lt;p&gt;In the case of &lt;strong&gt;regression&lt;/strong&gt;, the system attempts to output a value based on the received data. For instance, using financial data, the system might try to &lt;strong&gt;predict&lt;/strong&gt; the inflation rate — a technique commonly employed in the financial market.&lt;/p&gt;

&lt;p&gt;Despite its name, &lt;strong&gt;logistic regression is used for classification&lt;/strong&gt;. Classification can be binary, where there are only two classes, such as yes or no, positive or negative. It can also be multiclass, for example, classifying whether a word is a verb, noun, adjective, adverb, and so forth.&lt;/p&gt;

&lt;p&gt;To &lt;strong&gt;distinguish logistic regression from linear regression&lt;/strong&gt;, we can observe the graphical difference using an example with two inputs or dimensions. Using only two inputs makes visualization easier. In the case of linear regression applied to a set of points in a plane, our objective is to establish a line that effectively captures the underlying trend of point distribution in the plane.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1sj2deyo5zde1qi9y30c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1sj2deyo5zde1qi9y30c.png" alt="Image description" width="800" height="668"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;&lt;center&gt;Example of linear regression.&lt;/center&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Once this line is adjusted, we can use it to predict one axis value based on the other. If it’s a three-dimensional space, we’ll try to fit a &lt;strong&gt;plane&lt;/strong&gt;. If there are more dimensions, we’ll attempt to fit a &lt;strong&gt;hyperplane&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In the case of logistic regression, what we aim for is to return a decision, such as yes or no, or a classification. So, drawing a line won’t help. Consider this simple example where a decision needs to be made on whether to lend money to an individual based on their salary. With previous loan data, it becomes challenging to fit a line that can answer this question.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj7eyme0yz3gqx4ewgs56.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj7eyme0yz3gqx4ewgs56.png" alt="Image description" width="800" height="475"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;&lt;center&gt;Illustration of the inadequacy of linear regression for classification.&lt;/center&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Nevertheless, if you use a curve in an &lt;strong&gt;“S” shape&lt;/strong&gt;, as opposed to a straight line, it becomes easier to make this adjustment. When entering the salary value into the curved function, if the value is closer to the upper part of the curve, the answer will be yes. Otherwise, it will be no. To transform the line into a curve, it is necessary to &lt;strong&gt;introduce non-linearity&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F00m77zpeurgyz7cmsqqw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F00m77zpeurgyz7cmsqqw.png" alt="Image description" width="800" height="448"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;&lt;center&gt;Demonstration of the suitability of the “S”-shaped curve for classification.&lt;/center&gt;&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;A widely used function to introduce this nonlinearity is the &lt;strong&gt;logistic function&lt;/strong&gt;, hence the name logistic regression. Here we can see the general formula for this function . We would like to highlight the fact that it is a fraction, with the numerator being the number 1, and the denominator being equal to 1 or greater. This means that the value of the function is limited between 0 and 1. In the denominator, there is an exponentiation with the base being a mathematical constant called Euler’s number, whose value is approximately 2.718.&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;f(z)=11+e−z
 f(z) = \frac{1}{1 + e^{-z}}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;f&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;z&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;e&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;−&lt;/span&gt;&lt;span class="mord mathnormal mtight"&gt;z&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;em&gt;&lt;center&gt;Logistic function.&lt;/center&gt;&lt;/em&gt;&lt;br&gt;
&lt;br&gt;

&lt;p&gt;The logistic function has interesting characteristics. Firstly, when plotted on a two-dimensional graph, it takes the form of an “S”. This is why it is referred to as a &lt;strong&gt;sigmoid function&lt;/strong&gt;. The second characteristic is that the values returned by the function range between 0 and 1. This makes it very suitable for binary classifications, where there are two classes for classification, such as yes or no, positive or negative, lend or deny a loan.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnshxadd7owqspsxp59tt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnshxadd7owqspsxp59tt.png" alt="Image description" width="800" height="532"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;&lt;center&gt;Graphical representation of the logistic function.&lt;/center&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The third advantage is that it is a &lt;strong&gt;continuous function&lt;/strong&gt;. This means that at any point on the curve, you can draw a tangent line and calculate the slope of the tangent at that point. This characteristic is used during the model fitting stage, that is, during its learning. If the model calculates a wrong value, we can calculate the slope at the point that was predicted and determine the direction in which we should adjust the model to reduce the error. We will see how this is done in future posts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fphxtgizjdvs9yy2f2dbw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fphxtgizjdvs9yy2f2dbw.png" alt="Image description" width="800" height="646"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;&lt;center&gt;Graphical representation of a tangent line to a point in the logistic function.&lt;/center&gt;&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Calculation Example
&lt;/h2&gt;

&lt;p&gt;Let’s now explore &lt;strong&gt;how logistic regression works in detail&lt;/strong&gt;. To start, we need a &lt;strong&gt;dataset&lt;/strong&gt; designed for classification purposes. For example, consider a scenario where we collect data from various individuals to assess whether they qualify for a loan. This dataset might include &lt;strong&gt;features&lt;/strong&gt; such as their salary and the amount of money they wish to borrow.&lt;/p&gt;

&lt;p&gt;In reality, companies use a much larger set of information to make such decisions. But for our example, these two pieces of information will be sufficient. Each type of information is called a &lt;strong&gt;feature&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We will employ a pre-classified dataset that denotes whether loans were approved. They are divided into two groups: one for learning, which will be used to &lt;strong&gt;train&lt;/strong&gt; the model, and another for &lt;strong&gt;testing&lt;/strong&gt; the model. At the end of the learning process, and after passing the tests with a predefined &lt;strong&gt;accuracy level&lt;/strong&gt;, we can say that the system is ready to approve or deny new loan requests. Let’s now build our logistic regression model.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn0c7rp44lhrdxjojajq1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn0c7rp44lhrdxjojajq1.png" alt="Image description" width="800" height="348"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;&lt;center&gt;The Dataset.&lt;/center&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In the first part of the model, we take each input value, which in our example is the salary and loan request values. We multiply each by a weight, and then add them to a value known as &lt;strong&gt;bias&lt;/strong&gt;. The resulting value is referred to as Z.&lt;/p&gt;

&lt;p&gt;You may be wondering where these weight values and the bias value come from. Initially, these values are random and are adjusted during the model’s learning stage. They are called system parameters. Thus, the model learns which values should be assigned to the weights and bias to produce a correct output. The weights determine the importance assigned to each input attribute, while the bias corresponds to a general adjustment of the model.&lt;/p&gt;

&lt;p&gt;Let’s illustrate this calculation with an example. Suppose a person earns $3000 and wants a loan of $10,000. Assume both weight 1 and weight 2 are set to 0.01. Also, suppose the bias value is set to one. In this case, the value of Z would be 131. This value doesn’t convey much information, and what we want to determine is whether we should or should not grant the loan. To do this, we will input the value of Z into the sigmoid function in the second step of the model’s execution.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frxbe49o14iof9ghoy6rv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frxbe49o14iof9ghoy6rv.png" alt="Image description" width="800" height="268"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;&lt;center&gt;Example of calculating the value of Z.&lt;/center&gt;&lt;/em&gt;&lt;br&gt;
&lt;br&gt;&lt;/p&gt;

&lt;p&gt;In the second step, we will use the value of Z as an exponent in the denominator of the sigmoid formula. Upon performing the calculation, the final value will be close to 1.&lt;/p&gt;


&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;f(131)=11+e−131=1
 f(131) = \frac{1}{1 + e^{-131}}=1
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;f&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;131&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;e&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mtight"&gt;−&lt;/span&gt;&lt;span class="mord mtight"&gt;131&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;em&gt;&lt;center&gt;Application of the sigmoid function to Z.&lt;/center&gt;&lt;/em&gt;&lt;br&gt;
&lt;br&gt;

&lt;p&gt;This can be interpreted as a suggestion that the loan should be granted. In other words, any value equal to or greater than 0.5 can be considered a &lt;strong&gt;Yes&lt;/strong&gt;, while values below 0.5 may be considered a &lt;strong&gt;No&lt;/strong&gt;. However, the data table indicates that the final value should be 0, meaning the loan should be denied. &lt;strong&gt;An error has occurred&lt;/strong&gt;, and the model needs to be adjusted to correct this mistake. This correction is carried out during the &lt;strong&gt;learning stage&lt;/strong&gt;, and we will explain how this step is performed a bit later. Now, let’s provide a more graphical explanation of what has been done.&lt;/p&gt;

&lt;p&gt;We can view the logistic regression calculation as a flow. We have input attributes, which can be seen as a sequence of values &lt;em&gt;x1, x2,&lt;/em&gt; up to &lt;em&gt;xn&lt;/em&gt;. These values are multiplied by their respective weights &lt;em&gt;w1, w2&lt;/em&gt;, up to &lt;em&gt;wn&lt;/em&gt;. The results of the multiplication are summed along with a bias value, generating a value Z.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fba6xnjmp0t325il1o6ub.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fba6xnjmp0t325il1o6ub.png" alt="Image description" width="800" height="305"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;&lt;center&gt;Graphical representation of the logistic regression calculation flow.&lt;/center&gt;&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;This value is then applied to the sigmoid function, represented here by the Greek letter &lt;strong&gt;sigma&lt;/strong&gt;. This function is also called the &lt;strong&gt;activation function&lt;/strong&gt;, and, as we will see later, there are other possible activation functions besides the sigmoid function. The value generated by the activation function is the output value emitted by the network, represented here by the letter epsilon with a circumflex, also called &lt;strong&gt;epsilon hat&lt;/strong&gt;. We use this notation to indicate that it is an &lt;strong&gt;estimated&lt;/strong&gt; or calculated value, differentiating it from the &lt;strong&gt;expected&lt;/strong&gt; or real value.&lt;/p&gt;

&lt;p&gt;This operation is a metaphor for the functioning of a neuron. A neuron is connected to other neurons through filaments called &lt;strong&gt;dendrites&lt;/strong&gt;. Neurons provide inputs to others through electrochemical stimuli. The strength of each stimulus depends on the strength of each connection, which is equivalent to the role played by weights in logistic regression. In a neuron, if the received stimuli surpass a certain threshold, the neuron fires, emitting an electrochemical signal through its &lt;strong&gt;axon&lt;/strong&gt;, which is transmitted to other neurons. Due to this superficial resemblance and not reflecting the complexity of a neuron, logistic regression can be seen as an artificial neuron. And, as we will see later, the composition of these neurons in a network forms an artificial neural network.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgk61rt1i3cgplchosu8w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgk61rt1i3cgplchosu8w.png" alt="Image description" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;&lt;center&gt;Neuron&lt;center&gt;&lt;/center&gt;
&lt;/center&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Before we continue, let’s briefly discuss the notation used in our formulas. When a variable represents a single value, we use a regular lowercase letter. When it represents a vector or matrix, we use a lowercase letter in bold.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fztukaxj3j7m75nd0citz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fztukaxj3j7m75nd0citz.png" alt="Image description" width="800" height="821"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;&lt;center&gt;Notational convention.&lt;center&gt;&lt;/center&gt;
&lt;/center&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Now that we understand the calculation performed by logistic regression, let’s describe this step in a slightly more formal way. Given an input vector &lt;strong&gt;x&lt;/strong&gt; with values &lt;em&gt;x1, x2&lt;/em&gt;, up to &lt;em&gt;xn&lt;/em&gt;, a weight vector &lt;strong&gt;w&lt;/strong&gt; with values &lt;em&gt;w1, w2&lt;/em&gt;, up to &lt;em&gt;wn&lt;/em&gt;, and a bias value &lt;em&gt;b&lt;/em&gt;. Then, the linear function &lt;em&gt;Z&lt;/em&gt; is defined as the multiplication of the transpose of vector &lt;strong&gt;w&lt;/strong&gt; with vector &lt;strong&gt;x&lt;/strong&gt;, added to the &lt;strong&gt;bias&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;transpose&lt;/strong&gt; of a vector is simply the transformation of a column vector into a row vector to facilitate multiplication. It is a transformation that is particularly helpful in matrix multiplication.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbd2g0qopio5fsplt6sia.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbd2g0qopio5fsplt6sia.png" alt="Image description" width="800" height="354"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;&lt;center&gt;Vector multiplication.&lt;/center&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The multiplication of two vectors involves multiplying each element pair-wise and then summing these values, resulting in a single &lt;strong&gt;scalar value&lt;/strong&gt;. When we apply the logistic function, or sigma activation function, using the value of Z as an argument, we obtain a value between 0 and 1.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftp9191mkhvh4g7jpw163.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftp9191mkhvh4g7jpw163.png" alt="Image description" width="800" height="96"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;&lt;center&gt;Application of the sigmoid function to Z.&lt;/center&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Let’s consider another example. Given a vector with a salary value of 1 and a loan request value of 4, both values in thousands, and also given a weight vector with values 0.2 and 0.1, along with a bias value of 0.1, in this case, the value of Z would be 0.7. Applying the logistic function to this value yields a resulting value of 0.67.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fce9zuwz38q437yqhdlk9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fce9zuwz38q437yqhdlk9.png" alt="Image description" width="800" height="546"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Example of calculation in logistic regression.&lt;/em&gt; &lt;/p&gt;
&lt;h2&gt;
  
  
  Artificial Neuron Implementation
&lt;/h2&gt;

&lt;p&gt;Now, let’s demonstrate how to implement this calculation using the &lt;strong&gt;Python&lt;/strong&gt; programming language. First, let’s import the &lt;code&gt;exp&lt;/code&gt; function from the &lt;code&gt;math module&lt;/code&gt;, which, when given a number, returns Euler’s number raised to that power. We use this function to create our sigmoid function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from math import exp

def sigmoide(x):
  return 1 / (1 + exp(-x))

# Input X[0] Wage, x[1] Loan
X = [[3,10],[1.5,11.8],[5.5,20.0],[3.5,15.2],[3.1,14.5],
     [7.6,15.5],[1.5,3.5],[6.9,8.5],[8.6,2.0],[7.66,3.5]]
Y = [0   , 0   , 0   , 0   , 0   , 1   , 1  , 1  ,   1, 1]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The input will be defined by a matrix of 10 rows and 2 columns, where the index 0 column contains the salary value, and the index 1 column contains the loan request value. The 10 rows represent the ten loan request cases.&lt;/p&gt;

&lt;p&gt;To train the network, we also need the expected outputs, represented as a 10-position vector. In this vector, a value of 0 indicates that the request should be rejected, while a value of 1 signifies that the request should be accepted. The process of &lt;strong&gt;training the neuron&lt;/strong&gt; will be detailed in the next post.&lt;/p&gt;

&lt;p&gt;Next, we need to establish the initial values for the system parameters, namely the weights and bias. Let’s randomly choose values of 0.2 and 0.1 for the weights and 0.1 for the bias. After that, the program executes a loop, going through each request and calculating &lt;em&gt;Z&lt;/em&gt; and the prediction. We also calculate the error for each request based on the difference between the &lt;strong&gt;prediction&lt;/strong&gt; and the &lt;strong&gt;expected&lt;/strong&gt; value. The program prints, for each request, the input values and what was calculated. See the code below.&lt;/p&gt;

&lt;p&gt;Definition of parameters and calculation of outputs&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;m = len(X)

w=[0.2,0.1]
b=0.1

for j in range(m):
  z = X[j][0]*w[0]+X[j][1]*w[1]+b
  yhat = sigmoide(z)

  # Calculates error
  erro = yhat-Y[j]

  print(" Wage:{0:5.2f}  Wage:{1:5.2f} Expected value:{2} ".
        format( X[j][0]*1000, X[j][1], Y[j]))
  print(" z:{0:2.3f}   yhat:{1:2.3f}  error:{2:2.3f}\n ".format( z, yhat, erro))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Below is the initial segment of the program’s execution output. This calculation reflects the values of the current weights. If the output has errors, the weights need to be adjusted. For this, we have the &lt;strong&gt;learning stage&lt;/strong&gt;. This stage will be described in the next Post.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw4xlrrd8lo90wzqwqggd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw4xlrrd8lo90wzqwqggd.png" alt="Image description" width="800" height="1145"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;&lt;center&gt;Output issued by the program.&lt;/center&gt;&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;We have reached the end of our Post. If this Post was useful to you, please consider leaving a comment.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>neuron</category>
      <category>python</category>
      <category>deeplearning</category>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>Alcione Paiva</dc:creator>
      <pubDate>Mon, 06 Jan 2025 22:16:20 +0000</pubDate>
      <link>https://dev.to/alcionepaiva/-3fl8</link>
      <guid>https://dev.to/alcionepaiva/-3fl8</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/alcionepaiva" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2614765%2F0f2cde7e-5a9e-44ad-adc3-5194cb8d450b.png" alt="alcionepaiva"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/alcionepaiva/using-langchain-to-search-your-own-pdf-documents-23k3" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Using LangChain to Search Your Own PDF Documents&lt;/h2&gt;
      &lt;h3&gt;Alcione Paiva ・ Jan 3&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#llm&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#rag&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#langchain&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#pdf&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>langchain</category>
      <category>ai</category>
      <category>webdev</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Using LangChain to Search Your Own PDF Documents</title>
      <dc:creator>Alcione Paiva</dc:creator>
      <pubDate>Fri, 03 Jan 2025 13:12:18 +0000</pubDate>
      <link>https://dev.to/alcionepaiva/using-langchain-to-search-your-own-pdf-documents-23k3</link>
      <guid>https://dev.to/alcionepaiva/using-langchain-to-search-your-own-pdf-documents-23k3</guid>
      <description>&lt;p&gt;Artificial Intelligence applications like OpenAI's ChatGPT or Google's Gemini enable users to explore a wide range of topics and ask questions with ease. However, there are situations where the information we seek is not readily accessible to these tools but resides in private or less accessible documents. Even in such cases, these applications can leverage their advanced language processing capabilities to analyze these documents, extract relevant information, and provide targeted answers—eliminating the need to manually read through the entire content.&lt;/p&gt;

&lt;p&gt;Using a language model to search for information outside of its training base is one of the applications of a technique called &lt;strong&gt;RAG&lt;/strong&gt; (Retrieval-Augmented Generation). In this post, we will show how it's possible to easily create an application to search through local documents. In our example, we will use a PDF document, but the example can be adapted for various types of documents, such as TXT, MD, JSON, etc. To assist us in building our example, we will use the &lt;strong&gt;LangChain&lt;/strong&gt; library.&lt;/p&gt;

&lt;p&gt;LangChain is a powerful open-source framework that simplifies the construction of &lt;strong&gt;natural language processing&lt;/strong&gt; (NLP) pipelines using large language models (LLMs). LangChain stands out for its ability to build complex process chains, combining different stages of text manipulation and data processing in a modular and scalable manner.&lt;/p&gt;

&lt;p&gt;As a development environment, we will use &lt;strong&gt;Google Colab Notebook&lt;/strong&gt;. The notebook can be viewed at this &lt;a href="https://medium.com/r/?url=https%3A%2F%2Fcolab.research.google.com%2Fdrive%2F15mxuBqAtXU7FHIwRheTucTxREeYrYVsO%3Fusp%3Dsharing" rel="noopener noreferrer"&gt;link&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1 - Download the PDF Document
&lt;/h2&gt;

&lt;p&gt;To begin, we'll need to download the PDF document that we want to process and analyze using the LangChain library. In our example, we will use a document from the GLOBAL FINANCIAL STABILITY REPORT conducted by the International Monetary Fund. In the Colab Notebook, the document can be downloaded with the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!wget https://www.imf.org/-/media/Files/Publications/GFSR/2024/April/English/text.ashx -O text.pdf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2 - Install the Libraries
&lt;/h2&gt;

&lt;p&gt;Next, we need to install the necessary libraries using pip. In the Google Colab Notebook, you can install these libraries by running the following commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;!pip install langchain
!pip install -U langchain-community
!pip install -U langchain-openai
!pip install chromadb
!pip install pypdf2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is the explanation of each library:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;LangChain:&lt;/code&gt;&lt;br&gt;
LangChain is the main library for building natural language processing (NLP) pipelines using large language models (LLMs). This library facilitates the integration of different stages of text manipulation and data processing, enabling the creation of advanced NLP applications.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;LangChain-Community:&lt;/code&gt;&lt;br&gt;
LangChain-Community is an extension of the LangChain library that includes additional modules and functionalities developed by the community. This extension allows users to benefit from contributions and improvements made by other developers, expanding the capabilities and functionalities available.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;LangChain-OpenAI:&lt;/code&gt;&lt;br&gt;
LangChain-OpenAI is a specific module for integration with OpenAI's language models, such as GPT-3 and GPT-4. This package allows developers to efficiently use the OpenAI API within the LangChain ecosystem, facilitating the construction of pipelines involving OpenAI's powerful language models.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ChromaDB:&lt;/code&gt;&lt;br&gt;
ChromaDB is a database library designed for the efficient storage and management of data as vectors. It is important because textual elements are represented in the form of numeric vectors (embeddings) for analysis by the language model. ChromaDB facilitates the retrieval and manipulation of these vectors for tasks such as search and information retrieval.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;PyPDF2:&lt;/code&gt;&lt;br&gt;
PyPDF2 is a Python library that enables reading, manipulating, and extracting text from PDF files. This library is essential when working with PDF documents in NLP applications, allowing you to load and process the content of PDFs programmatically.&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 3 - Import the Modules to Be Used
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from pprint import pprint
import PyPDF2
import os
from google.colab import userdata

import openai
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain_openai import OpenAI
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The &lt;code&gt;pprint&lt;/code&gt; module (short for &lt;em&gt;pretty-print&lt;/em&gt;) is used to format complex data structures in a way that is more readable and organized for humans. The &lt;code&gt;os&lt;/code&gt; module will be used to store the API key value in an environment variable for accessing the OpenAI API. &lt;code&gt;userdata&lt;/code&gt; is used in Google Colab to access and manipulate user data, facilitating operations that involve exchanging data between the notebook and the user's Colab environment. In this case, we will use it to obtain the OpenAI API key, which should be registered in the Colab secrets space. The other modules will be explained at the time of their use.&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 4 - Reading the Document and Converting it to Text
&lt;/h2&gt;

&lt;p&gt;In the code snippet below, a PDF file is read, the text contained in each of its pages is extracted, and a portion of this text is displayed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Open the PDF file
file_path = "./text.pdf"
pdf_file = open(file_path, "rb")

# Create an Object to read the PDF
pdf_reader = PyPDF2.PdfReader(pdf_file)

# Extract text from each page
pdf_text = ""
for page_num in range(len(pdf_reader.pages)):
    page = pdf_reader.pages[page_num]
    pdf_text += page.extract_text()

# Close PDF
pdf_file.close()

# Shows an excerpt of the text read
pdf_text[:2000]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Below are the first 2000 characters of the text extracted from the PDF (pdf_text[:2000]).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fszywrryi40276jukgxhy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fszywrryi40276jukgxhy.png" alt="Image description" width="800" height="132"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5 - Splitting the Text
&lt;/h2&gt;

&lt;p&gt;The next step is to split the text before vectorizing it, that is, before &lt;strong&gt;converting the words into vectors&lt;/strong&gt;. Splitting is important because language models, especially those based on Transformers like BERT, GPT, etc., have a limit on the number of tokens (words or characters) they can process at once. Long texts that exceed this limit need to be divided into smaller parts to be processed correctly. Additionally, dividing the text into smaller parts allows each segment to maintain coherent context. If a text is too long and not split, the model might lose context or ignore important parts of the text. By splitting it into segments, we ensure that each part is meaningful and comprehensible on its own.&lt;br&gt;
In our example, we will use LangChain's &lt;code&gt;RecursiveCharacterTextSplitter&lt;/code&gt;. It is designed to split the text into smaller, coherent, and meaningful pieces.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
texts = text_splitter.split_text(pdf_text)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the example, we use a &lt;code&gt;chunk_size&lt;/code&gt; of 1000, defining the maximum size of each segment, and a &lt;code&gt;chunk_overlap&lt;/code&gt; of 100, defining the number of characters that overlap between consecutive segments. The overlap helps to maintain context between the segments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6 - Text Vectorization
&lt;/h2&gt;

&lt;p&gt;The code snippet below sets up an environment to use the OpenAI API and creates a vector database using ChromaDB to store text embeddings.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;os.environ['OPENAI_API_KEY'] = userdata.get("OPENAI_API_KEY")
persist_directory = 'db'

embedding = OpenAIEmbeddings()
vectordb = Chroma.from_texts(texts=texts,  embedding=embedding,  persist_directory=persist_directory)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;os.environ&lt;/code&gt;: A dictionary in Python that contains the system's environment variables.&lt;br&gt;
&lt;code&gt;userdata.get("OPENAI_API_KEY")&lt;/code&gt;: Retrieves the OpenAI API key from the user data registered in Google Colab's secrets.&lt;br&gt;
&lt;code&gt;persist_directory&lt;/code&gt;: Sets the path of the directory where persistent data will be stored. In this case, it is set as 'db'.&lt;br&gt;
&lt;code&gt;OpenAIEmbeddings()&lt;/code&gt;: Creates an instance of embeddings (vectors) provided by OpenAI.&lt;br&gt;
&lt;code&gt;Chroma.from_texts(...)&lt;/code&gt;: A method from the ChromaDB library that creates a vector database from a list of texts.&lt;br&gt;
&lt;code&gt;texts=texts&lt;/code&gt;: Passes the list of texts that will be converted into vectors and stored in the database.&lt;br&gt;
&lt;code&gt;embedding=embedding&lt;/code&gt;: Specifies the OpenAI embeddings object to be used for converting the texts into vector representations.&lt;br&gt;
&lt;code&gt;persist_directory=persist_directory&lt;/code&gt;: Sets the directory where the database will be saved and persisted.&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 7 - Create an Object for Querying
&lt;/h2&gt;

&lt;p&gt;Now we will create an object to query the text. In the code snippet below, an instance of &lt;code&gt;RetrievalQA&lt;/code&gt; is created using a specific chain type. &lt;code&gt;RetrievalQA&lt;/code&gt; is a class used to answer questions based on an index of documents. It is used to set up a question-and-answer system that combines information retrieval capabilities with a large language model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;qa = RetrievalQA.from_chain_type(llm=OpenAI(),
    chain_type="stuff", retriever=vectordb.as_retriever())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;from_chain_type(...)&lt;/code&gt; method of this class creates an instance based on the specified chain type.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Arguments:&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;llm=OpenAI():&lt;/code&gt; Creates an instance of the OpenAI language model. This instance will be used to generate responses based on the retrieved text.&lt;br&gt;
&lt;code&gt;OpenAI():&lt;/code&gt; This command calls the class or function that creates a connection with the OpenAI language model, using the previously configured API key.&lt;br&gt;
&lt;code&gt;chain_type="stuff":&lt;/code&gt; The chain type is set to "stuff." This indicates how the retrieved documents will be combined to form the final answer. In the case of "stuff," the documents are simply concatenated.&lt;br&gt;
&lt;code&gt;retriever=vectordb.as_retriever():&lt;/code&gt; vectordb is a vector database being used to retrieve relevant documents. The as_retriever() method transforms this database into an object that can be used to search for documents.&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 8 - Conducting the Search
&lt;/h2&gt;

&lt;p&gt;In this final step, we perform the query. In this case, we will ask to "Analyze cyber incidents in the current context." Remember to ask politely.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;query = "Please, analyze cyber incidents in the current context."
response = qa.invoke(query)
pprint(response['result'])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;response = qa.invoke(query):&lt;/code&gt; This line uses the qa object (created in the previous code) to search for the answer to the question. The invoke() method takes the question as a parameter and returns a response in the variable response.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;pprint(response['result']):&lt;/code&gt; This line prints the answer stored in the result key of the response dictionary. The pprint() function formats the output to make it easier to read, by indenting and aligning the text.&lt;br&gt;
Below is the output issued by the query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(' Cyber incidents, including cyber attacks, have increased almost doubled '
 'since before the COVID-19 pandemic. However, the total number of incidents '
 'and losses may still be underestimated due to factors such as lag in '
 'reporting and concerns about reputation. Improved reporting and data '
 'collection are needed, and supervisors should require firms to have response '
 'and recovery procedures in place. Ongoing digital transformation and '
 'technological innovation, as well as geopolitical tensions, exacerbate the '
 'risk of cyber incidents. Recent significant incidents, such as a ransomware '
 'attack on a major Chinese bank, highlight the potential impact of cyber '
 'incidents on financial stability. ')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We have reached the end of this Post. If this Post was useful to you, please consider leaving a comment.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>rag</category>
      <category>langchain</category>
      <category>pdf</category>
    </item>
  </channel>
</rss>
