<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sam Der</title>
    <description>The latest articles on DEV Community by Sam Der (@samder).</description>
    <link>https://dev.to/samder</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2132806%2F5efad557-ee03-44be-acb6-c4e933a4f9a4.jpg</url>
      <title>DEV Community: Sam Der</title>
      <link>https://dev.to/samder</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/samder"/>
    <language>en</language>
    <item>
      <title>Getting Started With Local LLMs Using AnythingLLM</title>
      <dc:creator>Sam Der</dc:creator>
      <pubDate>Sat, 07 Jun 2025 17:42:27 +0000</pubDate>
      <link>https://dev.to/samder/getting-started-with-local-llms-using-anythingllm-47gl</link>
      <guid>https://dev.to/samder/getting-started-with-local-llms-using-anythingllm-47gl</guid>
      <description>&lt;p&gt;In this tutorial, &lt;a href="https://anythingllm.com/" rel="noopener noreferrer"&gt;AnythingLLM&lt;/a&gt; will be used to load and ask questions to a model. AnythingLLM provides a desktop interface to allow users to send queries to a variety of different models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; At the time of writing, AnythingLLM &lt;a href="https://docs.anythingllm.com/installation-desktop/system-requirements#recommended-configuration-for-anythingllm" rel="noopener noreferrer"&gt;recommends&lt;/a&gt; that user machines have at least 2 GB of RAM, a 2-core CPU, and 5 GB of storage.&lt;/p&gt;

&lt;p&gt;Navigate to the homepage, download the desktop application for your device, and follow the installation instructions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjevcnh4k3mi8xuftfhla.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjevcnh4k3mi8xuftfhla.png" alt="Selecting an LLM. This should appear upon first startup." width="800" height="662"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At this point, you can choose from any model that's supported. The first selection, AnythingLLM, contains a collection of open source models, including Llama3.2 and Gemma, that do not require any additional setup and are free to use. The remaining selections allow you to connect AnythingLLM to external, third-party providers such as OpenAI and HuggingFace, through API keys or self-hosted endpoints. For this tutorial, we'll choose AnythingLLM's Gemma 3 1 billion parameter model.&lt;/p&gt;

&lt;p&gt;After proceeding, the model will begin downloading in the background.  In the meantime, you can play around with your settings and create a workspace. AnythingLLM allocates workspaces where you can create chat threads, leverage different models, and configure custom settings for each one. For example, it is entirely possible (and even encouraged!) to create a new workspace that uses Llama 3.2 with a LLM temperature of 0.1.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwg1zilkkkqguexvl2wme.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwg1zilkkkqguexvl2wme.png" alt="AnythingLLM homepage. Lots of customizations are available!" width="800" height="435"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once your model is downloaded, you can begin asking it questions:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzrcbeztpg43s9gd82z5v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzrcbeztpg43s9gd82z5v.png" alt="Asking a basic question to Gemma" width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can also attach files to the workspace and prompt the model about them for RAG-based capabilities. For example, I uploaded a Python file containing the code below and asked Gemma what it does:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;hello_world&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello world&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;hello_world&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fha4endsyhno8vi67bzfa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fha4endsyhno8vi67bzfa.png" alt="Uploading files to the workspace" width="800" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0f134ozi03vi0t8zzex2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0f134ozi03vi0t8zzex2.png" alt="Prompting Gemma about the above Python script" width="800" height="430"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As expected, it is able to parse through the code and correctly outputs what does.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>anythingllm</category>
      <category>tutorial</category>
      <category>ai</category>
    </item>
    <item>
      <title>Training a Convolutional Neural Network on the CIFAR-10 Dataset</title>
      <dc:creator>Sam Der</dc:creator>
      <pubDate>Sat, 26 Oct 2024 18:43:14 +0000</pubDate>
      <link>https://dev.to/samder/training-a-convolutional-neural-network-on-the-cifar-10-dataset-1mdm</link>
      <guid>https://dev.to/samder/training-a-convolutional-neural-network-on-the-cifar-10-dataset-1mdm</guid>
      <description>&lt;p&gt;I've created a &lt;a href="https://www.kaggle.com/code/samder123/training-a-cnn-on-cifar-10" rel="noopener noreferrer"&gt;Kaggle notebook&lt;/a&gt; where I create a convolutional neural network to recognize images from the &lt;a href="https://www.cs.toronto.edu/~kriz/cifar.html" rel="noopener noreferrer"&gt;CIFAR-10 dataset&lt;/a&gt;. feel free to take a look!&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;And with that, that concludes this machine learning series! This was meant to be an introductory series to machine learning, so I'll continue to publish posts later down the line on different topics, including natural language processing. Thanks for reading and I hope you've been able to learn something through my posts so far!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Introduction to Convolutional Neural Networks</title>
      <dc:creator>Sam Der</dc:creator>
      <pubDate>Sat, 19 Oct 2024 18:16:49 +0000</pubDate>
      <link>https://dev.to/samder/introduction-to-convolutional-neural-networks-4lmp</link>
      <guid>https://dev.to/samder/introduction-to-convolutional-neural-networks-4lmp</guid>
      <description>&lt;p&gt;Now that we know how to create simple neural networks and how to optimize them, we can transition to more complex variants of neural networks. This week, I'll be covering one that's used very often in image processing and computer vision: &lt;strong&gt;convolutional neural networks (CNN)&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why CNNs?
&lt;/h2&gt;

&lt;p&gt;In my first article, I gave a walkthrough of how to leverage &lt;code&gt;scikit-learn&lt;/code&gt; to recognize digits by training a simple 

&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;kk &lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;k&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
-nearest neighbors model. You might think that you can apply the same methodology to recognizing certain objects in images.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Here's why that doesn't work.&lt;/strong&gt; First off, all the images of digits shared the same black color and white background. In reality, digits are written down on paper or are shown on various displays with differing backgrounds and colors. Our 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;kk &lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;k&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
-nearest neighbors model would be overwhelmed at the staggering amount of colors and differences in the image, implode on itself, and fail to make accurate predictions.&lt;/p&gt;


  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftcowd333csnt5y0lirf5.png" width="400" height="402"&gt;You might have seen the meme where some models failed to tell the difference between a chihuahua and a muffin. They do look pretty similar!
  


&lt;p&gt;A CNN, on the other hand, is specifically designed to handle different aspects in an image, including colors and backgrounds across multiple environments. There are three concepts that CNNs rely on: convolution, pooling, and the fully-connected layer in that order.&lt;/p&gt;
&lt;h2&gt;
  
  
  What is Convolution?
&lt;/h2&gt;

&lt;p&gt;In the context of image processing, the &lt;strong&gt;convolution&lt;/strong&gt; operation slides a small matrix, known as &lt;strong&gt;kernels&lt;/strong&gt; or equivalently &lt;strong&gt;filters&lt;/strong&gt;, across the input data of the pixels in the image and calculates their dot product. The dimensions of a kernel and its values depend on the operation being performed. If, for example, you were trying to blur an image, the values would be somewhat small around the edges of the matrix, but larger towards the center to more closely (but not exactly) preserve the color of the pixel that lies in the center kernel. Different operations lead to different dot products, or in other words, different aspects of an image, and a CNN leverages these kernels to deduce the characteristics of an image.&lt;/p&gt;

&lt;p&gt;Sometimes, for images with lots of pixels, rather than convoluting at each individual pixel, the process can be sped up by only processing rows at intervals (such as every other row or every third row). In the first case, you would be performing convolution with a &lt;strong&gt;stride length&lt;/strong&gt; of 2 and in the second case, you would be doing it with a stride length of 3. Using a stride length greater than 1 causes the dot product to be less representative of an image as a result of skipping out on pixels in the image. At the same time, however, the chances of overfitting on an image would be greatly reduced. Again, a balance needs to be determined.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/fpt39y6yvr4"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;The above animation is a bit of a simplification, as pixels consist of RGB values rather than a single scalar, but the concept is still there. As the yellow rectangle moves across the matrix, that submatrix is multiplied by the kernel and then summed to a number that becomes the corresponding element of the convoluted image.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Pooling?
&lt;/h2&gt;

&lt;p&gt;You can imagine that the more pixels an image has, the more computationally intensive convolution and interpreting the results gets for the model. &lt;strong&gt;Pooling&lt;/strong&gt; takes the convoluted image and compresses it in a manner similar to the convolution process. Slide a small matrix over the values and then calculate some sort of metric on them that outputs a single value for that element of the smaller matrix. The dimensions of the matrix in question once again depends on the developer's desires. If they are more focused on smoothing out noise, then the maximum of those values should be used. Otherwise, they can use the average of the values.&lt;/p&gt;

&lt;p&gt;The output of the pooling process is another smaller matrix, which can be subsequently fed back into multiple other convolution-pooling loop layers.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fully-Connected Layer
&lt;/h2&gt;

&lt;p&gt;Once processing in the convolution and pooling layers is complete, the matrix is then flattened into a column vector (like what we've seen with MNIST), and then passed into a simple feed forward neural network with backpropagation. Activation functions such as ReLU and SoftMax can then be used to fit non-linear data and classify the results respectively.&lt;/p&gt;

&lt;p&gt;This part of the CNN as a whole is called the &lt;strong&gt;fully-connected layer&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  That's It For Now!
&lt;/h3&gt;

&lt;p&gt;Take some time to digest what you read. It's complicated! I'll provide a PyTorch demo in next week's article. Thanks for reading and see you in the demo!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Optimizing Your Neural Networks</title>
      <dc:creator>Sam Der</dc:creator>
      <pubDate>Sat, 12 Oct 2024 16:37:38 +0000</pubDate>
      <link>https://dev.to/samder/optimizing-your-neural-networks-133h</link>
      <guid>https://dev.to/samder/optimizing-your-neural-networks-133h</guid>
      <description>&lt;p&gt;Last week I posted an &lt;a href="https://dev.to/samder/mlseries-an-introduction-to-neural-networks-2mo5"&gt;article&lt;/a&gt; about how to build simple neural networks, specifically multi-layer perceptrons. This article will dive deeper into the specifics of neural networks to discuss how we can maximize the performance of a neural network by tweaking its configurations.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Long to Train Your Model For
&lt;/h2&gt;

&lt;p&gt;When training a model, you might think that if you train your model enough, the model will become flawless. This may be true, but that only holds for the dataset it was trained on. In fact, if you give it another set of data where the values are different, the model could output completely incorrect predictions.&lt;/p&gt;

&lt;p&gt;To understand this further, let's say you were practicing every single day for your driver's exam by driving in a straight line without moving the wheel. (Please don't do this.) While you would probably perform very well on the drag strip, if you were told to make a left turn on the actual exam, you might end up turning into a STOP sign instead.&lt;/p&gt;

&lt;p&gt;This phenomenon is called &lt;strong&gt;overfitting&lt;/strong&gt;. Your model can learn all the aspects and patterns of the data it's trained on but if it learns a pattern that adheres to the training dataset too closely, then when given a new dataset, your model will perform poorly. At the same time, if you don't train your model enough, then your model won't be able to recognize patterns in other datasets properly. In this case, you would be &lt;strong&gt;underfitting&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj382kqvadgfx4oh7qlr5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj382kqvadgfx4oh7qlr5.png" width="630" height="470"&gt;&lt;/a&gt;&lt;br&gt;An example of overfitting. The validation loss, represented by the orange line is gradually increasing while the training loss, represented by the blue line is decreasing.
  &lt;/p&gt;

&lt;p&gt;In the example above, a great position to stop training your model would be right when the validation loss reaches its minimum. It's possible to do this with &lt;strong&gt;early stopping&lt;/strong&gt;, which stops training once there is no improvement in validation loss after an arbitrary number of training cycles (epochs).&lt;/p&gt;

&lt;p&gt;Training your model is all about finding a balance between overfitting and underfitting while utilizing early stopping if necessary. That's why your training dataset should be as representative as possible of your overall population so that your model can more accurately make predictions on data it hasn't seen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Loss Functions
&lt;/h2&gt;

&lt;p&gt;Perhaps one of the most important training configurations that can be tweaked is the &lt;strong&gt;loss function&lt;/strong&gt;, which is the "inaccuracy" between your model's predictions and their actual values. The "inaccuracy" can be represented mathematically in many different ways, one of the most common being &lt;strong&gt;mean squared error (MSE)&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;MSE=∑i=1n(yiˉ−yi)2n\text{MSE} = \frac{\sum_{i=1}^n (\bar{y_i} - y_i)^2}{n}

&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;MSE&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;n&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mop"&gt;&lt;span class="mop op-symbol small-op"&gt;∑&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;span class="mrel mtight"&gt;=&lt;/span&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;n&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord accent"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="accent-body"&gt;&lt;span class="mord"&gt;ˉ&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;−&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose"&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;p&gt;where 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;yiˉ\bar{y_i} &lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord accent"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="accent-body"&gt;&lt;span class="mord"&gt;ˉ&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is the model's prediction and 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;yiy_i &lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is the true value. There's a similar variant called &lt;strong&gt;mean absolute error (MAE)&lt;/strong&gt;&lt;/p&gt;


&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;MAE=∑i=1n∣yiˉ−yi∣n\text{MAE} = \frac{\sum_{i=1}^n |\bar{y_i} - y_i|}{n}

&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord text"&gt;&lt;span class="mord"&gt;MAE&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;n&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mop"&gt;&lt;span class="mop op-symbol small-op"&gt;∑&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;span class="mrel mtight"&gt;=&lt;/span&gt;&lt;span class="mord mtight"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;n&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;∣&lt;/span&gt;&lt;span class="mord accent"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="accent-body"&gt;&lt;span class="mord"&gt;ˉ&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;−&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="msupsub"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="sizing reset-size6 size3 mtight"&gt;&lt;span class="mord mathnormal mtight"&gt;i&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mord"&gt;∣&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;p&gt;What's the difference between these two and which one is better? The real answer is that &lt;strong&gt;it depends on a variety of factors&lt;/strong&gt;. Let's consider a simple 2-dimensional linear regression example.&lt;/p&gt;

&lt;p&gt;In many cases, there can be data points that act &lt;strong&gt;outliers&lt;/strong&gt;, points that are far away from other data points. In terms of linear regression, this means that there are a few points on the 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;xyxy &lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
-plane that are far away from the rest of them. If you remember from your statistics classes, it's points like these that can significantly affect the linear regression line that's calculated.&lt;/p&gt;


  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqnx175xbpy4dg9qfcdtp.png" width="800" height="828"&gt;A simple graph with points on (1, 1), (2, 2), (3, 3), and (4, 4)
  


&lt;p&gt;If you wanted to think of a line that could cross all four points, then 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;y=xy = x &lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 would be a great choice because this line would go through all the points.&lt;/p&gt;


  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4n3ckku6yvcoojqpo0xu.png" width="800" height="831"&gt;A simple graph with points on (1, 1), (2, 2), (3, 3), and (4, 4) and the line 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;y=xy = x &lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 going through it
  


&lt;p&gt;However, let's say I decide to add another point at 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;(5,1)(5, 1) &lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;5&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord"&gt;1&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
. Now what should the regression line be? Well, it turns out that it's completely different: 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;y=0.2x+1.6y = 0.2x + 1.6 &lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;0.2&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;1.6&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7to5w7m35oa1da49oc3e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7to5w7m35oa1da49oc3e.png" width="800" height="801"&gt;&lt;/a&gt;&lt;br&gt;A simple graph with points on (1, 1), (2, 2), (3, 3), (4, 4), and (5,1) with a linear regression line going through it.
  &lt;/p&gt;

&lt;p&gt;Given the previous data points, the line would expect that the value of 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;yy &lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 when 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;x=5x = 5 &lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;5&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 is 5, but because of the outlier and its MSE, the regression line is "pulled downwards" significantly.&lt;/p&gt;

&lt;p&gt;This is just a simple example, but this poses a question that you, as a machine learning developer, need to stop and think about: &lt;em&gt;How sensitive should my model be to outliers?&lt;/em&gt; If you want your model to be more sensitive to outliers, then you would choose a metric like MSE, because in that case, errors involving outliers are more pronounced due to the squaring and your model will adjust itself to minimize that. Otherwise, you would choose a metric like MAE, which doesn't care as much about outliers.&lt;/p&gt;
&lt;h2&gt;
  
  
  Optimizers
&lt;/h2&gt;

&lt;p&gt;In my previous post, I also discussed the concept of backpropagation, gradient descent, and how they work to minimize the loss of the model. The &lt;strong&gt;gradient&lt;/strong&gt; is a vector that points towards the direction of greatest change. A gradient descent algorithm will calculate this vector and move in the exact opposite direction so that it eventually reaches a minimum.&lt;/p&gt;

&lt;p&gt;Most optimizers have a specific &lt;strong&gt;learning rate&lt;/strong&gt;, commonly denoted as 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;α\alpha &lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;α&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 that they adhere to. Essentially, this represents how much the algorithm will move towards the minimum each time it calculates the gradient. &lt;strong&gt;Be careful of setting your learning rate to be too large! Your algorithm may never reach the minimum due to the large steps it takes that could repeatedly skip over the minimum.&lt;/strong&gt;&lt;/p&gt;


  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fat62biif1cavdi3c8uan.png" width="800" height="540"&gt;[Tensorflow's neural network playground](https://playground.tensorflow.org) showing what can happen if you set the learning rate to be too large. Notice how the testing and training loss are both `NaN`.
  


&lt;p&gt;Going back to gradient descent, while it is effective in minimizing loss, this might significantly slow down the training process as the loss function is calculated on the entire dataset. There are several alternatives to gradient descent that are more efficient but have their respective downsides.&lt;/p&gt;
&lt;h3&gt;
  
  
  Stochastic Gradient Descent
&lt;/h3&gt;

&lt;p&gt;One of the most popular alternatives to standard gradient descent is a variant called &lt;strong&gt;stochastic gradient descent (SGD)&lt;/strong&gt;. As with gradient descent, SGD has a fixed learning rate. But rather than running through the entire dataset like gradient descent, SGD takes a small sample is randomly selected and the weights of your neural network are updated based on the sample instead. Eventually, the parameter values converge to a point that approximately (but not exactly) minimizes the loss function. This is one of the downsides of SGD, as it doesn't always reach the exact minimum. Additionally, similar to gradient descent, it remains sensitive to the learning rate that you set.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Adam Optimizer
&lt;/h3&gt;

&lt;p&gt;The name, Adam, is derived from &lt;strong&gt;adaptive moment estimation&lt;/strong&gt;. It essentially combines two variants of SGD to adjust the learning rate for each input parameter based on how often it gets updated during each training iteration (&lt;strong&gt;adaptive learning rate&lt;/strong&gt;). At the same time, it also keeps track of past gradient calculations as a moving average to smooth out updates (&lt;strong&gt;momentum&lt;/strong&gt;). However, because of its momentum characteristic, it can sometimes take longer to converge than other algorithms.&lt;/p&gt;
&lt;h2&gt;
  
  
  Putting it All Together
&lt;/h2&gt;

&lt;p&gt;Now for an example!&lt;/p&gt;

&lt;p&gt;I've created an example walkthrough on &lt;a href="https://colab.research.google.com/drive/1A1ipCan3BIJ1MNlkcX6Y234kckoKqQoP?usp=sharing" rel="noopener noreferrer"&gt;Google Colab&lt;/a&gt; that uses PyTorch to create a neural network that learns a simple linear relationship.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;If you're a bit new to Python, don't worry! I've included some explanations that discuss what's going on in each section.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reflection
&lt;/h2&gt;

&lt;p&gt;While this obviously doesn't cover everything about optimizing neural networks, I wanted to at least cover a few of the most important concepts that you can take advantage of while training your own models. Hopefully you've learned something this week and thanks for reading!&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>pytorch</category>
      <category>python</category>
    </item>
    <item>
      <title>An Introduction to Neural Networks</title>
      <dc:creator>Sam Der</dc:creator>
      <pubDate>Sat, 05 Oct 2024 16:57:40 +0000</pubDate>
      <link>https://dev.to/samder/mlseries-an-introduction-to-neural-networks-2mo5</link>
      <guid>https://dev.to/samder/mlseries-an-introduction-to-neural-networks-2mo5</guid>
      <description>&lt;p&gt;The rise of machine learning has been accompanied by a similar rise of tools and libraries to aid machine learning applications. From basic model training with &lt;code&gt;scikit-learn&lt;/code&gt; to deep learning frameworks such as PyTorch and Tensorflow, these tools have come a long way in terms of advancing machine learning development.&lt;/p&gt;

&lt;p&gt;But as someone starting out with machine learning, how does one make the jump from &lt;code&gt;scikit-learn&lt;/code&gt; to PyTorch or Tensorflow? And how do we use these frameworks to create &lt;strong&gt;neural networks&lt;/strong&gt;, one of the most powerufl models in machine learning today?&lt;/p&gt;

&lt;h2&gt;
  
  
  What is a Neural Network?
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;neural network&lt;/strong&gt; is a different way of training a model. Rather than using an algorithm such as &lt;em&gt;k&lt;/em&gt;-nearest neighbors to make predictions, a neural network takes in input(s), assigns weights to each input parameter, and performs mathematical calculations using these weights (along with several bias constants) to arrive at a prediction. The network will then adjust the weights based on the correctness of its predictions.&lt;/p&gt;

&lt;p&gt;Simple neural networks generally consist of three layers, each of which contain nodes, or &lt;strong&gt;neurons&lt;/strong&gt;. Input flows in one direction through the layers to the output. Specifically, these kinds of networks are called &lt;strong&gt;multi-layer perceptrons&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Input Layer
&lt;/h3&gt;

&lt;p&gt;As the name suggests, each node in this layer represents an input parameter such as an individual pixel in an image.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hidden Layers
&lt;/h3&gt;

&lt;p&gt;These layers are where the mathematical calculations are performed. Given the input layer, the nodes in the hidden layer multiply the inputs by their weights and add any biases that provide additional weighting to an input to adjust its predictions. In short, a node essentially performs the following linear calculation:&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;y=wx+b
\begin{gather*}
y = wx + b
\end{gather*}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mtable"&gt;&lt;span class="col-align-c"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;w&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;b&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;p&gt;
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;yy &lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 can then get fed to a node in another hidden layer where it will be processed again or directly to the output layer.&lt;/p&gt;

&lt;p&gt;It's possible to transform 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;yy &lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 using what's called an &lt;strong&gt;activation function&lt;/strong&gt;. These functions transform it into a non-linear function, which is extremely helpful for allowing the model to fit data points that are non-linear. One example is the Rectified Linear Unit (ReLU) function, which simply takes the maximum of 
&lt;span class="katex-element"&gt;
  &lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;yy &lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/span&gt;
 and 0 and returns that as the node's output:&lt;/p&gt;


&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;y=max⁡(0,wx+b)
\begin{gather*}
y = \max(0, wx + b)
\end{gather*}
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mtable"&gt;&lt;span class="col-align-c"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;y&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mop"&gt;max&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord"&gt;0&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;w&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mbin"&gt;+&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;b&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;


&lt;h3&gt;
  
  
  Output Layer
&lt;/h3&gt;

&lt;p&gt;The output layer represents the final output, or the prediction the neural network makes given some input data. This can range from probabilities that an input data point is of a certain class to regression predictions.&lt;/p&gt;

&lt;h3&gt;
  
  
  How a Neural Network Trains Itself
&lt;/h3&gt;

&lt;p&gt;When creating a neural network, you specify a variety of different parameters, ranging from &lt;strong&gt;optimizers&lt;/strong&gt; to &lt;strong&gt;loss functions&lt;/strong&gt;. I'll be going over these in more detail in my next article, but for now, the most important concept to know of is &lt;strong&gt;gradient descent&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A model trains itself by evaluating its performance on a training dataset, calculating a loss function and adjusting the model's weights in response. A developer can choose the exact loss function to use, such as &lt;strong&gt;mean absolute error&lt;/strong&gt;, as well as the way the weights are adjusted. One of the most common ways that weights are adjusted is through &lt;strong&gt;backpropagation&lt;/strong&gt;, which uses gradient descent to fine-tune the model's weights so that it eventually reaches a local minimum of the loss function.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; A local minimum is not necessarily equivalent to a global minimum. This means that the best weights for your model might not be found even when using backpropagation!&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding What's Available To You
&lt;/h2&gt;

&lt;p&gt;Now that we understand how simple neural networks work, let's view what options are available to build our own!&lt;/p&gt;

&lt;p&gt;In training a model, you can split your dataset into training and testing datasets. However, you can further split your dataset into training, &lt;strong&gt;validation&lt;/strong&gt;, and testing datasets, where the validation dataset acts as a sanity check for your model before actually testing itself on the testing dataset.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;scikit-learn&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.neural_network&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MLPClassifier&lt;/span&gt;

&lt;span class="n"&gt;mlp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MLPClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;hidden_layer_sizes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;relu&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_iter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;validation_fraction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;early_stopping&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;mlp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;mlp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is pretty simple! In this case, &lt;code&gt;mlp&lt;/code&gt; uses a multi-layer perceptron as a classifier. As you can see, you can customize a variety of parameters including the number of nodes in your layers, the activation function, the optimizer, and the maximum number of training iterations by the neural network.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Side Note:&lt;/strong&gt; &lt;code&gt;sgd&lt;/code&gt; stands for &lt;strong&gt;Stochastic Gradient Descent&lt;/strong&gt;, which is more efficient in terms of training than standard gradient descent in that it uses smaller batches in determining how to adjust weights rather than the whole dataset.&lt;/p&gt;

&lt;h3&gt;
  
  
  PyTorch
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;NeuralNetwork&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Module&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;flatten&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Flatten&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linear_relu_stack&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Sequential&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ReLU&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ReLU&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Softmax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;logits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;linear_relu_stack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;logits&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;NeuralNetwork&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;some&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt; &lt;span class="n"&gt;by&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt; &lt;span class="n"&gt;tensor&lt;/span&gt; &lt;span class="n"&gt;representing&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;

&lt;span class="c1"&gt;# We still need to transform the model's output into something that makes sense!
# Right now, it's just the output of a bunch of linear calculations and activation functions.
# We can use the softmax activation function to convert our output into a prediction.
# Specifically, this returns the index of the class that has the maximum probability.
&lt;/span&gt;&lt;span class="n"&gt;logits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pred_probab&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Softmax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;logits&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pred_probab&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;argmax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is slightly more complicated, but much more customizable than &lt;code&gt;scikit-learn&lt;/code&gt;. For each layer, you can define the number of inputs and outputs and you can also define activation functions within each one.&lt;/p&gt;

&lt;p&gt;The snippet above also introduces something called a tensor. A &lt;strong&gt;tensor&lt;/strong&gt; is essentially an optimized version of a matrix (of any size) for machine learning purposes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tensorflow
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tensorflow&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;keras&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tensorflow.keras&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;layers&lt;/span&gt;

&lt;span class="n"&gt;input_shape&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Sequential&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;units&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;relu&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_shape&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;input_shape&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;units&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;relu&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;units&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;relu&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;units&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mae&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;validation_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_valid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_valid&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, this is an example of a neural network in Tensorflow. It's quite similar to the PyTorch, with a few minor differences. Ultimately, the library you choose is up to you!&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimizing Your Neural Networks
&lt;/h2&gt;

&lt;p&gt;As you can see, there are a lot of options to configure when building your neural networks. I'll be going over these in my next article! See you in the next one!&lt;/p&gt;

&lt;p&gt;Moving forward, we'll stick with PyTorch.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Important:&lt;/strong&gt; Taking Advantage of Your GPUs
&lt;/h2&gt;

&lt;p&gt;Machine learning training obviously requires a lot of intensive computation. With GPUs being much more efficient at mathematical calculations than CPUs, the more advanced machine learning libraries cater to GPUs by allowing you to write code that takes advantage of them during training.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unfortunately, &lt;code&gt;scikit-learn&lt;/code&gt; does not support using GPUs to train models.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tensorflow&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;device&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/device:GPU:0&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# GPU code here
&lt;/span&gt;    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;

&lt;span class="n"&gt;device&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_available&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mps&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;backends&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_available&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Using &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; device&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
    </item>
    <item>
      <title>Getting Started With Machine Learning Using MNIST</title>
      <dc:creator>Sam Der</dc:creator>
      <pubDate>Sat, 28 Sep 2024 15:38:53 +0000</pubDate>
      <link>https://dev.to/samder/mlseries-getting-started-with-machine-learning-using-mnist-42bp</link>
      <guid>https://dev.to/samder/mlseries-getting-started-with-machine-learning-using-mnist-42bp</guid>
      <description>&lt;h1&gt;
  
  
  What is Machine Learning?
&lt;/h1&gt;

&lt;p&gt;Machine learning is one of the hottest topics in computer science in recent years. With the rise of software like ChatGPT by OpenAI, Copilot by Microsoft, and countless other large language models that are powered by machine learning, it's becoming ever more relevant in our daily lives. From helping us write emails to answering our questions about life and the pursuit of happiness, these chatbots seem almost omnipotent.&lt;/p&gt;

&lt;p&gt;But how do we get from machine learning to large language models? How do we get from training a model that can, for example, identify handwritten digits to something as complex as ChatGPT? I'll be publishing posts every week that goes further in depth into answering these questions, but for now, let's start with the basics.&lt;/p&gt;

&lt;h1&gt;
  
  
  The Basics
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Machine learning&lt;/strong&gt; is exactly what it is: making a machine (or more specifically, a computer) learn. More precisely, it's writing code to train our computers so that it can be used to make predictions later down the line. The result of this training is called a &lt;strong&gt;model&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In the handwritten digits example, you might think that we can just use existing images to train a model, but we can't just use any image of number. Since we each have our own way of writing digits, we can't just train on a screenshot of a number and then hope it can extrapolate to our handwriting. The numbers for a specific font are rendered exactly the same way on a computer screen whereas each time we write a number, there are extremely minor differences that could convince a computer that the number we wrote down is a different one. We need to train our model using the same type of data we want it to predict on. &lt;em&gt;In other words, we want to train our model with images of handwritten digits so that it can do the same with images it hasn't seen before.&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Google Colab
&lt;/h1&gt;

&lt;p&gt;To get started, we can head over to &lt;a href="//colab.google.com"&gt;Google Colab&lt;/a&gt;, a notebook-based coding environment. What that means is that you can split your code into individual cells that you can re-run and you can view the output of each one.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsg41hzdi0lx4tgejuv5k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsg41hzdi0lx4tgejuv5k.png" width="800" height="363"&gt;&lt;/a&gt;&lt;br&gt;A simple example using the Fibonacci sequence. Competitive programmers know what I'm doing here :). If you don't, then hopefully you learned something new!
  &lt;/p&gt;

&lt;p&gt;Notice how you don't need to use the &lt;code&gt;print&lt;/code&gt; function to view output in this scenario. The return value from the last line of a cell gets outputted by default, but if you want to see output from other lines, then you'll have to use &lt;code&gt;print&lt;/code&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Training A Model With &lt;code&gt;scikit-learn&lt;/code&gt; To Recognize Digits
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Getting Our Data
&lt;/h2&gt;

&lt;p&gt;There's a popular library to get started with machine learning called &lt;a href="https://scikit-learn.org/stable/" rel="noopener noreferrer"&gt;&lt;code&gt;scikit-learn&lt;/code&gt;&lt;/a&gt; that's already installed in Colab. We can use this to train our model using &lt;a href="https://yann.lecun.com/exdb/mnist/" rel="noopener noreferrer"&gt;the MNIST dataset of handwritten digits&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Importing the necessary libraries and functions that will be used later
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.datasets&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;fetch_openml&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;accuracy_score&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.neighbors&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;KNeighborsClassifier&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;mnist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fetch_openml&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mnist_784&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;mnist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwp2d2blccy83h9cqy8z8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwp2d2blccy83h9cqy8z8.png" width="800" height="208"&gt;&lt;/a&gt;&lt;br&gt;&lt;strong&gt;The dataset:&lt;/strong&gt; Each row represents an image and each column represents the weight of a pixel in that image.
  &lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;mnist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3kimne064fz2d5e9rqkn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3kimne064fz2d5e9rqkn.png" width="149" height="401"&gt;&lt;/a&gt;&lt;br&gt;&lt;strong&gt;The targets:&lt;/strong&gt; This contains the digits each image corresponds to. For example, the image in the first row is a 5.
  &lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mnist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_numpy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;imshow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;cmap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gray&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fizkelacdc0jk3p55eot1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fizkelacdc0jk3p55eot1.png" width="" height=""&gt;&lt;/a&gt;&lt;br&gt;An example image rendered using &lt;code&gt;numpy&lt;/code&gt; and &lt;code&gt;matplotlib&lt;/code&gt;.
  &lt;/p&gt;

&lt;p&gt;Now that we have our images, let's proceed with building our model! Since we're classifying images to a certain digit and also just to keep things simple, we should use a &lt;strong&gt;classification&lt;/strong&gt; algorithm. One of the most common algorithms of this type is &lt;strong&gt;&lt;em&gt;k&lt;/em&gt;-nearest neighbors&lt;/strong&gt;, which looks at data points that are near in terms of Euclidean distance (or some other distance metric) and outputs the classification that is most common among them. This is appropriate in our scenario because images that have similar pixel weights should represent the same number.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running Our Classifier
&lt;/h2&gt;

&lt;p&gt;To train our model and test it appropriately, we'll need to split this large dataset into smaller training and testing datasets. Luckily, &lt;code&gt;scikit-learn&lt;/code&gt; provides a function for us named &lt;code&gt;train_test_split&lt;/code&gt; to do just that! However, it requires that &lt;code&gt;mnist.data&lt;/code&gt; and &lt;code&gt;mnist.target&lt;/code&gt; be combined into one data structure first. We can join them together using the &lt;code&gt;join&lt;/code&gt; method provided by the &lt;code&gt;pandas&lt;/code&gt; library.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;mnist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mnist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mnist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_numpy&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The test size can be adjusted arbitrarily but usually 80/20 is a good split.&lt;/p&gt;

&lt;p&gt;We can then use the &lt;code&gt;KNeighborsClassifer&lt;/code&gt; that we imported at the beginning to train our model using our training dataset and then test the accuracy of our model using the test dataset. This snippet looks at the 5 nearest neighbors but feel free to change the number as you see fit!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;KNeighborsClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_neighbors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;accuracy_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This output means that our model had an accuracy of 97% with the test dataset. That's a pretty good score! And congratulations, you just built your first machine learning model!&lt;/p&gt;

&lt;h1&gt;
  
  
  Where To Go From Here?
&lt;/h1&gt;

&lt;p&gt;We used the &lt;code&gt;scikit-learn&lt;/code&gt; library in this tutorial, but for more intensive machine learning applications that, for instance, require more customization of your parameters or utilization of compute resources, it would be better to use a library like PyTorch or Tensorflow. I'll be going over both in later posts!&lt;/p&gt;

&lt;p&gt;In the meantime, thanks for sticking with me. See you in the next one!&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
