<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rishi Agrawal</title>
    <description>The latest articles on DEV Community by Rishi Agrawal (@rishiagrawal2609).</description>
    <link>https://dev.to/rishiagrawal2609</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F767398%2F611971d7-0f8d-41a4-958c-c41159f3999c.jpeg</url>
      <title>DEV Community: Rishi Agrawal</title>
      <link>https://dev.to/rishiagrawal2609</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rishiagrawal2609"/>
    <language>en</language>
    <item>
      <title>NLP: Deep dive Term Frequency</title>
      <dc:creator>Rishi Agrawal</dc:creator>
      <pubDate>Sat, 17 May 2025 11:38:02 +0000</pubDate>
      <link>https://dev.to/rishiagrawal2609/nlp-deep-dive-term-frequency-b2c</link>
      <guid>https://dev.to/rishiagrawal2609/nlp-deep-dive-term-frequency-b2c</guid>
      <description>&lt;h2&gt;
  
  
  What is NLP?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;NLP&lt;/strong&gt; stands for &lt;strong&gt;Natural Language Processing&lt;/strong&gt;. It is a field of artificial intelligence that focuses on enabling computers to understand, interpret, and generate human language in a way that is both meaningful and useful.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2vjm938tksnzq2x1gep8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2vjm938tksnzq2x1gep8.png" alt="NLP"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You might say, Computer understand only 0's and 1's, how come it can understand Human Language? If you had the same question, lets dive in how we make the computer understand what the natural language mean. &lt;/p&gt;

&lt;h2&gt;
  
  
  All words are just Numbers.
&lt;/h2&gt;

&lt;p&gt;Yes, the title is correct—&lt;strong&gt;all words are just numbers&lt;/strong&gt; when it comes to how computers understand language. In Natural Language Processing (NLP), every word is transformed into a numerical representation, typically a vector. These vectors exist in a high-dimensional space known as &lt;strong&gt;vector space&lt;/strong&gt;, where words with similar meanings are located closer to one another.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F80q0h4oyfqgmmlyagp95.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F80q0h4oyfqgmmlyagp95.png" alt="Vector Space"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This mathematical transformation allows the computer to process and analyze human language efficiently. Instead of understanding words the way humans do, the computer compares these vectors, calculates distances, and identifies patterns based on proximity. &lt;/p&gt;

&lt;p&gt;When a task such as prediction, translation, or classification is required, the computer doesn't think in words—it simply locates the vector that is most similar or "nearest" to the input and produces the corresponding output. The rich complexity of human language is distilled down into numerical patterns that machines can rapidly process, making NLP both a fascinating and powerful field in artificial intelligence.&lt;/p&gt;

&lt;p&gt;There are various methods for this and in this one we are going to see the fundamentals: the TF Algorithm, which is used for finding the importance of the word.&lt;/p&gt;

&lt;p&gt;Let's explore this mathematically and then we will implement it python. Why maths? Reason is below!&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F75cfnrmid0ps5o5ew3ui.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F75cfnrmid0ps5o5ew3ui.png" alt="Maths meme"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Term Frequency
&lt;/h3&gt;

&lt;p&gt;Term frequency refers to the number of occurrence of each word in the document. Let see this by a sample:&lt;/p&gt;

&lt;p&gt;"There was an amazing community event happened in Chennai last week and everyone loved all the talks and described it as one of the greatest community events of the entire month. There were 4 talks."&lt;/p&gt;

&lt;p&gt;Above statement is the sample corpus. &lt;/p&gt;

&lt;p&gt;Term Frequency is expressed mathematically as:&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="katex-element"&gt;
  &lt;span class="katex-display"&gt;&lt;span class="katex"&gt;&lt;span class="katex-mathml"&gt;TF(x,d)=Number of times the word has occuredTotal number of words in Document
TF(x,d) = \frac{Number\ of\ times\ the\ word\ has\ occured}{Total\ number\ of\ words\ in\ Document} 
&lt;/span&gt;&lt;span class="katex-html"&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;TF&lt;/span&gt;&lt;span class="mopen"&gt;(&lt;/span&gt;&lt;span class="mord mathnormal"&gt;x&lt;/span&gt;&lt;span class="mpunct"&gt;,&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mord mathnormal"&gt;d&lt;/span&gt;&lt;span class="mclose"&gt;)&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;span class="mrel"&gt;=&lt;/span&gt;&lt;span class="mspace"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="base"&gt;&lt;span class="strut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mopen nulldelimiter"&gt;&lt;/span&gt;&lt;span class="mfrac"&gt;&lt;span class="vlist-t vlist-t2"&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;T&lt;/span&gt;&lt;span class="mord mathnormal"&gt;o&lt;/span&gt;&lt;span class="mord mathnormal"&gt;t&lt;/span&gt;&lt;span class="mord mathnormal"&gt;a&lt;/span&gt;&lt;span class="mord mathnormal"&gt;l&lt;/span&gt;&lt;span class="mspace"&gt; &lt;/span&gt;&lt;span class="mord mathnormal"&gt;n&lt;/span&gt;&lt;span class="mord mathnormal"&gt;u&lt;/span&gt;&lt;span class="mord mathnormal"&gt;mb&lt;/span&gt;&lt;span class="mord mathnormal"&gt;er&lt;/span&gt;&lt;span class="mspace"&gt; &lt;/span&gt;&lt;span class="mord mathnormal"&gt;o&lt;/span&gt;&lt;span class="mord mathnormal"&gt;f&lt;/span&gt;&lt;span class="mspace"&gt; &lt;/span&gt;&lt;span class="mord mathnormal"&gt;w&lt;/span&gt;&lt;span class="mord mathnormal"&gt;or&lt;/span&gt;&lt;span class="mord mathnormal"&gt;d&lt;/span&gt;&lt;span class="mord mathnormal"&gt;s&lt;/span&gt;&lt;span class="mspace"&gt; &lt;/span&gt;&lt;span class="mord mathnormal"&gt;in&lt;/span&gt;&lt;span class="mspace"&gt; &lt;/span&gt;&lt;span class="mord mathnormal"&gt;Doc&lt;/span&gt;&lt;span class="mord mathnormal"&gt;u&lt;/span&gt;&lt;span class="mord mathnormal"&gt;m&lt;/span&gt;&lt;span class="mord mathnormal"&gt;e&lt;/span&gt;&lt;span class="mord mathnormal"&gt;n&lt;/span&gt;&lt;span class="mord mathnormal"&gt;t&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="frac-line"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span class="pstrut"&gt;&lt;/span&gt;&lt;span class="mord"&gt;&lt;span class="mord mathnormal"&gt;N&lt;/span&gt;&lt;span class="mord mathnormal"&gt;u&lt;/span&gt;&lt;span class="mord mathnormal"&gt;mb&lt;/span&gt;&lt;span class="mord mathnormal"&gt;er&lt;/span&gt;&lt;span class="mspace"&gt; &lt;/span&gt;&lt;span class="mord mathnormal"&gt;o&lt;/span&gt;&lt;span class="mord mathnormal"&gt;f&lt;/span&gt;&lt;span class="mspace"&gt; &lt;/span&gt;&lt;span class="mord mathnormal"&gt;t&lt;/span&gt;&lt;span class="mord mathnormal"&gt;im&lt;/span&gt;&lt;span class="mord mathnormal"&gt;es&lt;/span&gt;&lt;span class="mspace"&gt; &lt;/span&gt;&lt;span class="mord mathnormal"&gt;t&lt;/span&gt;&lt;span class="mord mathnormal"&gt;h&lt;/span&gt;&lt;span class="mord mathnormal"&gt;e&lt;/span&gt;&lt;span class="mspace"&gt; &lt;/span&gt;&lt;span class="mord mathnormal"&gt;w&lt;/span&gt;&lt;span class="mord mathnormal"&gt;or&lt;/span&gt;&lt;span class="mord mathnormal"&gt;d&lt;/span&gt;&lt;span class="mspace"&gt; &lt;/span&gt;&lt;span class="mord mathnormal"&gt;ha&lt;/span&gt;&lt;span class="mord mathnormal"&gt;s&lt;/span&gt;&lt;span class="mspace"&gt; &lt;/span&gt;&lt;span class="mord mathnormal"&gt;occ&lt;/span&gt;&lt;span class="mord mathnormal"&gt;u&lt;/span&gt;&lt;span class="mord mathnormal"&gt;re&lt;/span&gt;&lt;span class="mord mathnormal"&gt;d&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-s"&gt;​&lt;/span&gt;&lt;/span&gt;&lt;span class="vlist-r"&gt;&lt;span class="vlist"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="mclose nulldelimiter"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;/div&gt;
 

&lt;p&gt;So, let's pick one word from the list: "community"  &lt;/p&gt;

&lt;p&gt;Number of times the community has occurred = 2&lt;br&gt;
Total number of words in Document = 35&lt;/p&gt;

&lt;p&gt;TF (community,d) = 2/35 = 0.057&lt;/p&gt;

&lt;p&gt;To implement this in python, here is the code: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F60porpxbt3jvt3yt6u5a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F60porpxbt3jvt3yt6u5a.png" alt="Coding Time"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Basic Implementation
&lt;/h3&gt;

&lt;p&gt;To implement with vanilla python: we will have to first implement the counter function that will count the words in the given document, find the total number of words and then compute the TF formula. Let's do this step by step:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Word Counter function
&lt;/span&gt;
&lt;span class="c1"&gt;# Input (Document:str) -&amp;gt; Counter Function -&amp;gt; Output (Number of times the word is repeated)
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wordCounter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wordToCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; 
    &lt;span class="n"&gt;docLower&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c1"&gt;# Converting to all lower case
&lt;/span&gt;    &lt;span class="n"&gt;wordDict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="c1"&gt;# initalizing the return dict
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;char&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-.,&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;!?;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;docNoPunc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;docLower&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;char&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Repalcing all the punctuation so that it will not interfere.
&lt;/span&gt;    &lt;span class="n"&gt;words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;docNoPunc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Splitting the sentence into an array with the breaking point as a space.
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; 
        &lt;span class="n"&gt;wordDict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;wordDict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="c1"&gt;# Counter logic, counting the words
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wordDict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;wordToCount&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="c1"&gt;# returning count of the word
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Total number of words Counter
#Input(Document:str) -&amp;gt; TotalWordsCounter Function -&amp;gt; no of words:int
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;totalWordsCounter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Splitting the sentence into an array with the breaking point as a space.
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Returning the length of the array that is equal to number of words.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Implementing the TF formula: 
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;numberWordRepeat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;wordCounter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;totalWordsInDoc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;totalWordsCounter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;fnumberWordRepeat&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;totalWordsInDoc&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For detailed code please refer: &lt;a href="https://github.com/rishiagrawal2609/blogs/blob/main/Term_Frequency.ipynb" rel="noopener noreferrer"&gt;Here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the next one we will see the IDF (Inverse Document Frequency) to complete the foundational part for the TF-IDF Algorithm.&lt;/p&gt;

</description>
      <category>nlp</category>
      <category>ai</category>
      <category>programming</category>
      <category>python</category>
    </item>
    <item>
      <title>Unlock LLMs: SaaS vs. Local Solutions &amp; Crafting Custom LLM for Swagger - Series Intro</title>
      <dc:creator>Rishi Agrawal</dc:creator>
      <pubDate>Sun, 24 Mar 2024 01:14:24 +0000</pubDate>
      <link>https://dev.to/rishiagrawal2609/unlock-llms-saas-vs-local-solutions-crafting-custom-llm-for-swagger-series-intro-1l8m</link>
      <guid>https://dev.to/rishiagrawal2609/unlock-llms-saas-vs-local-solutions-crafting-custom-llm-for-swagger-series-intro-1l8m</guid>
      <description>&lt;p&gt;
As I embark on my journey into the realm of Large Language Models (LLMs), I'm discovering fascinating applications that redefine how we work. From leveraging AI like GitHub Co-pilot for coding to harnessing ChatGPT for email composition, the possibilities seem endless. However, I'm also intrigued by the limitations posed by these solutions being Software-as-a-Service (SaaS) products, lacking full control.

In this blog, I delve into a topic often overlooked: Swagger API documentation. Join me as I explore the potential of local setups and document my journey. As a newcomer to the world of LLMs, I seek to uncover practical applications and share insights along the way."
&lt;/p&gt;

&lt;p&gt;
Join us in a groundbreaking series as we delve into the world of Large Language Models (LLMs), examining both Software-as-a-Service (SaaS) solutions and local setups. Together, we'll compare their capabilities and uncover the potential of crafting our own local LLM for a unique purpose: Swagger documentation. This uncharted territory promises to revolutionize how we document APIs. Don't miss out on this pioneering exploration!.
&lt;/p&gt;

&lt;p&gt;
My goal with this article is to provide a path to understand the LLM and potentially help software engineers to use LLM to full potential into their projects.
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpbn2x8xbbw0ss4kmo66d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpbn2x8xbbw0ss4kmo66d.png" alt="Big Brain - generated using canva" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What is Machine Learning and why do we need to have Machine Learning in the first place?&lt;/em&gt;&lt;br&gt;
Today as our dependency on the technology is growing and to surpass the intelligence of the humans, we started to use Machine Learning to predict / classify things to automate the process and eliminating the human intervention. As we know all of the nature and all the processes can be represented as a function which takes some input and produce some output.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Functions Describes the world"&lt;br&gt;
 Quote from the introduction to Thomas Garrity's "Mathematical Maturity"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And the fact that the computers are good at crunching numbers we can use machine learning to basically approximate any function with appropriate data to train on.&lt;/p&gt;

&lt;p&gt;Let me define Machine Learning in Leman terms&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Machine learning is like teaching a computer to learn from examples rather than programming it with specific instructions. Just like how we learn from experiences, machine learning algorithms analyze data to recognize patterns and make predictions or decisions. Imagine you're teaching a child to differentiate between animals. You show them various animals like cat, dogs, and cows, explaining their unique features like appearance, sound they make, diet, etc. Over time, the child learns to identify each animal correctly without explicit instructions. Similarly, in machine learning, algorithms learn from data to perform tasks such as recognizing spam emails, recommending movies, or even driving cars. It's about enabling computers to learn and improve from data, making them more intelligent and adaptable.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Different areas of machine learning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Statistical Machine Learning&lt;/li&gt;
&lt;li&gt;Deep learning (Neural Networks)&lt;/li&gt;
&lt;li&gt;Reinforcement Learning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI is the sub-domain of machine learning which covers the fields like computer vision and NLP. For me, I started learning AI in my Freshmen year in my undergrad degree and the first project I built was Clone of AlexNet.&lt;/p&gt;

&lt;p&gt;As said by Jensen Huang, CEO of Nvidia in GTX 2024, in 2012 we gave the input of image 32X32 pixels and use to get one word as a answer, the potential was clear. Today, we give that one word/vector to the AI model and it generates millions of the pixel back, thats the age of Generative AI we are heading towards.&lt;/p&gt;

&lt;p&gt;Hope you will enjoy the series and share with your friends who wants to build amazing understanding of LLM.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tentative Articles:(Titles might be different)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Exploring the realm of LLM - what and why?&lt;/li&gt;
&lt;li&gt;LLM SAAS offerings - OpenAI, Cloud and HuggingFace models&lt;/li&gt;
&lt;li&gt;LLM Local - deploying the LLM in local environment.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>llm</category>
      <category>mistral7b</category>
      <category>tutorial</category>
      <category>python</category>
    </item>
  </channel>
</rss>
