<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Durga Pokharel</title>
    <description>The latest articles on DEV Community by Durga Pokharel (@iamdurga).</description>
    <link>https://dev.to/iamdurga</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F546410%2F5aeebbef-a01e-4795-bd65-e72b3f578b4b.png</url>
      <title>DEV Community: Durga Pokharel</title>
      <link>https://dev.to/iamdurga</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/iamdurga"/>
    <language>en</language>
    <item>
      <title>How Can We Generate Random Number From Congruential Method?</title>
      <dc:creator>Durga Pokharel</dc:creator>
      <pubDate>Mon, 17 Oct 2022 12:30:12 +0000</pubDate>
      <link>https://dev.to/iamdurga/how-can-we-generate-random-number-from-congruential-method-2m04</link>
      <guid>https://dev.to/iamdurga/how-can-we-generate-random-number-from-congruential-method-2m04</guid>
      <description>&lt;h1&gt;
  
  
  What is Random Number
&lt;/h1&gt;

&lt;p&gt;A random number is one that is selected at random, as the name suggests, from a group of numbers. As they tend to be excessively slow for most applications in statistics and cryptography, the first methods for producing random numbers, such as dice, coin flipping, and roulette wheels, are still employed today, primarily in games and gambling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Who generated first random numbers?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;John von Neuman gave idea to generate random numbers in 1946&lt;/code&gt;. His plan was to square an initial random seed value, remove the middle digits, and continue. The sequence of integers that results after repeatedly squaring the result and removing the middle digits exhibits the statistical characteristics of randomness.&lt;/p&gt;

&lt;p&gt;Consider any significant number, such as 2934; its square is 8608356; you choose 083 at random; its square is 6889; its next random number is 88 etc. As we need to select initially large number to generate random number.&lt;/p&gt;

&lt;h1&gt;
  
  
  Main Characteristics of Random Number
&lt;/h1&gt;

&lt;p&gt;Random number should have following desirable properties to become good random number,&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Random&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Random numbers as we generated should be random it means there is no any pattern in data. Random number are generated without any rule.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reproducible&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Another important property of random number is reproducibility. It means we can generated new random number from previous one.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Portable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Random number should be portable. It means it should be changeable.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Efficient&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The random numbers we have generated should be efficient. It should produce desirable result.&lt;/p&gt;

&lt;h1&gt;
  
  
  Generating Random Number from Monte Carlo Method
&lt;/h1&gt;

&lt;p&gt;Multiple sources of systematic and statistical mistakes can affect MC simulations. If our random number are poor quality as a result of which we get systematic error. The creation of random numbers and testing are still significant issues that haven’t been fully resolved. As was already indicated, RN sequences are required for MC. should not repeat over very long periods and should be uniform, uncorrelated, and of a very long length.&lt;/p&gt;

&lt;p&gt;Additionally, if we employ parallel computing (which is obviously necessary to handle massive amounts of data), we must ensure that every generated random number sequences are separate and uncorrelated (8N).&lt;/p&gt;

&lt;h1&gt;
  
  
  Generating Random Number using Congruential Method
&lt;/h1&gt;

&lt;p&gt;The fundamental principle is that a seed is picked together with a fixed number, c, and that successive numbers are then produced by simple multiplication.&lt;/p&gt;

&lt;p&gt;[X_n = (c * X_n-1 + a_0)MOD N_{max}]&lt;/p&gt;

&lt;p&gt;Where (X_n) is an integer between 1 to (N_{max})&lt;/p&gt;

&lt;p&gt;Experience has shown that a good congruential 32 bit linear congruential i.e,&lt;/p&gt;

&lt;p&gt;[X_n = (16807 * X_n-1 + a_0)MOD (2^{31}-1)]&lt;/p&gt;

&lt;p&gt;The number 16807 is called miracle number.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lets Generate Random numbers using Congruential Method by using python
&lt;/h3&gt;

&lt;p&gt;First import necessary module&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import numpy as np
import matplotlib.pyplot as plt

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can choose seed value to any number but it should be larger number. We set ran to &lt;code&gt;ran = (16807*seed)%(2**31)&lt;/code&gt; and then seed is equal to ran value so that seed is change in each iterations.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;seed = 10000
count = 0 
random1 = []
while count&amp;lt;100:
    ran = (16807*seed)%(2**31)
    seed = ran
    ran = ran
    random1.append(ran)
    count+=1
#print(random1)




&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lets change above random number in the range [0,1)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;L1 = [x/(2**31) for x in random1]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lets generate another random number using different seed. Here we use same procedure except different seed value.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;seed = 137474
count = 0 
random2 = []
while count&amp;lt;100:
    ran = (16807*seed)%(2**31)
    seed = ran
    ran = ran
    random2.append(ran)
    count+=1
#print(random1)




&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Change the above random number in the range [0,1)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;L2 = [x/(2**31) for x in random2]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Plot of Two random number as we generated
&lt;/h4&gt;

&lt;p&gt;We should check the randomness of the numbers we generated, so the scatter plot is the best choice. If the distribution of our scatter dotted points is uniform, we can get a random integer. However, we are unable to produce a perfect random number. We produce a pseudorandom number as the result of setting the seed value. The only sources of really random numbers are physical processes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.scatter(random1,random2)
plt.show()

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2FRandomn%2Foutput_14_0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2FRandomn%2Foutput_14_0.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the above plot, points are distributed nearly uniformly but not completely uniform. This happened because we generated a few numbers. If we increase the random number size, then what will happen? Let’s check.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;seed = 10000
count = 0 
random1 = []
while count&amp;lt;10000:
    ran = (16807*seed)%(2**31)
    seed = ran
    ran = ran
    random1.append(ran)
    count+=1
L1 = [x/(2**31) for x in random1]





seed = 137474
count = 0 
random2 = []
while count&amp;lt;10000:
    ran = (16807*seed)%(2**31)
    seed = ran
    ran = ran
    random2.append(ran)
    count+=1
L2 = [x/(2**31) for x in random2]





plt.scatter(random1,random2,s = 0.3)
plt.show()

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2FRandomn%2Foutput_18_0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2FRandomn%2Foutput_18_0.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Hence, as random number size increases, we can find that points are randomly distributed. Hence, we are able to build a pseudo random number.&lt;/p&gt;

&lt;h4&gt;
  
  
  Let’s change given random number into Normal distribution
&lt;/h4&gt;

&lt;p&gt;For this lets build new number from above numbers using following relation, (y_1 = (-2\log(random1))^\frac{1}{2} cos(2\pi \times random2))&lt;/p&gt;

&lt;p&gt;[y_2 = (-2\log(random1))^\frac{1}{2} sin(2\pi \times random2)]&lt;/p&gt;

&lt;h4&gt;
  
  
  Using Python
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;arr1 = np.array(L1)
arr2 = np.array(L2)
ran_guss1 = ((-2*np.log(arr1)))**0.5 *np.cos(2*np.pi*arr2)
ran_guss2 = ((-2*np.log(arr1)))**0.5 *np.sin(2*np.pi*arr2)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Lets Check Distribution of generated new numbers
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.hist(ran_guss1, bins= 10)
plt.title("Histogram From First Method")
plt.show()

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2FRandomn%2Foutput_23_0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2FRandomn%2Foutput_23_0.png" alt="image"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plt.hist(ran_guss2, bins= 10)
plt.title("Histogram From Second Method")
plt.show()

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2FRandomn%2Foutput_24_0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2FRandomn%2Foutput_24_0.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Let us compare the random number generated from Congruential Method and Using Library
&lt;/h3&gt;

&lt;p&gt;Here, we generate random number from numpy &lt;code&gt;random.rand()&lt;/code&gt; function which generate random number in the range [0,1).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ra = np.random.rand(10000)


plt.scatter(L1, ra, s = 0.3)
plt.title("Random Number From Libray and Congruential Method")
plt.show()

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2FRandomn%2Foutput_27_0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2FRandomn%2Foutput_27_0.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>News Classification using Neural Network</title>
      <dc:creator>Durga Pokharel</dc:creator>
      <pubDate>Wed, 28 Sep 2022 11:16:28 +0000</pubDate>
      <link>https://dev.to/iamdurga/news-classification-using-neural-network-2jma</link>
      <guid>https://dev.to/iamdurga/news-classification-using-neural-network-2jma</guid>
      <description>&lt;p&gt;News Classification with Simple Neural Network is one of the application of Deep Learning. And here in this part of the blog, I am going to perform a Nepali News Classification. Before jumping into the main part, I would love to share some of my previous contents based upon which this blog has been written.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dataqoil.com/2021/03/20/nepali-news-annapurna-post-scrapping-using-beautifulsoup-and-python/"&gt;How I collected news data?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dataqoil.com/2022/04/23/nepali-news-analysis-eda/"&gt;EDA in Nepali News Data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dataqoil.com/2022/04/30/nepali-news-classification-with-logistic-regression/"&gt;Nepali News Classification with Logistic Regression&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dataqoil.com/2022/05/07/nepali-news-classification-with-naive-bayes/"&gt;Nepali News Classification with Naive Bayes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Above blogs are written and performed by me on sequential order too. The part in this blog until the pre-processing of text is same throughout the other classification blog too.&lt;/p&gt;

&lt;h1&gt;
  
  
  Import Necessary Module
&lt;/h1&gt;

&lt;p&gt;Lets import necessary modules that we need to data preprocession before modelling.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;os&lt;/code&gt;: The OS module in Python provides functions for interacting with the operating system and files.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;pandas&lt;/code&gt;: Working with DataFrame and data analysis.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;numpy&lt;/code&gt;: For numerical operaiton and array stuffs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;matplotlib&lt;/code&gt;: For visualization.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;matplotlib.front manager&lt;/code&gt;: A module for finding, managing, and using fonts across platforms&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;matplotlib.front manager&lt;/code&gt;: A module for finding, managing, and using fonts across platforms&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;matplotlib.front manager&lt;/code&gt;: A module for finding, managing, and using fonts across&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;warnings&lt;/code&gt;: Warnings are provided to warn the developer of circumstances that aren’t always exceptions.&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
from matplotlib.font_manager import FontProperties
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
import pprint

plt.style.use("seaborn-whitegrid")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Data Load
&lt;/h1&gt;

&lt;p&gt;The data is currently in &lt;a href="https://drive.google.com/drive/folders/1eaZUvctC6mqK6kBxWOi5Ab-Jc5_rQBtO?usp=sharing"&gt;my drive&lt;/a&gt; which is available publicly. And I run the scraping code frequently to get more data so the number of rows could be different later.&lt;/p&gt;

&lt;p&gt;I used data that I had gathered over the course of a month or two by scraping news from several news portals. This daily news was amalgamated, and I created a final consolidated csv file that I used here. That file has 5348 rows and 9 columns. In columns I kept different news fields like business, sports, news, entertainment etc as attributes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;df = pd.read_csv("/content/drive/MyDrive/News Scraping/combined_csv.csv")
df.shape


(5838, 9)


df.Category.value_counts()


business 1550
news 1228
entertainment 1092
technology 441
prabhas-news 441
sports 420
world 331
national 120
international 120
province 95
Name: Category, dtype: int64

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From above, we can see that most news belongs to the business category and the entertainment category, similarly to the news category. While doing classification problems, the first requirement is that we should have an equal number of data points in all classes. If not, there is the problem of class imbalance, which arises because our model will classify data in the majority of classes and ignore the rest of the classes. Hence, we should be more concerned with how to achieve class balance.&lt;/p&gt;

&lt;p&gt;One way to make sure classes are balanced here is to combine two or more classes into single classes. While doing that task, we combined classes that have similar types of data, like in the above categories of news and prabhas-news, international and world, and so on.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# # business, entertainment 
# df.query('Category in ("business", "entertainment")')

# # business, entertainment, technology, sports, world + international

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Open the stopwords.txt file.
&lt;/h1&gt;

&lt;p&gt;Stop words are a collection of terms that are commonly used in any language. Stop words in English include words like “the,” “is,” and “and.” Stop words are used in NLP and text mining applications to remove extraneous terms so that computers may focus on the important ones. The following is how I loaded the stop words file. Because stop words play an important role in news classification, we should eliminate them during preprocessing.&lt;/p&gt;

&lt;h1&gt;
  
  
  Stop words file
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;stop_file = "/content/drive/MyDrive/News Scraping/News classification/nepali_stopwords.txt"
stop_words = []
with open(stop_file) as fp:
  lines = fp.readlines()
  stop_words =list( map(lambda x:x.strip(), lines))
#stop_words

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Open the Punctuation file.
&lt;/h1&gt;

&lt;p&gt;The code below is for loading a punctuation file. Punctuation is a set of tools used in writing to clearly distinguish sentences, phrases, and clauses so that their intended meaning may be understood. These tools provide no useful information during categorization, thus they should be eliminated before we train our model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;punctuation_file = "/content/drive/MyDrive/News Scraping/News classification/nepali_punctuation (1).txt"
punctuation_words = []
with open(punctuation_file) as fp:
  lines = fp.readlines()
  punctuation_words =list( map(lambda x:x.strip(), lines))
punctuation_words


[':', '?', '|', '!', '.', ',', '" "', '( )', '—', '-', "?'"]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Pre-processing of text
&lt;/h1&gt;

&lt;p&gt;I’m only going to utilize the titles of all of my blog’s categories. I’ll use content to make a blog post there later, despite the enormous quantity of words in the content columns. In this blog, I’ll show you how to use Naive Bayes in title data to classify news and categorize it by category.&lt;/p&gt;

&lt;p&gt;First, I created a method named ‘preprocessing text’ in the provided code that accepts data, stop words, and punctuation words as parameters. I made a list called ‘new cat’ to keep track of the information once I processed it. I also initialized naise, as you can see in the code. Then, within cat data, I use for loop. I isolated the data on cats from the white space, linked them together, and gave them names.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
def preprocess_text(cat_data, stop_words, punctuation_words):
  new_cat = []
  noise = "1,2,3,4,5,6,7,8,9,0,०,१,२,३,४,५,६,७,८,९".split(",")

  for row in cat_data:
    words = row.strip().split(" ")      
    nwords = "" # []

    for word in words:
      if word not in punctuation_words and word not in stop_words:
        is_noise = False
        for n in noise:
          #print(n)
          if n in word:
            is_noise = True
            break
        if is_noise == False:
          word = word.replace("(","")
          word = word.replace(")","")
          # nwords.append(word)
          if len(word)&amp;gt;1:
            nwords+=word+" "

    new_cat.append(nwords.strip())
  # print(new_cat)
  return new_cat

title_clean = preprocess_text(["शिक्षण संस्थामा ज जनस्वास्थ्य 50 मापदण्ड पालना शिक्षा मन्त्रालयको निर्देशन"], stop_words, punctuation_words)
print(title_clean)


['शिक्षण संस्थामा जनस्वास्थ्य मापदण्ड पालना शिक्षा मन्त्रालयको निर्देशन']

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, we only take title from our data and apply stops words and punctuations.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ndf = df.copy()
cat_title = []
for i, row in ndf.iterrows():
  ndf.loc[i, "Title"]= preprocess_text([row.Title], stop_words, punctuation_words)[0]

ndf.head()

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Unnamed: 0&lt;/th&gt;
&lt;th&gt;Title&lt;/th&gt;
&lt;th&gt;URL&lt;/th&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Author&lt;/th&gt;
&lt;th&gt;Author URL&lt;/th&gt;
&lt;th&gt;Content&lt;/th&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;प्रधानमन्त्री देउवा, दाहाल नेपाल भारतीय राजदूत...&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://ekantipur.com/news/2022/04/12/16497794"&gt;https://ekantipur.com/news/2022/04/12/16497794&lt;/a&gt;...&lt;/td&gt;
&lt;td&gt;चैत्र २९, २०७८&lt;/td&gt;
&lt;td&gt;कान्तिपुर संवाददाता&lt;/td&gt;
&lt;td&gt;&lt;a href="https://ekantipur.com/author/author-14301"&gt;https://ekantipur.com/author/author-14301&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;काठमाडौँ — प्रधानमन्त्री शेरबहादुर देउवा, नेकप...&lt;/td&gt;
&lt;td&gt;news&lt;/td&gt;
&lt;td&gt;प्रधानमन्त्री शेरबहादुर देउवा, नेकपा (माओवादी ...&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;गठबन्धनले महानगर उपमहानगरमा प्रमुख-उपप्रमुख के...&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://ekantipur.com/news/2022/04/12/16497772"&gt;https://ekantipur.com/news/2022/04/12/16497772&lt;/a&gt;...&lt;/td&gt;
&lt;td&gt;चैत्र २९, २०७८&lt;/td&gt;
&lt;td&gt;कान्तिपुर संवाददाता&lt;/td&gt;
&lt;td&gt;&lt;a href="https://ekantipur.com/author/author-14301"&gt;https://ekantipur.com/author/author-14301&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;काठमाडौँ — स्थानीय तहको निर्वाचनका लागि सत्ता ...&lt;/td&gt;
&lt;td&gt;news&lt;/td&gt;
&lt;td&gt;स्थानीय तहको निर्वाचनका लागि सत्ता गठबन्धन दलह...&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;परराष्ट्रमन्त्री खड्कासँग भारतीय राजदूत क्वात्...&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://ekantipur.com/news/2022/04/12/16497754"&gt;https://ekantipur.com/news/2022/04/12/16497754&lt;/a&gt;...&lt;/td&gt;
&lt;td&gt;चैत्र २९, २०७८&lt;/td&gt;
&lt;td&gt;कान्तिपुर संवाददाता&lt;/td&gt;
&lt;td&gt;&lt;a href="https://ekantipur.com/author/author-14301"&gt;https://ekantipur.com/author/author-14301&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;काठमाडौँ — भारतको विदेश मन्त्रालयमा सचिव पदमा ...&lt;/td&gt;
&lt;td&gt;news&lt;/td&gt;
&lt;td&gt;भारतको विदेश मन्त्रालयमा सचिव पदमा नियुक्त भएप...&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;स्थानीय तहको नेतृत्व बाँडफाँट केन्द्रमा पठाउन ...&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://ekantipur.com/news/2022/04/12/16497720"&gt;https://ekantipur.com/news/2022/04/12/16497720&lt;/a&gt;...&lt;/td&gt;
&lt;td&gt;चैत्र २९, २०७८&lt;/td&gt;
&lt;td&gt;कान्तिपुर संवाददाता&lt;/td&gt;
&lt;td&gt;&lt;a href="https://ekantipur.com/author/author-14301"&gt;https://ekantipur.com/author/author-14301&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;काठमाडौँ — सत्ता गठबन्धनले स्थानीय तहको नेतृत्...&lt;/td&gt;
&lt;td&gt;news&lt;/td&gt;
&lt;td&gt;सत्ता गठबन्धनले स्थानीय तहको नेतृत्व बाँडफाँट ...&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;प्रधानसेनापति भारतीय सेनाका रथीबीच भेटवार्ता&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://ekantipur.com/news/2022/04/12/16497700"&gt;https://ekantipur.com/news/2022/04/12/16497700&lt;/a&gt;...&lt;/td&gt;
&lt;td&gt;चैत्र २९, २०७८&lt;/td&gt;
&lt;td&gt;कान्तिपुर संवाददाता&lt;/td&gt;
&lt;td&gt;&lt;a href="https://ekantipur.com/author/author-14301"&gt;https://ekantipur.com/author/author-14301&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;काठमाडौँ — प्रधानसेनापति प्रभुराम शर्मा र भारत...&lt;/td&gt;
&lt;td&gt;news&lt;/td&gt;
&lt;td&gt;प्रधानसेनापति प्रभुराम शर्मा र भारतीय सेनाका र...&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;p&gt;&amp;lt;svg xmlns="&lt;a href="http://www.w3.org/2000/svg"&gt;http://www.w3.org/2000/svg&lt;/a&gt;" height="24px"viewBox="0 0 24 24"&lt;br&gt;
       width="24px"&amp;gt;&lt;br&gt;
    &lt;br&gt;
    &lt;br&gt;
  &amp;lt;/svg&amp;gt;&lt;br&gt;
      &amp;lt;br&amp;gt;
    .colab-df-container {&amp;lt;br&amp;gt;
      display:flex;&amp;lt;br&amp;gt;
      flex-wrap:wrap;&amp;lt;br&amp;gt;
      gap: 12px;&amp;lt;br&amp;gt;
    }&amp;lt;/p&amp;gt;
&amp;lt;div class="highlight"&amp;gt;&amp;lt;pre class="highlight plaintext"&amp;gt;&amp;lt;code&amp;gt;.colab-df-convert {
  background-color: #E8F0FE;
  border: none;
  border-radius: 50%;
  cursor: pointer;
  display: none;
  fill: #1967D2;
  height: 32px;
  padding: 0 0 0 0;
  width: 32px;
}

.colab-df-convert:hover {
  background-color: #E2EBFA;
  box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);
  fill: #174EA6;
}

[theme=dark] .colab-df-convert {
  background-color: #3B4455;
  fill: #D2E3FC;
}

[theme=dark] .colab-df-convert:hover {
  background-color: #434B5C;
  box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);
  filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));
  fill: #FFFFFF;
}
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;&amp;lt;/div&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;br&amp;gt;
        const buttonEl =&amp;lt;br&amp;gt;
          document.querySelector(&amp;amp;#39;#df-43d47c3e-e136-45db-a03b-a20b8267b861 button.colab-df-convert&amp;amp;#39;);&amp;lt;br&amp;gt;
        buttonEl.style.display =&amp;lt;br&amp;gt;
          google.colab.kernel.accessAllowed ? &amp;amp;#39;block&amp;amp;#39; : &amp;amp;#39;none&amp;amp;#39;;&amp;lt;/p&amp;gt;
&amp;lt;div class="highlight"&amp;gt;&amp;lt;pre class="highlight plaintext"&amp;gt;&amp;lt;code&amp;gt;    async function convertToInteractive(key) {
      const element = document.querySelector('#df-43d47c3e-e136-45db-a03b-a20b8267b861');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '&amp;amp;lt;a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb&amp;amp;gt;data table notebook&amp;amp;lt;/a&amp;amp;gt;'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  &amp;amp;lt;/script&amp;amp;gt;
&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;&amp;lt;/div&amp;gt;&amp;lt;h1&amp;gt;
  &amp;lt;a name="importing-necessary-module-for-simple-neural-network" href="#importing-necessary-module-for-simple-neural-network" class="anchor"&amp;gt;
  &amp;lt;/a&amp;gt;
  Importing Necessary Module for simple neural network
&amp;lt;/h1&amp;gt;

&amp;lt;p&amp;gt;Here first we import to_categorical from tensorflow.kers.utlis. It is used to make our label data into one hot encoded form. OHE form is very useful form in the multiclass classification because it gives 1 when current word lies in the current class and and 0 to all other classes.&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;Lets import&amp;lt;/p&amp;gt;

&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;OneHotEncoder&amp;lt;/code&amp;gt; : Encode categorical features as a one-hot numeric array&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;confusion_matrix&amp;lt;/code&amp;gt;: For classification model perfprmance&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sequence&amp;lt;/code&amp;gt;: The sequential API allows you to create models layer-by-layer for most problems &amp;lt;a href="https://stackoverflow.com/questions/57751417/what-is-meant-by-sequential-model-in-keras"&amp;gt;from&amp;lt;/a&amp;gt;&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;Tokeniaer&amp;lt;/code&amp;gt; : To tokenize the paragraph into sentence level or sentence into word level
&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&amp;lt;div class="highlight"&amp;gt;&amp;lt;pre class="highlight plaintext"&amp;gt;&amp;lt;code&amp;gt;from sklearn.preprocessing import OneHotEncoder
from sklearn.metrics import confusion_matrix
from keras.preprocessing import sequence
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout
from tensorflow.keras.utils import to_categorical


data = pd.DataFrame()

data["text"]=ndf.Title
data["label"]=ndf.Category
data["target"] = data["label"].apply(lambda x: "news" if x=="prabhas-news" else "national" if x=="province" else "world" if x=="international" else x)
classes = {c:i for i,c in enumerate(data.target.unique())}
data["target"] = data.target.apply(lambda x: classes[x])

targets = to_categorical(data.target)

vectorizer = CountVectorizer(ngram_range=(1, 2)).fit(data.text)
vectext = vectorizer.transform(data.text)

X_train, X_test, Y_train, Y_test = train_test_split(vectext, 
                                                    targets, 
                                                    random_state=0)


targets, targets.shape


(array([[1., 0., 0., ..., 0., 0., 0.],
        [1., 0., 0., ..., 0., 0., 0.],
        [1., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]], dtype=float32), (5838, 7))

&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;&amp;lt;/div&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;Above is our OHE form of target and we can see that there is at least one 1 in array’s each row and all are 0s. Also, the number of columns is 7, which represents total number of classes.&amp;lt;/p&amp;gt;
&amp;lt;h1&amp;gt;
  &amp;lt;a name="simple-neural-network-having-one-layer" href="#simple-neural-network-having-one-layer" class="anchor"&amp;gt;
  &amp;lt;/a&amp;gt;
  Simple Neural Network having one layer
&amp;lt;/h1&amp;gt;

&amp;lt;p&amp;gt;First I want ot go through what is simple neural network. It is the most straightforward type of deep learning architecture. In this architecture, the source nodes in the input layer are directly connected to the neurons in the output layer (the computation nodes), but not the other way around. Simplest form of neural network is also known as perceptron. This type of neural network are mainly used for classification task. If our data is linearly separable then it is benificial to used percepton.&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;&amp;lt;img src="https://iamdurga.github.io/assets/simple_nn/simple_nn.png" alt="image"&amp;gt;&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;We use only one neuron to classify data of binary class. If we want to classify data of multiple class we should add more neurons in the output layer. If our data is not linearly separable then we can used multi level percepton having at least one hidden layer. However we can add as many hidden as we want. Adding more hidden layers help us to extract higher order statistics from input.​ Our news clsssification is a n example of non-linear example. The linearity also depends on the activation function that we use.&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;Since our target output is in one hot encoded form, we should use the activation function that takes probability into the consideration becasue in our one hot encoded labels, there will be 1 value in the class and 0 in off class. So our goal is to make the model predict as near as possible for each values.&amp;lt;br&amp;gt;
&amp;lt;/p&amp;gt;
&amp;lt;div class="highlight"&amp;gt;&amp;lt;pre class="highlight plaintext"&amp;gt;&amp;lt;code&amp;gt;# simple NN
in_shape = X_train.shape[-1]
out_shape = targets.shape[-1]
model = Sequential()
model.add(Dense(input_dim=9401, units=out_shape, activation='softmax'))

model.summary()


Model: "sequential"
_________________________________________________________________
 Layer (type) Output Shape Param #   
=================================================================
 dense (Dense) (None, 7) 65814     

=================================================================
Total params: 65,814
Trainable params: 65,814
Non-trainable params: 0
_________________________________________________________________

&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;&amp;lt;/div&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;In above block of code we build the model using sequential which allows us to build model using layer by layer. In this model there is input layer which has input numbers which is actually our vacabulary size similary we set unit 800, units is representing how many neurons in a particular layer we can change it’s value as how many we want. There is output layer which has only 6 outputs layer which means we want to classify our data into only 7 categories.&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;It is benifical to used softmax activation function in multiclass classification. In its most basic form, Softmax is a vector function. A vector is used as the input, and a vector is generated as the output. The outputs of each unit are likewise condensed by the Softmax algorithm to lie between 0 and 1. However, it also divides each result so that the aggregate of all the outputs equals 1. You may determine the likelihood that any of the classes is true from the output of the Softmax algorithm. ​&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;\begin{equation} softmax(x_j) = \frac{exp^{(x_j)}}{\sum_{i=1}^n{exp^{(x_i)}}} \end{equation}&amp;lt;/p&amp;gt;

&amp;lt;blockquote&amp;gt;
&amp;lt;blockquote&amp;gt;
&amp;lt;p&amp;gt;If we do not mention any activation function by default there is use of linear activation function.&amp;lt;/p&amp;gt;
&amp;lt;/blockquote&amp;gt;
&amp;lt;/blockquote&amp;gt;

&amp;lt;p&amp;gt;​&amp;lt;br&amp;gt;
&amp;lt;/p&amp;gt;
&amp;lt;div class="highlight"&amp;gt;&amp;lt;pre class="highlight plaintext"&amp;gt;&amp;lt;code&amp;gt;model.compile(loss='categorical_crossentropy', 
              optimizer='adam', 
              metrics=['accuracy'])

&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;&amp;lt;/div&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;We shold always compile the model after build the model. We can view compilation as a precompute phase that enables the computer to train the model. While compliling the model we used ‘adam’ as optimizer. There are many other optimiser but which one is best to choose is crucial. Loss function as “categorical_crossentropy” because we have to classify our data into multiple classes.&amp;lt;/p&amp;gt;
&amp;lt;h1&amp;gt;
  &amp;lt;a name="model-fit" href="#model-fit" class="anchor"&amp;gt;
  &amp;lt;/a&amp;gt;
  Model Fit
&amp;lt;/h1&amp;gt;

&amp;lt;p&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;div class="highlight"&amp;gt;&amp;lt;pre class="highlight plaintext"&amp;gt;&amp;lt;code&amp;gt;history = model.fit(X_train.toarray(), Y_train, epochs=100, batch_size=32, validation_data=(X_test.toarray(), Y_test))


Epoch 1/100
137/137 [==============================] - 3s 6ms/step - loss: 1.7741 - accuracy: 0.5160 - val_loss: 1.6109 - val_accuracy: 0.6096
Epoch 2/100
137/137 [==============================] - 1s 4ms/step - loss: 1.4537 - accuracy: 0.6802 - val_loss: 1.3941 - val_accuracy: 0.6418
Epoch 3/100
137/137 [==============================] - 1s 4ms/step - loss: 1.2407 - accuracy: 0.7165 - val_loss: 1.2510 - val_accuracy: 0.6623
Epoch 4/100
137/137 [==============================] - 1s 4ms/step - loss: 1.0887 - accuracy: 0.7478 - val_loss: 1.1483 - val_accuracy: 0.6877
Epoch 5/100
137/137 [==============================] - 1s 4ms/step - loss: 0.9735 - accuracy: 0.7730 - val_loss: 1.0708 - val_accuracy: 0.7082
Epoch 6/100
137/137 [==============================] - 1s 4ms/step - loss: 0.8817 - accuracy: 0.7988 - val_loss: 1.0110 - val_accuracy: 0.7322
Epoch 7/100
137/137 [==============================] - 1s 4ms/step - loss: 0.8066 - accuracy: 0.8202 - val_loss: 0.9620 - val_accuracy: 0.7384
Epoch 8/100
137/137 [==============================] - 1s 4ms/step - loss: 0.7433 - accuracy: 0.8387 - val_loss: 0.9230 - val_accuracy: 0.7466
Epoch 9/100
137/137 [==============================] - 1s 5ms/step - loss: 0.6893 - accuracy: 0.8556 - val_loss: 0.8906 - val_accuracy: 0.7527
Epoch 10/100
137/137 [==============================] - 1s 4ms/step - loss: 0.6426 - accuracy: 0.8677 - val_loss: 0.8633 - val_accuracy: 0.7575
Epoch 11/100
137/137 [==============================] - 1s 4ms/step - loss: 0.6019 - accuracy: 0.8789 - val_loss: 0.8398 - val_accuracy: 0.7644
Epoch 12/100
137/137 [==============================] - 1s 5ms/step - loss: 0.5660 - accuracy: 0.8858 - val_loss: 0.8206 - val_accuracy: 0.7658
Epoch 13/100
137/137 [==============================] - 1s 8ms/step - loss: 0.5343 - accuracy: 0.8938 - val_loss: 0.8032 - val_accuracy: 0.7705
Epoch 14/100
137/137 [==============================] - 1s 4ms/step - loss: 0.5059 - accuracy: 0.8988 - val_loss: 0.7890 - val_accuracy: 0.7719
Epoch 15/100
137/137 [==============================] - 1s 4ms/step - loss: 0.4805 - accuracy: 0.9032 - val_loss: 0.7771 - val_accuracy: 0.7685
Epoch 16/100
137/137 [==============================] - 1s 4ms/step - loss: 0.4576 - accuracy: 0.9054 - val_loss: 0.7656 - val_accuracy: 0.7712
Epoch 17/100
137/137 [==============================] - 1s 4ms/step - loss: 0.4368 - accuracy: 0.9091 - val_loss: 0.7564 - val_accuracy: 0.7753
Epoch 18/100
137/137 [==============================] - 1s 4ms/step - loss: 0.4178 - accuracy: 0.9125 - val_loss: 0.7480 - val_accuracy: 0.7747
Epoch 19/100
137/137 [==============================] - 1s 4ms/step - loss: 0.4006 - accuracy: 0.9127 - val_loss: 0.7414 - val_accuracy: 0.7781
Epoch 20/100
137/137 [==============================] - 1s 4ms/step - loss: 0.3847 - accuracy: 0.9155 - val_loss: 0.7352 - val_accuracy: 0.7795
Epoch 21/100
137/137 [==============================] - 1s 4ms/step - loss: 0.3701 - accuracy: 0.9187 - val_loss: 0.7304 - val_accuracy: 0.7815
Epoch 22/100
137/137 [==============================] - 1s 4ms/step - loss: 0.3566 - accuracy: 0.9226 - val_loss: 0.7263 - val_accuracy: 0.7849
Epoch 23/100
137/137 [==============================] - 1s 4ms/step - loss: 0.3440 - accuracy: 0.9246 - val_loss: 0.7223 - val_accuracy: 0.7842
Epoch 24/100
137/137 [==============================] - 1s 4ms/step - loss: 0.3326 - accuracy: 0.9260 - val_loss: 0.7190 - val_accuracy: 0.7856
Epoch 25/100
137/137 [==============================] - 1s 5ms/step - loss: 0.3220 - accuracy: 0.9260 - val_loss: 0.7155 - val_accuracy: 0.7849
Epoch 26/100
137/137 [==============================] - 1s 4ms/step - loss: 0.3119 - accuracy: 0.9294 - val_loss: 0.7130 - val_accuracy: 0.7842
Epoch 27/100
137/137 [==============================] - 1s 4ms/step - loss: 0.3025 - accuracy: 0.9310 - val_loss: 0.7120 - val_accuracy: 0.7842
Epoch 28/100
137/137 [==============================] - 1s 4ms/step - loss: 0.2938 - accuracy: 0.9317 - val_loss: 0.7104 - val_accuracy: 0.7836
Epoch 29/100
137/137 [==============================] - 1s 4ms/step - loss: 0.2856 - accuracy: 0.9322 - val_loss: 0.7096 - val_accuracy: 0.7808
Epoch 30/100
137/137 [==============================] - 1s 4ms/step - loss: 0.2780 - accuracy: 0.9317 - val_loss: 0.7088 - val_accuracy: 0.7822
Epoch 31/100
137/137 [==============================] - 1s 4ms/step - loss: 0.2708 - accuracy: 0.9331 - val_loss: 0.7088 - val_accuracy: 0.7815
Epoch 32/100
137/137 [==============================] - 1s 4ms/step - loss: 0.2638 - accuracy: 0.9338 - val_loss: 0.7084 - val_accuracy: 0.7822
Epoch 33/100
137/137 [==============================] - 1s 4ms/step - loss: 0.2576 - accuracy: 0.9335 - val_loss: 0.7083 - val_accuracy: 0.7815
Epoch 34/100
137/137 [==============================] - 1s 4ms/step - loss: 0.2518 - accuracy: 0.9331 - val_loss: 0.7090 - val_accuracy: 0.7788
Epoch 35/100
137/137 [==============================] - 1s 4ms/step - loss: 0.2459 - accuracy: 0.9331 - val_loss: 0.7095 - val_accuracy: 0.7788
Epoch 36/100
137/137 [==============================] - 1s 5ms/step - loss: 0.2407 - accuracy: 0.9347 - val_loss: 0.7106 - val_accuracy: 0.7788
Epoch 37/100
137/137 [==============================] - 1s 4ms/step - loss: 0.2355 - accuracy: 0.9344 - val_loss: 0.7112 - val_accuracy: 0.7808
Epoch 38/100
137/137 [==============================] - 1s 4ms/step - loss: 0.2306 - accuracy: 0.9338 - val_loss: 0.7135 - val_accuracy: 0.7808
Epoch 39/100
137/137 [==============================] - 1s 4ms/step - loss: 0.2262 - accuracy: 0.9338 - val_loss: 0.7145 - val_accuracy: 0.7808
Epoch 40/100
137/137 [==============================] - 1s 4ms/step - loss: 0.2219 - accuracy: 0.9347 - val_loss: 0.7161 - val_accuracy: 0.7795
Epoch 41/100
137/137 [==============================] - 1s 4ms/step - loss: 0.2176 - accuracy: 0.9354 - val_loss: 0.7176 - val_accuracy: 0.7788
Epoch 42/100
137/137 [==============================] - 1s 4ms/step - loss: 0.2139 - accuracy: 0.9351 - val_loss: 0.7197 - val_accuracy: 0.7795
Epoch 43/100
137/137 [==============================] - 1s 4ms/step - loss: 0.2100 - accuracy: 0.9340 - val_loss: 0.7214 - val_accuracy: 0.7781
Epoch 44/100
137/137 [==============================] - 1s 4ms/step - loss: 0.2066 - accuracy: 0.9342 - val_loss: 0.7237 - val_accuracy: 0.7774
Epoch 45/100
137/137 [==============================] - 1s 4ms/step - loss: 0.2032 - accuracy: 0.9344 - val_loss: 0.7259 - val_accuracy: 0.7767
Epoch 46/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1999 - accuracy: 0.9351 - val_loss: 0.7279 - val_accuracy: 0.7774
Epoch 47/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1969 - accuracy: 0.9340 - val_loss: 0.7301 - val_accuracy: 0.7795
Epoch 48/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1940 - accuracy: 0.9367 - val_loss: 0.7329 - val_accuracy: 0.7781
Epoch 49/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1912 - accuracy: 0.9349 - val_loss: 0.7353 - val_accuracy: 0.7781
Epoch 50/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1887 - accuracy: 0.9344 - val_loss: 0.7382 - val_accuracy: 0.7774
Epoch 51/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1861 - accuracy: 0.9340 - val_loss: 0.7406 - val_accuracy: 0.7774
Epoch 52/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1836 - accuracy: 0.9370 - val_loss: 0.7432 - val_accuracy: 0.7774
Epoch 53/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1814 - accuracy: 0.9338 - val_loss: 0.7467 - val_accuracy: 0.7801
Epoch 54/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1790 - accuracy: 0.9333 - val_loss: 0.7492 - val_accuracy: 0.7815
Epoch 55/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1770 - accuracy: 0.9363 - val_loss: 0.7520 - val_accuracy: 0.7815
Epoch 56/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1750 - accuracy: 0.9351 - val_loss: 0.7555 - val_accuracy: 0.7788
Epoch 57/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1731 - accuracy: 0.9340 - val_loss: 0.7588 - val_accuracy: 0.7808
Epoch 58/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1710 - accuracy: 0.9347 - val_loss: 0.7619 - val_accuracy: 0.7808
Epoch 59/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1691 - accuracy: 0.9360 - val_loss: 0.7651 - val_accuracy: 0.7815
Epoch 60/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1674 - accuracy: 0.9347 - val_loss: 0.7686 - val_accuracy: 0.7808
Epoch 61/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1659 - accuracy: 0.9344 - val_loss: 0.7718 - val_accuracy: 0.7815
Epoch 62/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1642 - accuracy: 0.9342 - val_loss: 0.7748 - val_accuracy: 0.7842
Epoch 63/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1628 - accuracy: 0.9360 - val_loss: 0.7787 - val_accuracy: 0.7836
Epoch 64/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1613 - accuracy: 0.9335 - val_loss: 0.7823 - val_accuracy: 0.7849
Epoch 65/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1599 - accuracy: 0.9356 - val_loss: 0.7855 - val_accuracy: 0.7856
Epoch 66/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1585 - accuracy: 0.9367 - val_loss: 0.7888 - val_accuracy: 0.7849
Epoch 67/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1572 - accuracy: 0.9342 - val_loss: 0.7932 - val_accuracy: 0.7863
Epoch 68/100
137/137 [==============================] - 1s 8ms/step - loss: 0.1559 - accuracy: 0.9354 - val_loss: 0.7968 - val_accuracy: 0.7849
Epoch 69/100
137/137 [==============================] - 1s 6ms/step - loss: 0.1546 - accuracy: 0.9381 - val_loss: 0.8002 - val_accuracy: 0.7856
Epoch 70/100
137/137 [==============================] - 1s 6ms/step - loss: 0.1535 - accuracy: 0.9338 - val_loss: 0.8037 - val_accuracy: 0.7856
Epoch 71/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1523 - accuracy: 0.9349 - val_loss: 0.8071 - val_accuracy: 0.7842
Epoch 72/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1514 - accuracy: 0.9347 - val_loss: 0.8110 - val_accuracy: 0.7849
Epoch 73/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1503 - accuracy: 0.9344 - val_loss: 0.8150 - val_accuracy: 0.7849
Epoch 74/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1493 - accuracy: 0.9347 - val_loss: 0.8190 - val_accuracy: 0.7870
Epoch 75/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1484 - accuracy: 0.9356 - val_loss: 0.8224 - val_accuracy: 0.7842
Epoch 76/100
137/137 [==============================] - 1s 5ms/step - loss: 0.1474 - accuracy: 0.9358 - val_loss: 0.8271 - val_accuracy: 0.7856
Epoch 77/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1463 - accuracy: 0.9335 - val_loss: 0.8310 - val_accuracy: 0.7849
Epoch 78/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1455 - accuracy: 0.9349 - val_loss: 0.8349 - val_accuracy: 0.7863
Epoch 79/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1448 - accuracy: 0.9342 - val_loss: 0.8389 - val_accuracy: 0.7877
Epoch 80/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1439 - accuracy: 0.9342 - val_loss: 0.8428 - val_accuracy: 0.7863
Epoch 81/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1430 - accuracy: 0.9326 - val_loss: 0.8474 - val_accuracy: 0.7856
Epoch 82/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1423 - accuracy: 0.9354 - val_loss: 0.8513 - val_accuracy: 0.7863
Epoch 83/100
137/137 [==============================] - 1s 5ms/step - loss: 0.1416 - accuracy: 0.9354 - val_loss: 0.8560 - val_accuracy: 0.7836
Epoch 84/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1409 - accuracy: 0.9379 - val_loss: 0.8595 - val_accuracy: 0.7849
Epoch 85/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1403 - accuracy: 0.9354 - val_loss: 0.8633 - val_accuracy: 0.7836
Epoch 86/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1394 - accuracy: 0.9333 - val_loss: 0.8678 - val_accuracy: 0.7836
Epoch 87/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1389 - accuracy: 0.9349 - val_loss: 0.8727 - val_accuracy: 0.7842
Epoch 88/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1382 - accuracy: 0.9351 - val_loss: 0.8763 - val_accuracy: 0.7842
Epoch 89/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1378 - accuracy: 0.9351 - val_loss: 0.8805 - val_accuracy: 0.7849
Epoch 90/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1370 - accuracy: 0.9338 - val_loss: 0.8851 - val_accuracy: 0.7836
Epoch 91/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1365 - accuracy: 0.9338 - val_loss: 0.8884 - val_accuracy: 0.7849
Epoch 92/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1358 - accuracy: 0.9365 - val_loss: 0.8933 - val_accuracy: 0.7842
Epoch 93/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1354 - accuracy: 0.9374 - val_loss: 0.8968 - val_accuracy: 0.7842
Epoch 94/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1348 - accuracy: 0.9349 - val_loss: 0.9014 - val_accuracy: 0.7849
Epoch 95/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1345 - accuracy: 0.9347 - val_loss: 0.9061 - val_accuracy: 0.7836
Epoch 96/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1338 - accuracy: 0.9367 - val_loss: 0.9104 - val_accuracy: 0.7836
Epoch 97/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1332 - accuracy: 0.9351 - val_loss: 0.9144 - val_accuracy: 0.7856
Epoch 98/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1330 - accuracy: 0.9356 - val_loss: 0.9193 - val_accuracy: 0.7849
Epoch 99/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1323 - accuracy: 0.9347 - val_loss: 0.9237 - val_accuracy: 0.7836
Epoch 100/100
137/137 [==============================] - 1s 4ms/step - loss: 0.1319 - accuracy: 0.9360 - val_loss: 0.9278 - val_accuracy: 0.7842

&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;&amp;lt;/div&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;Finally, we fit the model using required arguments for our model. Firt we need to provide train data to our model. In above while training the model I used &amp;lt;code&amp;gt;.toarray()&amp;lt;/code&amp;gt; function because to fit model we need array of data but before used toarray() function we have sparse matrix. If we do not used toarray function we can get error message.&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;We can set epoch here 100 it means model can go up to 100 iteration however we can set it 20, 25 also if model already fit well up to 20 iteration it can obviously give right performance we set it upto 32,40 so on then there is possibility to overfit our model. Hence right choice of epochs size is also crucial.&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;Similarly in above example we set batch_size 32. Batch size allow us to split the total data into 32 parts calculate the gardient at different part and finally aggrigate the results of each batch. If we set batch size higher number it takes grater time to train the model and voice versa.&amp;lt;/p&amp;gt;
&amp;lt;h1&amp;gt;
  &amp;lt;a name="performance-of-model" href="#performance-of-model" class="anchor"&amp;gt;
  &amp;lt;/a&amp;gt;
  Performance of model
&amp;lt;/h1&amp;gt;

&amp;lt;p&amp;gt;In above example we tarin the model by only using input and output layer. Our model do work preety well. Our accuracy as well as validation accuracy goes on at satisfiable level. If we look towards loss it goes on decreaing at each iterations.&amp;lt;br&amp;gt;
&amp;lt;/p&amp;gt;
&amp;lt;div class="highlight"&amp;gt;&amp;lt;pre class="highlight plaintext"&amp;gt;&amp;lt;code&amp;gt;def performance_model(hist, model, X_test, Y_test, classes):
  # subplot
  fig, axes = plt.subplots(1,2, figsize=(20,5))
  axes[0].set_title('Accuracy score')
  axes[0].plot(history.history['accuracy'])
  axes[0].plot(history.history['val_accuracy'])
  axes[0].legend(['accuracy', 'val_accuracy'])
  # plt.show()
  # plt.figure(figsize=(9,7))
  axes[1].set_title('Loss value')
  axes[1].plot(history.history['loss'])
  axes[1].plot(history.history['val_loss'])
  axes[1].legend(['loss', 'val_loss'])
  plt.show()

  predictions = model.predict(X_test.toarray())

  y_test_evaluate = np.argmax(Y_test, axis=1)
  pred = np.argmax(predictions, axis=1)

  cm = confusion_matrix(y_test_evaluate, pred)

  plt.figure(figsize=(8,8))
  plt.title('Confusion matrix on test data')
  sns.heatmap(cm, annot=True, fmt='d', xticklabels=classes.keys(), yticklabels=classes.keys(), 
              cmap=plt.cm.Blues, cbar=False, annot_kws={'size':14})
  plt.xlabel('Predicted Label')
  plt.ylabel('True Label')
  plt.show()

performance_model(history, model, X_test, Y_test, classes)

&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;&amp;lt;/div&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;&amp;lt;img src="https://iamdurga.github.io/assets/simple_nn/output_31_0.png" alt="image"&amp;gt;&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;&amp;lt;img src="https://iamdurga.github.io/assets/simple_nn/output_31_1.png" alt="image"&amp;gt;&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;The above graph shows that our accuracy and validation accuracy are both increasing significantly, but there is a significant gap between the two.It may be possible to happen for the following reasons:&amp;lt;/p&amp;gt;

&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;A class imbalance&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Underfitting&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Overfitting&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;In precise preprocessing&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;

&amp;lt;p&amp;gt;Similarily, in our classification problem, accuracy is not a good measure of model perfarmance because our model has multiple classes as well, and there may be a case of class imbalance. Hence, to evaluate our model performance, we used the multicalss confusion matrix. In multiclass classification problems, the f1 score will be a better measure of model performance.&amp;lt;/p&amp;gt;
&amp;lt;h1&amp;gt;
  &amp;lt;a name="adding-one-more-hidden-layer" href="#adding-one-more-hidden-layer" class="anchor"&amp;gt;
  &amp;lt;/a&amp;gt;
  Adding one more hidden layer
&amp;lt;/h1&amp;gt;

&amp;lt;p&amp;gt;Lets explore our model adding one hidden layer in between input layer and output layer and compare the performance of model from previous one.&amp;lt;br&amp;gt;
&amp;lt;/p&amp;gt;
&amp;lt;div class="highlight"&amp;gt;&amp;lt;pre class="highlight plaintext"&amp;gt;&amp;lt;code&amp;gt;# simple NN with one hidden layer
model = Sequential()
model.add(Dense(input_dim=9401, units=800))
model.add(Dense(800))
model.add(Dense(out_shape, activation='softmax'))
model.summary()


Model: "sequential_1"
_________________________________________________________________
 Layer (type) Output Shape Param #   
=================================================================
 dense_1 (Dense) (None, 800) 7521600   

 dense_2 (Dense) (None, 800) 640800    

 dense_3 (Dense) (None, 7) 5607      

=================================================================
Total params: 8,168,007
Trainable params: 8,168,007
Non-trainable params: 0
_________________________________________________________________

&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;&amp;lt;/div&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;In above model we use all same but we add one extra hidden layer which has 800 units which means there are 800 neurons.&amp;lt;br&amp;gt;
&amp;lt;/p&amp;gt;
&amp;lt;div class="highlight"&amp;gt;&amp;lt;pre class="highlight plaintext"&amp;gt;&amp;lt;code&amp;gt;model.compile(loss='categorical_crossentropy', 
              optimizer='adam', 
              metrics=['accuracy'])

&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;&amp;lt;/div&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;Lets fit model.&amp;lt;br&amp;gt;
&amp;lt;/p&amp;gt;
&amp;lt;div class="highlight"&amp;gt;&amp;lt;pre class="highlight plaintext"&amp;gt;&amp;lt;code&amp;gt;history = model.fit(X_train.toarray(), Y_train, epochs=100, batch_size=35, validation_data=(X_test.toarray(), Y_test))


Epoch 1/100
126/126 [==============================] - 1s 8ms/step - loss: 1.0233 - accuracy: 0.6690 - val_loss: 0.7602 - val_accuracy: 0.7658
Epoch 2/100
126/126 [==============================] - 1s 7ms/step - loss: 0.3653 - accuracy: 0.8956 - val_loss: 0.8770 - val_accuracy: 0.7630
Epoch 3/100
126/126 [==============================] - 1s 6ms/step - loss: 0.2748 - accuracy: 0.9143 - val_loss: 0.8179 - val_accuracy: 0.7719
Epoch 4/100
126/126 [==============================] - 1s 6ms/step - loss: 0.2407 - accuracy: 0.9233 - val_loss: 0.8869 - val_accuracy: 0.7616
Epoch 5/100
126/126 [==============================] - 1s 6ms/step - loss: 0.2325 - accuracy: 0.9203 - val_loss: 0.8362 - val_accuracy: 0.7849
Epoch 6/100
126/126 [==============================] - 1s 6ms/step - loss: 0.2135 - accuracy: 0.9230 - val_loss: 0.8728 - val_accuracy: 0.7733
Epoch 7/100
126/126 [==============================] - 1s 6ms/step - loss: 0.2102 - accuracy: 0.9255 - val_loss: 0.8760 - val_accuracy: 0.7788
Epoch 8/100
126/126 [==============================] - 1s 6ms/step - loss: 0.2127 - accuracy: 0.9251 - val_loss: 0.9314 - val_accuracy: 0.7466
Epoch 9/100
126/126 [==============================] - 1s 6ms/step - loss: 0.2107 - accuracy: 0.9230 - val_loss: 1.0136 - val_accuracy: 0.7616
Epoch 10/100
126/126 [==============================] - 1s 8ms/step - loss: 0.2009 - accuracy: 0.9274 - val_loss: 0.9555 - val_accuracy: 0.7692
Epoch 11/100
126/126 [==============================] - 1s 9ms/step - loss: 0.1977 - accuracy: 0.9274 - val_loss: 0.9830 - val_accuracy: 0.7781
Epoch 12/100
126/126 [==============================] - 1s 9ms/step - loss: 0.1908 - accuracy: 0.9278 - val_loss: 0.9244 - val_accuracy: 0.7555
Epoch 13/100
126/126 [==============================] - 1s 7ms/step - loss: 0.1828 - accuracy: 0.9258 - val_loss: 0.9641 - val_accuracy: 0.7548
Epoch 14/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1740 - accuracy: 0.9301 - val_loss: 1.0089 - val_accuracy: 0.7630
Epoch 15/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1811 - accuracy: 0.9301 - val_loss: 0.9290 - val_accuracy: 0.7603
Epoch 16/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1690 - accuracy: 0.9322 - val_loss: 0.9671 - val_accuracy: 0.7493
Epoch 17/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1743 - accuracy: 0.9280 - val_loss: 1.0466 - val_accuracy: 0.7822
Epoch 18/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1785 - accuracy: 0.9294 - val_loss: 1.2220 - val_accuracy: 0.7500
Epoch 19/100
126/126 [==============================] - 1s 6ms/step - loss: 0.2090 - accuracy: 0.9260 - val_loss: 1.1526 - val_accuracy: 0.7514
Epoch 20/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1936 - accuracy: 0.9310 - val_loss: 1.0568 - val_accuracy: 0.7459
Epoch 21/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1746 - accuracy: 0.9278 - val_loss: 1.0345 - val_accuracy: 0.7527
Epoch 22/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1600 - accuracy: 0.9326 - val_loss: 1.0820 - val_accuracy: 0.7740
Epoch 23/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1823 - accuracy: 0.9278 - val_loss: 0.9914 - val_accuracy: 0.7664
Epoch 24/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1770 - accuracy: 0.9331 - val_loss: 1.0346 - val_accuracy: 0.7719
Epoch 25/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1738 - accuracy: 0.9342 - val_loss: 1.0696 - val_accuracy: 0.7678
Epoch 26/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1678 - accuracy: 0.9310 - val_loss: 1.0201 - val_accuracy: 0.7692
Epoch 27/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1551 - accuracy: 0.9340 - val_loss: 1.0547 - val_accuracy: 0.7521
Epoch 28/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1590 - accuracy: 0.9338 - val_loss: 0.9871 - val_accuracy: 0.7699
Epoch 29/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1527 - accuracy: 0.9340 - val_loss: 1.0202 - val_accuracy: 0.7767
Epoch 30/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1560 - accuracy: 0.9290 - val_loss: 1.0242 - val_accuracy: 0.7658
Epoch 31/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1507 - accuracy: 0.9326 - val_loss: 0.9984 - val_accuracy: 0.7781
Epoch 32/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1521 - accuracy: 0.9310 - val_loss: 1.0071 - val_accuracy: 0.7753
Epoch 33/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1542 - accuracy: 0.9322 - val_loss: 1.0931 - val_accuracy: 0.7644
Epoch 34/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1584 - accuracy: 0.9328 - val_loss: 1.0295 - val_accuracy: 0.7760
Epoch 35/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1521 - accuracy: 0.9324 - val_loss: 1.0787 - val_accuracy: 0.7801
Epoch 36/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1527 - accuracy: 0.9370 - val_loss: 1.0878 - val_accuracy: 0.7788
Epoch 37/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1695 - accuracy: 0.9317 - val_loss: 1.1328 - val_accuracy: 0.7685
Epoch 38/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1639 - accuracy: 0.9283 - val_loss: 1.0733 - val_accuracy: 0.7664
Epoch 39/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1525 - accuracy: 0.9322 - val_loss: 1.1290 - val_accuracy: 0.7829
Epoch 40/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1505 - accuracy: 0.9347 - val_loss: 1.0747 - val_accuracy: 0.7781
Epoch 41/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1483 - accuracy: 0.9319 - val_loss: 1.0657 - val_accuracy: 0.7671
Epoch 42/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1482 - accuracy: 0.9335 - val_loss: 1.0446 - val_accuracy: 0.7685
Epoch 43/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1424 - accuracy: 0.9349 - val_loss: 1.0357 - val_accuracy: 0.7863
Epoch 44/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1530 - accuracy: 0.9312 - val_loss: 1.3468 - val_accuracy: 0.7521
Epoch 45/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1688 - accuracy: 0.9296 - val_loss: 1.2449 - val_accuracy: 0.7589
Epoch 46/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1566 - accuracy: 0.9319 - val_loss: 1.1356 - val_accuracy: 0.7678
Epoch 47/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1547 - accuracy: 0.9322 - val_loss: 1.0903 - val_accuracy: 0.7644
Epoch 48/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1478 - accuracy: 0.9303 - val_loss: 1.0850 - val_accuracy: 0.7637
Epoch 49/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1443 - accuracy: 0.9335 - val_loss: 1.0880 - val_accuracy: 0.7562
Epoch 50/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1442 - accuracy: 0.9312 - val_loss: 1.0931 - val_accuracy: 0.7815
Epoch 51/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1503 - accuracy: 0.9349 - val_loss: 1.0529 - val_accuracy: 0.7630
Epoch 52/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1448 - accuracy: 0.9324 - val_loss: 1.0910 - val_accuracy: 0.7658
Epoch 53/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1444 - accuracy: 0.9290 - val_loss: 1.0953 - val_accuracy: 0.7692
Epoch 54/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1427 - accuracy: 0.9335 - val_loss: 1.0905 - val_accuracy: 0.7726
Epoch 55/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1402 - accuracy: 0.9358 - val_loss: 1.1630 - val_accuracy: 0.7664
Epoch 56/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1372 - accuracy: 0.9358 - val_loss: 1.2109 - val_accuracy: 0.7678
Epoch 57/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1468 - accuracy: 0.9322 - val_loss: 1.1316 - val_accuracy: 0.7568
Epoch 58/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1442 - accuracy: 0.9335 - val_loss: 1.1098 - val_accuracy: 0.7582
Epoch 59/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1422 - accuracy: 0.9335 - val_loss: 1.1099 - val_accuracy: 0.7582
Epoch 60/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1409 - accuracy: 0.9340 - val_loss: 1.0642 - val_accuracy: 0.7719
Epoch 61/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1413 - accuracy: 0.9356 - val_loss: 1.1239 - val_accuracy: 0.7678
Epoch 62/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1424 - accuracy: 0.9338 - val_loss: 1.1617 - val_accuracy: 0.7637
Epoch 63/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1447 - accuracy: 0.9306 - val_loss: 1.0902 - val_accuracy: 0.7603
Epoch 64/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1515 - accuracy: 0.9342 - val_loss: 1.1615 - val_accuracy: 0.7630
Epoch 65/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1452 - accuracy: 0.9340 - val_loss: 1.1811 - val_accuracy: 0.7603
Epoch 66/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1404 - accuracy: 0.9338 - val_loss: 1.1923 - val_accuracy: 0.7678
Epoch 67/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1387 - accuracy: 0.9372 - val_loss: 1.1625 - val_accuracy: 0.7671
Epoch 68/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1384 - accuracy: 0.9328 - val_loss: 1.1772 - val_accuracy: 0.7678
Epoch 69/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1445 - accuracy: 0.9333 - val_loss: 1.1646 - val_accuracy: 0.7719
Epoch 70/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1409 - accuracy: 0.9317 - val_loss: 1.1859 - val_accuracy: 0.7733
Epoch 71/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1363 - accuracy: 0.9340 - val_loss: 1.1493 - val_accuracy: 0.7767
Epoch 72/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1413 - accuracy: 0.9347 - val_loss: 1.1126 - val_accuracy: 0.7705
Epoch 73/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1395 - accuracy: 0.9335 - val_loss: 1.1422 - val_accuracy: 0.7637
Epoch 74/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1485 - accuracy: 0.9358 - val_loss: 1.4028 - val_accuracy: 0.7541
Epoch 75/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1518 - accuracy: 0.9354 - val_loss: 1.4361 - val_accuracy: 0.7726
Epoch 76/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1571 - accuracy: 0.9335 - val_loss: 1.3987 - val_accuracy: 0.7589
Epoch 77/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1570 - accuracy: 0.9347 - val_loss: 1.3393 - val_accuracy: 0.7637
Epoch 78/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1512 - accuracy: 0.9347 - val_loss: 1.3479 - val_accuracy: 0.7623
Epoch 79/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1458 - accuracy: 0.9331 - val_loss: 1.3049 - val_accuracy: 0.7562
Epoch 80/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1479 - accuracy: 0.9340 - val_loss: 1.3393 - val_accuracy: 0.7644
Epoch 81/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1394 - accuracy: 0.9386 - val_loss: 1.2484 - val_accuracy: 0.7671
Epoch 82/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1391 - accuracy: 0.9370 - val_loss: 1.2412 - val_accuracy: 0.7644
Epoch 83/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1368 - accuracy: 0.9365 - val_loss: 1.2545 - val_accuracy: 0.7664
Epoch 84/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1427 - accuracy: 0.9340 - val_loss: 1.2018 - val_accuracy: 0.7603
Epoch 85/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1325 - accuracy: 0.9372 - val_loss: 1.2868 - val_accuracy: 0.7651
Epoch 86/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1362 - accuracy: 0.9349 - val_loss: 1.2306 - val_accuracy: 0.7658
Epoch 87/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1338 - accuracy: 0.9333 - val_loss: 1.2871 - val_accuracy: 0.7616
Epoch 88/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1367 - accuracy: 0.9331 - val_loss: 1.2153 - val_accuracy: 0.7836
Epoch 89/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1351 - accuracy: 0.9363 - val_loss: 1.2591 - val_accuracy: 0.7699
Epoch 90/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1315 - accuracy: 0.9349 - val_loss: 1.2360 - val_accuracy: 0.7616
Epoch 91/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1317 - accuracy: 0.9372 - val_loss: 1.3478 - val_accuracy: 0.7534
Epoch 92/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1383 - accuracy: 0.9317 - val_loss: 1.2972 - val_accuracy: 0.7685
Epoch 93/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1365 - accuracy: 0.9349 - val_loss: 1.2775 - val_accuracy: 0.7562
Epoch 94/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1374 - accuracy: 0.9406 - val_loss: 1.2316 - val_accuracy: 0.7644
Epoch 95/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1352 - accuracy: 0.9338 - val_loss: 1.2494 - val_accuracy: 0.7630
Epoch 96/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1555 - accuracy: 0.9370 - val_loss: 1.3208 - val_accuracy: 0.7548
Epoch 97/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1346 - accuracy: 0.9360 - val_loss: 1.2599 - val_accuracy: 0.7651
Epoch 98/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1382 - accuracy: 0.9363 - val_loss: 1.2476 - val_accuracy: 0.7616
Epoch 99/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1324 - accuracy: 0.9344 - val_loss: 1.2851 - val_accuracy: 0.7637
Epoch 100/100
126/126 [==============================] - 1s 6ms/step - loss: 0.1432 - accuracy: 0.9358 - val_loss: 1.2710 - val_accuracy: 0.7664


performance_model(history, model, X_test, Y_test, classes)

&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;&amp;lt;/div&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;&amp;lt;img src="https://iamdurga.github.io/assets/simple_nn/output_41_0.png" alt="image"&amp;gt;&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;&amp;lt;img src="https://iamdurga.github.io/assets/simple_nn/output_41_1.png" alt="image"&amp;gt;&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;Compare to the previous one our model get fluctuate due to overfitting or it may happen due to under fitting. Let’s add some dropeout. Drouptout is used to regulate our model in case of oaverfitting.&amp;lt;br&amp;gt;
&amp;lt;/p&amp;gt;
&amp;lt;div class="highlight"&amp;gt;&amp;lt;pre class="highlight plaintext"&amp;gt;&amp;lt;code&amp;gt;# simple NN
model = Sequential()
model.add(Dense(input_dim=in_shape, units=800))
model.add(Dense(800))
model.add(Dropout(0.5))
model.add(Dense(out_shape, activation='softmax'))

model.summary()


Model: "sequential_2"
_________________________________________________________________
 Layer (type) Output Shape Param #   
=================================================================
 dense_4 (Dense) (None, 800) 7521600   

 dense_5 (Dense) (None, 800) 640800    

 dropout (Dropout) (None, 800) 0         

 dense_6 (Dense) (None, 7) 5607      

=================================================================
Total params: 8,168,007
Trainable params: 8,168,007
Non-trainable params: 0
_________________________________________________________________


model.compile(loss='categorical_crossentropy', 
              optimizer='adam', 
              metrics=['accuracy'])


history = model.fit(X_train.toarray(), Y_train, epochs=10, batch_size=32, validation_data=(X_test.toarray(), Y_test))


Epoch 1/10
137/137 [==============================] - 2s 8ms/step - loss: 1.0331 - accuracy: 0.6592 - val_loss: 0.7495 - val_accuracy: 0.7630
Epoch 2/10
137/137 [==============================] - 1s 6ms/step - loss: 0.3937 - accuracy: 0.8856 - val_loss: 0.8275 - val_accuracy: 0.7719
Epoch 3/10
137/137 [==============================] - 1s 6ms/step - loss: 0.3173 - accuracy: 0.9100 - val_loss: 0.8168 - val_accuracy: 0.7747
Epoch 4/10
137/137 [==============================] - 1s 6ms/step - loss: 0.2610 - accuracy: 0.9223 - val_loss: 0.8826 - val_accuracy: 0.7877
Epoch 5/10
137/137 [==============================] - 1s 6ms/step - loss: 0.2415 - accuracy: 0.9258 - val_loss: 0.8702 - val_accuracy: 0.7740
Epoch 6/10
137/137 [==============================] - 1s 6ms/step - loss: 0.2325 - accuracy: 0.9267 - val_loss: 0.9430 - val_accuracy: 0.7630
Epoch 7/10
137/137 [==============================] - 1s 6ms/step - loss: 0.2360 - accuracy: 0.9212 - val_loss: 0.9362 - val_accuracy: 0.7781
Epoch 8/10
137/137 [==============================] - 1s 6ms/step - loss: 0.2266 - accuracy: 0.9233 - val_loss: 0.9364 - val_accuracy: 0.7815
Epoch 9/10
137/137 [==============================] - 1s 6ms/step - loss: 0.2180 - accuracy: 0.9246 - val_loss: 0.8815 - val_accuracy: 0.7747
Epoch 10/10
137/137 [==============================] - 1s 6ms/step - loss: 0.2200 - accuracy: 0.9269 - val_loss: 0.8999 - val_accuracy: 0.7822


performance_model(history, model, X_test, Y_test, classes)

&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;&amp;lt;/div&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;&amp;lt;img src="https://iamdurga.github.io/assets/simple_nn/output_46_0.png" alt="image"&amp;gt;&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;&amp;lt;img src="https://iamdurga.github.io/assets/simple_nn/output_46_1.png" alt="image"&amp;gt;&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;After &amp;lt;code&amp;gt;Dropout&amp;lt;/code&amp;gt; added our model does preetry well. In above block of code we also reduce epoch to get better performance of our model.&amp;lt;/p&amp;gt;
&amp;lt;h1&amp;gt;
  &amp;lt;a name="lets-add-few-more-layer-dropout-and-observe-the-result" href="#lets-add-few-more-layer-dropout-and-observe-the-result" class="anchor"&amp;gt;
  &amp;lt;/a&amp;gt;
  Let’s Add few more Layer, Dropout and Observe the result
&amp;lt;/h1&amp;gt;

&amp;lt;p&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;div class="highlight"&amp;gt;&amp;lt;pre class="highlight plaintext"&amp;gt;&amp;lt;code&amp;gt;# simple NN
model = Sequential()
model.add(Dense(input_dim=in_shape, units=800))
model.add(Dense(800))
model.add(Dropout(0.2))
model.add(Dense(400))
#model.add(Dropout(0.2))
model.add(Dense(out_shape, activation='softmax'))

model.summary()


Model: "sequential_4"
_________________________________________________________________
 Layer (type) Output Shape Param #   
=================================================================
 dense_11 (Dense) (None, 800) 7521600   

 dense_12 (Dense) (None, 800) 640800    

 dropout_2 (Dropout) (None, 800) 0         

 dense_13 (Dense) (None, 400) 320400    

 dense_14 (Dense) (None, 7) 2807      

=================================================================
Total params: 8,485,607
Trainable params: 8,485,607
Non-trainable params: 0
_________________________________________________________________


model.compile(loss='categorical_crossentropy', 
              optimizer='adam', 
              metrics=['accuracy'])


history = model.fit(X_train.toarray(), Y_train, epochs=10, batch_size=32, validation_data=(X_test.toarray(), Y_test))


Epoch 1/10
137/137 [==============================] - 2s 9ms/step - loss: 1.0494 - accuracy: 0.6585 - val_loss: 0.7984 - val_accuracy: 0.7610
Epoch 2/10
137/137 [==============================] - 1s 6ms/step - loss: 0.4105 - accuracy: 0.8862 - val_loss: 0.9538 - val_accuracy: 0.7534
Epoch 3/10
137/137 [==============================] - 1s 6ms/step - loss: 0.3477 - accuracy: 0.9091 - val_loss: 0.9417 - val_accuracy: 0.7445
Epoch 4/10
137/137 [==============================] - 1s 6ms/step - loss: 0.2813 - accuracy: 0.9175 - val_loss: 0.9003 - val_accuracy: 0.7781
Epoch 5/10
137/137 [==============================] - 1s 6ms/step - loss: 0.2623 - accuracy: 0.9217 - val_loss: 0.8869 - val_accuracy: 0.7767
Epoch 6/10
137/137 [==============================] - 1s 6ms/step - loss: 0.2364 - accuracy: 0.9219 - val_loss: 0.8637 - val_accuracy: 0.7760
Epoch 7/10
137/137 [==============================] - 1s 6ms/step - loss: 0.2442 - accuracy: 0.9221 - val_loss: 0.9527 - val_accuracy: 0.7932
Epoch 8/10
137/137 [==============================] - 1s 6ms/step - loss: 0.2369 - accuracy: 0.9203 - val_loss: 0.9556 - val_accuracy: 0.7733
Epoch 9/10
137/137 [==============================] - 1s 6ms/step - loss: 0.2210 - accuracy: 0.9233 - val_loss: 0.9772 - val_accuracy: 0.7808
Epoch 10/10
137/137 [==============================] - 1s 6ms/step - loss: 0.2238 - accuracy: 0.9258 - val_loss: 0.9673 - val_accuracy: 0.7705


performance_model(history, model, X_test, Y_test, classes)

&amp;lt;/code&amp;gt;&amp;lt;/pre&amp;gt;&amp;lt;/div&amp;gt;
&amp;lt;p&amp;gt;&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;&amp;lt;img src="https://iamdurga.github.io/assets/simple_nn/output_51_0.png" alt="image"&amp;gt;&amp;lt;/p&amp;gt;

&amp;lt;p&amp;gt;&amp;lt;img src="https://iamdurga.github.io/assets/simple_nn/output_51_1.png" alt="image"&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;h1&amp;gt;
  &amp;lt;a name="conclusion" href="#conclusion" class="anchor"&amp;gt;
  &amp;lt;/a&amp;gt;
  Conclusion:
&amp;lt;/h1&amp;gt;

&amp;lt;p&amp;gt;The above graph shows that our accuracy and validation accuracy are both increasing significantly, but there is a significant gap between the two.It may be possible to happen for the following reasons:&amp;lt;/p&amp;gt;

&amp;lt;ul&amp;gt;
&amp;lt;li&amp;gt;A class imbalance&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Underfitting&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;Overfitting&amp;lt;/li&amp;gt;
&amp;lt;li&amp;gt;In precise preprocessing Hence we can improve our model and get better performance by adding mode training data and by better preprocessing(we have only limited amount of stop words).&amp;lt;/li&amp;gt;
&amp;lt;/ul&amp;gt;
&lt;/p&gt;

</description>
    </item>
    <item>
      <title>R Exercise: Social Network Analysis</title>
      <dc:creator>Durga Pokharel</dc:creator>
      <pubDate>Fri, 02 Sep 2022 03:06:03 +0000</pubDate>
      <link>https://dev.to/iamdurga/r-exercise-social-network-analysis-2inh</link>
      <guid>https://dev.to/iamdurga/r-exercise-social-network-analysis-2inh</guid>
      <description>&lt;h1&gt;
  
  
  Social Network Analysis
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Definition
&lt;/h2&gt;

&lt;p&gt;Social networks are simply networks of social interactions and personal relationships. Think about our group of friends and how we got to know them. Maybe we met them while ago from our schooling, or maybe we met them through a hobby or through our community. In fact, 72% of all Internet users are active on social media today, including in social interactions and developing personal relationships. However, to understand about social networks, we only don’t need to go through internet or social media, they may come in variety of form in our daily life.&lt;/p&gt;

&lt;h2&gt;
  
  
  Social Network Analysis
&lt;/h2&gt;

&lt;p&gt;Social network analysis (SNA), also known as network science, is a field of data analytics that uses networks and graph theory to understand social structures. We can see network around us such as road network, online network, network of social media such as facebook, twitter etc. Learning SNA allow us to explore insight of various data sources.&lt;/p&gt;

&lt;h2&gt;
  
  
  SNA Graph
&lt;/h2&gt;

&lt;p&gt;We all are familier about the graph which simply the collection of non zero vertex and edge. In order to build SNA graphs, two key components are required actors and relationships. Here, actors represents the vertex and relationship means the edge between two actors. Let us write SNA graph in R code. To do this, we should have &lt;code&gt;igraph&lt;/code&gt; package already install in our R or R studio.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;library(igraph)


## 
## Attaching package: 'igraph'

## The following objects are masked from 'package:stats':
## 
## decompose, spectrum

## The following object is masked from 'package:base':
## 
## union


g &amp;lt;- graph(c(1,2))
plot(g)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-1-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-1-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In figure, we can see the directed graph from node 1 to node 2. From above graph, it is also clear that by default it gives directed graph. In above figure we can not clearly see node and edge so let’s increase its size and give different color to the node.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plot(g,
vertex.color = 'green',
vertex.size = 40,
edge.color ='red',
edge.size = 20)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-2-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-2-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now we change the node color and node font size. Add more node on graph for this we follow the following code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;g &amp;lt;- graph(c(1,2,2,3,3,4,4,1))
plot(g,
     vertex.color = 'green',
     vertex.size =40,
     edge.color = 'red',
     edge.size = 20)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-3-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-3-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We got the graph with four vertex 1,2,3,4. Here, we also got directed graph. Let’s make it for this we need to write directed is false.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;g &amp;lt;- graph(c(1,2,2,3,3,4,4,1),directed = FALSE)
plot(g,
     vertex.color = 'green',
     vertex.size =40,
     edge.color = 'red',
     edge.size = 20)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-4-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-4-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We got our desirable type of graph. Now let’s move forword. We can give the number of vertex with out writing them for this look following code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;g &amp;lt;- graph(c(1,2,2,3,3,4,4,1), 
directed = F, n=7)
plot(g,
     vertex.color = 'green',
     vertex.size =40,
     edge.color = 'red',
     edge.size = 20)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-5-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-5-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In above graph we have given seven vertex numbers. Among them we can see three isolated nodes. The reason to come isolated node is that we did not specify the edge or relationship between them. Also from graph we can make sense that this is not directed graph.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adjacency Matrix
&lt;/h2&gt;

&lt;p&gt;Let’s see what will happen if we type only &lt;code&gt;g[]&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;g[]


## 7 x 7 sparse Matrix of class "dgCMatrix"
##                   
## [1,] . 1 . 1 . . .
## [2,] 1 . 1 . . . .
## [3,] . 1 . 1 . . .
## [4,] 1 . 1 . . . .
## [5,] . . . . . . .
## [6,] . . . . . . .
## [7,] . . . . . . .

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives us the 7 by 7 adjacency matrix. Adjancency matrix is matix where we give 1 if there is edge between two vertex if not then we give 0. But in above matrix it gave simply &lt;code&gt;.&lt;/code&gt; instead of zero.&lt;/p&gt;

&lt;p&gt;Let us try to build our graph by keeping text data in place of number.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;g1 &amp;lt;-
graph(c("Binu","Binita","Binita","Rita"
,"Rita","Binu","Binu","Rita", "Anju", 
"Binita"))
plot(g1,
vertex.color = "green",
vertex.size = 40,
edge.color = "red",
edge.size = 5)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-7-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-7-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we want to check the features of g1 we simply type g1 and press control and enter key. We got following output.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;g1


## IGRAPH d030509 DN-- 4 5 -- 
## + attr: name (v/c)
## + edges from d030509 (vertex names):
## [1] Binu -&amp;gt;Binita Binita-&amp;gt;Rita Rita -&amp;gt;Binu Binu -&amp;gt;Rita Anju -&amp;gt;Binita

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It showed that in graph there are 4 nodes 5 edges. And edges are directed from&lt;code&gt;Binu -&amp;gt;Binita, Binita-&amp;gt;Rita, Rita -&amp;gt;Binu, Binu -&amp;gt;Rita, Anju -&amp;gt;Binita&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Degree
&lt;/h2&gt;

&lt;p&gt;Degree means numbers of connection to each node. Let’s check the degree of graph g1. To check degree we can do&lt;code&gt;degree(g1)&lt;/code&gt; or &lt;code&gt;degree(g1, mode='all')&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;degree(g1) 


## Binu Binita Rita Anju 
## 3 3 3 1

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Degree of Binu is 3 similarly Anju has degree 1.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;degree(g1, mode='all')


## Binu Binita Rita Anju 
## 3 3 3 1

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Diameter
&lt;/h2&gt;

&lt;p&gt;Diameter means number of edged inside and outside of SND. Now, let’s check the diameter of graph.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;diameter(g1, directed = F, weights = 
NA)


## [1] 2

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We got two diameter. i.e &lt;code&gt;Anju &amp;lt;- Binita &amp;lt;- Rita&lt;/code&gt;, &lt;code&gt;Anju &amp;lt;- Binita &amp;lt;- Binu&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Edge density
&lt;/h2&gt;

&lt;p&gt;Edge density means &lt;code&gt;ecount(g1)/(vcount(g1)*(vcount(g1) -1))&lt;/code&gt;. We can calculate it from following code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;edge_density(g1, loops = F)


## [1] 0.4166667

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We got the value of edge density 0.4166667.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reciprocity
&lt;/h2&gt;

&lt;p&gt;Total edges = 5&lt;/p&gt;

&lt;p&gt;Tied edges = 2&lt;/p&gt;

&lt;p&gt;Reciprocity = 2/5 = 0.4&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;reciprocity(g1)


## [1] 0.4

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Closeness
&lt;/h2&gt;

&lt;p&gt;Now getting closeness of graph.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;closeness(g1, mode = "all", weights = NA)


## Binu Binita Rita Anju 
## 0.2500000 0.3333333 0.2500000 0.2000000

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From above result we can see that Binita is closest to the others three persons and Anju is farthest from other three persons.&lt;/p&gt;

&lt;h2&gt;
  
  
  Betweenness
&lt;/h2&gt;

&lt;p&gt;Let’s calculate between of g1&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;betweenness(g1, directed = T, weights = NA)


## Binu Binita Rita Anju 
## 1 2 2 0

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Binita and Rita has 2 inner edge similarly Binu has one inner edge and Anju has no inner edge.&lt;/p&gt;

&lt;h2&gt;
  
  
  Edge Betweenness
&lt;/h2&gt;

&lt;p&gt;For every pair of vertices in a connected graph, there exists at least one shortest path between the vertices.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;edge_betweenness(g1, directed = T, weights = NA)


## [1] 2 4 4 1 3

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  SNA in TwitterData
&lt;/h2&gt;

&lt;p&gt;Here I have loaded&lt;code&gt;Twitterdata&lt;/code&gt; from my local machine.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;load("F:/MDS R/termDocMatrix.rdata")


m&amp;lt;- as.matrix(termDocMatrix)
termM &amp;lt;- m %*% t(m)
termM[1:10,1:10]


## Terms
## Terms analysis applications code computing data examples introduction
## analysis 23 0 1 0 4 4 2
## applications 0 9 0 0 8 0 0
## code 1 0 9 0 1 6 0
## computing 0 0 0 10 2 0 0
## data 4 8 1 2 85 5 3
## examples 4 0 6 0 5 17 2
## introduction 2 0 0 0 3 2 10
## mining 4 7 3 1 50 5 3
## network 12 0 1 0 0 2 2
## package 2 1 0 2 12 2 0
## Terms
## Terms mining network package
## analysis 4 12 2
## applications 7 0 1
## code 3 1 0
## computing 1 0 2
## data 50 0 12
## examples 5 2 2
## introduction 3 2 0
## mining 64 1 6
## network 1 17 1
## package 6 1 27

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we have built a term-term adjacency matrix, where the rows and columns represents terms, and every entry is the number of co-occurrences of two terms. Next we can build a graph with&lt;code&gt;graph.adjacency()&lt;/code&gt; from package igraph.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;g &amp;lt;- graph.adjacency(termM,weighted = T,mode = 'undirected')
g


## IGRAPH d066e18 UNW- 21 151 -- 
## + attr: name (v/c), weight (e/n)
## + edges from d066e18 (vertex names):
## [1] analysis --analysis analysis --code        
## [3] analysis --data analysis --examples    
## [5] analysis --introduction analysis --mining      
## [7] analysis --network analysis --package     
## [9] analysis --positions analysis --postdoctoral
## [11] analysis --r analysis --research    
## [13] analysis --series analysis --slides      
## [15] analysis --social analysis --time        
## + ... omitted several edges

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here we have built graph on &lt;code&gt;termDocMatrix&lt;/code&gt;. In result we can see the edges between different dodes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;g &amp;lt;- simplify(g)
g


## IGRAPH d06a87e UNW- 21 130 -- 
## + attr: name (v/c), weight (e/n)
## + edges from d06a87e (vertex names):
## [1] analysis --code analysis --data        
## [3] analysis --examples analysis --introduction
## [5] analysis --mining analysis --network     
## [7] analysis --package analysis --positions   
## [9] analysis --postdoctoral analysis --r           
## [11] analysis --research analysis --series      
## [13] analysis --slides analysis --social      
## [15] analysis --time analysis --tutorial    
## + ... omitted several edges

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Function &lt;code&gt;simplify()&lt;/code&gt; in igraph handly removes self-loops from a network. We can see in previous graph there are 151 edges in second graph there are only 130 edges. Hence there are 21 self loops they are omitted from the graph.&lt;/p&gt;

&lt;h2&gt;
  
  
  Check Degree of graph and nodes of the graph.
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;V(g)$label &amp;lt;- V(g)$name
V(g)$label


## [1] "analysis" "applications" "code" "computing" "data"        
## [6] "examples" "introduction" "mining" "network" "package"     
## [11] "parallel" "positions" "postdoctoral" "r" "research"    
## [16] "series" "slides" "social" "time" "tutorial"    
## [21] "users"


V(g)$degree &amp;lt;- degree(g)
V(g)$degree


## [1] 17 6 9 9 18 14 12 20 14 13 8 7 8 17 9 11 15 11 11 16 15

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We found degree of graph is 20.&lt;/p&gt;

&lt;h2&gt;
  
  
  Histogram on the basis of degree
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;hist(V(g)$degree, breaks = 100,col = 'green', main = "Frequency Of Degree", 
     xlab = " Degree of vertices", ylab = " Frequency")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-22-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-22-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see most of nodes have degree 9 and 11. We all khow what is degree of graph, number of edges that are incident to the node is called degree of graph.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let’s Plot Graph on the Data.
&lt;/h2&gt;

&lt;p&gt;Let’s set &lt;code&gt;set.seed(3952)&lt;/code&gt;. Where set.seed gives same sample with the same seed value.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;set.seed(3952)

layout1 &amp;lt;- layout.fruchterman.reingold(g)

plot(g, layout=layout1)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-23-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-23-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A different layout can be generated with the first line of code below. The second line produces an interactive plot, which allows us to manually rearrange the layout&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plot(g, layout=layout.kamada.kawai)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-24-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-24-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Make it better.
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;V(g)$label.cex &amp;lt;- 2.2 * V(g)$degree / max(V(g)$degree)+ .2

V(g)$label.color &amp;lt;- rgb(0, 0, .2, .8)

V(g)$frame.color &amp;lt;- NA

egam &amp;lt;- (log(E(g)$weight)+.4) / max(log(E(g)$weight)+.4)

E(g)$color &amp;lt;- rgb(.5, .5, 0, egam)

E(g)$width &amp;lt;- egam
# plot the graph in layout1

plot(g, layout=layout1)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-25-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-25-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here size of words appear according to their degree. From graph we can clearly see that node mining has maximum degree.&lt;/p&gt;

&lt;h2&gt;
  
  
  Community detection
&lt;/h2&gt;

&lt;p&gt;Communities are seen as groups, clusters, coherent subgroups, or modules in different fields; community detection in a social network is identifying sets of nodes in such a way that the connections of nodes within a set are more than their connection to other network nodes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;comm &amp;lt;- cluster_edge_betweenness(g)


## Warning in cluster_edge_betweenness(g): At community.c:461 :Membership vector
## will be selected based on the lowest modularity score.

## Warning in cluster_edge_betweenness(g): At community.c:468 :Modularity
## calculation with weighted edge betweenness community detection might not make
## sense -- modularity treats edge weights as similarities while edge betwenness
## treats them as distances


plot(comm,g)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-26-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-26-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are dense connection within the group within the community the connection is sparse.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;prop &amp;lt;- cluster_label_prop(g)
plot(prop, g)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-27-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-27-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is different type of algorithms to detect community which is different from previous one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hubs
&lt;/h2&gt;

&lt;p&gt;Nodes with most outer edges. We need to find hub score.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;hs &amp;lt;- hub_score(g,weights = NA)$vector
hs


## analysis applications code computing data examples 
## 0.9047777 0.3589289 0.5606314 0.5223206 0.9065063 0.8195307 
## introduction mining network package parallel positions 
## 0.7307838 1.0000000 0.7483791 0.7210610 0.4939614 0.3733995 
## postdoctoral r research series slides social 
## 0.4095660 0.9147530 0.4481802 0.6761093 0.8510808 0.6018664 
## time tutorial users 
## 0.6761093 0.8899001 0.8342594

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Authority
&lt;/h2&gt;

&lt;p&gt;Nodes with most inner edges. We need to find authority score&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;as &amp;lt;- authority_score(g, weights = NA)$vector
as


## analysis applications code computing data examples 
## 0.9047777 0.3589289 0.5606314 0.5223206 0.9065063 0.8195307 
## introduction mining network package parallel positions 
## 0.7307838 1.0000000 0.7483791 0.7210610 0.4939614 0.3733995 
## postdoctoral r research series slides social 
## 0.4095660 0.9147530 0.4481802 0.6761093 0.8510808 0.6018664 
## time tutorial users 
## 0.6761093 0.8899001 0.8342594

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Hub in Plot
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;par(mfrow = c(1,2))
plot(g,vertex.size= hs*50, main = "Hubs",
     vertex.label = NA,
     vertex.colour = rainbow(50))

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-30-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-30-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Authority in Plot
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plot(g,vertex.size= as*30, main = "Authorities",
     vertex.label = NA,
     vertex.colour = rainbow(50))

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-31-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fsna%2Funnamed-chunk-31-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Hub are expected to contain large number of outgoing link. And authority are expected to contain large number of incoming link from hubs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Application of SNA in Real World
&lt;/h2&gt;

&lt;p&gt;Social network analysis can provide information about the reach of gangs, the impact of gangs, and gang activity. The approach may also allow we to identify those who may be at risk of gang-association and/or being exploited by gangs.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>R Exercise: Association Rule Mining in R</title>
      <dc:creator>Durga Pokharel</dc:creator>
      <pubDate>Fri, 02 Sep 2022 03:04:45 +0000</pubDate>
      <link>https://dev.to/iamdurga/r-exercise-association-rule-mining-in-r-3lpb</link>
      <guid>https://dev.to/iamdurga/r-exercise-association-rule-mining-in-r-3lpb</guid>
      <description>&lt;h1&gt;
  
  
  Association Rule Mining
&lt;/h1&gt;

&lt;p&gt;Association rule mining (also known as Association Rule Learning) is a typical technique for determining relationships (co-occurrence) between many variables. It is mostly used in grocery stores, e-commerce websites, and other similar establishments.&lt;/p&gt;

&lt;p&gt;in addition to massive transactional data bases&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon knows what else you want to buy when you order something on their site.&lt;/strong&gt; This is a very prevalent example in our daily life.&lt;/p&gt;

&lt;p&gt;Spotify works on the same principle: they know what song you want to listen to next.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use of Association Mining Results
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Changing the store layout according to trends&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Customer behavior analysis&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Catalogue design&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cross markteing on online store&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What are the trending items customers buy&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Customized email with add-on sales&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When Association Mining is used?
&lt;/h2&gt;

&lt;p&gt;When we wish to find an association between different objects in a collection, find frequent patterns in a transaction database, relational databases, or any other information repository, we utilize association rule mining. In retailling clustering and classification, association rule mining is found in ‘Marketing Basket Analysis.’&lt;/p&gt;

&lt;p&gt;By developing a set of rules known as &lt;strong&gt;Association Rules&lt;/strong&gt; , it can tell us what things clients commonly buy together. In simple terms, it generates output in the manner of &lt;strong&gt;if this, then that&lt;/strong&gt; rules.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Apriori Algorithm and Rule?
&lt;/h2&gt;

&lt;p&gt;Data from a retail market or an online e-commerce store is typically used to mine association rules. The ‘apriori algorithm’ makes it easier to detect these patterns or rules rapidly because most transaction data is huge. Using apriori() with all of the rules in the data is not a smart idea!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rule&lt;/strong&gt; A rule is a note that shows which things are frequently purchased together.&lt;/p&gt;

&lt;p&gt;It has two parts: a ‘LHS’ and a ‘RHS’, which can be represented as follows:&lt;/p&gt;

&lt;p&gt;‘itemssetA =&amp;gt; itemssetB’ is a condition.&lt;/p&gt;

&lt;h2&gt;
  
  
  Some Association Rule Mining Terms
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Support
&lt;/h3&gt;

&lt;p&gt;Association rule are given in the following form,&lt;/p&gt;

&lt;p&gt;&lt;code&gt;A=&amp;gt;B[support, confidence]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Where A and B are sets of items in the transation data. A and B are disjoint sets.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Support = Number of transactions with both A and B / Total number of
transactions = P(A∩B) = frequency(A, B)/N

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Confidence
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Confidence = Number of transactions with both A and B / Total number of
transactions with A = P(A∩B)/P(A) = frequenc(A, B)/frequency(A)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Expected Confidence
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Expected Confidence = Number of transactions withB/Total number of
transactions = P(B) = frequency(B)/N

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  List
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Lift = Confidence/ExpectedConfidence =P(A∩B)/P(A).P(B) =
Support(A,B)/Support(A).Support(B)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Let’s Do Association Rule Mining in R
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Create a list of baskets
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;market_basket&amp;lt;- list(c("bread", "milk"),
                     c("bread","dipers","beer","Egg"),
                     c("milk","dipers","beer","coka"),
                     c("bread","milk","dipers","beer"),
                     c("bread","milk","dipers","coka")
                     )
names(market_basket) &amp;lt;- paste("T",c(1:5),sep = "")
market_basket


## $T1
## [1] "bread" "milk" 
## 
## $T2
## [1] "bread" "dipers" "beer" "Egg"   
## 
## $T3
## [1] "milk" "dipers" "beer" "coka"  
## 
## $T4
## [1] "bread" "milk" "dipers" "beer"  
## 
## $T5
## [1] "bread" "milk" "dipers" "coka"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The five transcations were created from the preceding data and given the names T1,T2,T2,T4,T5.&lt;/p&gt;

&lt;p&gt;Now we’ll use the ‘arules packages’ to do some more association rule mining. To move on, we should have installed mention packages.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;library(arules)


## Warning: package 'arules' was built under R version 4.1.2

## Loading required package: Matrix

## 
## Attaching package: 'arules'

## The following objects are masked from 'package:base':
## 
## abbreviate, write

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s make Transformation of transactions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;trans &amp;lt;- as(market_basket,"transactions")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s check dimension of trans variable,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dim(trans)


## [1] 5 6

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We received the response 5 6 here, which means we have 5 transactions and 6 products. Let’s take a look at the labels on the objects.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;labels(trans)


## [1] "{bread,milk}" "{beer,bread,dipers,Egg}" 
## [3] "{beer,coka,dipers,milk}" "{beer,bread,dipers,milk}"
## [5] "{bread,coka,dipers,milk}"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here we got items names we have.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;summary(trans)


## transactions as itemMatrix in sparse format with
## 5 rows (elements/itemsets/transactions) and
## 6 columns (items) and a density of 0.6 
## 
## most frequent items:
## bread dipers milk beer coka (Other) 
## 4 4 4 3 2 1 
## 
## element (itemset/transaction) length distribution:
## sizes
## 2 4 
## 1 4 
## 
## Min. 1st Qu. Median Mean 3rd Qu. Max. 
## 2.0 4.0 4.0 3.6 4.0 4.0 
## 
## includes extended item information - examples:
## labels
## 1 beer
## 2 bread
## 3 coka
## 
## includes extended transaction information - examples:
## transactionID
## 1 T1
## 2 T2
## 3 T3

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Let’s inspect the &lt;code&gt;trans&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;inspect(trans)


## items transactionID
## [1] {bread, milk} T1           
## [2] {beer, bread, dipers, Egg} T2           
## [3] {beer, coka, dipers, milk} T3           
## [4] {beer, bread, dipers, milk} T4           
## [5] {bread, coka, dipers, milk} T5

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is preferable to employ the inspect function. It will display ten transactions. In this case, if our data is really huge, a larger number of transaction inspect functions will be necessary.&lt;/p&gt;

&lt;h3&gt;
  
  
  Relative frequency plot and plot of trans
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;itemFrequencyPlot(trans, topN=10, cex.names =1)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fassociation-rule%2Funnamed-chunk-8-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fassociation-rule%2Funnamed-chunk-8-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Most sold items were &lt;code&gt;bread&lt;/code&gt;, &lt;code&gt;milk&lt;/code&gt; and &lt;code&gt;beer&lt;/code&gt; similarly less sold item is &lt;code&gt;Egg&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;image(trans)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fassociation-rule%2Funnamed-chunk-9-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fassociation-rule%2Funnamed-chunk-9-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Apriori Algorithms is important here?
&lt;/h2&gt;

&lt;p&gt;Because it necessitates a thorough database scan, Frequent Item Set Generation is the most computationally intensive stage. We saw an example of only 5 transactions in the previous example, but real-world transaction data for retail might surpass GBs and TBs of data, necessitating the use of an optimal technique to prune out Item-sets that will not aid in further phases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Apriori algorithm of “trans” without/with min. support of 0.3 and min. confidence of 0.5.
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rues &amp;lt;- apriori(trans)


## Apriori
## 
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
## 
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
## 
## Absolute minimum support count: 0 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[6 item(s), 5 transaction(s)] done [0.00s].
## sorting and recoding items ... [6 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [31 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].


rules


## function (rhs, lhs, itemLabels, quality = data.frame()) 
## {
## if (!is(lhs, "itemMatrix")) 
## lhs &amp;lt;- encode(lhs, itemLabels = itemLabels)
## if (!is(rhs, "itemMatrix")) 
## rhs &amp;lt;- encode(rhs, itemLabels = itemLabels)
## new("rules", lhs = lhs, rhs = rhs, quality = quality)
## }
## &amp;lt;bytecode: 0x0000000022947bd8&amp;gt;
## &amp;lt;environment: namespace:arules&amp;gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, we made set of 15 rules.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rules &amp;lt;- apriori(trans, parameter = list(supp=0.3,conf=0.5,
                                         maxlen=10,
                                         target ="rules"))


## Apriori
## 
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.3 1
## maxlen target ext
## 10 rules TRUE
## 
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
## 
## Absolute minimum support count: 1 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[6 item(s), 5 transaction(s)] done [0.00s].
## sorting and recoding items ... [5 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [32 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note: maxlen= maximum length of the transaction! We could have used maxlen= 4 here as we know it but this will not be known in real-life!&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Summary of rules
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;summary(rules)


## set of 32 rules
## 
## rule length distribution (lhs + rhs):sizes
## 1 2 3 
## 4 16 12 
## 
## Min. 1st Qu. Median Mean 3rd Qu. Max. 
## 1.00 2.00 2.00 2.25 3.00 3.00 
## 
## summary of quality measures:
## support confidence coverage lift       
## Min. :0.4000 Min. :0.5000 Min. :0.4000 Min. :0.8333  
## 1st Qu.:0.4000 1st Qu.:0.6667 1st Qu.:0.6000 1st Qu.:0.8333  
## Median :0.4000 Median :0.7500 Median :0.6000 Median :1.0000  
## Mean :0.4938 Mean :0.7474 Mean :0.6813 Mean :1.0473  
## 3rd Qu.:0.6000 3rd Qu.:0.8000 3rd Qu.:0.8000 3rd Qu.:1.2500  
## Max. :0.8000 Max. :1.0000 Max. :1.0000 Max. :1.6667  
## count      
## Min. :2.000  
## 1st Qu.:2.000  
## Median :2.000  
## Mean :2.469  
## 3rd Qu.:3.000  
## Max. :4.000  
## 
## mining info:
## data ntransactions support confidence
## trans 5 0.3 0.5
## call
## apriori(data = trans, parameter = list(supp = 0.3, conf = 0.5, maxlen = 10, target = "rules"))

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A collection of 32 rules has been created. There are four rules in transaction 1, sixteen in transaction two, and twelve in transaction three. There were also several empty rues generated here. Let’s get rid of these useless rules.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rules &amp;lt;- apriori(trans, parameter = list(supp=0.3,conf = 0.5,
                                         maxlen =10,
                                         minlen=2,
                                         target="rules"))


## Apriori
## 
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.3 2
## maxlen target ext
## 10 rules TRUE
## 
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
## 
## Absolute minimum support count: 1 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[6 item(s), 5 transaction(s)] done [0.00s].
## sorting and recoding items ... [5 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [28 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Let’s set RHS rule for trans data
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# we set rhs =beer and default = lhs
beer_rules_rhs&amp;lt;- apriori(trans, parameter = list(supp= 0.3,conf= 0.5,
                                                 maxlen= 10,
                                                 minlen=2),
                         appearance = list(default="lhs",
                                           rhs ="beer"))


## Apriori
## 
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.3 2
## maxlen target ext
## 10 rules TRUE
## 
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
## 
## Absolute minimum support count: 1 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[6 item(s), 5 transaction(s)] done [0.00s].
## sorting and recoding items ... [5 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [5 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].


inspect(beer_rules_rhs)


## lhs rhs support confidence coverage lift count
## [1] {bread} =&amp;gt; {beer} 0.4 0.5000000 0.8 0.8333333 2    
## [2] {milk} =&amp;gt; {beer} 0.4 0.5000000 0.8 0.8333333 2    
## [3] {dipers} =&amp;gt; {beer} 0.6 0.7500000 0.8 1.2500000 3    
## [4] {bread, dipers} =&amp;gt; {beer} 0.4 0.6666667 0.6 1.1111111 2    
## [5] {dipers, milk} =&amp;gt; {beer} 0.4 0.6666667 0.6 1.1111111 2

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;People who bought beer said their most recent purchase was dipers, and their most recent sales were breads, dipers, and dipers, as well as milk. It’s a pretty interesting data insight. We can deduce from this that the fathers most likely went to the grocery store to get the baby’s necessities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let’s put beer in LHS and set RHS as default values
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;beer_rules_lhs &amp;lt;- apriori(trans, parameter = list(supp=0.3,conf=0.5,
                                                  maxlen =10,
                                                  minlen =2),
                          appearance = list(default="rhs",lhs ="beer"))


## Apriori
## 
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.3 2
## maxlen target ext
## 10 rules TRUE
## 
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
## 
## Absolute minimum support count: 1 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[6 item(s), 5 transaction(s)] done [0.00s].
## sorting and recoding items ... [5 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [3 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].


inspect(beer_rules_lhs)


## lhs rhs support confidence coverage lift count
## [1] {beer} =&amp;gt; {bread} 0.4 0.6666667 0.6 0.8333333 2    
## [2] {beer} =&amp;gt; {milk} 0.4 0.6666667 0.6 0.8333333 2    
## [3] {beer} =&amp;gt; {dipers} 0.6 1.0000000 0.6 1.2500000 3

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;People who bought beer would then buy dipers. We can see that the person who buys the most drinks also buys the most diapers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Product Recommendation Rule
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rules_conf&amp;lt;- sort(rules,by ="confidence",
                  decreasing = TRUE)
inspect(rules_conf)


## lhs rhs support confidence coverage lift count
## [1] {coka} =&amp;gt; {milk} 0.4 1.0000000 0.4 1.2500000 2    
## [2] {coka} =&amp;gt; {dipers} 0.4 1.0000000 0.4 1.2500000 2    
## [3] {beer} =&amp;gt; {dipers} 0.6 1.0000000 0.6 1.2500000 3    
## [4] {coka, milk} =&amp;gt; {dipers} 0.4 1.0000000 0.4 1.2500000 2    
## [5] {coka, dipers} =&amp;gt; {milk} 0.4 1.0000000 0.4 1.2500000 2    
## [6] {beer, milk} =&amp;gt; {dipers} 0.4 1.0000000 0.4 1.2500000 2    
## [7] {beer, bread} =&amp;gt; {dipers} 0.4 1.0000000 0.4 1.2500000 2    
## [8] {dipers} =&amp;gt; {beer} 0.6 0.7500000 0.8 1.2500000 3    
## [9] {milk} =&amp;gt; {bread} 0.6 0.7500000 0.8 0.9375000 3    
## [10] {bread} =&amp;gt; {milk} 0.6 0.7500000 0.8 0.9375000 3    
## [11] {milk} =&amp;gt; {dipers} 0.6 0.7500000 0.8 0.9375000 3    
## [12] {dipers} =&amp;gt; {milk} 0.6 0.7500000 0.8 0.9375000 3    
## [13] {bread} =&amp;gt; {dipers} 0.6 0.7500000 0.8 0.9375000 3    
## [14] {dipers} =&amp;gt; {bread} 0.6 0.7500000 0.8 0.9375000 3    
## [15] {beer} =&amp;gt; {milk} 0.4 0.6666667 0.6 0.8333333 2    
## [16] {beer} =&amp;gt; {bread} 0.4 0.6666667 0.6 0.8333333 2    
## [17] {dipers, milk} =&amp;gt; {coka} 0.4 0.6666667 0.6 1.6666667 2    
## [18] {beer, dipers} =&amp;gt; {milk} 0.4 0.6666667 0.6 0.8333333 2    
## [19] {dipers, milk} =&amp;gt; {beer} 0.4 0.6666667 0.6 1.1111111 2    
## [20] {beer, dipers} =&amp;gt; {bread} 0.4 0.6666667 0.6 0.8333333 2    
## [21] {bread, dipers} =&amp;gt; {beer} 0.4 0.6666667 0.6 1.1111111 2    
## [22] {bread, milk} =&amp;gt; {dipers} 0.4 0.6666667 0.6 0.8333333 2    
## [23] {dipers, milk} =&amp;gt; {bread} 0.4 0.6666667 0.6 0.8333333 2    
## [24] {bread, dipers} =&amp;gt; {milk} 0.4 0.6666667 0.6 0.8333333 2    
## [25] {milk} =&amp;gt; {coka} 0.4 0.5000000 0.8 1.2500000 2    
## [26] {dipers} =&amp;gt; {coka} 0.4 0.5000000 0.8 1.2500000 2    
## [27] {milk} =&amp;gt; {beer} 0.4 0.5000000 0.8 0.8333333 2    
## [28] {bread} =&amp;gt; {beer} 0.4 0.5000000 0.8 0.8333333 2

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The rules are sorted by confidence in decreasing order in the above results.&lt;/p&gt;

&lt;h2&gt;
  
  
  Plotting rules with “arulesViz” package
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;library(arulesViz)


## Warning: package 'arulesViz' was built under R version 4.1.2


plot(rules)


## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fassociation-rule%2Funnamed-chunk-17-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fassociation-rule%2Funnamed-chunk-17-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here, darker orange color indicate those items whose lift value is maximun when lift values decrease colar also become light orange.&lt;/p&gt;

&lt;p&gt;Let’s plot the same plot by setting &lt;code&gt;measure = "confidence"&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plot(rules, measure = "confidence")


## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fassociation-rule%2Funnamed-chunk-18-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fassociation-rule%2Funnamed-chunk-18-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Plot &lt;code&gt;two-key-plot&lt;/code&gt;
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;library(arulesViz)
plot(rules, method = 'two-key plot')


## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fassociation-rule%2Funnamed-chunk-19-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fassociation-rule%2Funnamed-chunk-19-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Plot with “ggplot2” engine
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;library(ggplot2)

plot(rules, engine = "ggplot2")


## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fassociation-rule%2Funnamed-chunk-20-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fassociation-rule%2Funnamed-chunk-20-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we hover our curcer above orange points we can see the value of supp, conf as well as left. Darker the orange color more will be the value of corresponing parameters.&lt;/p&gt;

&lt;h2&gt;
  
  
  Parallel Coordinate plot for 10 rules
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plot(subrules, method = "paracoord")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fassociation-rule%2Funnamed-chunk-22-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fassociation-rule%2Funnamed-chunk-22-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We used the parallel coordinate approach to see in higher-dimensional space. In this case, we visualize in ten dimensions.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>R Exercise: Getting Started With ggplot in R</title>
      <dc:creator>Durga Pokharel</dc:creator>
      <pubDate>Fri, 29 Jul 2022 15:34:19 +0000</pubDate>
      <link>https://dev.to/iamdurga/r-exercise-getting-started-with-ggplot-in-r-1a4c</link>
      <guid>https://dev.to/iamdurga/r-exercise-getting-started-with-ggplot-in-r-1a4c</guid>
      <description>&lt;h1&gt;
  
  
  Getting Started with ggplot2 in R
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Grammer
&lt;/h2&gt;

&lt;p&gt;A grammar provides a foundation for understanding diffrent types of graphics. A grammar may also help us on what a well-formed or correct graphic looks like, but there will still be many grammatically correct but nonsensical graphics. This is easy to see by analogy to the English language: good grammar is just the first step in creating a good sentence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Grammar of Graphics
&lt;/h2&gt;

&lt;p&gt;A grammar of graphics is a tool that enables us to clearly describe the components of a graphics. Such a grammar allows us to move beyond named graphics (e.g., the “scatterplot”) and gain insights into the deep structures that underlies the statistical graphics. &lt;code&gt;ggplot2&lt;/code&gt;proposes an alternative parameterization of the grammar, based around the idea of building up a graphic from multiple layers of data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Components of ggplot2
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Data and aesthetic mappings&lt;/li&gt;
&lt;li&gt;Geometric objects&lt;/li&gt;
&lt;li&gt;Scale&lt;/li&gt;
&lt;li&gt;Facet Specification&lt;/li&gt;
&lt;li&gt;Statistical Transformation&lt;/li&gt;
&lt;li&gt;Coordinate Syatem&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Layered grammar of graphics
&lt;/h2&gt;

&lt;p&gt;Together, the data, mappings, statistical transformations and geometric objects form a layer. Plot may have different layers. Layers are responsible for creating the objects that we expect on the plots.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to use ggplot2 in R?
&lt;/h2&gt;

&lt;p&gt;For this we need to have installed ggplot2 package in our IDE. Let us use ggplot in R builtin data &lt;code&gt;diamonds&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;library(ggplot2)
ggplot(diamonds, aes(carat,price)) + geom_point()

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-1-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-1-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;geom_point&lt;/code&gt; is used for scatter plot. From above figure, we can see that whenever diamond’s carat increases, prices also increases. We can not see how the data distibuted for this, let’s make some changes in our code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggplot(diamonds,aes(carat,price)) + geom_point() +
  scale_x_continuous() + scale_y_continuous()

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-2-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-2-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see better distribution of points than previous plot. We can clearly see that carat and price variable are not linearly distributed. To make it linearly distributable, lets make some changes in our code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggplot(diamonds, aes(carat,price)) + geom_point() +
  stat_smooth(method = lm) + scale_x_log10() + scale_y_log10()


## `geom_smooth()` using formula 'y ~ x'

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-3-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-3-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From above graph, relationship between price and carat variables is linear. If we try the code without &lt;code&gt;stat_smooth(method= lm)&lt;/code&gt; we can not see linear line in graph. Where lm means linear model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggplot(diamonds, aes(carat,price)) + geom_point() +
  scale_x_log10() + scale_y_log10()

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-4-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-4-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Lets make histogram of &lt;code&gt;diamonds&lt;/code&gt; data
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggplot(diamonds, aes(price)) + geom_histogram()


## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-5-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-5-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To build histogram, we use function &lt;code&gt;geom_histogram()&lt;/code&gt;. We should note that histogram is made on one dimensional data. If we want to add title of the plot we can do as,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggplot(diamonds, aes(price)) + geom_histogram() + ggtitle("ggplot2 Histogram")


## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-6-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-6-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Let us try some other ggplot2 features in R builtin data &lt;code&gt;mtcars&lt;/code&gt;
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point()

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-7-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-7-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The figure above shows scatterplot between&lt;code&gt;hwy&lt;/code&gt;and&lt;code&gt;displ&lt;/code&gt;variables of &lt;code&gt;mtcars&lt;/code&gt; data from figure we can see as the values of&lt;code&gt;hwy&lt;/code&gt;increases, values of&lt;code&gt;displ&lt;/code&gt;variable slightly decreases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let’s add &lt;code&gt;geam_smooth()&lt;/code&gt;: What will happen?
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth()


## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-8-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-8-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We see there is smooth line appearing on the middle of the data points.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding “wiggliness” in the smoothing plot
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth(span = 0.2) 


## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-9-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-9-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What changes can we see in above graph and previous graph. Let us again check by keeping &lt;code&gt;span = 1&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggplot(mpg, aes(displ, hwy)) + geom_point()+ geom_smooth(span = 1)


## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-10-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-10-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can make sense that, by default ggplot kept value of span 1. If we set&lt;code&gt;&lt;br&gt;
method= lm&lt;/code&gt; inside &lt;code&gt;geom_smooth()&lt;/code&gt; we can find stright smooth line. Let us try.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth(method = lm) 


## `geom_smooth()` using formula 'y ~ x'

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-11-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-11-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Let Modify our code little,
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth(method = lm, se= FALSE) 


## `geom_smooth()` using formula 'y ~ x'

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-12-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-12-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By &lt;code&gt;se= FALSE&lt;/code&gt; we added a smooth line.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fixed color
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggplot(mpg, aes(displ,hwy)) + geom_point(color = 'red')

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-13-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-13-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Changing color by variable attributes
&lt;/h2&gt;

&lt;p&gt;Lets change our color based on class.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggplot(mpg, aes(displ, hwy, colour = class)) + 
geom_point()

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-14-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-14-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here we gave colors according to variable’s name.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting multiple scatterplot of attributes
&lt;/h2&gt;

&lt;p&gt;We can get multiple scatterplot by using &lt;code&gt;facet_wrap()&lt;/code&gt; function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggplot(mpg, aes(displ, hwy)) + geom_point() + 
facet_wrap(~class)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-15-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-15-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In above figure, we found distribution of various variables along with &lt;code&gt;displ&lt;/code&gt;and &lt;code&gt;hwy&lt;/code&gt; variables.&lt;/p&gt;

&lt;h2&gt;
  
  
  Histogram
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggplot(mpg, aes(hwy)) + geom_histogram()


## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-16-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-16-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;hwy&lt;/code&gt; variable bins automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Changing bin size of the histogram
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggplot(mpg, aes(hwy)) + geom_histogram(binwidth = 2.5)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-17-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-17-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequency polygon
&lt;/h2&gt;

&lt;p&gt;A frequency polygon is a line graph of class frequency plotted against class midpoint. It can be obtained by joining the midpoints of the top of the rectangles in the histogram.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggplot(mpg, aes(hwy)) + geom_freqpoly()


## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-18-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-18-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Change Bin size of frequency Polygon
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggplot(mpg, aes(hwy)) + geom_freqpoly(binwidth= 1)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-19-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-19-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see the effect of binwidth from figure by comparing above figure with previous one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Histogram with faceting:
&lt;/h2&gt;

&lt;p&gt;We have already discussed about what a facet does in scatter plot. Similarly, in histogram it gives multiple subplots.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggplot(mpg, aes(displ, fill = drv)) + geom_histogram(binwidth = 0.5) + 
facet_wrap(~drv, ncol = 1)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-20-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-20-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Bar plot
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggplot(mpg, aes(manufacturer)) + geom_bar()

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-21-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-21-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can draw bar plot in &lt;code&gt;geom_bar()&lt;/code&gt; function. From bar plot we can see&lt;code&gt;dodge&lt;/code&gt; and &lt;code&gt;toyotao&lt;/code&gt; has maximum frequency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let’s Use alpha inside &lt;code&gt;geom_point()&lt;/code&gt;
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggplot(mpg, aes(cty, hwy)) + geom_point(alpha = 1 / 3)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-22-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-22-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Alpha refers to the opacity of a geom. Values of alpha range from 0 to 1, with lower values corresponding to more transparent colors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Modifying the axes
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggplot(mpg, aes(cty, hwy)) +geom_point(alpha = 1 / 3) + xlab("city driving (mpg)") + 
ylab("highway driving (mpg)")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-23-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-23-1.png"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggplot(mpg, aes(cty, hwy)) + geom_point(alpha = 1 / 3) + xlab(NULL) + 
 ylab(NULL)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-24-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fggplot%2Funnamed-chunk-24-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thats all for this part, thank you so much for reading.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>R Exercise: Monte Carlo Simulations in R</title>
      <dc:creator>Durga Pokharel</dc:creator>
      <pubDate>Fri, 29 Jul 2022 15:33:02 +0000</pubDate>
      <link>https://dev.to/iamdurga/r-exercise-monte-carlo-simulations-in-r-2nk6</link>
      <guid>https://dev.to/iamdurga/r-exercise-monte-carlo-simulations-in-r-2nk6</guid>
      <description>&lt;h1&gt;
  
  
  Monte Carlo Simulations
&lt;/h1&gt;

&lt;h2&gt;
  
  
  What is Monte Carlo Simulations?
&lt;/h2&gt;

&lt;p&gt;One of the main motivations to switch from spreadsheet-type tools (such as Microsoft Excel) to a program like R is for simulation modeling. R allows us to repeat the same (potentially complex and detailed) calculations with different random values over and over again.&lt;/p&gt;

&lt;p&gt;Within the same software, we can then summarize and plot the results of these replicated calculations. Monte Carlo methods are used to perform this type of analysis they randomly sample from a set of values in order to generate and summarize a distribution of some statistic related to the sampled quantities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Randomness
&lt;/h2&gt;

&lt;p&gt;Random processes are an important aspect of simulation modeling. A random process is one that produces a different result each time it is run according to some rules. They are inextricably tied to the concept of uncertainty, you have no idea what will happen the next time the process is run.&lt;/p&gt;

&lt;p&gt;There are two basic ways to introduce randomness in R&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Random deviattes&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resampling&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Random Deviates
&lt;/h2&gt;

&lt;p&gt;Each person alive at the start of the year has the option of living or dying at the conclusion of the year. There are two possible endings here, and each person has an 80% probability of surviving. survive is the outcome of a binomial random process in which there were n individuals alive at the start of this year and p is the probability that any one of them would live to the next year.&lt;/p&gt;

&lt;p&gt;In R, we can simulate a binomial random process with p=0.8 and n=100.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rbinom(n= 1, size =100,
       prob= 0.8)


## [1] 80

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this time I got 73, but we almost certainly get different number than this one.&lt;/p&gt;

&lt;h2&gt;
  
  
  With a little tinkering, we can also plot it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;survivors = rbinom(1000,
                   100, 0.8)
hist(survivors,
  col = "skyblue")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fmonte_carlo%2Funnamed-chunk-2-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fmonte_carlo%2Funnamed-chunk-2-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  We could also used other processes like log normal
&lt;/h2&gt;

&lt;p&gt;The log normal process is another random process. It creates random numbers using a log of the values that is regularly distributed, with a mean of log mean and a standard deviation of log sd.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;hist(rlnorm(1000,0,0.1),col="skyblue")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fmonte_carlo%2Funnamed-chunk-3-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fmonte_carlo%2Funnamed-chunk-3-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Need for sampling
&lt;/h2&gt;

&lt;p&gt;There are several situations in probability, and more broadly in machine learning, where an analytical solution cannot be calculated immediately. In Machine Learning, a problem of class imbalance exists. In fact, some would argue that for most practical probabilistic models, accurate inference is impossible.&lt;/p&gt;

&lt;p&gt;The desired calculation is usually a sum of discrete distributions or an integral of continuous distributions, and thus is computationally difficult. For a variety of reasons, such as the huge number of random variables, the stochastic nature of the domain, noise in the data, a shortage of observations, and more, the calculation may be intractable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resampling
&lt;/h2&gt;

&lt;p&gt;Using random deviates to generate fresh random numbers is excellent, but what if we already have a set of numbers to which we want to add randomness? We can utilize resampling techniques to do this. To sample size elements from the vector x in R, use the sample() function.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resampling of 1 to 10
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sample(x = 1:10, size =5)


## [1] 4 3 10 9 2

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Sample with replacement
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sample(x = c("a","b","c"), size = 10, replace = T)


## [1] "a" "a" "a" "b" "a" "b" "a" "c" "b" "b"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Sample with set probalilities
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sample(x = c("live","die"),size = 10, replace = T, prob = c(0.8,0.2))


## [1] "live" "live" "die" "die" "live" "live" "live" "die" "live" "live"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Reproducing Randomness
&lt;/h2&gt;

&lt;p&gt;We may want to receive the same precise random integers each time we run our script for reproducibility. To do so, we must first set the random seed, which is the starting point of our computer’s random number generator.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;set.seed(1234)
rnorm(1)


## [1] -1.207066

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s try without random seed&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rnorm(1)


## [1] 0.2774292

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each time we get different result.&lt;/p&gt;

&lt;h2&gt;
  
  
  Replication
&lt;/h2&gt;

&lt;p&gt;To use Monte Carlo methods, we need to be able to replicate some random process many times. There are two main ways this is commonly done, either with &lt;code&gt;replicate()&lt;/code&gt;or with &lt;code&gt;for()&lt;/code&gt;loops.&lt;/p&gt;

&lt;p&gt;The replicate() functions executes same expression many times and returns the output from each execution. Say we have a vector x, which represents 40 observations of an animal length(mm).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;x = rnorm(30, 500,40)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We want to create the mean length sampling distribution “by hand.” We can take a random sample, determine the mean, and then repeat the process as many times as necessary.&lt;/p&gt;

&lt;h2&gt;
  
  
  Replication with “for” loop
&lt;/h2&gt;

&lt;p&gt;A loop is a command in programming that repeats itself until it reaches a specified point. R has a few types of loops, repeat(), while(), and for(). for() loops are among the most common in simuation modeling.For each value in a vector, a for() loop performs an operation for the number of times you specify.&lt;/p&gt;

&lt;p&gt;For loop syntax&lt;/p&gt;

&lt;p&gt;for(var in seq){ expression(var) }&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for( i in 1:5){
print(i^2)
}


## [1] 1
## [1] 4
## [1] 9
## [1] 16
## [1] 25


nt = 100
N= NULL
N[1] = 1000
for(t in 2:nt){
  N[t] = (N[t-1]*1.1*rlnorm(1,0,0.1))*(1-0.08)
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s plot it&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plot(N, type= "l", pch = 15, xlab = "Year", ylab = "Abundance")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fmonte_carlo%2Funnamed-chunk-12-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fmonte_carlo%2Funnamed-chunk-12-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Summarization of simulation
&lt;/h2&gt;

&lt;p&gt;After replicating a calculation many times, we will need to summarize the results.&lt;/p&gt;

&lt;h2&gt;
  
  
  Simulating Based Learning
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mu = 500
sig = 30
random = rnorm(100,mu,sig)
p = seq(0.01, 0.99, 0.01)
random_q = quantile(random,p)
normal_q = qnorm(p,mu,sig)
plot(normal_q~random_q)
abline(c(0,1))

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fmonte_carlo%2Funnamed-chunk-13-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fmonte_carlo%2Funnamed-chunk-13-1.png"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;q = seq(400,600,10)
random_cdf = ecdf(random)
random_p =random_cdf(q)
normal_p = pnorm(q,mu,sig)
plot(normal_p~q, type= "l", col = "blue")
points(random_p~q,col = "red")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fmonte_carlo%2Funnamed-chunk-14-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fr_exercises%2Fmonte_carlo%2Funnamed-chunk-14-1.png"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>R Exercise: Validation and Cross Validation for Predictive Modeling R</title>
      <dc:creator>Durga Pokharel</dc:creator>
      <pubDate>Fri, 15 Jul 2022 15:49:00 +0000</pubDate>
      <link>https://dev.to/iamdurga/r-exercise-validation-and-cross-validation-for-predictive-modeling-r-gng</link>
      <guid>https://dev.to/iamdurga/r-exercise-validation-and-cross-validation-for-predictive-modeling-r-gng</guid>
      <description>&lt;h1&gt;
  
  
  Validation &amp;amp; Cross-validation for Predictive Modelling including Linear Model as well as Multi Linear Model
&lt;/h1&gt;

&lt;p&gt;Before starting topic, let’s be familier on some term.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Validation&lt;/strong&gt; : An act of confirming something as true or correct. Also, Validation is the process of establishing documentary evidence that a procedure, process, or activity was carried out in testing before being put into production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross_Validation&lt;/strong&gt; : Cross-validation, also known as rotation estimation or out-of-sample testing, is a set of model validation procedures for determining how well the results of a statistical investigation will generalize to new data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Linear Model&lt;/strong&gt; : The term “linear model” refers to a model that has a linear relationship between the target variable and the independent variable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi Linear Model&lt;/strong&gt; : A regression model that uses a straight line to evaluate the connection between a quantitative dependent variable and two or more independent variables is known as multiple linear regression.&lt;/p&gt;

&lt;p&gt;Here we will use R’s bulit in data &lt;code&gt;mtcars&lt;/code&gt; for coding purpose. At first let’s divided data into train set and test set in the ratio of 70% to 30%. While doing that task never forgot to use &lt;code&gt;seed()&lt;/code&gt; function.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;seed()&lt;/strong&gt;: The random number generator is initialized using the seed() method. To generate a random number, the random number generator requires a starting value (seed value). The random number generator defaults to using the current system time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#Define the mtcars data as “data”:
data &amp;lt;- mtcars
#Use random seed to replicate the result
set.seed(123)
#Do random sampling to divide the cases into two independent samples
ind &amp;lt;- sample(2, nrow(mtcars), replace = T, prob = c(0.7, 0.3))
#Data partition
train.data &amp;lt;- data[ind==1,]
test.data &amp;lt;- data[ind==2,]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We divided our data into training and testing set in the ratio of 70 % to 30%.&lt;/p&gt;

&lt;h1&gt;
  
  
  Let’s fit Linear Model
&lt;/h1&gt;

&lt;p&gt;Set mile per gallon(mpg) as dependent variable and weight(wt) as independent variable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;lmodel &amp;lt;- lm(mpg~wt, data = train.data, method = "lm")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s to model prediction.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pred &amp;lt;- predict(lmodel, data= test.data)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check value of R square and error value. To do at first we should load&lt;code&gt;library(caret)&lt;/code&gt; into our R studio.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;library(caret)


## Loading required package: ggplot2

## Loading required package: lattice


pred &amp;lt;- predict(lmodel, data= test.data)
R2 &amp;lt;- R2(pred, train.data$mpg)
R2


## [1] 0.7377021

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, we found value of R-square 73.77% that means 73.77% data fit the linear model. Let’s check for error,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;RMSE &amp;lt;- RMSE(pred, test.data$mpg)


## Warning in pred - obs: longer object length is not a multiple of shorter object
## length


RMSE


## [1] 8.786064

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hence error for the model is 12.6374.&lt;/p&gt;

&lt;h1&gt;
  
  
  Leave-One-Out Cross-Validation approach
&lt;/h1&gt;

&lt;p&gt;It’s usual practice when building a machine learning model to validate your methods by setting aside a subset of your data as a test set.&lt;/p&gt;

&lt;p&gt;LOOCV (leave-one-person-out cross validation) is a type of cross validation that uses each individual as a “test” set. It’s a form of k-fold cross validation in which the number of folds, k, equals the number of participants in the dataset.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;library(caret)
# Define training control
train.control &amp;lt;- trainControl(method = "LOOCV")


# Train the model
model1 &amp;lt;- train(mpg ~wt, data = mtcars, method = 
"lm",
trControl = train.control)
print(model1)


## Linear Regression 
## 
## 32 samples
## 1 predictor
## 
## No pre-processing
## Resampling: Leave-One-Out Cross-Validation 
## Summary of sample sizes: 31, 31, 31, 31, 31, 31, ... 
## Resampling results:
## 
## RMSE Rsquared MAE     
## 3.201673 0.7104641 2.517436
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE


pred1 &amp;lt;- predict(model1, test.data)
R2 &amp;lt;- R2(pred1, test.data$mpg)
R2


## [1] 0.7864736

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We receive a value of R square 78.46 percent when fitting the model using the leave-one-out strategy, which is higher than the linear regression model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;RMSE &amp;lt;- RMSE(pred1, test.data$mpg)
RMSE


## [1] 2.843768

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Error is only 2.44 which is very lower than previous one.&lt;/p&gt;

&lt;h1&gt;
  
  
  Let’s fit the model using K-folds Cross-Validation approach
&lt;/h1&gt;

&lt;p&gt;A K-fold CV is one in which a given data set is divided into K sections/folds, with each fold serving as a testing set at some point. Let’s look at a 10-fold cross validation case (K=10). The data set is divided into ten folds here. The first fold is used to test the model, while the others are used to train it in the first iteration. The second iteration uses the second fold as the testing set and the rest as the training set. This procedure is repeated until each of the ten folds has been utilized as a test set.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#k-fold cross validation
library(caret)
# Define training control
set.seed(123) 
train.control &amp;lt;- trainControl(method = "cv", number = 10)
# Train the model
model2 &amp;lt;- train(mpg ~ wt, data = train.data, method = 
"lm",
trControl = train.control)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Calculate value of R sqauere and error observed is it will come diffrerent from previous one.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;library(caret)
pred2 &amp;lt;- predict(model2, train.data)
R2 &amp;lt;- R2(pred2, train.data$mpg)
R2


## [1] 0.7377021

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This method gives the value of R square 73.77%. Which meand 73% data fitted by the model.&lt;/p&gt;

&lt;h1&gt;
  
  
  Fit the model using Repeated K-folds Cross-Validation approach
&lt;/h1&gt;

&lt;p&gt;Repeated k-fold cross-validation is a technique for improving a machine learning model’s predicted performance. Simply repeat the cross-validation technique several times and return the mean result across all folds from all runs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#repeated k-fold cross validation
library(caret)
# Define training control
set.seed(123)
train.control &amp;lt;- trainControl(method = "repeatedcv", 
number = 10, repeats = 3)
# Train the model
model &amp;lt;- train(mpg ~wt, data = mtcars, method = 
"lm",
trControl = train.control)
# Summarize the results
print(model)


## Linear Regression 
## 
## 32 samples
## 1 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 28, 28, 29, 29, 29, 30, ... 
## Resampling results:
## 
## RMSE Rsquared MAE     
## 2.975392 0.8351572 2.539797
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hence we get value of R- square 83.51% similarly value of RMSE 2.97.&lt;/p&gt;

&lt;h1&gt;
  
  
  Summary: Which one should be used based on R-squared values of “lm” model?
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;R-square for training set: 0.7013&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;R-square for training with LOOCV: 0.7104641&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;R-square for training with k-folds CV: 0.7346939&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;R-square for training with repeated k-folds CV: 0.8351572&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;R-square for testing set: 0.9031085&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;R-square for testing with LOOCV: 0.9031085&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;R-square for testing with k-folds CV: 0.9031085&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;R-square for testing with repeated k-folds CV: 0.9031085&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Which one should be used based on RMSE value?
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;RMSE for training set: 3.08648&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;RMSE for training with LOOCV&lt;br&gt;&lt;br&gt;
3.201673&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;RMSE for training with k-folds CV: 2.85133&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;RMSE for training with repeated k- folds CV: 2.975392&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;RMSE for testing test: 2.279303&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;RMSE for testing with LOOCV: 2.244232&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;RMSE for testing with k-folds CV: 2.244232&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;RMSE for testing with repeated k- folds CV: 2.244232&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Let’s Repeate same process for Multilinear Regression Model
&lt;/h1&gt;

&lt;p&gt;It is an extension of the simple linear regression. Multi linear regression have more than one (two or more) independent variables. Multi linear regression has one (1) continuous dependent variable. It is a supervised learning. All the assumptions of the simple linear regression are also applicable here. There is one more condition.&lt;/p&gt;

&lt;p&gt;Multicollinearity must not be present i.e. correlations between independent variables must not be “high”.&lt;/p&gt;

&lt;h1&gt;
  
  
  Fitting Multi Linear Regression Model
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mlr &amp;lt;- lm(mpg~., data = mtcars)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s check variance inflection factor of &lt;code&gt;mlr&lt;/code&gt;. The inflation factor is the difference between the variance of estimating a parameter in a model with many other factors and the variance of a model with only one term. which is avilable in car packages.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;library(car)


## Loading required package: carData


vif(mlr)


## cyl disp hp drat wt qsec vs am 
## 15.373833 21.620241 9.832037 3.374620 15.164887 7.527958 4.965873 4.648487 
## gear carb 
## 5.357452 7.908747

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We need to drop the independent variable with highest VIF and run the model again until all the VIF &amp;lt;10!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#Removing “disp” variable:
mlr1 &amp;lt;- lm(mpg ~ cyl+hp+drat+wt+qsec+vs+am+gear+carb, data = mtcars)
vif(mlr)


## cyl disp hp drat wt qsec vs am 
## 15.373833 21.620241 9.832037 3.374620 15.164887 7.527958 4.965873 4.648487 
## gear carb 
## 5.357452 7.908747


#Removing “cyl” variable:
mlr2 &amp;lt;- lm(mpg ~ 
hp+drat+wt+qsec+vs+am+gear+carb, data = mtcars)
summary(mlr1)


## 
## Call:
## lm(formula = mpg ~ cyl + hp + drat + wt + qsec + vs + am + gear + 
## carb, data = mtcars)
## 
## Residuals:
## Min 1Q Median 3Q Max 
## -3.7863 -1.4055 -0.2635 1.2029 4.4753 
## 
## Coefficients:
## Estimate Std. Error t value Pr(&amp;gt;|t|)  
## (Intercept) 12.55052 18.52585 0.677 0.5052  
## cyl 0.09627 0.99715 0.097 0.9240  
## hp -0.01295 0.01834 -0.706 0.4876  
## drat 0.92864 1.60794 0.578 0.5694  
## wt -2.62694 1.19800 -2.193 0.0392 *
## qsec 0.66523 0.69335 0.959 0.3478  
## vs 0.16035 2.07277 0.077 0.9390  
## am 2.47882 2.03513 1.218 0.2361  
## gear 0.74300 1.47360 0.504 0.6191  
## carb -0.61686 0.60566 -1.018 0.3195  
## ---
## Signif. codes: 0 ' ***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.623 on 22 degrees of freedom
## Multiple R-squared: 0.8655, Adjusted R-squared: 0.8105 
## F-statistic: 15.73 on 9 and 22 DF, p-value: 1.183e-07


vif(mlr2)


## hp drat wt qsec vs am gear carb 
## 6.015788 3.111501 6.051127 5.918682 4.270956 4.285815 4.690187 4.290468

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now all Vif less than 10 so, data is ready to fit different prediction model.&lt;/p&gt;

&lt;h1&gt;
  
  
  Leave-One-Out Cross-Validation approach on Multi Regression Model.
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#Leave one out CV
library(caret)
# Define training control
train.control &amp;lt;- trainControl(method = "LOOCV")
# Train the model
mlr &amp;lt;- train(mpg ~ hp+drat+wt+qsec+vs+am+gear+carb, data = mtcars, method = "lm",
trControl = train.control)
# Summarize 
summary(mlr)


## 
## Call:
## lm(formula = .outcome ~ ., data = dat)
## 
## Residuals:
## Min 1Q Median 3Q Max 
## -3.8187 -1.3903 -0.3045 1.2269 4.5183 
## 
## Coefficients:
## Estimate Std. Error t value Pr(&amp;gt;|t|)  
## (Intercept) 13.80810 12.88582 1.072 0.2950  
## hp -0.01225 0.01649 -0.743 0.4650  
## drat 0.88894 1.52061 0.585 0.5645  
## wt -2.60968 1.15878 -2.252 0.0342 *
## qsec 0.63983 0.62752 1.020 0.3185  
## vs 0.08786 1.88992 0.046 0.9633  
## am 2.42418 1.91227 1.268 0.2176  
## gear 0.69390 1.35294 0.513 0.6129  
## carb -0.61286 0.59109 -1.037 0.3106  
## ---
## Signif. codes: 0 ' ***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.566 on 23 degrees of freedom
## Multiple R-squared: 0.8655, Adjusted R-squared: 0.8187 
## F-statistic: 18.5 on 8 and 23 DF, p-value: 2.627e-08

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We got value of R square is 86.55% value of error is 2.566 on 23 degree of freedom.&lt;/p&gt;

&lt;h1&gt;
  
  
  Let’s fit the model using K-folds Cross-Validation approach on Multi Linear Regression Model.
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#K- folds Cross- Validation
library(caret)
# Define training control
train.control &amp;lt;- trainControl(method = "cv", number = 10)
# Train the model
mlr1&amp;lt;- train(mpg ~ hp+drat+wt+qsec+vs+am+gear+carb, data = mtcars, method = "lm",
trControl = train.control)
# Summarize 
summary(mlr1)


## 
## Call:
## lm(formula = .outcome ~ ., data = dat)
## 
## Residuals:
## Min 1Q Median 3Q Max 
## -3.8187 -1.3903 -0.3045 1.2269 4.5183 
## 
## Coefficients:
## Estimate Std. Error t value Pr(&amp;gt;|t|)  
## (Intercept) 13.80810 12.88582 1.072 0.2950  
## hp -0.01225 0.01649 -0.743 0.4650  
## drat 0.88894 1.52061 0.585 0.5645  
## wt -2.60968 1.15878 -2.252 0.0342 *
## qsec 0.63983 0.62752 1.020 0.3185  
## vs 0.08786 1.88992 0.046 0.9633  
## am 2.42418 1.91227 1.268 0.2176  
## gear 0.69390 1.35294 0.513 0.6129  
## carb -0.61286 0.59109 -1.037 0.3106  
## ---
## Signif. codes: 0 ' ***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.566 on 23 degrees of freedom
## Multiple R-squared: 0.8655, Adjusted R-squared: 0.8187 
## F-statistic: 18.5 on 8 and 23 DF, p-value: 2.627e-08

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Again, we got value of r square 86.55% similarly, value for the error is 2.566.&lt;/p&gt;

&lt;h1&gt;
  
  
  Fit the model using Repeated K-folds Cross-Validation approach
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;set.seed(224)
# Repeated K- folds Cross- Validation
library(caret)
# Define training control
train.control &amp;lt;- trainControl(method = "repeatedcv", 
number = 10, repeats = 3)
# Train the model
mlr2&amp;lt;- train(mpg ~ hp+drat+wt+qsec+vs+am+gear+carb, data = mtcars, method = "lm",
trControl = train.control)
# Summarize 
summary(mlr2)


## 
## Call:
## lm(formula = .outcome ~ ., data = dat)
## 
## Residuals:
## Min 1Q Median 3Q Max 
## -3.8187 -1.3903 -0.3045 1.2269 4.5183 
## 
## Coefficients:
## Estimate Std. Error t value Pr(&amp;gt;|t|)  
## (Intercept) 13.80810 12.88582 1.072 0.2950  
## hp -0.01225 0.01649 -0.743 0.4650  
## drat 0.88894 1.52061 0.585 0.5645  
## wt -2.60968 1.15878 -2.252 0.0342 *
## qsec 0.63983 0.62752 1.020 0.3185  
## vs 0.08786 1.88992 0.046 0.9633  
## am 2.42418 1.91227 1.268 0.2176  
## gear 0.69390 1.35294 0.513 0.6129  
## carb -0.61286 0.59109 -1.037 0.3106  
## ---
## Signif. codes: 0 ' ***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.566 on 23 degrees of freedom
## Multiple R-squared: 0.8655, Adjusted R-squared: 0.8187 
## F-statistic: 18.5 on 8 and 23 DF, p-value: 2.627e-08

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We got value for R square 86.55 % and value for error is 2.566.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;             Than you for Reading

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
    </item>
    <item>
      <title>R Exercise: Polynomial Regression Model in R</title>
      <dc:creator>Durga Pokharel</dc:creator>
      <pubDate>Fri, 15 Jul 2022 15:48:05 +0000</pubDate>
      <link>https://dev.to/iamdurga/r-exercise-polynomial-regression-model-in-r-3dfh</link>
      <guid>https://dev.to/iamdurga/r-exercise-polynomial-regression-model-in-r-3dfh</guid>
      <description>&lt;h1&gt;
  
  
  Polynomial Regression
&lt;/h1&gt;

&lt;p&gt;Curve fitting or curve-linear regression are additional words for the same thing. It is used when a scatterplot shows a non-linear relationship. It’s most typically employed with time series data, but it can be applied to a variety of other situations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let’s use the Nepal Covid data and fit a polynomial models on Covid deaths using R
&lt;/h2&gt;

&lt;p&gt;To do this first import excel file in R studio using &lt;code&gt;readxl&lt;/code&gt; library. Like below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;library(readxl)
data &amp;lt;- read_excel("F:/MDS-Private-Study-Materials/First Semester/Statistical Computing with R/Assignments/Data/covid_tbl_final.xlsx")
head(data)


## # A tibble: 6 x 14
## SN Date Confirmed_cases_~ Confirmed_cases~ `Confirmed _case~
## &amp;lt;dbl&amp;gt; &amp;lt;dttm&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;
## 1 1 2020-01-23 00:00:00 1 1 1
## 2 2 2020-01-24 00:00:00 1 0 1
## 3 3 2020-01-25 00:00:00 1 0 1
## 4 4 2020-01-26 00:00:00 1 0 1
## 5 5 2020-01-27 00:00:00 1 0 1
## 6 6 2020-01-28 00:00:00 1 0 1
## # ... with 9 more variables: Recoveries_total &amp;lt;dbl&amp;gt;, Recoveries_daily &amp;lt;dbl&amp;gt;,
## # Deaths_total &amp;lt;dbl&amp;gt;, Deaths_daily &amp;lt;dbl&amp;gt;, RT-PCR_tests_total &amp;lt;dbl&amp;gt;,
## # RT-PCR_tests_daily &amp;lt;dbl&amp;gt;, Test_positivity_rate &amp;lt;dbl&amp;gt;, Recovery_rate &amp;lt;dbl&amp;gt;,
## # Case_fatality_rate &amp;lt;dbl&amp;gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;head()&lt;/code&gt; function return top 6 rows of dataframe along with all columns.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;str(data)


## tibble [495 x 14] (S3: tbl_df/tbl/data.frame)
## $ SN : num [1:495] 1 2 3 4 5 6 7 8 9 10 ...
## $ Date : POSIXct[1:495], format: "2020-01-23" "2020-01-24" ...
## $ Confirmed_cases_total : num [1:495] 1 1 1 1 1 1 1 1 1 1 ...
## $ Confirmed_cases_new : num [1:495] 1 0 0 0 0 0 0 0 0 0 ...
## $ Confirmed _cases_active: num [1:495] 1 1 1 1 1 1 0 0 0 0 ...
## $ Recoveries_total : num [1:495] 0 0 0 0 0 0 1 1 1 1 ...
## $ Recoveries_daily : num [1:495] 0 0 0 0 0 0 1 0 0 0 ...
## $ Deaths_total : num [1:495] 0 0 0 0 0 0 0 0 0 0 ...
## $ Deaths_daily : num [1:495] 0 0 0 0 0 0 0 0 0 0 ...
## $ RT-PCR_tests_total : num [1:495] NA NA NA NA NA 3 4 5 5 NA ...
## $ RT-PCR_tests_daily : num [1:495] NA NA NA NA NA NA 1 1 0 NA ...
## $ Test_positivity_rate : num [1:495] NA NA NA NA NA ...
## $ Recovery_rate : num [1:495] 0 0 0 0 0 0 100 100 100 100 ...
## $ Case_fatality_rate : num [1:495] 0 0 0 0 0 0 0 0 0 0 ...

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The’str()’ method examines each column’s data type. The data type number for Confirmed cases total is the same as the data type number for the other columns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let us plot the daily deaths by date and see what is causing the problem
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plot(data$Date,data$Deaths_daily, main= "Daily Deaths:23jan 2020-31 may 2021 ",xlab = "Date",
  ylab = "Daily Deaths" )

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EKzx_FFD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://iamdurga.github.io/assets/r_exercises/poly/unnamed-chunk-3-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--EKzx_FFD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://iamdurga.github.io/assets/r_exercises/poly/unnamed-chunk-3-1.png" alt="" width="672" height="480"&gt;&lt;/a&gt;The problem is associated with the three outliers (all the missed deaths a priori added to the data on those 3 days!)&lt;/p&gt;

&lt;h2&gt;
  
  
  Let us plot the cumulative deaths again before these outliers i.e. till 23 Feb 2021
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plot.data &amp;lt;- data[data$SN &amp;lt;= 398,]
plot(plot.data$Date, plot.data$Deaths_total,
     main= "Daily Covid Deaths,Nepal:23 jan-23 feb2021",
     xlab= "Date",
     ylab= "Daily Deaths")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--rOVRPelg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://iamdurga.github.io/assets/r_exercises/poly/unnamed-chunk-4-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--rOVRPelg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://iamdurga.github.io/assets/r_exercises/poly/unnamed-chunk-4-1.png" alt="" width="672" height="480"&gt;&lt;/a&gt;As a result, we eliminate outliers. Our data is now ready to be fitted into a model. Let’s divide our model into a train set and a test set in the proportions of 70% to 30%.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;set.seed(132)
ind &amp;lt;- sample(2, nrow(plot.data), replace = T, prob = c(0.7,0.3))
train_data &amp;lt;- plot.data[ind==1,]
test_data &amp;lt;- plot.data[ind==2,]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;seed()&lt;/code&gt; function in R is used to reproduce results i.e. it produces the same sample again and again. When we generate randoms numbers without set. &lt;code&gt;seed()&lt;/code&gt; function it will produce different samples at different time of execution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let us fit a linear model in the filtered data (plot.data) using SN as time variable
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;library(caret)


## Warning: package 'caret' was built under R version 4.1.2

## Loading required package: ggplot2

## Loading required package: lattice


lm1 &amp;lt;- lm(plot.data$Deaths_total~plot.data$SN, 
         data= train_data)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using the caret package, we fit a linear model to the covid data. Let’s make a prediction based on the test data.&lt;/p&gt;

&lt;p&gt;Before calculating the linear model summary, it is necessary to master some concepts in order to comprehend the summary.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Coefficent of Determination&lt;/code&gt; :&lt;/p&gt;

&lt;p&gt;The coefficient of determination is a statistical measurement that examines how differences in one variable can be explained by the difference in a second variable. Higher the value of R square better will be the model.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Residual Standard Error&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;The residual standard error is used to measure how well a regression model fits a dataset. Lower the value of residual standard error better will be the model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;summary(lm1)


## 
## Call:
## lm(formula = plot.data$Deaths_total ~ plot.data$SN, data = train_data)
## 
## Residuals:
## Min 1Q Median 3Q Max 
## -537.91 -344.76 22.38 351.50 582.90 
## 
## Coefficients:
## Estimate Std. Error t value Pr(&amp;gt;|t|)    
## (Intercept) -588.8326 35.1575 -16.75 &amp;lt;2e-16 ***
## plot.data$SN 5.9315 0.1527 38.84 &amp;lt;2e-16 ***
## ---
## Signif. codes: 0 ' ***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 350 on 396 degrees of freedom
## Multiple R-squared: 0.7921, Adjusted R-squared: 0.7916 
## F-statistic: 1509 on 1 and 396 DF, p-value: &amp;lt; 2.2e-16

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When we fit a linear model, we get an R2 of 79.21%, which suggests that only 79.21% of the variance is explained by independent factors in relation to dependent variables. On 396 degrees of freedom, the value of residual standard error is 350.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let’s plot the linear model
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plot(plot.data$SN, plot.data$Deaths_total, data= plot.data,
     main= "Daily Covid Deaths,Nepal:23 jan-23 feb2021",
     xlab= "Date",
     ylab= "Daily Deaths")
abline(lm(plot.data$Deaths_total~plot.data$SN,data= plot.data), col="red",lwd=2)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ya1Oauwq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://iamdurga.github.io/assets/r_exercises/poly/unnamed-chunk-8-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ya1Oauwq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://iamdurga.github.io/assets/r_exercises/poly/unnamed-chunk-8-1.png" alt="" width="672" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Let us fit a quadratic linear model in the filtered data
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;qlm &amp;lt;- lm(plot.data$Deaths_total~ poly(plot.data$SN,2), data= train_data)
summary(qlm)


## 
## Call:
## lm(formula = plot.data$Deaths_total ~ poly(plot.data$SN, 2), 
## data = train_data)
## 
## Residuals:
## Min 1Q Median 3Q Max 
## -422.04 -110.87 8.94 81.97 282.94 
## 
## Coefficients:
## Estimate Std. Error t value Pr(&amp;gt;|t|)    
## (Intercept) 594.495 6.763 87.90 &amp;lt;2e-16 ***
## poly(plot.data$SN, 2)1 13595.485 134.928 100.76 &amp;lt;2e-16 ***
## poly(plot.data$SN, 2)2 6428.710 134.928 47.65 &amp;lt;2e-16 ***
## ---
## Signif. codes: 0 ' ***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 134.9 on 395 degrees of freedom
## Multiple R-squared: 0.9692, Adjusted R-squared: 0.969 
## F-statistic: 6211 on 2 and 395 DF, p-value: &amp;lt; 2.2e-16

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The value of R2 96.92 percent was obtained in this case. In terms of dependent variables, independent factors account for 96.92 percent of variability. Similarly, the residual standard error on 395 degrees of freedom is 134.9. In comparison to the linear model, we can see that the R2 value is increasing and the error is decreasing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let’s plot the quardatic model
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plot(plot.data$SN, plot.data$Deaths_total, data= plot.data,
     main= "Daily Covid Deaths,Nepal:23 jan-23 feb2021",
     xlab= "Date",
     ylab= "Daily Deaths")


## Warning in plot.window(...): "data" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "data" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "data" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "data" is not a
## graphical parameter

## Warning in box(...): "data" is not a graphical parameter

## Warning in title(...): "data" is not a graphical parameter


lines(fitted(qlm)~SN, data=plot.data, col= "red",lwd=2)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RBllzK6U--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://iamdurga.github.io/assets/r_exercises/poly/unnamed-chunk-10-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RBllzK6U--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://iamdurga.github.io/assets/r_exercises/poly/unnamed-chunk-10-1.png" alt="" width="672" height="480"&gt;&lt;/a&gt;Quardatic model fited data more welly than linear model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let’s Fit Cubic Model
&lt;/h2&gt;

&lt;p&gt;We fit the cubic model in the following way.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;clm &amp;lt;- lm(plot.data$Deaths_total~poly(SN,3), data= plot.data)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s calculate the summary of cubic model and observed what changes came,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;summary(clm)


## 
## Call:
## lm(formula = plot.data$Deaths_total ~ poly(SN, 3), data = plot.data)
## 
## Residuals:
## Min 1Q Median 3Q Max 
## -369.58 -123.49 12.82 99.36 267.65 
## 
## Coefficients:
## Estimate Std. Error t value Pr(&amp;gt;|t|)    
## (Intercept) 594.495 6.696 88.789 &amp;lt; 2e-16 ***
## poly(SN, 3)1 13595.485 133.576 101.781 &amp;lt; 2e-16 ***
## poly(SN, 3)2 6428.710 133.576 48.128 &amp;lt; 2e-16 ***
## poly(SN, 3)3 -401.539 133.576 -3.006 0.00282 ** 
## ---
## Signif. codes: 0 ' ***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 133.6 on 394 degrees of freedom
## Multiple R-squared: 0.9699, Adjusted R-squared: 0.9696 
## F-statistic: 4228 on 3 and 394 DF, p-value: &amp;lt; 2.2e-16

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The R-square value is 96.99 percent, and the residual standard error is 133.6. When we compare the prior model to this one, we can immediately see the differences.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let’s Plot the Cubic Model
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plot(plot.data$SN, plot.data$Deaths_total, data= plot.data,
     main= "Daily Covid Deaths,Nepal:23 jan-23 feb2021",
     xlab= "Date",
     ylab= "Daily Deaths")


## Warning in plot.window(...): "data" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "data" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "data" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "data" is not a
## graphical parameter

## Warning in box(...): "data" is not a graphical parameter

## Warning in title(...): "data" is not a graphical parameter


lines(fitted(clm)~plot.data$SN,data = plot.data, col= "red",lwd= 2)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0juCSMP2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://iamdurga.github.io/assets/r_exercises/poly/unnamed-chunk-13-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0juCSMP2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://iamdurga.github.io/assets/r_exercises/poly/unnamed-chunk-13-1.png" alt="" width="672" height="480"&gt;&lt;/a&gt;From figure we can see that predicted model and actual model are more closure than in case of quardatic model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let’s Fit Double Quardatic Model
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dlm &amp;lt;- lm(plot.data$Deaths_total~poly(plot.data$SN,4))

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s calculate the summary of it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;summary(dlm)


## 
## Call:
## lm(formula = plot.data$Deaths_total ~ poly(plot.data$SN, 4))
## 
## Residuals:
## Min 1Q Median 3Q Max 
## -105.44 -53.22 -12.50 53.61 159.13 
## 
## Coefficients:
## Estimate Std. Error t value Pr(&amp;gt;|t|)    
## (Intercept) 594.50 3.13 189.92 &amp;lt; 2e-16 ***
## poly(plot.data$SN, 4)1 13595.49 62.45 217.71 &amp;lt; 2e-16 ***
## poly(plot.data$SN, 4)2 6428.71 62.45 102.94 &amp;lt; 2e-16 ***
## poly(plot.data$SN, 4)3 -401.54 62.45 -6.43 3.71e-10 ***
## poly(plot.data$SN, 4)4 -2344.63 62.45 -37.55 &amp;lt; 2e-16 ***
## ---
## Signif. codes: 0 ' ***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 62.45 on 393 degrees of freedom
## Multiple R-squared: 0.9934, Adjusted R-squared: 0.9934 
## F-statistic: 1.486e+04 on 4 and 393 DF, p-value: &amp;lt; 2.2e-16

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this scenario, the independent variables have a 99.34 percent variability with respect to the dependent variable. In addition, the residual standard error is 62.45, which is half of the cubic model’s.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let’s fit the Double Quardatic Model
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plot(plot.data$SN, plot.data$Deaths_total, data= plot.data,
     main= "Daily Covid Deaths,Nepal:23 jan-23 feb2021",
     xlab= "Date",
     ylab= "Daily Deaths")


## Warning in plot.window(...): "data" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "data" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "data" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "data" is not a
## graphical parameter

## Warning in box(...): "data" is not a graphical parameter

## Warning in title(...): "data" is not a graphical parameter


lines(fitted(dlm)~plot.data$SN,data = plot.data, col= "red",lwd= 2)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RUuSYWd_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://iamdurga.github.io/assets/r_exercises/poly/unnamed-chunk-16-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RUuSYWd_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://iamdurga.github.io/assets/r_exercises/poly/unnamed-chunk-16-1.png" alt="" width="672" height="480"&gt;&lt;/a&gt;Here both the model are near to overlap&lt;/p&gt;

&lt;h2&gt;
  
  
  Let’s Plot Fifth Order Ploynomial
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flm &amp;lt;- lm(plot.data$Deaths_total~poly(plot.data$SN,5),data= plot.data)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s calculate the summary of flm to see the value of R square and residual standard error.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;summary(flm)


## 
## Call:
## lm(formula = plot.data$Deaths_total ~ poly(plot.data$SN, 5), 
## data = plot.data)
## 
## Residuals:
## Min 1Q Median 3Q Max 
## -77.300 -16.980 -3.571 19.199 140.089 
## 
## Coefficients:
## Estimate Std. Error t value Pr(&amp;gt;|t|)    
## (Intercept) 594.495 1.716 346.36 &amp;lt;2e-16 ***
## poly(plot.data$SN, 5)1 13595.485 34.242 397.04 &amp;lt;2e-16 ***
## poly(plot.data$SN, 5)2 6428.710 34.242 187.74 &amp;lt;2e-16 ***
## poly(plot.data$SN, 5)3 -401.539 34.242 -11.73 &amp;lt;2e-16 ***
## poly(plot.data$SN, 5)4 -2344.634 34.242 -68.47 &amp;lt;2e-16 ***
## poly(plot.data$SN, 5)5 -1035.863 34.242 -30.25 &amp;lt;2e-16 ***
## ---
## Signif. codes: 0 ' ***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 34.24 on 392 degrees of freedom
## Multiple R-squared: 0.998, Adjusted R-squared: 0.998 
## F-statistic: 3.973e+04 on 5 and 392 DF, p-value: &amp;lt; 2.2e-16

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this case, the residual error is approximately half that of the double quardatic model, and the R square is 99.98 percent. Our model performs better than the previous one since we used a higher order ploynomial. As a result, higher order polynomial models are preferred since they reduce error and improve model accuracy.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>R Exercise: Different Hypothesis Testing in R</title>
      <dc:creator>Durga Pokharel</dc:creator>
      <pubDate>Fri, 15 Jul 2022 15:46:33 +0000</pubDate>
      <link>https://dev.to/iamdurga/r-exercise-different-hypothesis-testing-in-r-31nh</link>
      <guid>https://dev.to/iamdurga/r-exercise-different-hypothesis-testing-in-r-31nh</guid>
      <description>&lt;h2&gt;
  
  
  What is Hypothesis Testing
&lt;/h2&gt;

&lt;p&gt;It is a type of inferential statistics that involves extrapolating results from a sample (random) to the entire population. It’s used to make decisions based on statistical tests and models that use the p-value, also known as the Type I error or alpha error.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Type I Error&lt;/strong&gt; : When we reject true null hypothesis then it is called&lt;code&gt;type I error&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Type II Error&lt;/strong&gt; : When we do not reject false null hypothesis then it is called type II error.&lt;/p&gt;

&lt;p&gt;It can be done using parametric or non-parametric methods/models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Parametric&lt;/strong&gt; : They have certain assumptions about the data (model) and/or errors that must be validated before the results can be accepted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Non-parametric&lt;/strong&gt; : They are non-parametric because they make no assumptions about the data distribution (model) or mistakes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why to use parametric test?
&lt;/h2&gt;

&lt;p&gt;Because they are based on the mean, standard deviation, and normal distribution, parametric tests are regarded “more powerful” than non parametric tests/models. Non-parametric tests are based on median, IQR, and non-normal distributions, non-parametric tests are deemed “less powerful” than parametric tests/models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Statistical Hypothesis
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;Null Hypothesis&lt;/code&gt; : It is also known as hypothesis of no difference&lt;code&gt;Alternative Hypothesis&lt;/code&gt; : It is complementary to the null hypothesis also known as research hypotheis.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to accept Null or Alternative Hypothesis
&lt;/h2&gt;

&lt;p&gt;Accept (fail to reject) null hypothesis from parametric or non-parametric tests requires a P-value &amp;gt; 0.05. (Goodness-of-fit tests)&lt;/p&gt;

&lt;p&gt;To accept it from parametric or non-parametric testing (Research hypothesis tests! ), the P-value must be less than 0.05.&lt;/p&gt;

&lt;h1&gt;
  
  
  Some Commonly Used Parametric Test Using R
&lt;/h1&gt;

&lt;h2&gt;
  
  
  One Sample Z Test On Mtcars Data
&lt;/h2&gt;

&lt;p&gt;In this blog I am only going to explain how to test one sample z test using R without explain what is z-test, how it work because I already explained it in my past blog.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# we need to define parameter
muO &amp;lt;- 20
sigma &amp;lt;- 6
xbar &amp;lt;- mean(mtcars$mpg)
n &amp;lt;- length(mtcars$mpg)
z &amp;lt;-sqrt(n)*(xbar-muO)/sigma
p_value&amp;lt;-2*pnorm(-abs(z))

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s check z value and p value,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;z


## [1] 0.08544207

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hence, we found value of z is 0.08544207&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;p_value


## [1] 0.9319099

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We found p-vale 0.9319099 which is &amp;gt; 0.05 hence we accpet null hypothesis, i.e means of sample and population are equal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why there is no one sample z-test in base R package?
&lt;/h2&gt;

&lt;p&gt;Because the t-distribution behaves like the z-distribution for n&amp;gt;=30, the T-test can be employed for both small and big samples. Thus, we don’t need one-sample z-test in R!&lt;/p&gt;

&lt;h1&gt;
  
  
  One Sample t-test: We can work for small sample as well as for large sample
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;t.test(mtcars$mpg, mu =20)


## 
## One Sample t-test
## 
## data: mtcars$mpg
## t = 0.08506, df = 31, p-value = 0.9328
## alternative hypothesis: true mean is not equal to 20
## 95 percent confidence interval:
## 17.91768 22.26357
## sample estimates:
## mean of x 
## 20.09062

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hence we obtained p-valued 0.9328 it means we do not reject null hypothesis.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Sample T-test
&lt;/h2&gt;

&lt;p&gt;It is used to compare the means of a dependent variable with two categories of grouped independent variables. For instance, we can compare exam score (dependent variable) between male and female groups of students!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Assumptions&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;For each category, the dependent variable must follow the normal distribution (Test of normality-GOF)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The variance is homogeneous (i.e. equal) across independent variable categories (Test of equal variance-GOF)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What to do if variance across independent variable categories not equal
&lt;/h2&gt;

&lt;p&gt;In this case we used Welch test.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Assumption&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;For each category, the dependent variable must follow the normal distribution (Test of normality-GOF)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Variance across independent variable categories are not homogenous i.e; not equal.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Let’s do narmality test on mtcars data
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;with(mtcars, shapiro.test(mpg[am == 0]))


## 
## Shapiro-Wilk normality test
## 
## data: mpg[am == 0]
## W = 0.97677, p-value = 0.8987

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, p-value is 0.8987. Hence, we do not reject null hypothesis that means it follows normal distribution.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;with(mtcars, shapiro.test(mpg[am == 1]))


## 
## Shapiro-Wilk normality test
## 
## data: mpg[am == 1]
## W = 0.9458, p-value = 0.5363

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It also follows normal distribution. Hence first condition is satisfied i.e; dependent variable mpg follows normal distribution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Variance Check
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;var.test(mpg ~ am, data = mtcars)


## 
## F test to compare two variances
## 
## data: mpg by am
## F = 0.38656, num df = 18, denom df = 12, p-value = 0.06691
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.1243721 1.0703429
## sample estimates:
## ratio of variances 
## 0.3865615

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can see p-value is 0.06691 which is grater than 0.05. Hence we can say variance across independent variable categories are same. Now we can use two sample student t test.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;t.test(mpg ~ am, var.equal= T, data = mtcars)


## 
## Two Sample t-test
## 
## data: mpg by am
## t = -4.1061, df = 30, p-value = 0.000285
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -10.84837 -3.64151
## sample estimates:
## mean in group 0 mean in group 1 
## 17.14737 24.39231

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here we saw p-value 0.000285 which is less then 0.05. Hence we reject ho that means milage (mpg) is statistically different among cars with automatic and manual transmission system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let’s check two sample student t-test result with simple linear regression model
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;summary(lm(mpg ~ am, data = mtcars))


## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
## Min 1Q Median 3Q Max 
## -9.3923 -3.0923 -0.2974 3.2439 9.5077 
## 
## Coefficients:
## Estimate Std. Error t value Pr(&amp;gt;|t|)    
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## am 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 ' ***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385 
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This difference is statistically significant and the p-value is same as given by the two-samples t-test.&lt;/p&gt;

&lt;h1&gt;
  
  
  What test should we used if we have to compare mean of more than two samples
&lt;/h1&gt;

&lt;p&gt;If we need to compare mean of more than two samples we used 1-way ANOVA test.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Assumption&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Dependent variable must be “normally distributed”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Variance across categories must be same&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  1-way ANOVA assumptions checks
&lt;/h1&gt;

&lt;p&gt;Normality by categories&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;with(mtcars, shapiro.test(mpg[gear == 3]))


## 
## Shapiro-Wilk normality test
## 
## data: mpg[gear == 3]
## W = 0.95833, p-value = 0.6634

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Category 3 follows normal distribution.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;with(mtcars, shapiro.test(mpg[gear == 4]))


## 
## Shapiro-Wilk normality test
## 
## data: mpg[gear == 4]
## W = 0.90908, p-value = 0.2076

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Category 4 also follows normal distribution.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;with(mtcars, shapiro.test(mpg[gear == 5]))


## 
## Shapiro-Wilk normality test
## 
## data: mpg[gear == 5]
## W = 0.90897, p-value = 0.4614

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, dependent variable follows normal distribution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let’s do variance test
&lt;/h2&gt;

&lt;p&gt;In case of more than two samples case we do not use&lt;code&gt;var.test()&lt;/code&gt;. For this we used&lt;code&gt;leveneTest()&lt;/code&gt; avilable in car packages. Let’s check. Before doing this we need to change our independent variable into factor.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;library(car)


## Loading required package: carData


leveneTest(mpg ~ as.factor(gear), data=mtcars)


## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(&amp;gt;F)
## group 2 1.4886 0.2424
## 29

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, we find p-value grater than 0.2424. Hence variance across categories is same. So, we can now used one way calssical ANOVA test.&lt;/p&gt;

&lt;h2&gt;
  
  
  1-Way Classical ANOVA test
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;summary(aov(mpg ~ gear, data = mtcars))


## Df Sum Sq Mean Sq F value Pr(&amp;gt;F)   
## gear 1 259.7 259.75 8.995 0.0054 **
## Residuals 30 866.3 28.88                  
## ---
## Signif. codes: 0 ' ***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We find p-value less than 0.05. Hence we reject null hypothesis that means sample means are not equal. This means, post-hoc test or pairwise comparison is required. If alternative hypothesis is accepted we need to do post-hoc test. For classical 1-way ANOVA TukeyHSD post-hoc test is best. Let’s used it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TukeyHSD(aov(mpg ~ as.factor(gear), data = mtcars))


## Tukey multiple comparisons of means
## 95% family-wise confidence level
## 
## Fit: aov(formula = mpg ~ as.factor(gear), data = mtcars)
## 
## $`as.factor(gear)`
## diff lwr upr p adj
## 4-3 8.426667 3.9234704 12.929863 0.0002088
## 5-3 5.273333 -0.7309284 11.277595 0.0937176
## 5-4 -3.153333 -9.3423846 3.035718 0.4295874

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Let’s check this result with simple linear model
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;summary(lm(mpg ~ gear, data = mtcars))


## 
## Call:
## lm(formula = mpg ~ gear, data = mtcars)
## 
## Residuals:
## Min 1Q Median 3Q Max 
## -10.240 -2.793 -0.205 2.126 12.583 
## 
## Coefficients:
## Estimate Std. Error t value Pr(&amp;gt;|t|)   
## (Intercept) 5.623 4.916 1.144 0.2618   
## gear 3.923 1.308 2.999 0.0054 **
## ---
## Signif. codes: 0 ' ***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.374 on 30 degrees of freedom
## Multiple R-squared: 0.2307, Adjusted R-squared: 0.205 
## F-statistic: 8.995 on 1 and 30 DF, p-value: 0.005401


pairwise.t.test(mtcars$mpg, mtcars$gear, p.adj= "none")


## 
## Pairwise comparisons using t tests with pooled SD 
## 
## data: mtcars$mpg and mtcars$gear 
## 
## 3 4    
## 4 7.3e-05 -    
## 5 0.038 0.218
## 
## P value adjustment method: none

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;gear = 3 category is omitted from the result because R automatically creates 3 dummy variables for 3 categories of gear variable i.e. 3, 4 and 5 and uses only last two of them in the model and takes the first one as reference.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Making a Stack Data Type in Python</title>
      <dc:creator>Durga Pokharel</dc:creator>
      <pubDate>Thu, 14 Jul 2022 07:22:59 +0000</pubDate>
      <link>https://dev.to/iamdurga/making-a-stack-data-type-in-python-2lci</link>
      <guid>https://dev.to/iamdurga/making-a-stack-data-type-in-python-2lci</guid>
      <description>&lt;h1&gt;
  
  
  Making a Stack Data Type in Python
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Stack is one of the primitive data structure that we always have to study before diving into the Data Structure and Analysis. It is an example of ADT (Abstract Data Type) &lt;a href="https://en.wikipedia.org/wiki/Abstract_data_type" rel="noopener noreferrer"&gt;where operations are predefined&lt;/a&gt;. There are some other types of ADTs also like Queue, List etc.&lt;/p&gt;

&lt;h2&gt;
  
  
  Operations
&lt;/h2&gt;

&lt;p&gt;For any Data Type, most common operations includes inserting a data, removing a data and retrieving a data. For a simple stack there are 3 types of operations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Push&lt;/strong&gt; : To insert a data on the top of a stack.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pop&lt;/strong&gt; : To remove a data from the top of a stack.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Top&lt;/strong&gt; : To retrieve a top element data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Stack operates in a LIFO (Last in First Out) way. Which means that at anytime, the pointer will be on the top of a stack and we are only allowed to operation with a data which is pointed by a pointer. If a stack is going to use an array then its size will be fixed but if we are going to use a list, then the size will variable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Push Operation
&lt;/h3&gt;

&lt;p&gt;Lets assume that we are using an array for the stack with the size of 6 with data, &lt;code&gt;[4 5 6 8 5 2]&lt;/code&gt;. Initially, a stack will be empty and the pointer will be pointing the top of stack’s data which is bottom. Now in first step, to push a data, we will push 4 in the 0th position. Then pointer moves one step upwards. On the next step, next data is inserted on the 1st position. And at last, our stack will be full.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fdsa%2Finsert_stack.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2Fdsa%2Finsert_stack.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Lets write it on python.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class Stack:
    def __init__ (self, size):
        self.size = size
        self.storage = ["~"]*size
        self.pointer = 0
        print(f"New Stack: {self.storage}")
        print(f"Pointer at {self.pointer}.\n")

    def push(self, x):
        self.storage[self.pointer] = x
        print(f"New Stack: {self.storage}")
        print(f"Pointer at {self.pointer+1}.\n")

        self.pointer+=1




data = [4, 5, 6, 8, 5, 2]
stack = Stack(size=6)

for x in data:
    stack.push(x)


New Stack: ['~', '~', '~', '~', '~', '~']
Pointer at 0.

New Stack: [4, '~', '~', '~', '~', '~']
Pointer at 1.

New Stack: [4, 5, '~', '~', '~', '~']
Pointer at 2.

New Stack: [4, 5, 6, '~', '~', '~']
Pointer at 3.

New Stack: [4, 5, 6, 8, '~', '~']
Pointer at 4.

New Stack: [4, 5, 6, 8, 5, '~']
Pointer at 5.

New Stack: [4, 5, 6, 8, 5, 2]
Pointer at 6.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we have inserted the data into our stack but what if we inserted more data than the size of a stack? In that case, a overflow will happen and in above example, we have used a list as an storage and it will also show error if we tried to insert something.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;stack.push(0)


---------------------------------------------------------------------------

IndexError Traceback (most recent call last)

&amp;lt;ipython-input-23-c549502547a3&amp;gt; in &amp;lt;module&amp;gt;
----&amp;gt; 1 stack.push(0)

&amp;lt;ipython-input-21-dfbe390820f6&amp;gt; in push(self, x)
      8 
      9 def push(self, x):
---&amp;gt; 10 self.storage[self.pointer] = x
     11 print(f"New Stack: {self.storage}")
     12 print(f"Pointer at {self.pointer+1}.\n")

IndexError: list assignment index out of range

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which is expected but we must make error looking little bit different in our case.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class Stack:
    def __init__ (self, size):
        self.size = size
        self.storage = ["~"]*size
        self.pointer = 0
        print(f"New Stack: {self.storage}")
        print(f"Pointer at {self.pointer}.\n")

    def push(self, x):
        try:
            self.storage[self.pointer] = x
            print(f"New Stack: {self.storage}")
            print(f"Pointer at {self.pointer+1}.\n")

            self.pointer+=1
        except IndexError as e:
            print(f"Overflow Occured at pointer: {self.pointer}")



data = [4, 5, 6, 8, 5, 2, 0]
stack = Stack(size=6)

for x in data:
    stack.push(x)


New Stack: ['~', '~', '~', '~', '~', '~']
Pointer at 0.

New Stack: [4, '~', '~', '~', '~', '~']
Pointer at 1.

New Stack: [4, 5, '~', '~', '~', '~']
Pointer at 2.

New Stack: [4, 5, 6, '~', '~', '~']
Pointer at 3.

New Stack: [4, 5, 6, 8, '~', '~']
Pointer at 4.

New Stack: [4, 5, 6, 8, 5, '~']
Pointer at 5.

New Stack: [4, 5, 6, 8, 5, 2]
Pointer at 6.

Overflow Occured at pointer: 6

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pop Operation
&lt;/h3&gt;

&lt;p&gt;Now we have our stack fully filled, lets remove values from it. The pop operation again works with data located on pointer’s position.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class Stack:
    def __init__ (self, size):
        self.size = size
        self.storage = ["~"]*size
        self.pointer = 0
        print(f"New Stack: {self.storage}")
        print(f"Pointer at {self.pointer}.\n")

    def push(self, x):
        try:
            self.storage[self.pointer] = x
            print(f"New Stack: {self.storage}")
            print(f"Pointer at {self.pointer+1}.\n")

            self.pointer+=1
        except IndexError as e:
            print(f"Overflow Occured at pointer: {self.pointer} \n")

    def pop(self):
        self.storage = self.storage[:-1]

        print(f"New Stack: {self.storage}")
        print(f"Pointer at {self.pointer-1}.\n")

        self.pointer-=1


data = [4, 5, 6, 8, 5, 2, 0]
stack = Stack(size=6)

for x in data:
    stack.push(x)
stack.pop()


New Stack: ['~', '~', '~', '~', '~', '~']
Pointer at 0.

New Stack: [4, '~', '~', '~', '~', '~']
Pointer at 1.

New Stack: [4, 5, '~', '~', '~', '~']
Pointer at 2.

New Stack: [4, 5, 6, '~', '~', '~']
Pointer at 3.

New Stack: [4, 5, 6, 8, '~', '~']
Pointer at 4.

New Stack: [4, 5, 6, 8, 5, '~']
Pointer at 5.

New Stack: [4, 5, 6, 8, 5, 2]
Pointer at 6.

Overflow Occured at pointer: 6 

New Stack: [4, 5, 6, 8, 5]
Pointer at 5.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now that we have made a pop operation, what happens if our stack is empty and we tried to remove a data? That case is a stack underflow. Lets write that also.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for i in range(len(data)):
    stack.pop()


New Stack: [4, 5, 6, 8]
Pointer at 4.

New Stack: [4, 5, 6]
Pointer at 3.

New Stack: [4, 5]
Pointer at 2.

New Stack: [4]
Pointer at 1.

New Stack: []
Pointer at 0.

New Stack: []
Pointer at -1.

New Stack: []
Pointer at -2.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In above code, no error is shown when pointer becomes -ve. Lets assume that if pointer is already 0 and we tried to remove, then it should be an error.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class Stack:
    def __init__ (self, size):
        self.size = size
        self.storage = ["~"]*size
        self.pointer = 0
        print(f"New Stack: {self.storage}")
        print(f"Pointer at {self.pointer}.\n")

    def push(self, x):
        try:
            self.storage[self.pointer] = x
            print(f"New Stack: {self.storage}")
            print(f"Pointer at {self.pointer+1}.\n")

            self.pointer+=1
        except IndexError as e:
            print(f"Overflow Occured at pointer: {self.pointer} \n")

    def pop(self):
        if self.pointer&amp;lt;1:
            print("Stack Underflow occured.")
        else:
            self.storage = self.storage[:-1]
            print(f"New Stack: {self.storage}")
            print(f"Pointer at {self.pointer-1}.\n")

            self.pointer-=1



data = [4, 5, 6, 8, 5, 2, 0]
stack = Stack(size=6)

for x in data:
    stack.push(x)
stack.pop()

for i in range(len(data)):
    stack.pop()


New Stack: ['~', '~', '~', '~', '~', '~']
Pointer at 0.

New Stack: [4, '~', '~', '~', '~', '~']
Pointer at 1.

New Stack: [4, 5, '~', '~', '~', '~']
Pointer at 2.

New Stack: [4, 5, 6, '~', '~', '~']
Pointer at 3.

New Stack: [4, 5, 6, 8, '~', '~']
Pointer at 4.

New Stack: [4, 5, 6, 8, 5, '~']
Pointer at 5.

New Stack: [4, 5, 6, 8, 5, 2]
Pointer at 6.

Overflow Occured at pointer: 6 

New Stack: [4, 5, 6, 8, 5]
Pointer at 5.

New Stack: [4, 5, 6, 8]
Pointer at 4.

New Stack: [4, 5, 6]
Pointer at 3.

New Stack: [4, 5]
Pointer at 2.

New Stack: [4]
Pointer at 1.

New Stack: []
Pointer at 0.

Stack Underflow occured.
Stack Underflow occured.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now tweaking it little bit more to make it more readable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class Stack:
    def __init__ (self, size):
        self.size = size
        self.storage = ["~"]*size
        self.pointer = 0
        print(f"New Stack: {self.storage}")
        print(f"Pointer at {self.pointer}.\n")

    def push(self, x):
        print("=" * 20+" Push Operation "+"=" * 20)
        try:
            self.storage[self.pointer] = x
            print(f"New Stack: {self.storage}")
            print(f"Pointer at {self.pointer+1}.")

            self.pointer+=1
        except IndexError as e:
            print(f"Overflow Occured at pointer: {self.pointer}")
        print("="*55 + "\n")

    def pop(self):
        print("=" * 20+" Pop Operation "+"=" * 20)
        if self.pointer&amp;lt;1:
            print("Stack Underflow occured.")
        else:
            self.storage = self.storage[:-1]
            print(f"New Stack: {self.storage}")
            print(f"Pointer at {self.pointer-1}.")

            self.pointer-=1
        print("="*55 + "\n")



data = [4, 5, 6, 8, 5, 2, 0]
stack = Stack(size=6)

for x in data:
    stack.push(x)
stack.pop()

for i in range(len(data)):
    stack.pop()


New Stack: ['~', '~', '~', '~', '~', '~']
Pointer at 0.

==================== Push Operation ====================
New Stack: [4, '~', '~', '~', '~', '~']
Pointer at 1.
=======================================================

==================== Push Operation ====================
New Stack: [4, 5, '~', '~', '~', '~']
Pointer at 2.
=======================================================

==================== Push Operation ====================
New Stack: [4, 5, 6, '~', '~', '~']
Pointer at 3.
=======================================================

==================== Push Operation ====================
New Stack: [4, 5, 6, 8, '~', '~']
Pointer at 4.
=======================================================

==================== Push Operation ====================
New Stack: [4, 5, 6, 8, 5, '~']
Pointer at 5.
=======================================================

==================== Push Operation ====================
New Stack: [4, 5, 6, 8, 5, 2]
Pointer at 6.
=======================================================

==================== Push Operation ====================
Overflow Occured at pointer: 6
=======================================================

==================== Pop Operation ====================
New Stack: [4, 5, 6, 8, 5]
Pointer at 5.
=======================================================

==================== Pop Operation ====================
New Stack: [4, 5, 6, 8]
Pointer at 4.
=======================================================

==================== Pop Operation ====================
New Stack: [4, 5, 6]
Pointer at 3.
=======================================================

==================== Pop Operation ====================
New Stack: [4, 5]
Pointer at 2.
=======================================================

==================== Pop Operation ====================
New Stack: [4]
Pointer at 1.
=======================================================

==================== Pop Operation ====================
New Stack: []
Pointer at 0.
=======================================================

==================== Pop Operation ====================
Stack Underflow occured.
=======================================================

==================== Pop Operation ====================
Stack Underflow occured.
=======================================================

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Top Operation
&lt;/h3&gt;

&lt;p&gt;This is a simple operation. It gives the value that is below the position currently pointed by a pointer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class Stack:
    def __init__ (self, size):
        self.size = size
        self.storage = ["~"]*size
        self.pointer = 0
        print(f"New Stack: {self.storage}")
        print(f"Pointer at {self.pointer}.\n")

    def push(self, x):
        print("=" * 20+" Push Operation "+"=" * 20)
        try:
            self.storage[self.pointer] = x
            print(f"New Stack: {self.storage}")
            print(f"Pointer at {self.pointer+1}.")

            self.pointer+=1
        except IndexError as e:
            print(f"Overflow Occured at pointer: {self.pointer}")
        print("="*55 + "\n")

    def pop(self):
        print("=" * 20+" Pop Operation "+"=" * 20)
        if self.pointer&amp;lt;1:
            print("Stack Underflow occured.")
        else:
            self.storage = self.storage[:-1]
            print(f"New Stack: {self.storage}")
            print(f"Pointer at {self.pointer-1}.")

            self.pointer-=1
        print("="*55 + "\n")

    def top(self):
        print("=" * 20+" Top Operation "+"=" * 20)
        try:
            print(f"Pointer at {self.pointer}.")
            print(f"Return: {self.storage[self.pointer-1]}")
        except:
            print("Nothing on the top. Stack is empty.")
        print("="*55 + "\n")



data = [4, 5, 6, 8, 5, 2, 0]
stack = Stack(size=6)

for x in data:
    stack.push(x)
    stack.top()
stack.pop()

for i in range(len(data)):
    stack.pop()
    stack.top()


New Stack: ['~', '~', '~', '~', '~', '~']
Pointer at 0.

==================== Push Operation ====================
New Stack: [4, '~', '~', '~', '~', '~']
Pointer at 1.
=======================================================

==================== Top Operation ====================
Pointer at 1.
Return: 4
=======================================================

==================== Push Operation ====================
New Stack: [4, 5, '~', '~', '~', '~']
Pointer at 2.
=======================================================

==================== Top Operation ====================
Pointer at 2.
Return: 5
=======================================================

==================== Push Operation ====================
New Stack: [4, 5, 6, '~', '~', '~']
Pointer at 3.
=======================================================

==================== Top Operation ====================
Pointer at 3.
Return: 6
=======================================================

==================== Push Operation ====================
New Stack: [4, 5, 6, 8, '~', '~']
Pointer at 4.
=======================================================

==================== Top Operation ====================
Pointer at 4.
Return: 8
=======================================================

==================== Push Operation ====================
New Stack: [4, 5, 6, 8, 5, '~']
Pointer at 5.
=======================================================

==================== Top Operation ====================
Pointer at 5.
Return: 5
=======================================================

==================== Push Operation ====================
New Stack: [4, 5, 6, 8, 5, 2]
Pointer at 6.
=======================================================

==================== Top Operation ====================
Pointer at 6.
Return: 2
=======================================================

==================== Push Operation ====================
Overflow Occured at pointer: 6
=======================================================

==================== Top Operation ====================
Pointer at 6.
Return: 2
=======================================================

==================== Pop Operation ====================
New Stack: [4, 5, 6, 8, 5]
Pointer at 5.
=======================================================

==================== Pop Operation ====================
New Stack: [4, 5, 6, 8]
Pointer at 4.
=======================================================

==================== Top Operation ====================
Pointer at 4.
Return: 8
=======================================================

==================== Pop Operation ====================
New Stack: [4, 5, 6]
Pointer at 3.
=======================================================

==================== Top Operation ====================
Pointer at 3.
Return: 6
=======================================================

==================== Pop Operation ====================
New Stack: [4, 5]
Pointer at 2.
=======================================================

==================== Top Operation ====================
Pointer at 2.
Return: 5
=======================================================

==================== Pop Operation ====================
New Stack: [4]
Pointer at 1.
=======================================================

==================== Top Operation ====================
Pointer at 1.
Return: 4
=======================================================

==================== Pop Operation ====================
New Stack: []
Pointer at 0.
=======================================================

==================== Top Operation ====================
Pointer at 0.
Nothing on the top. Stack is empty.
=======================================================

==================== Pop Operation ====================
Stack Underflow occured.
=======================================================

==================== Top Operation ====================
Pointer at 0.
Nothing on the top. Stack is empty.
=======================================================

==================== Pop Operation ====================
Stack Underflow occured.
=======================================================

==================== Top Operation ====================
Pointer at 0.
Nothing on the top. Stack is empty.
=======================================================

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Thank you for reading this blog and reaching to the end. In the next blog, I will share similar approach for the Queue and it will also be a fun to try.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Dataframe in R.</title>
      <dc:creator>Durga Pokharel</dc:creator>
      <pubDate>Thu, 14 Jul 2022 07:20:43 +0000</pubDate>
      <link>https://dev.to/iamdurga/dataframe-in-r-22id</link>
      <guid>https://dev.to/iamdurga/dataframe-in-r-22id</guid>
      <description>&lt;h1&gt;
  
  
  Getting Started With Dataframe .
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Dataframe are the mostly used data structure in R. Dataframe is a list where all components have name and are on the same line. Easiest way of understanding about dataframe is the visualization of spreadsheets. The first row is represented by header. The header is given by the list component name. Each column can store the different datatype which is called a variable and each row is an observation across multiple variables, since dataframe are like spreadsheet we can insert the data how we will like to. There are many possibilities to inserting data.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Product&lt;/th&gt;
&lt;th&gt;apple&lt;/th&gt;
&lt;th&gt;Banana&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;price store A&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;td&gt;56&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;price store B&lt;/td&gt;
&lt;td&gt;67&lt;/td&gt;
&lt;td&gt;80&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;It is not dataframe because here price store is divided into two parts. If we rearrange the data by taking product is one variable and price is next variable and store is one variable then it become dataframe.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Product&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;th&gt;Store&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;apple&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;td&gt;A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;apple&lt;/td&gt;
&lt;td&gt;67&lt;/td&gt;
&lt;td&gt;B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;banana&lt;/td&gt;
&lt;td&gt;56&lt;/td&gt;
&lt;td&gt;A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;banana&lt;/td&gt;
&lt;td&gt;80&lt;/td&gt;
&lt;td&gt;B&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Attributes of dataframe
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Length&lt;/li&gt;
&lt;li&gt;Dimension&lt;/li&gt;
&lt;li&gt;Name&lt;/li&gt;
&lt;li&gt;Class 
## How to Create DataFrame
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;product &amp;lt;- c('apple','banana','orange','papaya','rice','wheat','pee','noodle')
catagory &amp;lt;- c( 'groceries','groceries','electronic','electronic','groceries','electronic','electronic','groceries')
price &amp;lt;- c(24,45,67,88,56,78,89,90)
quality &amp;lt;- c('high','low','high','low','high','low','high','low') 

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To create dataframe from above data we can do&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; shopping_data &amp;lt;- data.frame(product,catagory,price,quality,
                           budget = c(120,3000,600,500,45,67,89,90))
shopping_data

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output of the avove code is,dataframe.&lt;/p&gt;

&lt;p&gt;To check wether it is dataframe or not we can use folowing code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;str(shopping_data)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output of the above cde is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;'data.frame':   8 obs. of 5 variables:
 $ product : chr "apple" "banana" "orange" "papaya" ...
 $ catagory: chr "groceries" "groceries" "electronic" "electronic" ...
 $ price : num 24 45 67 88 56 78 89 90
 $ quality : chr "high" "low" "high" "low" ...
 $ budget : num 120 3000 600 500 45 67 89 90 

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Check the attribute of dataframe.
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; names(shopping_data)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Check dimension of dataframe.
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; dim(shopping_data)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Check first six rows of dataframe
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; head(shopping_data)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Check last six rows of dataframe.
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; tail(shopping_data)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Take only two rows of dataframe.
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; head(shopping_data, n = 2)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Access specified column of database.
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; shopping_data$product

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output of the above code is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; 'apple''banana''orange''papaya''rice''wheat''pee''noodle'


 shopping_data[['product']]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output of the above code is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; 'apple''banana''orange''papaya''rice''wheat''pee''noodle'

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Manipulating dataframe By manipulating data frame we khow how to select, add new row and how to sort and rank into dataframe. Dataframe are list where each elements are name vector of same length. Therefore we can select element as same as in list. we do by [[]] or $column. Dataframe are also two dimensional matricies which means we can index them as matrices by using square braces.[row,column].We fix data in one dimension they behave as list. Therefore dataframe can be index either as like list or as like matrices accoding to positions, rules, names.
&lt;/h2&gt;

&lt;h3&gt;
  
  
  List subsetting
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#list subsetting
shopping_data[[2]]
shopping_data[['budget']]
shopping_data$price
shopping_data$price[1:3]
shopping_data[[3]][3]
shopping_data$price[3]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output of the above code is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;'groceries''groceries''electronic''electronic''groceries''electronic''electronic''groceries'
120300060050045678990
2445678856788990
244567
67
67

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Matrix subsetting
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#Matrix subsetting
shopping_data[,1]
shopping_data[,"product"]
shopping_data[1,]
shopping_data[1,"price"]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output will be&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;'apple''banana''orange''papaya''rice''wheat''pee''noodle'
'apple''banana''orange''papaya''rice''wheat''pee''noodle'
A data.frame: 1 × 5
1   apple   groceries   24  high    120
24

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Add new attribute into dataframe.
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;feedback&amp;lt;- c('good','outstanding','ordinary','nice','excilent','brillent','extra-ordinary','satisfactory')
shopping_data &amp;lt;- cbind(shopping_data,feedback)
shopping_data

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output will be&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;A data.frame: 8 × 6
apple   groceries   24  high    120 good
banana  groceries   45  low 3000    outstanding
orange  electronic  67  high    600 ordinary
papaya  electronic  88  low 500 nice
rice    groceries   56  high    45  excilent
wheat   electronic  78  low 67  brillent
pee electronic  89  high    89  extra-ordinary
noodle  groceries   90  low 90  satisfactory

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  We can do the following operations to access the data from dataframe
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;shopping_data[c(1:3),1]
shopping_data[1]
shopping_data[[1]]
is.vector(shopping_data[1])
is.vector(shopping_data[[1]])
is.list(shopping_data[1])
is.list(shopping_data[1])

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;'apple''banana''orange'
A data.frame: 8 × 1
apple
banana
orange
papaya
rice
wheat
pee
noodle
'apple''banana''orange''papaya''rice''wheat''pee''noodle'
FALSE
TRUE
TRUE
TRUE

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Working with tidyverse
&lt;/h2&gt;

&lt;p&gt;During data analysis we spend our most time in data cleaning and transforming the raw data. Tydyverse is an add on that let us perform operation such as cleaning data and creating powerful graph.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;product &amp;lt;- c('apple','banana','orange','papaya','Rice','wheat','pee','noodle')
catagory &amp;lt;- c( 'groceries','groceries','electronic','electronic','groceries','electronic','electronic','groceries')
price &amp;lt;- c(24,45,67,88,56,78,89,90)
quality &amp;lt;- c('high','low','high','low','high','low','high','low')
shopping_data &amp;lt;- data.frame(product,catagory,price,quality,
                           budget = c(120,3000,600,500,45,67,89,90))
#arrange(desc(price))
shopping_data

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;A data.frame: 8 × 5
apple   groceries   24  high    120
banana  groceries   45  low 3000
orange  electronic  67  high    600
papaya  electronic  88  low 500
Rice    groceries   56  high    45
wheat   electronic  78  low 67
pee electronic  89  high    89
noodle  groceries   90  low 90

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Select Function
&lt;/h3&gt;

&lt;p&gt;Select function allow us to select specified data from dataframe.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# dplyr never change the original data
#install.packages("tidyverse")
#library(tidyverse)
library(dplyr) 
product &amp;lt;- select(shopping_data,price,budget)
product

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;A data.frame: 8 × 2
24  120
45  3000
67  600
88  500
56  45
78  67
89  89
90  90

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Filter
&lt;/h3&gt;

&lt;p&gt;Filter function work similar to the select. Using the pipe operator %&amp;gt;% we can write multiple operations at once without renaming the intermedating results.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;filter(product,budget &amp;gt; 100)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;A data.frame: 4 × 2
24  120
45  3000
67  600
88  500


dataset2 &amp;lt;- shopping_data %&amp;gt;%
select(product,price)%&amp;gt;%
filter(price&amp;gt;45)%&amp;gt;%
group_by( product)%&amp;gt;%
summarize(avg = mean(price))

dataset2 

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;A tibble: 6 × 2
noodle  90
orange  67
papaya  88
pee 89
Rice    56
wheat   78

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Arrange function
&lt;/h3&gt;

&lt;p&gt;It sort our dataframe in acending order.&lt;code&gt;arrange(price)&lt;/code&gt;, to arrange dataframe in decending order we used &lt;code&gt;arrange(desc(price))&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;arrange(product,price)


Output is,
A data.frame: 8 × 2
24  120
45  3000
56  45
67  600
78  67
88  500
89  89
90  90

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Managing control statements:
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;If statement:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If statement is the most common statement that execute code that only the condition place between bracket is true. Otherwise if statement ignore that particular piece of code. &lt;code&gt;if(condition){ code to be executed}&lt;/code&gt; to overcome this abstacle we add extra element else # Paste function Paste converts its arguments ( via as.character) to character strings and concatenates them (separating them by the string given by sep ). If the arguments are vectors, they are concatenated term-by-term to give a character vector result.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;product &amp;lt;- "tshirt"
price&amp;lt;- 110
if(price &amp;lt; 100){
    print(paste('adding',product,'to cart'))
}else
{
    print(paste('adding',product,'to wishlist'))
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[1] "adding tshirt to wishlist"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Control Statement in vectors
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;quantity &amp;lt;- c(1,1,2,3,4)
ifelse(quantity == 1,'Yes','No')

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;'Yes''Yes''No''No''No'


price &amp;lt;- 100
if(price &amp;lt; 100){
    print("price"&amp;lt; "budget")
}else if(price == 100){
    print("the price is equal to budget")

}else{
    print("The budget is less then price")
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[1] "the price is equal to budget"


price &amp;lt;- c(58,100,110)
if(price &amp;lt; 100){
    print("price"&amp;lt; "budget")
}else if(price == 100){
    print("the price is equal to budget")

}else{
    print("The budget is less then price")
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the condition has the lenght grater than one then only the first input is tested. That means it check the first elements and then stop. This problem is resolved by using any function.&lt;/p&gt;

&lt;h1&gt;
  
  
  Any Function
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if(any(price &amp;lt; 100)){

    print('At least one price is under budget')
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[1] "At least one price is under budget"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  All Function
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if(all(price&amp;lt;100)){
    print('all the price are under budget')
}else{
    print('Not all prices satisfies the condition.')
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[1] "Not all prices satisfies the condition."

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To combine the condition we can use &lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt; and &lt;code&gt;||&lt;/code&gt; operator. single and and or are used to element wise vector. While double and or are used for vector compare on one(non vectorise form)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;price &amp;lt;- 58
if(price&amp;gt; 50 &amp;amp;&amp;amp; price &amp;lt; 100){
    print('The price is between 50 and 100')
}else {
    print("the price is not in between 50 and 100")
}


[1] "The price is between 50 and 100"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Switch Statement
&lt;/h1&gt;

&lt;p&gt;We can add as many as if else statements however keeping more than four is difficult to keep track what is happing when the condition is true. The switch command work with the cases, each syntax contain value to be tested followed by the possible cases.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;quantity &amp;lt;- c(1,3,4,5)

average_quantity &amp;lt;- function(quantity,type) {
    switch(type,
          arthematic = mean(quantity),
          geometric = prod(quantity)^(1/length(quantity)))
}
average_quantity(quantity,"arthematic")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;3.25


x &amp;lt;- c(1,2,3,4,5)
sumfunction &amp;lt;- function(x,i){
    switch(i, 
          s = sum(x)
        )
}
sumfunction(x,"s")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;15

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Loop
&lt;/h1&gt;

&lt;p&gt;Loop is the sequence of instructions that are repeated untill a certain condition is reached.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For loop It perform the same operations on all elements from input. Its syntax is &lt;code&gt;if(variable in sequence ){ Expression}&lt;/code&gt;between parenthesis there are three argument first argument is variable which can take any name then we have keyword in and last is sequence or vector of any kind.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For loop does not save output untill we print it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cart &amp;lt;- c('apple','cookie','lemoan')
    for(product in cart){
        print(product)
    }

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[1] "apple"
[1] "cookie"
[1] "lemoan"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  While loop
&lt;/h1&gt;

&lt;p&gt;While loop perform the operation as long as given conditions is true. Syntax is similary as for loop. To make loop stop there must be relation between condtion and expression other wise loop does not stop ever.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;index &amp;lt;- 1
while(index &amp;lt;3 ) {
    print(paste("The index value is",index))
    index &amp;lt;- index + 1
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[1] "The index value is 1"
[1] "The index value is 2"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Repeat Loop
&lt;/h1&gt;

&lt;p&gt;They repeat the same operation untill it hitting the stop key or by inserting special function to stop them. Repeat loop are important in algorithms optimization and maximization. As an syntax &lt;code&gt;repeat expression&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The next statement is used to discontinue one particular cycle and skip to the next.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;x &amp;lt;- 1
repeat {
    print(x)
    x = x + 1
    if( x==3){
        break
    }
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[1] 1
[1] 2


price &amp;lt;- c(123,456,78,900,987)
for(value in price){
    if( value &amp;lt; 100){
        next
    }
    discount &amp;lt;- value - value * 0.1
    print(discount)
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[1] 110.7
[1] 410.4
[1] 810
[1] 888.3

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
    </item>
    <item>
      <title>Getting Started with R Programming Language.</title>
      <dc:creator>Durga Pokharel</dc:creator>
      <pubDate>Tue, 12 Jul 2022 15:39:33 +0000</pubDate>
      <link>https://dev.to/iamdurga/getting-started-with-r-programming-language-1oin</link>
      <guid>https://dev.to/iamdurga/getting-started-with-r-programming-language-1oin</guid>
      <description>&lt;h1&gt;
  
  
  Getting Started With R .
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;R is a programming language and software environment for statical computing and graphics supported by the R foundation. R is not like a general-purpose programming language like Java, C, because it was created by statisticians as an active environment. Interactivity is the critical characteristic that allows R to explore our data. It is also a programming language and development environment for statistical testing and graphical testing. Each statistical testing is either linear, non-linear modeling, classification or many more. Different types of the plot are required while doing data analysis. In order to run R, we will use IDE(according to Wikipedia an integrated development environment(IDE) is a software application that provides comprehensive facilities to the programmer for software development). The core component which is required for every R program is BaseR. These core components contain only the code importing bits that run our code successfully.&lt;/p&gt;

&lt;h2&gt;
  
  
  History About R
&lt;/h2&gt;

&lt;p&gt;Bell labs develops s language in 1976. In 1993 Ross Lhaka and Robert Gentleman created R in New-Zealand. R became a free source in 1995. R version 1.0.0 is released in 2000 to the public. IDE Rstdio is release in 2011.&lt;/p&gt;

&lt;h2&gt;
  
  
  Drawback
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;R is build by using &lt;code&gt;S&lt;/code&gt;. If we want to build apps R probabily one be our choice.&lt;/li&gt;
&lt;li&gt;The object that we work must be strored in memory and working with fetch data set can queckly&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Installing and Setting up R in your Windows
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Downloading installation file
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Download R tools from &lt;a href="https://cran.r-project.org/bin/windows/Rtools/" rel="noopener noreferrer"&gt;Official Website&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Next, we need to have an IDE, most popular one is Rstudio. We can download it from &lt;a href="https://www.rstudio.com/products/rstudio/download/" rel="noopener noreferrer"&gt;this link&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After downloading installation file, install them on desired places and then open the console.&lt;/p&gt;

&lt;p&gt;After installation completed open R then we get window just like below&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2FMath_blog%2Fwindows%25203.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fiamdurga.github.io%2Fassets%2FMath_blog%2Fwindows%25203.PNG" alt="img"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now we can write our R codes within console or we can do it via Rstudio.&lt;/p&gt;

&lt;p&gt;I prefer to use Jupyter Notebook for runing R because it is more friendly for me. A good tutorial is available at &lt;a href="https://docs.anaconda.com/anaconda/navigator/tutorials/r-lang/" rel="noopener noreferrer"&gt;Anaconda’s Documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  My First R program
&lt;/h2&gt;

&lt;p&gt;I am assigning variable in R as my first R programs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Assigning Variable and operator in R
&lt;/h3&gt;

&lt;p&gt;A Variable is a container that stores values. An assignment statement set or reset the value store in the storage location(s) denoted by variable name(by Wikipedia). The assignment operator is a command that is it telling the computer to assign the text apple to the variable product. we can also assign by &lt;code&gt;assign('products', ' apple)&lt;/code&gt;. We can assign the variable in R in many ways like below.&lt;/p&gt;

&lt;h4&gt;
  
  
  Way 1
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;('apple'-&amp;gt; product)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Way 2
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(product = 'apple')

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Way 3
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;assign('products', ' apple)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Logical Operators in R
&lt;/h2&gt;

&lt;p&gt;Logical operator means those which gives &lt;code&gt;True&lt;/code&gt; and &lt;code&gt;False&lt;/code&gt; value. For example&lt;/p&gt;

&lt;h3&gt;
  
  
  Example 1
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apple &amp;lt;- 2
banana &amp;lt;- 3
most_expensive &amp;lt;- banana&amp;gt; apple
most_expensive

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output of above code is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TRUE

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example 2
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apple &amp;lt;- 2
banana &amp;lt;- 3
most_expensive &amp;lt;- banana&amp;lt; apple
most_expensive

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output of above code is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FALSE

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example 3
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apple &amp;lt;- 2
banana &amp;lt;- 2
most_expensive &amp;lt;- banana == apple
most_expensive

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TRUE

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example 4
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apple &amp;lt;- 2
banana &amp;lt;- 2
most_expensive &amp;lt;- banana != apple
most_expensive

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FALSE

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Some Commonly Used Data Types in R
&lt;/h2&gt;

&lt;p&gt;Data is centre for analysis if there is no data there is no analysis. Every piece of data are working with some characteristics thses characteristics can be summarize with data type.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Character&lt;/code&gt; : Anything inside quotation is a character.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Number&lt;/code&gt;: Number in R is double. Working with whole and fraction is a unique feature of double. Another is integer.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Integer&lt;/code&gt; Integer is actually simplified version of double. It store data as a string we must use capital letter L. In our use we need to use double rather than integer.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Logical(Boolean)&lt;/code&gt;:&lt;code&gt;Yes&lt;/code&gt; or &lt;code&gt;No&lt;/code&gt;. Also &lt;code&gt;T&lt;/code&gt; or &lt;code&gt;F&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Complex Number&lt;/code&gt;: (2 + 6i)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Raw&lt;/code&gt;: It is not so popular data type. It is not easy to create variable of raw type. If we really need to create raw function as a result of calling this function we get raw type data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All the fundamental data types are called atomic data type.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example of numbers
&lt;/h3&gt;

&lt;p&gt;An integer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;a &amp;lt;- 2L
class(a)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;'integer'

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A numeric:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;a &amp;lt;- 2
class(a)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;'numeric'


quantity &amp;lt;- 2
typeof(quantity)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;'double'


quantity_integer &amp;lt;- 2L
typeof(quantity_integer)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;'integer'

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Comments
&lt;/h2&gt;

&lt;p&gt;Comments are used to give important information about the code. Comments are not run by the program but a programmer writes it for better explanation of the code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# This is a comment in R

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Exploring vectors and factors
&lt;/h2&gt;

&lt;p&gt;Data structure as name suggest represent way to organize data to facilate different operations to perform faster calculations.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Vectors&lt;/code&gt;: Collection of data of same structure.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Factors&lt;/code&gt;: Which are used to store categorical data.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Array&lt;/code&gt;: Is a matrix which are generalization of vectors.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;List\DataFrame&lt;/code&gt;: Elements of different list are dataframe. List are more complex data structure because they allow us to store other list too. We can think data frame as spreadsheets where data are organize as columns and rows where each column has specific data type. Within a data frame we have all kinds of datatype but within one column we have only one data type. Other criteria to categorize our data is by dimensional.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Vector and list are one dimensional objects. Matrices and dataframe are two dimensional data structure. Array are the object that have more than two dimensions.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Vector have two properties they are one dimensional and containing element of same type.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Assigning a column vector
&lt;/h3&gt;

&lt;p&gt;Lets assign a column vector,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;assign('b',c(1,2,3,4))
print(b)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1 2 3 4

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Vectors attributes:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;length&lt;/code&gt;: It is denoted by length(a) and its meaning is number of elements.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Name&lt;/code&gt;: names(a), it allows us to add element in the list.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Type&lt;/code&gt;: typeof(a), It gives type of data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are six vectors types&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Double&lt;/li&gt;
&lt;li&gt;logical&lt;/li&gt;
&lt;li&gt;character&lt;/li&gt;
&lt;li&gt;complex&lt;/li&gt;
&lt;li&gt;Raw&lt;/li&gt;
&lt;li&gt;Integer
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;vector &amp;lt;- c("Durga","Puja","Ram","Hari")
vector
length(vector) # length 
names(vector)= "Sita" #names
typeof(vector) # type
vector

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;'Durga''Puja''Ram''Hari'
4
'character'
Sita'Durga'2'Puja'3'Ram'4'Hari'

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Manipulating vectors.
&lt;/h3&gt;

&lt;p&gt;Manipulating of vectors consists of sorting, ordering, indexing.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;sorting&lt;/code&gt;: Sort the data in some order.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Ordering&lt;/code&gt;: The order function return the index needed to get the vector sort.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Indexing&lt;/code&gt;: Selecting specifics iteam by position.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;quantity &amp;lt;- c(1,3,2,5,6,7)
sort(quantity)
order(quantity)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1 2 3 5 6 7
1 3 2 4 5 6


a &amp;lt;- c(1,7,36,0,7,5)
a[2]
a[3:5]
a[c(2,4)]
a[c(4,7)]# it return particular element from vector
a[-2]
a[-(2:4)] # it skip the element in the vector.
a[a==1]
a[a&amp;gt;3]
a[a %in%c(2,4)] # it gives matching element.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;7
36 0 7
7 0
0 &amp;lt;NA&amp;gt;
1 36 0 7 5
1 7 5
1
7 36 7 5

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Operating vector
&lt;/h3&gt;

&lt;p&gt;Adding or multipling vector of different size is called recycling rule. For recycling largest vector must be multiple of small one.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;c &amp;lt;- 1:6
d &amp;lt;- 1:3
c * d

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1 4 9 4 10 18

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Sequence generation
&lt;/h2&gt;

&lt;p&gt;It is used to create sequence of elements in a vector. &lt;code&gt;seq()&lt;/code&gt; function takes length and difference between values as optional argument. In a code below, I take elements in the range 1 to 5 in the interval of 1.5.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;seq(1,5,by = 1.5)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1 2.5 4

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Replicating elements
&lt;/h2&gt;

&lt;p&gt;It is used to return the replicating element in the list in a specified times. In the following code I replicate the numbers from 1 to 6 two times. A builtin function &lt;code&gt;rep()&lt;/code&gt; is used.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;e&amp;lt;- rep(1:6,times = 2)
e

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1 2 3 4 5 6 1 2 3 4 5 6

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can replicate the same number at desirable times.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;x &amp;lt;- rep(c(1),each = 10)
x

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Out put is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1 1 1 1 1 1 1 1 1 1

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Scan Function
&lt;/h2&gt;

&lt;p&gt;Scan function read any file into vector. It is very powerful function. In the code given below, it scan function read &lt;code&gt;covid_data.csv&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;f &amp;lt;- scan("covid data.csv", what = "Character")
f

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Out put of the above code is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;'date,totalCases,newCases,totalRecoveries,newRecoveries,totalDeaths,newDeaths' '1/23/2020,1,1,0,0,0,0' '1/24/2020,0,0,0,0,0,0' '1/25/2020,0,0,0,0,0,0' '1/26/2020,0,0,0,0,0,0' '1/27/2020,0,0,0,0,0,0' '1/28/2020,0,0,0,0,0,0' '1/29/2020,0,0,0,0,0,0' '1/30/2020,0,0,0,0,0,0' '1/31/2020,0,0,1,1,0,0' '2/1/2020,0,0,1,0,0,0' '2/2/2020,0,0,1,0,0,0' '2/3/2020,0,0,1,0,0,0' '2/4/2020,0,0,1,0,0,0' '2/5/2020,0,0,1,0,0,0' '2/6/2020,0,0,1,0,0,0' '2/7/2020,0,0,1,0,0,0' '2/8/2020,0,0,1,0,0,0' '2/9/2020,0,0,1,0,0,0' '2/10/2020,0,0,1,0,0,0' '2/11/2020,0,0,1,0,0,0' '2/12/2020,0,0,1,0,0,0' '2/13/2020,0,0,1,0,0,0' '2/14/2020,0,0,1,0,0,0' '2/15/2020,0,0,1,0,0,0' '2/16/2020,0,0,1,0,0,0' '2/17/2020,0,0,1,0,0,0' '2/18/2020,0,0,1,0,0,0' '2/19/2020,0,0,1,0,0,0' '2/20/2020,0,0,2,1,0,0' '2/21/2020,0,0,2,0,0,0' '2/22/2020,0,0,2,0,0,0' '2/23/2020,0,0,2,0,0,0' '2/24/2020,0,0,2,0,0,0' '2/25/2020,0,0,2,0,0,0' '2/26/2020,0,0,2,0,0,0' '2/27/2020,0,0,2,0,0,0' '2/28/2020,0,0,2,0,0,0' '2/29/2020,0,0,2,0,0,0' '3/1/2020,0,0,2,0,0,0' '3/2/2020,0,0,2,0,0,0' '3/3/2020,0,0,2,0,0,0' '3/4/2020,0,0,2,0,0,0' '3/5/2020,0,0,2,0,0,0' '3/6/2020,0,0,2,0,0,0' '3/7/2020,0,0,2,0,0,0' '3/8/2020,0,0,2,0,0,0' '3/9/2020,0,0,2,0,0,0' '3/10/2020,0,0,2,0,0,0' '3/11/2020,0,0,2,0,0,0' '3/12/2020,0,0,2,0,0,0' '3/13/2020,0,0,2,0,0,0' '3/14/2020,0,0,2,0,0,0' '3/15/2020,0,0,2,0,0,0' '3/16/2020,0,0,2,0,0,0' '3/17/2020,0,0,2,0,0,0' '3/18/2020,0,0,2,0,0,0' '3/19/2020,0,0,2,0,0,0' '3/20/2020,0,0,2,0,0,0' '3/21/2020,0,0,2,0,0,0' '3/22/2020,0,0,2,0,0,0' '3/23/2020,1,1,2,0,0,0' '3/24/2020,1,0,2,0,0,0' '3/25/2020,2,1,2,0,0,0' '3/26/2020,2,0,2,0,0,0' '3/27/2020,3,1,2,0,0,0' '3/28/2020,4,1,2,0,0,0' '3/29/2020,4,0,2,0,0,0' '3/30/2020,4,0,2,0,0,0' '3/31/2020,4,0,2,0,0,0' '4/1/2020,4,0,2,0,0,0' '4/2/2020,5,1,2,0,0,0' '4/3/2020,5,0,2,0,0,0' '4/4/2020,8,3,2,0,0,0' '4/5/2020,8,0,2,0,0,0' '4/6/2020,8,0,2,0,0,0' '4/7/2020,8,0,2,0,0,0' '4/8/2020,8,0,2,0,0,0' '4/9/2020,8,0,2,0,0,0' '4/10/2020,8,0,2,0,0,0' '4/11/2020,8,0,2,0,0,0' '4/12/2020,11,3,2,0,0,0' '4/13/2020,13,2,2,0,0,0' '4/14/2020,15,2,2,0,0,0' '4/15/2020,15,0,2,0,0,0' '4/16/2020,15,0,2,0,0,0' '4/17/2020,29,14,2,0,0,0' '4/18/2020,30,1,4,2,0,0' '4/19/2020,30,0,5,1,0,0' '4/20/2020,30,0,5,0,0,0' '4/21/2020,41,11,6,1,0,0' '4/22/2020,44,3,8,2,0,0' '4/23/2020,47,3,9,1,0,0' '4/24/2020,48,1,11,2,0,0' '4/25/2020,48,0,12,1,0,0' '4/26/2020,51,3,14,2,0,0' '4/27/2020,51,0,14,0,0,0' '4/28/2020,53,2,14,0,0,0' '4/29/2020,56,3,14,0,0,0' '4/30/2020,56,0,14,0,0,0' '5/1/2020,58,2,14,0,0,0' '5/2/2020,58,0,14,0,0,0' '5/3/2020,74,16,14,0,0,0' '5/4/2020,74,0,14,0,0,0' '5/5/2020,81,7,14,0,0,0' '5/6/2020,98,17,20,6,0,0' '5/7/2020,100,2,20,0,0,0' '5/8/2020,101,1,28,8,0,0' '5/9/2020,108,7,29,1,0,0' '5/10/2020,109,1,29,0,0,0' '5/11/2020,133,24,31,2,0,0' '5/12/2020,216,83,31,0,0,0' '5/13/2020,242,26,33,2,0,0' '5/14/2020,248,6,33,0,1,1' '5/15/2020,266,18,34,1,1,0' '5/16/2020,280,14,34,0,1,0' '5/17/2020,294,14,34,0,3,2' '5/18/2020,374,80,34,0,3,0' '5/19/2020,401,27,35,1,3,0' '5/20/2020,426,25,43,8,3,0' '5/21/2020,456,30,47,4,4,1' '5/22/2020,515,59,68,21,4,0' '5/23/2020,583,68,68,0,5,1' '5/24/2020,602,19,85,17,5,0' '5/25/2020,681,79,110,25,5,0' '5/26/2020,771,90,152,42,5,0' '5/27/2020,885,114,180,28,6,1' '5/28/2020,1041,156,184,4,6,0' '5/29/2020,1211,170,184,0,6,0' '5/30/2020,1400,189,188,4,7,1' '5/31/2020,1571,171,189,1,8,1' '6/1/2020,1810,239,190,1,8,0' '6/2/2020,2098,288,235,45,9,1' '6/3/2020,2299,201,238,3,11,2' '6/4/2020,2633,334,256,18,12,1' '6/5/2020,2911,278,289,33,12,0' '6/6/2020,3234,323,295,6,13,1' '6/7/2020,3447,213,340,45,13,0' '6/8/2020,3760,313,363,23,14,1' '6/9/2020,4083,323,394,31,15,1' '6/10/2020,4362,279,394,0,17,2' '6/11/2020,4612,250,394,0,17,0' '6/12/2020,5059,447,394,0,18,1' '6/13/2020,5334,275,394,0,19,1' '6/14/2020,5759,425,394,0,19,0' '6/15/2020,6210,451,1044,650,20,1' '6/16/2020,6590,380,1161,117,20,0' '6/17/2020,7176,586,1170,9,22,2' '6/18/2020,7847,671,1189,19,22,0' '6/19/2020,8273,426,1405,216,23,1' '6/20/2020,8604,331,1581,176,23,0' '6/21/2020,9025,421,1775,194,23,0' '6/22/2020,9558,533,2151,376,24,1' '6/23/2020,10098,540,2225,74,24,0' '6/24/2020,10727,629,2339,114,25,1' '6/25/2020,11161,434,2651,312,27,2' '6/26/2020,11754,593,2699,48,27,0' '6/27/2020,12308,554,2835,136,29,2' '6/28/2020,12771,463,3014,179,30,1' '6/29/2020,13247,476,3135,121,30,0' '6/30/2020,13563,316,3195,60,30,0' '7/1/2020,14045,482,4657,1462,33,3' '7/2/2020,14518,473,5321,664,33,0' '7/3/2020,15258,740,6144,823,33,0' '7/4/2020,15490,232,6416,272,34,1' '7/5/2020,15783,293,6548,132,35,1' '7/6/2020,15963,180,6812,264,35,0' '7/7/2020,16167,204,7500,688,36,1' '7/8/2020,16422,255,7753,253,36,0' '7/9/2020,16530,108,7892,139,38,2' '7/10/2020,16648,118,8012,120,39,1' '7/11/2020,16718,70,8443,431,39,0' '7/12/2020,16800,82,8590,147,39,0' '7/13/2020,16944,144,10295,1705,39,0' '7/14/2020,17060,116,10329,34,39,0' '7/15/2020,17176,116,11026,697,40,1' '7/16/2020,17343,167,11250,224,41,1' '7/17/2020,17444,101,11388,138,41,0' '7/18/2020,17501,57,11491,103,41,0' '7/19/2020,17657,156,11549,58,41,0' '7/20/2020,17843,186,11722,173,41,0' '7/21/2020,17993,150,12331,609,42,1' '7/22/2020,18093,100,12538,207,44,2' '7/23/2020,18240,147,12694,156,44,0' '7/24/2020,18373,133,12801,107,47,3' '7/25/2020,18482,109,12907,106,47,0' '7/26/2020,18612,130,12982,75,49,2' '7/27/2020,18751,139,13608,626,50,1' '7/28/2020,19062,311,13729,121,50,0' '7/29/2020,19272,210,13875,146,53,3' '7/30/2020,19546,274,14102,227,56,3' '7/31/2020,19770,224,14253,151,57,1' '8/1/2020,20085,315,14346,93,59,2' '8/2/2020,20331,246,14457,111,59,0' '8/3/2020,20749,418,14815,358,61,2' '8/4/2020,21008,259,14880,65,62,1' '8/5/2020,21389,381,15010,130,67,5' '8/6/2020,21749,360,15243,233,71,4' '8/7/2020,22213,464,15668,425,74,3' '8/8/2020,22591,378,16167,499,76,2' '8/9/2020,22971,380,16207,40,80,4' '8/10/2020,23309,338,16347,140,83,3' '8/11/2020,23947,638,16518,171,86,3' '8/12/2020,24431,484,16582,64,95,9' '8/13/2020,24956,525,16691,109,96,1' '8/14/2020,25550,594,16931,240,101,5' '8/15/2020,26018,468,17055,124,102,1' '8/16/2020,26659,641,17189,134,104,2' '8/17/2020,27240,581,17349,160,107,3' '8/18/2020,28256,1016,17434,85,114,7' '8/19/2020,28937,681,17554,120,120,6' '8/20/2020,29644,707,17818,264,126,6' '8/21/2020,30482,838,18068,250,137,11' '8/22/2020,31116,634,18204,136,146,9' '8/23/2020,31934,818,18485,281,149,3' '8/24/2020,32677,743,18660,175,157,8' '8/25/2020,33532,855,18973,313,164,7' '8/26/2020,34417,885,19358,385,175,11' '8/27/2020,35528,1111,19927,569,183,8' '8/28/2020,36455,927,20096,169,195,12' '8/29/2020,37339,884,20409,313,207,12' '8/30/2020,38560,1221,20676,267,221,14' '8/31/2020,39459,899,21264,588,228,7' '9/1/2020,40528,1069,22032,768,239,11' '9/2/2020,41648,1120,23144,1112,250,11' '9/3/2020,42876,1228,24061,917,257,7' '9/4/2020,44235,1359,25415,1354,271,14' '9/5/2020,45276,1041,26981,1566,280,9' '9/6/2020,46256,980,28795,1814,289,9' '9/7/2020,47235,979,30531,1736,300,11' '9/8/2020,48137,902,32818,2287,306,6' '9/9/2020,49218,1081,33736,918,312,6' '9/10/2020,50464,1246,35554,1818,317,5' '9/11/2020,51918,1454,36526,972,322,5' '9/12/2020,53119,1201,37378,852,336,14' '9/13/2020,54158,1039,38551,1173,345,9' '9/14/2020,55328,1170,39430,879,360,15' '9/15/2020,56787,1459,40492,1062,371,11' '9/16/2020,58326,1539,41560,1068,379,8' '9/17/2020,59572,1246,42803,1243,383,4' '9/18/2020,61592,2020,43674,871,390,7' '9/19/2020,62796,1204,45121,1447,401,11' '9/20/2020,64121,1325,46087,966,411,10' '9/21/2020,65275,1154,47092,1005,427,16' '9/22/2020,66631,1356,47915,823,429,2' '9/23/2020,67803,1172,49808,1893,436,7' '9/24/2020,69300,1497,50265,457,452,16' '9/25/2020,70613,1313,51720,1455,458,6' '9/26/2020,71820,1207,52867,1147,466,8' '9/27/2020,73393,1573,53752,885,476,10' '9/28/2020,74744,1351,54494,742,481,5' '9/29/2020,76257,1513,55225,731,491,10' '9/30/2020,77816,1559,56282,1057,498,7'

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conversion of different type of data into character type is called implicit coercian
&lt;/h2&gt;

&lt;p&gt;R convert coerced data type into character.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;x &amp;lt;- c(1,'two',4,"durga")
x
typeof(x)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;'1' 'two' '4' 'durga'
'character'

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Explicit type coercian
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;We do this by typing &lt;code&gt;as.desire data type&lt;/code&gt;. Explicit type coercian helps us to deal with incorectly catagorized data.&lt;/li&gt;
&lt;li&gt;We can not transfer numeric into character&lt;/li&gt;
&lt;li&gt;Character into numberic.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;num &amp;lt;- 1:5
num_char &amp;lt;- as.character(num)
num_char

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;'1' '2' '3' '4' '5'


product &amp;lt;- c("apple",1,"banana")
as.numeric(product)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Warning message in eval(expr, envir, enclos):
"NAs introduced by coercion"
&amp;lt;NA&amp;gt; 1 &amp;lt;NA&amp;gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Installing Packages in R
&lt;/h2&gt;

&lt;p&gt;There are numerous useful packages to do various tasks in R and with those packages, we could do things better and faster way. Once simpler way to install packages is via console;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;install.packages("haven")


library("haven") # allows to read sav file
saq8 &amp;lt;- read_sav("F:/Statisticts with R/CSV file for covid data/SAQ8.sav")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In above example, I first installed package named as &lt;code&gt;haven&lt;/code&gt; and then I used it to read &lt;code&gt;sav&lt;/code&gt; file.&lt;/p&gt;

&lt;p&gt;This all for this blog and I hope you enjoyed it. Please leave the feedbacks and stay tuned for my next blog.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
