<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Pushpa Sree Potluri</title>
    <description>The latest articles on DEV Community by Pushpa Sree Potluri (@sreepotluri).</description>
    <link>https://dev.to/sreepotluri</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F285049%2F918893e1-58dc-4847-a5c3-c0bb0546c60e.jpg</url>
      <title>DEV Community: Pushpa Sree Potluri</title>
      <link>https://dev.to/sreepotluri</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sreepotluri"/>
    <language>en</language>
    <item>
      <title>Introduction to Data Science - Part 2</title>
      <dc:creator>Pushpa Sree Potluri</dc:creator>
      <pubDate>Sat, 13 Jun 2020 17:05:19 +0000</pubDate>
      <link>https://dev.to/sreepotluri/introduction-to-data-science-part-2-26lo</link>
      <guid>https://dev.to/sreepotluri/introduction-to-data-science-part-2-26lo</guid>
      <description>&lt;p&gt;Data Science is all about how well you understand your data. Knowing what type of data you have makes a lot of difference. By knowing your data type, you will be to apply necessary statistical measurements or you can conclude certain assumptions about the data.&lt;br&gt;
 &lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Gegq237Y--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/u4lpvxtkcuqo4q0ke28g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Gegq237Y--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/u4lpvxtkcuqo4q0ke28g.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;1. Categorical data or qualitative data&lt;/strong&gt; - data that can be divided into categories. Categorical data cannot be defined in numbers. Sometimes we use numbers to define categorical data, but that numbers cannot hold any value. &lt;br&gt;
Ex: You have population data of a city divided into male and female categories, you have represented 1 for male and 0 for female. Here 1 &amp;amp; 0 are numbers but they hold any mathematical meaning.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;i. Nominal data&lt;/em&gt; - categorical data that has no order or sequence&lt;br&gt;
Ex: Gender, Race, language etc.,&lt;br&gt;
&lt;em&gt;ii. Ordinal data&lt;/em&gt; - this is an ordered series&lt;br&gt;
Ex: Education (high school, college, graduate, PhD)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Numerical or quantitative data&lt;/strong&gt; - data which represents numerical value &lt;br&gt;
Ex: You have population data of a city and how much annual income and how many children every individual has. Number of children and annual income is considered as numerical data. It gives information about the quantity of a specific thing.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;i. Discrete data&lt;/em&gt; - data that can be counted and not measured&lt;br&gt;
Ex: Number of students in a class&lt;br&gt;
&lt;em&gt;ii. Continuous data&lt;/em&gt; - data that represents measurements&lt;br&gt;
Ex: Temperature, heights of the students in a class&lt;/p&gt;

&lt;p&gt;And just like different types of data, we have different ways to visualize the data:&lt;br&gt;
&lt;em&gt;a. Nominal data&lt;/em&gt; - pie chart, bar graphs&lt;br&gt;
&lt;em&gt;b. Ordinal data&lt;/em&gt; - stacked bar graph&lt;br&gt;
&lt;em&gt;c. Discrete data&lt;/em&gt; - bar graphs, scatter plots&lt;br&gt;
&lt;em&gt;d. Continuous data&lt;/em&gt; - box plot, histograms&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Introduction to Data Science</title>
      <dc:creator>Pushpa Sree Potluri</dc:creator>
      <pubDate>Sat, 13 Jun 2020 03:39:46 +0000</pubDate>
      <link>https://dev.to/sreepotluri/introduction-to-data-science-227b</link>
      <guid>https://dev.to/sreepotluri/introduction-to-data-science-227b</guid>
      <description>&lt;p&gt;Most of the people are under the misconception that data science is all about machine learning algorithms. That is not true. Data Science is a combination of mathematics, computer science and, machine learning. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xR65-9EQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/41vfi1clim213cv5hcct.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xR65-9EQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/41vfi1clim213cv5hcct.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Data Science is a study of data, where you maintain datsets and derive insights from the dataset. Data Science uses different parts mentioned in the pattern below to solve the problems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1nbzccMa--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/m0v770upuc0s1qki3mq9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1nbzccMa--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/m0v770upuc0s1qki3mq9.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Perception - try to identify patterns with the help of the data&lt;br&gt;
Planning - involves two steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Finding all possible solutions&lt;/li&gt;
&lt;li&gt;Finding the best possible solution among all solutions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What do you need to know to be a successful data scientist?&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Programming Knowledge&lt;/li&gt;
&lt;li&gt;Data modelling and evaluation&lt;/li&gt;
&lt;li&gt;Data Visualization and reporting&lt;/li&gt;
&lt;li&gt;Probability and Statistics&lt;/li&gt;
&lt;li&gt;Machine Learning techniques&lt;/li&gt;
&lt;li&gt;Relational Database knowledge&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let's get started with some basic terminology used in data science:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Observations - data points in your dataset (rows)&lt;/li&gt;
&lt;li&gt;Features - variables in your dataset (columns)&lt;/li&gt;
&lt;li&gt;Target Variable - which you are trying to predict&lt;/li&gt;
&lt;li&gt;Train data - data from which your algorithm learns&lt;/li&gt;
&lt;li&gt;Test data - data to evaluate your model performance&lt;/li&gt;
&lt;li&gt;Model - set of patterns learned from the data&lt;/li&gt;
&lt;li&gt;Algorithm - specific machine learning process used to train your model&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Environment setup for Data Analysis with PySpark and Spark SQL</title>
      <dc:creator>Pushpa Sree Potluri</dc:creator>
      <pubDate>Mon, 27 Apr 2020 21:46:32 +0000</pubDate>
      <link>https://dev.to/sreepotluri/environment-setup-for-data-analysis-with-pyspark-and-spark-sql-gnb</link>
      <guid>https://dev.to/sreepotluri/environment-setup-for-data-analysis-with-pyspark-and-spark-sql-gnb</guid>
      <description>&lt;p&gt;Data Analysis is all about extracting all possible insights from your dataset. A very important step in building a machine learning model is to get to know the data. Spark is widely used for its parallel data processing on computer clusters. Spark supports multiple programming languages (Python, Scala, R, and Java) and includes libraries for SQL(Spark SQL), machine learning(MLlib), stream processing (spark streaming), and graph analytics (GraphX). In this post, I am going to use PySpark and Spark SQL for my data analysis.&lt;/p&gt;

&lt;p&gt;If you want to run Spark locally, you should have Java, as well as Python (Python 3), installed on your machine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Install Spark&lt;/strong&gt;&lt;br&gt;
i. Go to &lt;a href="https://spark.apache.org/downloads.html" rel="noopener noreferrer"&gt;https://spark.apache.org/downloads.html&lt;/a&gt; &lt;br&gt;
ii. Select version and package type&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fom6rnzuqivixv1bhbdet.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fom6rnzuqivixv1bhbdet.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
iii. Click on the download link, it will bring you to Apache Software Foundation site. From this site, you can start downloading&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fsm9lypajdb0tsoas94k4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fsm9lypajdb0tsoas94k4.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
iv. Set up some environment variables for Spark home and PySpark in a file called .bash_profile&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fbeouglu3zdpizq13rgyw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fbeouglu3zdpizq13rgyw.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
v. Install PySpark - I am using Python installer program (pip) to install PySpark&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fvingolript4klfczhtt4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fvingolript4klfczhtt4.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Launching Jupyter Notebook&lt;/strong&gt;&lt;br&gt;
i. Install jupyter notebook with python installer&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fnnl2cae045if943xpovs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fnnl2cae045if943xpovs.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;ii. Open terminal window, navigate to your working directory and type jupyter notebook. This will launch jupyter notebook&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fa7hzqey0ix9lwwegsyud.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fa7hzqey0ix9lwwegsyud.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6ofpf5ap5kwzllqj47em.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6ofpf5ap5kwzllqj47em.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;iii. Create new jupyter notebook by clicking on the "New" button on the upper right side and selecting Python 3&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fp6w03exrl5gy33344ob4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fp6w03exrl5gy33344ob4.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>python</category>
      <category>spark</category>
    </item>
    <item>
      <title>Best online data science course for beginners (My opinion)</title>
      <dc:creator>Pushpa Sree Potluri</dc:creator>
      <pubDate>Thu, 02 Jan 2020 22:21:21 +0000</pubDate>
      <link>https://dev.to/sreepotluri/best-online-data-science-course-for-beginners-my-opinion-1dc</link>
      <guid>https://dev.to/sreepotluri/best-online-data-science-course-for-beginners-my-opinion-1dc</guid>
      <description>&lt;p&gt;I have done some research to find out the best online course to learn data science especially for beginners and I found this course really interesting. They have a really good course structure starting with data preprocessing and covered all the popular algorithms with hands on experience. &lt;/p&gt;

&lt;p&gt;Course: Udemy - Machine Learning A-Z : Hands-On Python &amp;amp; R in Data Science&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Intermediate level of Python or R&lt;/li&gt;
&lt;li&gt;Anyone with a programming background can try this but I suggest to go through the python or R basics before starting this course&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This course provides hands on of building a model using some of the basic and most used algorithms in Regression, Classification, Clustering, Association Rule Mining, Neural Networks etc., in both R and Python. &lt;/p&gt;

&lt;p&gt;Each section is structured in a way to help us understand the basics of how to build a model. Every section consists of following steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Dataset (Explanation and Importing)&lt;/li&gt;
&lt;li&gt;Algorithm (Intuition and Implementation)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;And coming to how I learned data science:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Think of an use case you want to implement (I work in telecommunications industry, so I searched for the most popular machine learning use cases in telecom)&lt;/li&gt;
&lt;li&gt;Set your objective&lt;/li&gt;
&lt;li&gt;Which category does your objective falls into? (For this you need to have prior understanding of machine learning categories like Regression, Classification, Clustering etc.,)&lt;/li&gt;
&lt;li&gt;Choose a dataset (you can download datasets online - Kaggle.com has some good datasets)&lt;/li&gt;
&lt;li&gt;Start your project (I started in Jupyter Notebooks)&lt;/li&gt;
&lt;li&gt;Go through the data and make sure you have a clear understanding of the features (you should be able to answer all your questions from the data itself)&lt;/li&gt;
&lt;li&gt;And now the most important part Data pre-processing (handling missing data, removing duplicates etc.,)&lt;/li&gt;
&lt;li&gt;Select the features you need from data to train your model&lt;/li&gt;
&lt;li&gt;Select an algorithm that will fit your purpose&lt;/li&gt;
&lt;li&gt;Train your model&lt;/li&gt;
&lt;li&gt;Validate the model performance&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>machinelearning</category>
      <category>beginners</category>
      <category>python</category>
    </item>
    <item>
      <title>Association Rule Learning - Part I</title>
      <dc:creator>Pushpa Sree Potluri</dc:creator>
      <pubDate>Sun, 15 Dec 2019 05:42:48 +0000</pubDate>
      <link>https://dev.to/sreepotluri/association-rule-learning-part-i-2o4p</link>
      <guid>https://dev.to/sreepotluri/association-rule-learning-part-i-2o4p</guid>
      <description>&lt;p&gt;Ever wondered about how retailers are doing &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4gl5tdV1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/1vth0mkx51zxp3byle82.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4gl5tdV1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/1vth0mkx51zxp3byle82.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The answer is simple, Association Rule Learning. This technique is used by retailers across the globe to understand customer buying patterns by finding co-relation between the products that customers have bought.&lt;/p&gt;

&lt;p&gt;Association Rule Learning involves two steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Finding all frequent itemsets&lt;/li&gt;
&lt;li&gt;Generating strong association rules from the frequent itemsets&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Finding frequent itemsets can be done either by using the Apriori algorithm or FP Growth algorithm. In this part, we will see how the Apriori algorithm works. Apriori works on the assumption that &lt;/p&gt;

&lt;p&gt;"All nonempty subsets of a frequent itemset must also be frequent".&lt;/p&gt;

&lt;p&gt;Here is the sample dataset consisting of 9 transactions containing items I1, I2, ..I5.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ZsvtewX8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/b6lz4bwv5oah6t3ipu7c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ZsvtewX8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/b6lz4bwv5oah6t3ipu7c.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In order to have a proper understanding of association rule learning, it's better if we know the following metrics:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Support: Support of an item I1 is nothing but the number of transactions containing I1 to the total number of transactions&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   Support (I1) = Transactions containing I1 / Total transactions
                = 6 / 9 = 0.66
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Confidence: How likely a customer is to purchase item I3 when I1 is purchased. &lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   Confidence (I1 =&amp;gt; I3) = Transactions containing both I1 and I3 / Transactions containing I1
                        = 4 / 6 = 0.66
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now that we are familiar with these terms, let's try to understand the apriori. For this example, I'm taking minimum support count = 2 &lt;/p&gt;

&lt;p&gt;Step 1: Find 1-frequent itemsets (all the items) and calculate their support counts (nothing but the number of times itemsets have appeared in our transactions)&lt;/p&gt;

&lt;p&gt;Step 2: Compare each item support with the minimum support and remove the items having support less than minimum support. Here all the items satisfy the minimum support.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Yj2VKDnC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/7yo8m6vgi5ndcpwsq9yn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Yj2VKDnC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/7yo8m6vgi5ndcpwsq9yn.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
Step 3: From the result we got from table, find 2-frequent itemsets&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xJE2ZxGD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/0dh0wlyce1njhj4ybygp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xJE2ZxGD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/0dh0wlyce1njhj4ybygp.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
Step 4: Compare each itemset support with the minimum support and remove the itemsets having support less than minimum support. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--lMlzKqqe--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/xzp02naj14uw8knv4hgr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--lMlzKqqe--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/xzp02naj14uw8knv4hgr.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
Step 5: From the result we got from table, find 3-frequent itemsets and calculate their support counts&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--eOP4XDJf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/slczpi3kxaqozsq1ie59.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--eOP4XDJf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/slczpi3kxaqozsq1ie59.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
Step 6: Compare each itemset support with the minimum support and remove the itemsets having support less than minimum support. &lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--QB0fsI13--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/w19198lecd9vhdlt69o0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--QB0fsI13--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/w19198lecd9vhdlt69o0.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
Step 7: From the result we got from table, find 4-frequent itemsets&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Zgeu5-hn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/766epnscl5tip0mvc5ay.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Zgeu5-hn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/766epnscl5tip0mvc5ay.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
Step 8: Compare each itemset support with the minimum support and remove the itemsets having support less than minimum support. &lt;/p&gt;

&lt;p&gt;Repeat the steps until you get an empty set. Since the 4-itemset is not satisfying our minimum support count, we are not generating itemsets anymore.&lt;/p&gt;

&lt;p&gt;Once the frequent itemsets are generated, now is the time to generate strong association rules from the itemsets. Association rules can be generated as follows: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;For each frequent itemset l, generate all non-empty subsets s&lt;/li&gt;
&lt;li&gt;For every non-empty subset s, output the rule "s =&amp;gt; (l-s)"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For this, I'm taking minimum confidence value = 60%&lt;/p&gt;

&lt;p&gt;Step 9: Generating all non-empty subsets of an itemset. Here, I am generating all non-empty subsets for an itemset {I1, I2, I5}&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--eCfaPD9O--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/30embyb6csamfs8jtv86.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--eCfaPD9O--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/30embyb6csamfs8jtv86.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
Step 10: Generating rules from the non-empty subsets&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1YiGZz7h--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/i510arogh71aswdvoe81.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1YiGZz7h--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/i510arogh71aswdvoe81.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
Step 11: Which rules to consider? For this, we have to take calculate the confidence value for each rule&lt;/p&gt;

&lt;p&gt;Consider the first rule in the table I1 =&amp;gt; I2∩I5&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;         Confidence = Support count of (I1, I2, I5) / Support Count of I1
                    = 2 / 6 * 100 = 33.3%
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Calculate the confidence for all subsets&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6hScBS1o--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/r8g6ry0u52b59t05wy7u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6hScBS1o--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/r8g6ry0u52b59t05wy7u.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
After considering the minimum confidence value, rules 3,5 &amp;amp; 6 are strong rules for the itemset {I1, I2, I5}&lt;/p&gt;

&lt;p&gt;Step 12: Take the itemset {I1, I2, I3} and follow-through steps 10 &amp;amp; 11&lt;/p&gt;

&lt;p&gt;This series consists of &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Apriori algorithm working (Current Post).&lt;/li&gt;
&lt;li&gt;Python implementation of apriori.&lt;/li&gt;
&lt;li&gt;FP Growth algorithm working.&lt;/li&gt;
&lt;li&gt;Python implementation of FP growth.&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>beginners</category>
    </item>
    <item>
      <title>What do you need to know to get into Data Science as a Beginner?</title>
      <dc:creator>Pushpa Sree Potluri</dc:creator>
      <pubDate>Wed, 04 Dec 2019 02:35:30 +0000</pubDate>
      <link>https://dev.to/sreepotluri/what-do-you-need-to-know-to-get-into-data-science-as-a-beginner-1m4l</link>
      <guid>https://dev.to/sreepotluri/what-do-you-need-to-know-to-get-into-data-science-as-a-beginner-1m4l</guid>
      <description>&lt;p&gt;Data Science is a combination of Programming &amp;amp; Statistics, so to be a data scientist you need to have knowledge of at least one programming language, preferably Python/R as there is a good amount of people/communities who use these languages to build their models. &lt;/p&gt;

&lt;p&gt;For a complete beginner, Python is easy to learn. Some of the basic tools used in data science from Python stack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Jupyter Notebooks&lt;/strong&gt; - IDE&lt;br&gt;
&lt;strong&gt;Pandas&lt;/strong&gt; - library for data manipulation and analysis&lt;br&gt;
&lt;strong&gt;Numpy&lt;/strong&gt; - library for scientific computing&lt;br&gt;
&lt;strong&gt;Matplotlib &amp;amp; Seaborn&lt;/strong&gt; - library for data visualization&lt;br&gt;
&lt;strong&gt;Scikit-Learn&lt;/strong&gt; - library for machine learning&lt;/p&gt;

&lt;p&gt;Good mathematical knowledge helps to make a better judgment while choosing a procedure (algorithm) based on the data available to you and also to diagnose the problems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ETT3tZdo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/zqvelxm4ki8vs7alnf7u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ETT3tZdo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/zqvelxm4ki8vs7alnf7u.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you don't have time to go through the theory, start with a tutorial. Follow the tutorial step-by-step. After you complete a tutorial, apply what you learned to new datasets. You can find some sample datasets online (&lt;a href="https://www.kaggle.com/datasets"&gt;https://www.kaggle.com/datasets&lt;/a&gt;). If you try the same modeling on a new dataset, you might run into a new issue. Upon doing some research, you might discover data issues in the dataset like different formats, or missing values.&lt;/p&gt;

&lt;p&gt;If you are looking for more resources &lt;a href="https://www.coursera.org/"&gt;https://www.coursera.org/&lt;/a&gt;, &lt;a href="https://www.datacamp.com/"&gt;https://www.datacamp.com/&lt;/a&gt; offers some good and free courses.&lt;/p&gt;

&lt;p&gt;This blog is first posted on &lt;a href="https://www.hackerheap.com/"&gt;hackerheap.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>sql</category>
      <category>beginners</category>
      <category>python</category>
    </item>
  </channel>
</rss>
