<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: D Siddhant Patro</title>
    <description>The latest articles on DEV Community by D Siddhant Patro (@siddhantpatro).</description>
    <link>https://dev.to/siddhantpatro</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F397727%2F842a192e-0a41-4603-adcc-52d47bfa2984.jpg</url>
      <title>DEV Community: D Siddhant Patro</title>
      <link>https://dev.to/siddhantpatro</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/siddhantpatro"/>
    <language>en</language>
    <item>
      <title>Spark MLlib for Big data and Machine learning</title>
      <dc:creator>D Siddhant Patro</dc:creator>
      <pubDate>Tue, 09 Feb 2021 18:50:51 +0000</pubDate>
      <link>https://dev.to/siddhantpatro/spark-mllib-for-big-data-and-machine-learning-330j</link>
      <guid>https://dev.to/siddhantpatro/spark-mllib-for-big-data-and-machine-learning-330j</guid>
      <description>&lt;p&gt;In this world, full of data, there’s a good chance that you might know what Big data and Apache Spark is. If you don’t, that’s ok! I’ll tell you what it is but before knowing about big data and spark, you need to understand, what is &lt;em&gt;Data&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data&lt;/strong&gt; :- The quantities, characters, or symbols containing some kind of information on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F38ktb6vcbh7fub5knx65.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F38ktb6vcbh7fub5knx65.jpg" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since you all got an idea about what Data is, now it will be easy for you to understand what big data is.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Big data&lt;/strong&gt; :- It is a collection of data that is huge in volume and having more complexity, especially obtained from new data sources, and it is growing exponentially with time. These data sets are so voluminous that traditional data processing software just can’t manage them. &lt;br&gt;
It consists of 3 types of data, they are &lt;em&gt;structured&lt;/em&gt;, &lt;em&gt;semi-structured&lt;/em&gt; and &lt;em&gt;unstructured&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fazdw0o2zm77x20zj4wu4.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fazdw0o2zm77x20zj4wu4.jpg" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Machine learning&lt;/strong&gt; :- It is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fgz7lcmq6yo109jg7tky2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fgz7lcmq6yo109jg7tky2.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Apache Spark&lt;/strong&gt; :- With an immense amount of data, we need a tool to digest it and the tool is &lt;em&gt;Apache Spark&lt;/em&gt;. It is a fast, unified computing and open source data-processing engine for parallel data processing on computer clusters. It is designed to deliver the computational speed and scalability required for Big Data — specifically for streaming data, graph data, machine learning applications. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F43dygg0wxg1e74zskmuv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F43dygg0wxg1e74zskmuv.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Spark provides an unified data processing engine known as the&lt;br&gt;
&lt;u&gt;Spark stack&lt;/u&gt;. This stack is built on top of a strong foundation called &lt;u&gt;Spark Core&lt;/u&gt;, which provides all the necessary functionalities to manage and run distributed applications such as scheduling, coordination, and fault tolerance. Available libraries of Spark are &lt;em&gt;Spark SQL&lt;/em&gt;, &lt;em&gt;Spark Streaming&lt;/em&gt;, &lt;em&gt;GraphX&lt;/em&gt;, &lt;em&gt;Spark MLlib&lt;/em&gt; and &lt;em&gt;Spark R&lt;/em&gt;. &lt;/p&gt;

&lt;p&gt;&lt;u&gt;Spark SQL&lt;/u&gt; is for batch as well as interactive data processing. &lt;br&gt;
&lt;u&gt;Spark Streaming&lt;/u&gt; is for real-time stream data processing. &lt;br&gt;
&lt;u&gt;Spark GraphX&lt;/u&gt; is for graph processing. &lt;br&gt;
&lt;u&gt;Spark MLlib&lt;/u&gt; is for machine learning. &lt;br&gt;
&lt;u&gt;Spark R&lt;/u&gt; is for running machine learning tasks using the R shell.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fzyl2rrxgcsypw2cl8chd.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fzyl2rrxgcsypw2cl8chd.jpg" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Spark MLlib&lt;/strong&gt; is nothing but a library that helps in managing and simplifying many of the machine learning models for building tasks, such as featurization, pipeline for constructing, evaluating and tuning of the model. Machine learning algorithms are iterative in nature, meaning they run through many iterations until a desired objective is achieved. Spark makes it extremely easy to implement those algorithms and run them in a scalable manner through a cluster of machines.&lt;/p&gt;

&lt;p&gt;Spark MLlib tools are given below:-&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;ML Algorithms&lt;/li&gt;
&lt;li&gt;Featurization&lt;/li&gt;
&lt;li&gt;Pipelines&lt;/li&gt;
&lt;li&gt;Model Tuning&lt;/li&gt;
&lt;li&gt;Persistence&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;ML Algorithms&lt;/strong&gt;:-&lt;br&gt;
ML Algorithms form the core of MLlib. These include common learning algorithms such as classification, regression, clustering, and collaborative filtering. MLlib standardizes APIs to make it easier to combine multiple algorithms into a single pipeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Featurization&lt;/strong&gt;:-&lt;br&gt;
Featurization includes feature extraction, transformation, dimensionality reduction, and selection.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Feature Extraction is extracting features from raw data.&lt;/li&gt;
&lt;li&gt;Feature Transformation includes scaling, and modifying features&lt;/li&gt;
&lt;li&gt;Feature Selection involves selecting a subset of necessary features from a huge set of features.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Pipelines&lt;/strong&gt;:- &lt;br&gt;
In machine learning, it is common to run a sequence of steps to clean and transform data, then train one or more ML algorithms to learn from the data. MLlib has a class called Pipeline, which consists of a sequence of Pipeline Stages (Transformers and Estimators) to be run in a specific order. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model Tuning&lt;/strong&gt;:-&lt;br&gt;
The goal of the model tuning is to train a model with the right set of parameters to achieve the best performance to meet the object defined in the first step of the ML development process.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Persistence&lt;/strong&gt;:-&lt;br&gt;
Persistence helps in saving and loading ML algorithms, models, and pipelines. This helps in reducing time and efforts as the model is persistence, it can be loaded or reused any time when needed.&lt;/p&gt;

&lt;p&gt;The above are the tools via which one can learn to use machine learning algorithms on Apache spark framework for better and faster processing of massive and voluminous data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F7rddh6ls498qiwvg40c0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F7rddh6ls498qiwvg40c0.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;:-&lt;br&gt;
In the Python world, scikit-learn is one of the most popular open source machine learning libraries. It provides a set of supervised and unsupervised learning algorithms. It is designed to be simple and efficient and therefore, it is a perfect tool to learn and practice machine learning on a single machine. But the moment the size of the data exceeds the storage capacity of a single machine, that’s when it is time to switch to Spark MLlib.&lt;/p&gt;

&lt;p&gt;Thank you.&lt;/p&gt;

</description>
      <category>bigdata</category>
      <category>machinelearning</category>
      <category>apachespark</category>
    </item>
    <item>
      <title>Rundown on Deep Learning</title>
      <dc:creator>D Siddhant Patro</dc:creator>
      <pubDate>Sat, 01 Aug 2020 18:27:35 +0000</pubDate>
      <link>https://dev.to/siddhantpatro/rundown-on-deep-learning-d25</link>
      <guid>https://dev.to/siddhantpatro/rundown-on-deep-learning-d25</guid>
      <description>&lt;p&gt;&lt;strong&gt;What is deep learning ?&lt;/strong&gt; 😀&lt;/p&gt;

&lt;p&gt;Deep learning is an artificial intelligence (AI) function that imitates the working of the human brain in processing data and creating patterns for decision making. The word "Deep" in Deep Learning isn't a reference to any kind of deeper understanding achieved by some approach, rather it stands for the idea of successive layers of representation. It is a subset of machine learning in artificial intelligence that has layers/networks, capable of learning unsupervised data that is, unstructured or unlabeled. Deep learning can also be called as deep neural learning or deep neural network.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JC2o7DnW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/3midp3bzx8meedsmnvny.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JC2o7DnW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/3midp3bzx8meedsmnvny.jpeg" alt="AI, ML, DL"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Other appropriate names of deep learning could have been hierarchical representation learning and layered representation learning. Modern deep learning involves tens or even thousands of successive layers of representations and they are learned automatically from the exposure of training data.&lt;br&gt;
These layered or hierarchical representations are learned via models called Neural Networks, which are stacked on top of each other. Most of us have learnt about neural networks in the subject Biology. Yes!, it is true that some of the core concepts of deep learning were developed by drawing inspiration from the understanding/learning procedure of our brain. But, since there is no evidence that our brain does the learning in same way as modern deep learning models do, so it is not right to say that deep learning models are the models of our brain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How deep learning works ?&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;Let’s examine how a network of several layers transforms an image of a digit in order to recognize what digit it is.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---fXnfT_---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/mgxxa4b53fn83387dvaq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---fXnfT_---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/mgxxa4b53fn83387dvaq.png" alt="MNIST 4"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From the above image, you can get an idea about the basic deep learning architecture used by the neural network models. There are 3 layers namely input layer(layer 1), hidden layer(layer 2 &amp;amp; 3) and output layer(layer 4). Connections between neurons(layers) are associated with a weight, dictating the importance of the input value. &lt;br&gt;
Steps followed by the neural network are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;The above picture depicts an image of 28x28 pixels showing 4, &lt;br&gt;
is provided as an input to the input layer of neural network. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This input gets transformed in the successive hidden layers. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;This transformed image is allowed to pass through the output &lt;br&gt;
layer. And in the output layer the deep learning model is able &lt;br&gt;
to detect the digit.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You can think of a network as a distillation process in which the information passes through the successive filters and gives purified output.&lt;/p&gt;

&lt;p&gt;This is just a brief idea about how deep learning model works.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why deep learning ?&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;Research from Gartner revealed that a huge percentage of an organization’s data is unstructured because the majority of it exists in different types of formats like pictures, texts etc. For the majority of machine learning algorithms, it’s difficult to analyze unstructured data, which means it’s remaining unutilized and this is exactly where deep learning becomes useful.&lt;br&gt;
According to &lt;strong&gt;Andrew Ng&lt;/strong&gt; (the chief scientist of China’s major search engine Baidu, head of the Google Brain Project and co-founder of Coursera), &lt;strong&gt;“The analogy to deep learning is that the rocket engine is the deep learning models and the fuel is the huge amounts of data we can feed to these algorithms.”&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Yi8Pt9VI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/ym3573dlog0rlabsyr8e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Yi8Pt9VI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/ym3573dlog0rlabsyr8e.png" alt="Why DL"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;According to the quote and the graph, above, it is evident that due to the increase in amount of data, deep learning models are very useful to obtain a perfect and desirable output. The ability to process large numbers of features makes deep learning very powerful when dealing with unstructured data. And this is the reason why deep learning has emerged in recent years.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advantages of Deep learning over traditional machine learning&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;The major advantage of using deep learning over traditional machine learning algorithm are :&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;The deep learning model has the ability to do feature &lt;br&gt;
engineering on its own. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Massively parallel computations through use of GPU - scalable &lt;br&gt;
for large volume of data &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In Deep learning, problems are solved on an end-to-end basis &lt;br&gt;
while in machine learning, tasks are divided into small pieces &lt;br&gt;
and then received results are combined into one conclusion. &lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Refer to the picture below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_CTwLxt_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/he8bwfmq9ydhzdbgknsy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_CTwLxt_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/he8bwfmq9ydhzdbgknsy.png" alt="Feature Extraction"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples of deep learning in real-world scenarios&lt;/strong&gt;  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;u&gt;Electronics&lt;/u&gt;: Deep learning is being utilized &lt;br&gt;
in automated speech translation. You can think of home &lt;br&gt;
assistance devices which respond to your voice and understand &lt;br&gt;
your preferences.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;u&gt;Automated driving&lt;/u&gt;: With the help of deep &lt;br&gt;
learning, automotive researchers are now able to detect objects &lt;br&gt;
like traffic lights, stop signs etc automatically. They’re also &lt;br&gt;
using it to detect pedestrians that helps lower accidents.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;u&gt;Medical research&lt;/u&gt;: Deep learning is being used &lt;br&gt;
by researchers to detect cancer cells automatically.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;When to use deep learning ?&lt;/strong&gt; &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Deep learning performs exceptionally good for a massive amount &lt;br&gt;
of data. But for small data size, machine learning algorithm is &lt;br&gt;
more preferable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deep Learning really shines when it comes to complex problems &lt;br&gt;
such as image classification, natural language processing, and &lt;br&gt;
speech recognition.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deep Learning techniques need to have high end infrastructure &lt;br&gt;
to train in reasonable time.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Challenges faced&lt;/strong&gt; &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;One needs to find and process massive datasets for training. &lt;br&gt;
And these datasets are rarely available. Once the datasets &lt;br&gt;
are in hand, using them to train deep learning networks can &lt;br&gt;
require days on big clusters of CPUs and GPUs. Emerging &lt;br&gt;
techniques such as transfer learning shows some promise with &lt;br&gt;
regard to overcoming this challenge.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There can also be the danger of over-fitting of the data. Over- &lt;br&gt;
fitting happens when an algorithm learns the detail and noise &lt;br&gt;
in the training data to the extent that negatively impacts the &lt;br&gt;
performance of the model in real-life scenarios.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Due to the sheer number of layers, nodes, and connections, it &lt;br&gt;
is difficult to understand how deep learning networks arrive at &lt;br&gt;
insights.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The points presented above illustrate that deep learning has a lot of potential, but needs to overcome a few challenges before becoming a more versatile tool. Now the question is not whether this technology is useful, rather how companies can implement it in their projects to improve the way they process data. The interest and enthusiasm for the field is, however, growing, and already today we see incredible real-world applications of this technology.&lt;/p&gt;

&lt;p&gt;Thank you !!!&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
