<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Hitansh</title>
    <description>The latest articles on DEV Community by Hitansh (@hitansh159).</description>
    <link>https://dev.to/hitansh159</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F494280%2F3872d3cf-a45f-445e-8107-438de0508efd.png</url>
      <title>DEV Community: Hitansh</title>
      <link>https://dev.to/hitansh159</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/hitansh159"/>
    <language>en</language>
    <item>
      <title>Summary of VGG paper</title>
      <dc:creator>Hitansh</dc:creator>
      <pubDate>Fri, 12 Nov 2021 13:34:44 +0000</pubDate>
      <link>https://dev.to/hitansh159/summary-of-vgg-paper-2n16</link>
      <guid>https://dev.to/hitansh159/summary-of-vgg-paper-2n16</guid>
      <description>&lt;h2&gt;
  
  
  Title
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Very deep convolutional networks for large scale image recognition&lt;/li&gt;
&lt;li&gt;result:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Large Scale image recognition&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;Method

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Very deep convolution&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Abstract
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Why / aim:
To investigate the effect of the depth on accuracy of recognition &lt;/li&gt;
&lt;li&gt;Description / method:
By increasing the depth 
using an architecture with very small (3*3) convolution filter 
increasing depth upto 16-19 weight layers&lt;/li&gt;
&lt;li&gt;Result:
model that genralize well on other datasets as well
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Background:
Convolution networks have seen great success in large scale image and video recognition
This is possible due to:

&lt;ol&gt;
&lt;li&gt;Large public image datasets&lt;/li&gt;
&lt;li&gt;high performance systems&lt;/li&gt;
&lt;li&gt;large scale distributed clusters&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;Intention:
This paper address an important aspect of ConvNet architecture design i.e. &lt;code&gt;depth&lt;/code&gt;.
In this paper, authors keeps other aspects constant while steadily increasing depth of the network by adding 3*3 filter convolution layers
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Data
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Dataset&lt;br&gt;
ILSVRC-2012 dataset &lt;br&gt;
It includes 1000 classes which are divied as following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Training &lt;code&gt;1.3 Million&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Validation &lt;code&gt;50 Thousand&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;testing &lt;code&gt;100 Thousand&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Perfomance Evalution is done in 2 ways &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Top - 1 / multi class classification error:
 the problem of classifying instances into one of three or more classes&lt;/li&gt;
&lt;li&gt;Top - 5:
The proportion of images such that ground truth category is outside the top 5 predicted categories &lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Preprocessing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;isotropically rescalling (i.e. keeping image ratio same while scalling) to 224 &lt;/li&gt;
&lt;li&gt;subtracting mean RGB value&lt;/li&gt;
&lt;li&gt;Random cropping one crop per image per SGD iteration&lt;/li&gt;
&lt;li&gt;Random horizontal flips&lt;/li&gt;
&lt;li&gt;Random RGB shift &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Model Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ah-geGGN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/175bze1jb2eurto111wl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ah-geGGN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/175bze1jb2eurto111wl.png" alt="VGG architecture" width="475" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Input &lt;code&gt;fixed size RGB image of 224*224 px&lt;/code&gt;.followed by &lt;code&gt;preproccessing steps&lt;/code&gt;. &lt;br&gt;
Architecture is stack of convolution layers with filters of small&lt;code&gt;receptive field: 3*3 or 1*1&lt;/code&gt; in some cases&lt;br&gt;
1*1 convolution layers can be treated as linear transformation followed by non-linear transformation.&lt;br&gt;
&lt;code&gt;Stride is fixed to 1 px&lt;/code&gt; . Padding is such that size is maintained.&lt;br&gt;
&lt;code&gt;pooling is done by max pooling layers with 2*2 window and stride 2&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;After this &lt;code&gt;3 FC layers&lt;/code&gt; are used. &lt;br&gt;
&lt;code&gt;2 have 4096 channels&lt;/code&gt; and &lt;code&gt;last one is 1000 way soft-max layer&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;All hidden layers used &lt;code&gt;ReLU&lt;/code&gt;.&lt;br&gt;
also, some models used &lt;code&gt;Local response Normalization&lt;/code&gt; but it did not improve the performance while increasing memory consumption and computation time. &lt;/p&gt;

&lt;p&gt;There are 6 (A-E, A-LRN) models based on this genaric design&lt;br&gt;&lt;br&gt;
A-E 5 models differ only in depth from 11 to 19 layers.&lt;br&gt;
With width of convolution layers starting from 64 to 512 increasing in factor of 2.&lt;/p&gt;

&lt;h2&gt;
  
  
  Input and Output
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Input:&lt;code&gt;fixed size 224*224 RGB image&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Output: &lt;code&gt;class&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  New techniques
&lt;/h2&gt;

&lt;p&gt;Important part of paper is, &lt;code&gt;why author selected 3*3 window field ?&lt;/code&gt;&lt;br&gt;
Explanation by author: &lt;br&gt;
2 conv layer of window 3 * 3 =  1 window of 5 * 5&lt;br&gt;
3 conv layer of window 3 * 3 = 1 window of 7 * 7&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;using 3 convo layer means we get to involve 3  non-linear rectification layers which makes decision making more discriminate.&lt;/li&gt;
&lt;li&gt;It decreases number of parameters&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Lets assume there are C number of channels then,&lt;br&gt;
3 conv layers with window of 3 * 3  = 3(3^2 * C^2 ) = 27C^2&lt;br&gt;
1 Conv layer with window of 7 * 7 = 7^2 * c^2 = 49C^2&lt;/p&gt;

&lt;p&gt;One can also use 1 * 1 window to increase non-linearity of decision function. &lt;br&gt;
&lt;code&gt;This small window technique works best if net is deep.&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Loss
&lt;/h2&gt;

&lt;p&gt;Input image is isotropically rescaled to a pre-defined smallest image side.&lt;br&gt;
The fully connected layers are first converted to convolution layers. First FC to a 7 * 7 conv layers, last two to 1 * 1 conv layers&lt;/p&gt;

&lt;p&gt;Now, on perdition we get class score map i.e. we get confidence for each class.&lt;br&gt;
to obtain fixed size vector class score map is passed through sum-pooled&lt;br&gt;
Final score is average of score from original image and its horizontal flip.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Multi crop was drop as it doesn't justify the computation time required for the accuracy growth it showed.&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Model training
&lt;/h2&gt;

&lt;p&gt;technique: &lt;code&gt;mini batch gradient descent&lt;/code&gt;&lt;br&gt;
batch size: &lt;code&gt;256&lt;/code&gt;&lt;br&gt;
optimizer: &lt;code&gt;multinomial logistic regression&lt;/code&gt;&lt;br&gt;
momentum: &lt;code&gt;0.9&lt;/code&gt;&lt;br&gt;
regularization: &lt;code&gt;L2&lt;/code&gt; &lt;br&gt;
regularization multiplier: &lt;code&gt;5*10^-4&lt;/code&gt;&lt;br&gt;
Dropout: &lt;code&gt;for first 2 FC 0.5&lt;/code&gt;&lt;br&gt;
learning rate: &lt;code&gt;10^-2 decreased by factor of 10 when validation accuracy stopped improving&lt;/code&gt; &lt;br&gt;
(learning rate was totally decreased 3 times)&lt;br&gt;
&lt;code&gt;learning stopped after 370k iterations 74 epochs&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Model converge in less epochs :&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Implicit regularization imposed by greater depth and smaller filter size&lt;/li&gt;
&lt;li&gt;pre-initalization of certain layer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Initalize weights without pre-training by using &lt;code&gt;Xavier weights&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Two training settings were used&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;fixed scale &lt;/li&gt;
&lt;li&gt;Multi scale &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Two fix size were considered S = 256, 384 &lt;br&gt;
First network was trained with size of 256.&lt;br&gt;
Then same weight are used to initalize network for size of 384. and learning rate was set to 10^-3.   &lt;/p&gt;

&lt;p&gt;Second approach, &lt;br&gt;
set input image size to S where S from certain range [256, 512]&lt;br&gt;
this can also be seen as training set augmentation by scale jittering. Where a single model is trained on recogniizing object over a wide range of scales. &lt;br&gt;
This approach was speed up by fine tuning weights from first approach&lt;/p&gt;

&lt;p&gt;This was implemented with &lt;code&gt;c++ Caffe toolbox&lt;/code&gt; with a &lt;code&gt;number of significant modifications&lt;/code&gt;.&lt;br&gt;
To make use of multiple GPUs.&lt;br&gt;
Images were trained in &lt;code&gt;batches parallelly in GPUs&lt;/code&gt;.&lt;br&gt;
Results from each GPUs was &lt;code&gt;averaged&lt;/code&gt; to &lt;code&gt;obtained graident for full batch&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;4 &lt;code&gt;NVIDIA Titan BLACK GPUs&lt;/code&gt; were used to give &lt;code&gt;3.75 times fast training then then single GPU&lt;/code&gt;. &lt;br&gt;
It took &lt;code&gt;2-3 weeks for single net to train&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Result
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--QtbtKLRf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/40286xlcifgjhgcdsfze.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--QtbtKLRf--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/40286xlcifgjhgcdsfze.png" alt="Result table" width="880" height="379"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  summary
&lt;/h2&gt;

&lt;p&gt;In very deep convolution networks for large scale image classification depth is beneficial for the classification accuracy. &lt;/p&gt;

&lt;h2&gt;
  
  
  Check data and code available
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Dataset: https://image-net.org/challenges/LSVRC/index.php 
Code: 
    By Paper's author: https://worksheets.codalab.org/worksheets/0xe2ac460eee7443438d5ab9f43824a819
    Tensorflow implementation: https://github.com/tensorflow/models/blob/master/research/slim/nets/vgg.py 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>deeplearning</category>
    </item>
    <item>
      <title>Hecked-All-october</title>
      <dc:creator>Hitansh</dc:creator>
      <pubDate>Fri, 13 Nov 2020 13:28:19 +0000</pubDate>
      <link>https://dev.to/hitansh159/hecked-all-october-4m5l</link>
      <guid>https://dev.to/hitansh159/hecked-all-october-4m5l</guid>
      <description>&lt;p&gt;Just completed Git and Github course on coursera. My friend contacted me and told me about this competition and it become perfect start point to start contributing to open source. I was afraid at start but then confidence grew PR by PR and now I also pitched new ideas and features into existing repo.&lt;/p&gt;

&lt;p&gt;Thank you so much for such an wonderful event!&lt;br&gt;
thanks Intel, Dev and Digital ocean it was an awesome experience!!&lt;/p&gt;

</description>
      <category>hacktoberfest</category>
    </item>
  </channel>
</rss>
