<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: B.Nikhil Tej</title>
    <description>The latest articles on DEV Community by B.Nikhil Tej (@bnikhil_tej_b7a6e92e).</description>
    <link>https://dev.to/bnikhil_tej_b7a6e92e</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3697362%2F0cdc5097-81fb-4bbf-8eb6-15f7977dc98d.png</url>
      <title>DEV Community: B.Nikhil Tej</title>
      <link>https://dev.to/bnikhil_tej_b7a6e92e</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/bnikhil_tej_b7a6e92e"/>
    <language>en</language>
    <item>
      <title>I Tried to Run VGG19 on a CPU… It Failed. So I Fixed It."</title>
      <dc:creator>B.Nikhil Tej</dc:creator>
      <pubDate>Tue, 21 Apr 2026 13:57:59 +0000</pubDate>
      <link>https://dev.to/bnikhil_tej_b7a6e92e/i-tried-to-run-vgg19-on-a-cpu-it-failed-so-i-fixed-it-2pbj</link>
      <guid>https://dev.to/bnikhil_tej_b7a6e92e/i-tried-to-run-vgg19-on-a-cpu-it-failed-so-i-fixed-it-2pbj</guid>
      <description>&lt;p&gt;"Turning a 500MB deep learning model into something actually usable using pruning, quantization, and simple tricks"&lt;/p&gt;

&lt;p&gt;Deep learning models look impressive when you read about them.&lt;/p&gt;

&lt;p&gt;High accuracy, benchmark scores, state-of-the-art results.&lt;/p&gt;

&lt;p&gt;But the moment you try to actually use one in a real system, things feel very different.&lt;/p&gt;

&lt;p&gt;That’s exactly what happened when I started working with VGG19.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem I ran into
&lt;/h2&gt;

&lt;p&gt;I picked VGG19 because it's simple and widely used. It felt like a safe choice.&lt;/p&gt;

&lt;p&gt;But when I tried to run it on a CPU:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It was slow
&lt;/li&gt;
&lt;li&gt;It took time just to load
&lt;/li&gt;
&lt;li&gt;Memory usage was huge
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At that point, it became obvious:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This model is powerful, but not practical.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Instead of switching to a different model, I wanted to understand something deeper:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I make this model usable instead of replacing it?&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What VGG19 actually looks like
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fij78m4odocuxafr0ox40.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fij78m4odocuxafr0ox40.png" alt="VGG 19 Architecture " width="800" height="572"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flsdyqmggwcb1238ycse0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flsdyqmggwcb1238ycse0.png" alt="A comparison diagram showing the layer-by-layer architecture of VGG-19 alongside plain and residual 34-layer networks, highlighting how convolution blocks, pooling, and residual connections differ in structure and depth." width="572" height="1314"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;VGG19 is built using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;16 convolution layers
&lt;/li&gt;
&lt;li&gt;5 max-pooling layers
&lt;/li&gt;
&lt;li&gt;3 fully connected layers
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Initially, I thought the convolution layers were the main reason for its size.&lt;/p&gt;

&lt;p&gt;But after digging deeper, I realized something important:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Most of the parameters are actually in the fully connected layers.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s why the model ends up with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;~143 million parameters
&lt;/li&gt;
&lt;li&gt;~500MB size
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What I built
&lt;/h2&gt;

&lt;p&gt;Instead of guessing what works, I built a small system to test things properly.&lt;/p&gt;

&lt;p&gt;The idea was simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Upload an image
&lt;/li&gt;
&lt;li&gt;Run it through different optimized versions of the VGG19 model
&lt;/li&gt;
&lt;li&gt;Compare the results side-by-side
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here’s what my interface looks like in practice:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvw516e697zupjju7qzzr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvw516e697zupjju7qzzr.png" alt="My Interface" width="800" height="565"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each card represents a different version of the same model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Baseline
&lt;/li&gt;
&lt;li&gt;Pruned
&lt;/li&gt;
&lt;li&gt;Quantized
&lt;/li&gt;
&lt;li&gt;Pruned + Quantized
&lt;/li&gt;
&lt;li&gt;Input scaled
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes it easy to visually compare how each optimization affects performance.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Structured pruning — removing unnecessary filters
&lt;/h2&gt;

&lt;p&gt;The first thing I tried was pruning.&lt;/p&gt;

&lt;p&gt;Not random pruning, but structured pruning.&lt;/p&gt;

&lt;p&gt;Instead of removing individual weights, I removed entire filters from convolution layers.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I did
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Focused on deeper layers (early layers are more important)
&lt;/li&gt;
&lt;li&gt;Removed about 10% of filters
&lt;/li&gt;
&lt;li&gt;Used L2 norm to identify less important filters
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What I observed
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The model became lighter
&lt;/li&gt;
&lt;li&gt;Inference got faster
&lt;/li&gt;
&lt;li&gt;Accuracy dropped slightly
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To fix that, I added a short fine-tuning step.&lt;/p&gt;

&lt;p&gt;That helped recover most of the performance.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Quantization — reducing precision
&lt;/h2&gt;

&lt;p&gt;Next, I explored quantization.&lt;/p&gt;

&lt;p&gt;Instead of changing the structure, this changes how weights are stored.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Before → 32-bit floating point
&lt;/li&gt;
&lt;li&gt;After → 8-bit integers
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What stood out
&lt;/h3&gt;

&lt;p&gt;Just quantizing the fully connected layers reduced a huge portion of the model size.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Fully connected layers are the main bottleneck.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It also made inference faster on CPU.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Combining both — what worked best
&lt;/h2&gt;

&lt;p&gt;After trying both techniques separately, I combined them.&lt;/p&gt;

&lt;p&gt;The order mattered:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Prune the model
&lt;/li&gt;
&lt;li&gt;Fine-tune it
&lt;/li&gt;
&lt;li&gt;Apply quantization
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This worked much better than using either technique on its own.&lt;/p&gt;

&lt;h3&gt;
  
  
  Final result:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Smaller model
&lt;/li&gt;
&lt;li&gt;Faster inference
&lt;/li&gt;
&lt;li&gt;Predictions still mostly consistent
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  4. Input scaling — the simplest trick
&lt;/h2&gt;

&lt;p&gt;One of the simplest changes gave surprisingly good results.&lt;/p&gt;

&lt;p&gt;Instead of modifying the model, I reduced the input size:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;From 224×224
&lt;/li&gt;
&lt;li&gt;To 160×160
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s almost half the number of pixels.&lt;/p&gt;

&lt;p&gt;Since convolution depends heavily on input size, this reduced computation significantly.&lt;/p&gt;

&lt;p&gt;No retraining was needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the results showed
&lt;/h2&gt;

&lt;p&gt;Looking at the interface output:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Baseline → slowest and largest
&lt;/li&gt;
&lt;li&gt;Pruning → moderate speed improvement
&lt;/li&gt;
&lt;li&gt;Quantization → major size reduction
&lt;/li&gt;
&lt;li&gt;Pruning + Quantization → best balance
&lt;/li&gt;
&lt;li&gt;Input scaling → highest speed boost
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;No single technique solves everything.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Each technique targets a different limitation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pruning → reduces computation
&lt;/li&gt;
&lt;li&gt;Quantization → reduces memory
&lt;/li&gt;
&lt;li&gt;Input scaling → reduces workload
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What I learned
&lt;/h2&gt;

&lt;p&gt;A few things stood out clearly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Models are often overbuilt
&lt;/li&gt;
&lt;li&gt;Fully connected layers are the real bottleneck
&lt;/li&gt;
&lt;li&gt;Simple changes matter
&lt;/li&gt;
&lt;li&gt;Combining techniques works best
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The trade-off
&lt;/h2&gt;

&lt;p&gt;There’s always a balance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Smaller model → faster
&lt;/li&gt;
&lt;li&gt;But → slight accuracy drop
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is to manage that trade-off.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;This isn’t just about VGG19.&lt;/p&gt;

&lt;p&gt;These ideas apply to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Edge devices
&lt;/li&gt;
&lt;li&gt;Mobile applications
&lt;/li&gt;
&lt;li&gt;Real-time systems
&lt;/li&gt;
&lt;li&gt;IoT deployments
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anywhere performance and memory are limited.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;Working on this changed how I think about deep learning models.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;It’s not just about building accurate models.&lt;br&gt;&lt;br&gt;
It’s about making them usable.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Code and Project
&lt;/h2&gt;

&lt;p&gt;If you want to explore the implementation:&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://github.com/Nikhil-tej108/VGG19" rel="noopener noreferrer"&gt;https://github.com/Nikhil-tej108/VGG19&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What I want to explore next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Knowledge distillation
&lt;/li&gt;
&lt;li&gt;Edge deployment
&lt;/li&gt;
&lt;li&gt;Real-time inference
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;If you've worked on similar optimization problems, I'd love to hear your approach.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>deeplearning</category>
      <category>vgg19</category>
      <category>computervision</category>
    </item>
  </channel>
</rss>
