<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: shubham oulkar</title>
    <description>The latest articles on DEV Community by shubham oulkar (@shubhamoulkar).</description>
    <link>https://dev.to/shubhamoulkar</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1233131%2Fea4e0074-4630-4945-9267-6c24ed779aeb.png</url>
      <title>DEV Community: shubham oulkar</title>
      <link>https://dev.to/shubhamoulkar</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/shubhamoulkar"/>
    <language>en</language>
    <item>
      <title>Don't let large git repositories slow you down</title>
      <dc:creator>shubham oulkar</dc:creator>
      <pubDate>Sun, 19 Apr 2026 03:45:57 +0000</pubDate>
      <link>https://dev.to/shubhamoulkar/dont-let-large-git-repositories-slow-you-down-1da8</link>
      <guid>https://dev.to/shubhamoulkar/dont-let-large-git-repositories-slow-you-down-1da8</guid>
      <description>&lt;p&gt;When a project matures over several years, the repository inevitably accumulates a massive history of commits, heavy binary assets, and complex trees. A standard &lt;code&gt;git clone&lt;/code&gt; can take hours, turning what should be a simple contribution into a frustrating hurdle. For developers on limited hardware or internet connections, this isn't just an annoyance it's a barrier to entry.&lt;/p&gt;

&lt;p&gt;To understand how to bypass this, we need to treat Git as what it truly is: a &lt;strong&gt;content-addressed Directed Acyclic Graph (DAG)&lt;/strong&gt; of objects. &lt;/p&gt;

&lt;p&gt;Instead of downloading the entire database, we can selectively fetch only the nodes of the graph we need to get to work. I’ve set up a &lt;strong&gt;&lt;a href="https://github.com/ShubhamOulkar/git-clone-testing" rel="noopener noreferrer"&gt;demo repository&lt;/a&gt;&lt;/strong&gt; so you can test these optimizations without waiting on a massive repo like the Linux kernel.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common challenges in large repositories
&lt;/h2&gt;

&lt;p&gt;A bloated repository creates friction across the entire development lifecycle:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Network Latency:&lt;/strong&gt; Cloning fails on unstable connections.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage Bloat:&lt;/strong&gt; Disk space vanishes, especially on local machines or ephemeral CI runners.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pipeline Drag:&lt;/strong&gt; CI/CD overhead increases because every build starts with a full checkout.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context Overload:&lt;/strong&gt; In a monorepo, you’re forced to download thousands of files just to edit a single folder.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Understanding git’s data model
&lt;/h2&gt;

&lt;p&gt;Git stores data in three main layers. By understanding these, you can choose exactly where to "cut" the data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;History (Commits):&lt;/strong&gt; The timeline of who changed what and when.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structure (Trees):&lt;/strong&gt; The "snapshots" of your directory and file layout at any given point.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Content (Blobs):&lt;/strong&gt; The actual file data (Binary Large Objects).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cloning strategies overview
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Flag&lt;/th&gt;
&lt;th&gt;Layer Skipped&lt;/th&gt;
&lt;th&gt;Best Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Shallow&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;--depth=1&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Past Commits&lt;/td&gt;
&lt;td&gt;CI/CD pipelines &amp;amp; "drive by" fixes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Blobless&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;--filter=blob:none&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Historical Blobs&lt;/td&gt;
&lt;td&gt;Professional day-to-day development.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Treeless&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;--filter=tree:0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Historical Trees&lt;/td&gt;
&lt;td&gt;Automated scripts that only scan commit logs.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Benchmarks: Comparing clone strategies
&lt;/h2&gt;

&lt;p&gt;I ran &lt;code&gt;git count-objects -vH&lt;/code&gt; on my demo repo to measure the internal object counts for each strategy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Full Clone (The Baseline)
&lt;/h3&gt;

&lt;p&gt;Every commit, tree, and blob is downloaded.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;git count-objects &lt;span class="nt"&gt;-vH&lt;/span&gt;
&lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="nt"&gt;-pack&lt;/span&gt;: 48
packs: 1
size-pack: 85.54 KiB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Shallow clone (&lt;code&gt;--depth=1&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;This truncates the history, fetching only the tip of the branch. It is the fastest way to get code on your screen, but it breaks commands like &lt;code&gt;git blame&lt;/code&gt; or &lt;code&gt;git bisect&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;git clone &lt;span class="nt"&gt;--depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 https://github.com/ShubhamOulkar/git-clone-testing.git
&lt;span class="nv"&gt;$ &lt;/span&gt;git count-objects &lt;span class="nt"&gt;-vH&lt;/span&gt;
&lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="nt"&gt;-pack&lt;/span&gt;: 25
packs: 1
size-pack: 29.76 KiB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Blobless clone (&lt;code&gt;--filter=blob:none&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;This is the "sweet spot." It fetches all commits and all trees (so &lt;code&gt;git log&lt;/code&gt; and &lt;code&gt;git checkout&lt;/code&gt; work instantly) but skips the actual file contents until you need them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;git clone &lt;span class="nt"&gt;--filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;blob:none https://github.com/ShubhamOulkar/git-clone-testing.git
&lt;span class="nv"&gt;$ &lt;/span&gt;git count-objects &lt;span class="nt"&gt;-vH&lt;/span&gt;
&lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="nt"&gt;-pack&lt;/span&gt;: 41
packs: 2
size-pack: 35.79 KiB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Notice the &lt;strong&gt;2 packs&lt;/strong&gt;. The second pack is a "Lazy Fetch" triggered automatically to populate your current working directory.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Treeless clone (&lt;code&gt;--filter=tree:0&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;This is the most aggressive partial clone. It fetches only the commit metadata.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;git clone &lt;span class="nt"&gt;--filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;tree:0 https://github.com/ShubhamOulkar/git-clone-testing.git
&lt;span class="nv"&gt;$ &lt;/span&gt;git count-objects &lt;span class="nt"&gt;-vH&lt;/span&gt;
&lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="nt"&gt;-pack&lt;/span&gt;: 31
packs: 3
size-pack: 36.07 KiB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; This results in &lt;strong&gt;3 packs&lt;/strong&gt;. Git first fetches commits, then realizes it needs the tree structure to show you files, then fetches the blobs. Each stage is a separate on-demand network trip.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Operational trade offs
&lt;/h2&gt;

&lt;p&gt;Partial clones are fast to start, but they introduce "Lazy Fetching." This means certain commands will trigger an on-demand network request when Git realizes it's missing a node in the graph.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Full&lt;/th&gt;
&lt;th&gt;Shallow&lt;/th&gt;
&lt;th&gt;Blobless&lt;/th&gt;
&lt;th&gt;Treeless&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;git log&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Local&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Local&lt;/td&gt;
&lt;td&gt;Local&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;git checkout&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Local&lt;/td&gt;
&lt;td&gt;Local&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Fetches Blobs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Fetches Trees + Blobs&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;git blame&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Local&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Fetches Blobs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Fetches Trees + Blobs&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;git diff&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Local&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Fetches Blobs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Fetches Trees + Blobs&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Under the hood
&lt;/h2&gt;

&lt;p&gt;When you run a partial clone, Git marks the remote as a &lt;strong&gt;promisor remote&lt;/strong&gt;. It adds &lt;code&gt;promisor = true&lt;/code&gt; to your &lt;code&gt;.git/config&lt;/code&gt;. This tells the local Git client: &lt;em&gt;"If you can't find an object, don't error out; just ask the remote for it."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To see these changes yourself in VS Code (which hides &lt;code&gt;.git&lt;/code&gt; by default):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Open Command Palette (&lt;code&gt;Ctrl + Shift + P&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt; Search &lt;strong&gt;Preferences: Open Settings (JSON)&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Add or update:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"files.exclude"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"**/.git"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Advanced tools for large repositories
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Git LFS (Large File Storage):&lt;/strong&gt; Replaces heavy assets with lightweight pointers. The actual binary data stays on a dedicated server. Knowing Git LFS is a great asset, as it’s widely supported by platforms like GitHub and Bitbucket for managing large binary files.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Sparse Checkout:&lt;/strong&gt; Perfect for monorepos. You download the metadata, but only "populate" the specific folders you’re working in.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;git sparse-checkout &lt;span class="nb"&gt;set&lt;/span&gt; &amp;lt;folder-path&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Git Scalar:&lt;/strong&gt; An opinionated wrapper (from Microsoft) that handles partial clones, sparse checkouts, and background maintenance automatically for massive enterprise repos.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Which should you use?
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;CI/CD or One-off Fixes:&lt;/strong&gt; Use &lt;code&gt;--depth=1&lt;/code&gt;. It’s the fastest path to a build.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Professional Development:&lt;/strong&gt; Use &lt;code&gt;--filter=blob:none&lt;/code&gt;. You keep full history for context but avoid the binary bloat.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Monorepos:&lt;/strong&gt; Combine &lt;code&gt;--filter=blob:none&lt;/code&gt; with &lt;code&gt;sparse-checkout&lt;/code&gt;. Download the map, but only visit the rooms you need.&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  References
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.blog/open-source/git/gits-database-internals-i-packed-object-store/#packfiles-and-pack-indexes" rel="noopener noreferrer"&gt;GitHub Blog: How Packfiles Work&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://git-scm.com/docs/git-rev-list#Documentation/git-rev-list.txt---filterfilter-spec" rel="noopener noreferrer"&gt;Official Git Filter Specs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/ShubhamOulkar/git-clone-testing" rel="noopener noreferrer"&gt;Test Lab Repository&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>git</category>
      <category>github</category>
      <category>bitbucket</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
