<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: hitesh</title>
    <description>The latest articles on DEV Community by hitesh (@hiteshsisara).</description>
    <link>https://dev.to/hiteshsisara</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1957755%2Fe266a5c1-17b8-422c-bc2d-78a29303f265.png</url>
      <title>DEV Community: hitesh</title>
      <link>https://dev.to/hiteshsisara</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/hiteshsisara"/>
    <language>en</language>
    <item>
      <title>I Made My Terminal 7.5x Faster by Deleting the Tools Everyone Tells You to Install</title>
      <dc:creator>hitesh</dc:creator>
      <pubDate>Sun, 07 Jun 2026 19:20:26 +0000</pubDate>
      <link>https://dev.to/hiteshsisara/i-made-my-terminal-75x-faster-by-deleting-the-tools-everyone-tells-you-to-install-50m2</link>
      <guid>https://dev.to/hiteshsisara/i-made-my-terminal-75x-faster-by-deleting-the-tools-everyone-tells-you-to-install-50m2</guid>
      <description>&lt;h2&gt;
  
  
  My shell took 2 seconds to start. Then I realized an AI agent was paying that tax hundreds of times a day.
&lt;/h2&gt;

&lt;p&gt;For fifteen years the advice was settled: install oh-my-zsh, add some plugins, get autosuggestions and a pretty git-aware prompt. It made the terminal nicer to &lt;em&gt;sit in front of&lt;/em&gt;. That assumption — a human, sitting, typing, one command at a time — is now wrong most of the time.&lt;/p&gt;

&lt;p&gt;On my machine, the overwhelming majority of terminal commands are issued by AI agents, not me. And once that flips, every plugin you installed for human comfort turns into a tax the machine pays on your behalf, over and over.&lt;/p&gt;

&lt;p&gt;I profiled my own setup and cut shell startup from &lt;strong&gt;2.05 seconds to 0.27 seconds&lt;/strong&gt; — roughly &lt;strong&gt;7.5x&lt;/strong&gt; — by deleting tools that added zero value to an agent workflow. Here's what I found, why it matters more now than ever, and how to audit your own machine.&lt;/p&gt;




&lt;h2&gt;
  
  
  The thing nobody tells you: agents don't reuse your shell
&lt;/h2&gt;

&lt;p&gt;When &lt;em&gt;you&lt;/em&gt; work in a terminal, you open it once. Your &lt;code&gt;~/.zshrc&lt;/code&gt; runs a single time, you pay the startup cost once, and then you type into that warm session for hours. If sourcing oh-my-zsh costs 600ms at launch — who cares. You paid it at 9am and forgot about it.&lt;/p&gt;

&lt;p&gt;AI agents don't work that way. An agent that runs terminal commands typically &lt;strong&gt;spawns a fresh shell process per command&lt;/strong&gt; (or per small batch). There's no long-lived session it's lovingly curating. It runs &lt;code&gt;git status&lt;/code&gt; → shell starts → sources your full init → runs the command → exits. Then it runs &lt;code&gt;bun install&lt;/code&gt; → a &lt;em&gt;brand new shell&lt;/em&gt; starts → sources your full init &lt;em&gt;again&lt;/em&gt; → exits. Every command re-pays the entire startup cost.&lt;/p&gt;

&lt;p&gt;Now layer on how modern agentic tools actually operate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Parallel sub-agents.&lt;/strong&gt; Complex tasks fan out across multiple agent threads working concurrently — one gathering context, one running tests, one editing files. Each spawns its own shells.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Background processes.&lt;/strong&gt; Dev servers, watchers, and test runners get their own shell instances.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retry and verification loops.&lt;/strong&gt; Run a command, check output, fix, re-run. More shells.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the cost isn't &lt;code&gt;startup_time&lt;/code&gt;. It's:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;startup_time × commands_per_task × parallel_agents
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That 1.78 seconds I shaved off isn't saved once — it's saved on &lt;strong&gt;every shell spawn across every agent thread, all day&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A rough illustration: an agent task that runs 50 commands used to burn ~100 seconds &lt;em&gt;just starting shells&lt;/em&gt; before doing any real work. Run three agents in parallel on a big task and you're throwing away minutes of pure init overhead — plus the CPU and battery to do it.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I found when I profiled it
&lt;/h2&gt;

&lt;p&gt;I used zsh's built-in profiler to see where 2 seconds per shell was going:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Drop this at the very top of ~/.zshrc temporarily&lt;/span&gt;
zmodload zsh/zprof
&lt;span class="c"&gt;# ... your config ...&lt;/span&gt;
&lt;span class="c"&gt;# And this at the very bottom&lt;/span&gt;
zprof
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open a new shell and read the table. The results were almost comically lopsided.&lt;/p&gt;

&lt;h3&gt;
  
  
  Offender #1: The Yandex Cloud CLI's completion script — 827ms (52% of total)
&lt;/h3&gt;

&lt;p&gt;The single biggest cost wasn't even a "plugin." It was the shell-completion script for the &lt;strong&gt;Yandex Cloud CLI (&lt;code&gt;yc&lt;/code&gt;)&lt;/strong&gt; — an SDK I'd installed months earlier and stopped using. The function eating the time showed up in the profiler as &lt;code&gt;__yc_bash_source&lt;/code&gt;, and the completion file it sourced (&lt;code&gt;completion.zsh.inc&lt;/code&gt;) was &lt;strong&gt;9 MB&lt;/strong&gt;. Every shell launch sourced nine megabytes of bash-compatibility completion logic to enable tab-completion for a command I never typed — and that the agent certainly never typed. The whole SDK was 189 MB sitting in &lt;code&gt;~/yandex-cloud&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This is the purest form of the problem. Tab-completion helps a &lt;em&gt;human&lt;/em&gt; who is &lt;em&gt;typing and pressing Tab&lt;/em&gt;. An agent generates the entire command string in one shot from a language model. It will never press Tab. The feature's whole value proposition is irrelevant to the primary user of the shell, yet it was the most expensive thing in the startup path.&lt;/p&gt;

&lt;h3&gt;
  
  
  Offender #2: The completion framework itself — ~640ms
&lt;/h3&gt;

&lt;p&gt;oh-my-zsh's machinery (&lt;code&gt;compinit&lt;/code&gt;, &lt;code&gt;compdump&lt;/code&gt;, and ~2,500 &lt;code&gt;compdef&lt;/code&gt; calls) accounted for most of the rest. &lt;code&gt;compinit&lt;/code&gt; rebuilds a completion cache and runs a security audit over completion directories. Useful if you live in the shell. Dead weight if a model is generating your commands.&lt;/p&gt;

&lt;h3&gt;
  
  
  Offender #3: Forge's shell plugin
&lt;/h3&gt;

&lt;p&gt;Ironically, a &lt;em&gt;different&lt;/em&gt; AI coding tool — &lt;strong&gt;Forge&lt;/strong&gt; (a CLI agent) — had installed its own zsh plugin (commands, completions, keybindings) that ran on every launch via &lt;code&gt;eval "$(forge zsh plugin)"&lt;/code&gt; in the hot path. AI tooling adding shell-interactivity features to speed up humans, slowing down the AI tooling that's actually running the commands.&lt;/p&gt;




&lt;h2&gt;
  
  
  The cleanup, in order of impact
&lt;/h2&gt;

&lt;p&gt;I removed things in descending order of cost, measuring after each step:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;What I removed&lt;/th&gt;
&lt;th&gt;Startup time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Baseline&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.05s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remove Yandex Cloud CLI (&lt;code&gt;yc&lt;/code&gt;) + its 9MB completion&lt;/td&gt;
&lt;td&gt;source lines + 189MB SDK&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.90s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remove oh-my-zsh&lt;/td&gt;
&lt;td&gt;framework + 14MB install&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.40s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remove Forge's shell plugin&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;eval&lt;/code&gt; block + 33MB binary&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.30s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remove completion entirely (&lt;code&gt;compinit&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;I never tab-complete&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.27s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A few details that turn this from "delete stuff" into something you can reason about:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Interactive vs. non-interactive shells matter.&lt;/strong&gt; History expansion (the &lt;code&gt;!&lt;/code&gt; that breaks commit messages), completion, and prompt theming are &lt;em&gt;interactive&lt;/em&gt; features — they only load in interactive shells. The fact that they bit an agent at all confirmed it was launching interactive shells that source &lt;code&gt;~/.zshrc&lt;/code&gt;, which is exactly why trimming the file paid off per-command.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Removing a tool means removing what depended on it.&lt;/strong&gt; When I deleted oh-my-zsh, Forge's plugin started throwing &lt;code&gt;command not found: compdef&lt;/code&gt; — oh-my-zsh had been initializing the completion system, and Forge registered its completions through it. You can't just yank the framework; you either restore the one primitive the dependents need (a lightweight &lt;code&gt;compinit&lt;/code&gt;) or remove the dependents too. (I eventually removed both Forge and completion entirely.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cache logic is easy to get backwards.&lt;/strong&gt; The standard "rebuild the completion cache once a day" idiom uses a glob modifier to check the dump's age. I initially inverted it and silently ran the &lt;em&gt;slow&lt;/em&gt; audited rebuild on every launch instead of the fast cached path. It looked fine and cost 300ms every time. Profile &lt;em&gt;after&lt;/em&gt; you change, not just before.&lt;/p&gt;




&lt;h2&gt;
  
  
  The principle: optimize for your actual primary user
&lt;/h2&gt;

&lt;p&gt;The mental shift is this: &lt;strong&gt;your shell's primary user is now a program, and programs have different needs than people.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A human at a terminal values:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Autosuggestions and tab-completion (you're typing, you want help finishing)&lt;/li&gt;
&lt;li&gt;A rich, git-aware, colored prompt (you're reading it constantly)&lt;/li&gt;
&lt;li&gt;Syntax highlighting (you're scanning your own input)&lt;/li&gt;
&lt;li&gt;History shortcuts like &lt;code&gt;!!&lt;/code&gt; and &lt;code&gt;!$&lt;/code&gt; (you're recalling past commands)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An AI agent values:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fast process startup&lt;/strong&gt; (it spawns shells constantly)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non-interactive, non-blocking behavior&lt;/strong&gt; (it can't press &lt;code&gt;q&lt;/code&gt; to exit a pager or type into an editor)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Predictable, literal command parsing&lt;/strong&gt; (history expansion silently breaking a &lt;code&gt;!&lt;/code&gt; in a commit message just wastes a turn)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Almost none of the human-comfort features help the agent, and several actively hurt it. The pager that opens &lt;code&gt;less&lt;/code&gt; on &lt;code&gt;git log&lt;/code&gt; and waits for a keypress will &lt;strong&gt;hang the agent indefinitely&lt;/strong&gt; — it sees no output and stalls until timeout. The editor that pops open for a commit without &lt;code&gt;-m&lt;/code&gt; waits forever for a human who isn't coming. A glob that errors on no-match aborts a command the agent expected to succeed.&lt;/p&gt;

&lt;p&gt;So the same audit that speeds things up also makes the environment &lt;em&gt;correct&lt;/em&gt; for an agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Non-blocking defaults — nothing waits for a human&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PAGER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;cat
export &lt;/span&gt;&lt;span class="nv"&gt;GIT_PAGER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;cat
export &lt;/span&gt;&lt;span class="nv"&gt;EDITOR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;          &lt;span class="c"&gt;# a no-op "editor" that exits instantly&lt;/span&gt;
git config &lt;span class="nt"&gt;--global&lt;/span&gt; core.pager &lt;span class="nb"&gt;cat
&lt;/span&gt;git config &lt;span class="nt"&gt;--global&lt;/span&gt; core.editor &lt;span class="nb"&gt;true&lt;/span&gt;

&lt;span class="c"&gt;# Parse command strings literally&lt;/span&gt;
setopt no_bang_hist         &lt;span class="c"&gt;# '!' stops triggering history expansion&lt;/span&gt;
unsetopt nomatch            &lt;span class="c"&gt;# don't error when a glob matches nothing&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These aren't speed optimizations exactly — they're "stop the agent from getting stuck and burning tokens recovering from a hang" fixes. In an agentic workflow a blocked command is worse than a slow one: it doesn't just cost time, it costs context and tokens while the model tries to figure out why it got no response, often flailing into other tools.&lt;/p&gt;




&lt;h2&gt;
  
  
  It's not just the shell — it's the whole IDE surface
&lt;/h2&gt;

&lt;p&gt;The same logic extends past &lt;code&gt;~/.zshrc&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The file watcher.&lt;/strong&gt; My editor was watching ~163,000 files, 64,000 of them in &lt;code&gt;node_modules&lt;/code&gt;. A file watcher exists so the UI updates when files change — but an agent is &lt;em&gt;constantly&lt;/em&gt; changing files and doesn't need a watcher to know it did; it made the change. Excluding &lt;code&gt;node_modules&lt;/code&gt;, build output, and caches removes a continuous CPU/memory drain and, crucially, removes event-queue lag that can delay saves.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json-doc"&gt;&lt;code&gt;&lt;span class="c1"&gt;// settings.json&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"files.watcherExclude"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"**/node_modules/**"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"**/.svelte-kit/**"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"**/build/**"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"**/dist/**"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"**/.git/objects/**"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The dirty-buffer conflict.&lt;/strong&gt; Without auto-save, a human's unsaved editor buffer and the on-disk file diverge the moment an agent edits that file directly. You get a "save / revert / overwrite" conflict prompt — and the agent, seeing stale or conflicting content, second-guesses itself and wastes turns. Aggressive auto-save keeps disk and buffer in sync so the conflict never arises:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json-doc"&gt;&lt;code&gt;&lt;span class="nl"&gt;"files.autoSave"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"afterDelay"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"files.autoSaveDelay"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a pure agent-era failure mode. It literally cannot happen in a human-only workflow where the same person owns both the edit and the save.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to audit your own setup
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Profile, don't guess.&lt;/strong&gt; &lt;code&gt;zmodload zsh/zprof&lt;/code&gt; at the top, &lt;code&gt;zprof&lt;/code&gt; at the bottom, open a shell, read the table. The offenders are almost never what you'd assume.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time it honestly:&lt;/strong&gt; &lt;code&gt;for i in 1 2 3; do /usr/bin/time -p zsh -i -c exit; done&lt;/code&gt;. A lean config is well under 0.3s; over a second means you have a problem.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Delete by impact.&lt;/strong&gt; Start with the biggest line in the profile. Re-measure after each removal so you know what actually moved the needle.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Remove the dependents, not just the framework.&lt;/strong&gt; Anything hooked into what you deleted will break loudly or silently. Decide whether you need the primitive or the dependent at all.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make it non-blocking.&lt;/strong&gt; Pager → &lt;code&gt;cat&lt;/code&gt;, editor → no-op, disable history expansion, don't error on unmatched globs. Correctness for a non-interactive caller, not just speed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extend to the IDE.&lt;/strong&gt; Exclude heavy directories from the watcher and search; turn on auto-save so agent edits never collide with stale buffers.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;oh-my-zsh, fancy prompts, and completion frameworks weren't bad tools. They were &lt;em&gt;right for their era&lt;/em&gt; — when a human sat in front of one long-lived terminal and the startup cost amortized to nothing. That era is ending. When an AI agent spawns hundreds of short-lived shells across parallel threads, the math inverts completely: a one-time convenience becomes a per-command tax multiplied by your concurrency.&lt;/p&gt;

&lt;p&gt;The fix isn't anti-tooling. It's recognizing who your tools are actually serving now. My terminal got 7.5x faster not because of some clever trick, but because I stopped optimizing it for a user who's barely there anymore and started optimizing it for the one doing most of the work.&lt;/p&gt;

&lt;p&gt;If a program is typing your commands, build the environment that program needs — fast, non-blocking, and literal — and keep the human-comfort layer for the rare moments you're actually in the driver's seat.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you stripped your shell for an agent workflow? What was your biggest offender in &lt;code&gt;zprof&lt;/code&gt;? Drop your before/after numbers in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>terminal</category>
      <category>vibecoding</category>
    </item>
    <item>
      <title>From Node.js to Go: Supercharging S3 Downloads of Thousands of Files as a Single Zip</title>
      <dc:creator>hitesh</dc:creator>
      <pubDate>Wed, 21 Aug 2024 03:09:09 +0000</pubDate>
      <link>https://dev.to/hiteshsisara/from-nodejs-to-go-supercharging-s3-downloads-of-thousands-of-files-as-a-single-zip-474b</link>
      <guid>https://dev.to/hiteshsisara/from-nodejs-to-go-supercharging-s3-downloads-of-thousands-of-files-as-a-single-zip-474b</guid>
      <description>&lt;p&gt;As developers, we often face challenges when dealing with large-scale data processing and delivery. At Kamero, we recently tackled a significant bottleneck in our file delivery pipeline. Our application allows users to download thousands of files associated with a particular event as a single zip file. This feature, powered by a Node.js-based Lambda function responsible for fetching and zipping files from S3 buckets, was struggling with memory constraints and long execution times as our user base grew.&lt;/p&gt;

&lt;p&gt;This post details our journey from a resource-hungry Node.js implementation to a lean and lightning-fast Go solution that efficiently handles massive S3 downloads. We'll explore how we optimized our system to provide users with a seamless experience when requesting large numbers of files from specific events, all packaged into a convenient single zip download.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Challenge
&lt;/h2&gt;

&lt;p&gt;Our original Lambda function faced several critical issues when processing large event-based file sets:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Memory Consumption&lt;/strong&gt;: Even with 10GB of allocated memory, the function would fail when processing 20,000+ files for larger events.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution Time&lt;/strong&gt;: Zip operations for events with numerous files were taking too long, sometimes timing out before completion.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability&lt;/strong&gt;: The function couldn't handle the increasing load efficiently, limiting our ability to serve users with large file sets from popular events.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User Experience&lt;/strong&gt;: Slow download preparation times were impacting user satisfaction, especially for events with substantial file counts.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Node.js Implementation: A Quick Look
&lt;/h2&gt;

&lt;p&gt;Our original implementation used the &lt;code&gt;s3-zip&lt;/code&gt; library to create zip files from S3 objects. Here's a simplified snippet of how we were processing files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;s3Zip&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;s3-zip&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// ... other code ...&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;s3Zip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;archive&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;bucketName&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="nx"&gt;eventId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;files&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;entryData&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;uploadZipFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;Upload_Bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;zipfileKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While this approach worked, it loaded all files into memory before creating the zip, leading to high memory usage and potential out-of-memory errors for large file sets.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enter Go: A Game-Changing Rewrite
&lt;/h2&gt;

&lt;p&gt;We decided to rewrite our Lambda function in Go, leveraging its efficiency and built-in concurrency features. The results were astounding:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Memory Usage&lt;/strong&gt;: Dropped from 10GB to a mere 100MB for the same workload.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Speed&lt;/strong&gt;: The function became approximately 10 times faster.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reliability&lt;/strong&gt;: Successfully processes 20,000+ files without issues.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Key Optimizations in the Go Implementation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Efficient S3 Operations
&lt;/h3&gt;

&lt;p&gt;We used the AWS SDK for Go v2, which offers better performance and lower memory usage compared to v1:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LoadDefaultConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TODO&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;s3Client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewFromConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Concurrent Processing
&lt;/h3&gt;

&lt;p&gt;Go's goroutines allowed us to process multiple files concurrently:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt; &lt;span class="n"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WaitGroup&lt;/span&gt;
&lt;span class="n"&gt;sem&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="nb"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;chan&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// Limit concurrent operations&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;photo&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;photos&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;photo&lt;/span&gt; &lt;span class="n"&gt;Photo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;sem&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;{}{}&lt;/span&gt; &lt;span class="c"&gt;// Acquire semaphore&lt;/span&gt;
        &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="n"&gt;sem&lt;/span&gt; &lt;span class="p"&gt;}()&lt;/span&gt; &lt;span class="c"&gt;// Release semaphore&lt;/span&gt;

        &lt;span class="c"&gt;// Process photo&lt;/span&gt;
    &lt;span class="p"&gt;}(&lt;/span&gt;&lt;span class="n"&gt;photo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;wg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Wait&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach allows us to process multiple files simultaneously while controlling the level of concurrency to prevent overwhelming the system.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Streaming Zip Creation
&lt;/h3&gt;

&lt;p&gt;Instead of loading all files into memory, we stream the zip content directly to S3:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;pipeReader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pipeWriter&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Pipe&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;zipWriter&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;zip&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewWriter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pipeWriter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c"&gt;// Add files to zip&lt;/span&gt;
    &lt;span class="n"&gt;zipWriter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;pipeWriter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}()&lt;/span&gt;

&lt;span class="c"&gt;// Upload streaming content to S3&lt;/span&gt;
&lt;span class="n"&gt;uploader&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Upload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PutObjectInput&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Bucket&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;destBucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;zipFileKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;   &lt;span class="n"&gt;pipeReader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This streaming approach significantly reduces memory usage and allows us to handle much larger file sets.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Results
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Before: Node.js Implementation
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk6wl1h2xiwlvfcfs7n4d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk6wl1h2xiwlvfcfs7n4d.png" alt="CloudWatch Logs for Node.js Lambda Function" width="800" height="193"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  After: Go Implementation
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffbyp2iza3ejkos6gjiq4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffbyp2iza3ejkos6gjiq4.png" alt="CloudWatch Logs for Go Lambda Function" width="800" height="196"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The rewrite to Go delivered impressive improvements:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Memory Usage&lt;/strong&gt;: Reduced by 99% (from 10GB to around 100MB)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Processing Speed&lt;/strong&gt;: Increased by approximately 1000%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reliability&lt;/strong&gt;: Successfully handles 20,000+ files without issues&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Efficiency&lt;/strong&gt;: Lower memory usage and faster execution time result in reduced AWS Lambda costs&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Language Choice Matters&lt;/strong&gt;: Go's efficiency and concurrency model made a massive difference in our use case.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Understand Your Bottlenecks&lt;/strong&gt;: Profiling our Node.js function helped us identify key areas for improvement.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Leverage Cloud-Native Solutions&lt;/strong&gt;: Using AWS SDK for Go v2 and understanding S3's capabilities allowed for better integration and performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Think in Streams&lt;/strong&gt;: Processing data as streams rather than loading everything into memory is crucial for large-scale operations.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Rewriting our Lambda function in Go not only solved our immediate scaling issues but also provided a more robust and efficient solution for our file processing needs. While Node.js served us well initially, this experience highlighted the importance of choosing the right tool for the job, especially when dealing with resource-intensive tasks at scale.&lt;/p&gt;

&lt;p&gt;Remember, the best language or framework depends on your specific use case. In our scenario, Go's performance characteristics aligned perfectly with our needs, resulting in a significantly improved user experience and reduced operational costs.&lt;/p&gt;

&lt;p&gt;Have you faced similar challenges with serverless functions? How did you overcome them? We'd love to hear about your experiences in the comments below!&lt;/p&gt;

</description>
      <category>lambda</category>
      <category>go</category>
      <category>s3</category>
      <category>node</category>
    </item>
  </channel>
</rss>
