<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Pavel Zeman</title>
    <description>The latest articles on DEV Community by Pavel Zeman (@pavel-zeman).</description>
    <link>https://dev.to/pavel-zeman</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3068452%2F89baef6c-256d-4eb1-8ab9-7b1831fcb9ad.png</url>
      <title>DEV Community: Pavel Zeman</title>
      <link>https://dev.to/pavel-zeman</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pavel-zeman"/>
    <language>en</language>
    <item>
      <title>Analyzing Storage Consumption in Sonatype Nexus npm Repositories</title>
      <dc:creator>Pavel Zeman</dc:creator>
      <pubDate>Tue, 20 May 2025 10:34:52 +0000</pubDate>
      <link>https://dev.to/pavel-zeman/analyzing-storage-consumption-in-sonatype-nexus-npm-repositories-10i3</link>
      <guid>https://dev.to/pavel-zeman/analyzing-storage-consumption-in-sonatype-nexus-npm-repositories-10i3</guid>
      <description>&lt;p&gt;&lt;em&gt;I've been using &lt;a href="https://www.sonatype.com/products/nexus-community-edition-download" rel="noopener noreferrer"&gt;Sonatype Nexus Repository Community Edition&lt;/a&gt; (or just Nexus) for some time as a npm repository. While using it, I've observed, that it requires quite a lot of storage. So I decided, to analyze the storage usage and I present my results in this article.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuration
&lt;/h2&gt;

&lt;p&gt;For this article, I've downloaded the latest version of Nexus and analyzed its storage requirements with the following configuration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Version: 3.79.1-04&lt;/li&gt;
&lt;li&gt;Database: Embedded H2&lt;/li&gt;
&lt;li&gt;Blob store type: File&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  npm repository types
&lt;/h2&gt;

&lt;p&gt;For npm, Nexus provides the following &lt;a href="https://help.sonatype.com/en/npm-registry.html" rel="noopener noreferrer"&gt;repository types&lt;/a&gt;, each serving a distinct purpose and having specific properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hosted&lt;/li&gt;
&lt;li&gt;Proxy&lt;/li&gt;
&lt;li&gt;Group&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdrive.usercontent.google.com%2Fdownload%3Fid%3D1jiQcvP__yFdEkGHCzpY9ZBhsp8inReiK" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdrive.usercontent.google.com%2Fdownload%3Fid%3D1jiQcvP__yFdEkGHCzpY9ZBhsp8inReiK" alt="Nexus architecture" width="1582" height="784"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Hosted repository
&lt;/h3&gt;

&lt;p&gt;Hosted repository is designed to store npm packages that are published internally by an organization. This type of repository provides a private space for proprietary or custom packages, allowing users to upload, manage, and share their own npm packages securely within the organization.&lt;/p&gt;

&lt;p&gt;Hosted repositories are writable, meaning users with appropriate permissions can publish new packages or update existing ones directly to Nexus. These repositories do not automatically fetch or cache external npm packages; their content is limited to what has been explicitly published to them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Proxy repository
&lt;/h3&gt;

&lt;p&gt;Proxy repository, on the other hand, acts as an intermediary between Nexus and a remote registry such as &lt;a href="https://registry.npmjs.org" rel="noopener noreferrer"&gt;https://registry.npmjs.org&lt;/a&gt;. When a package is requested from a proxy repository, Nexus fetches it from the remote registry if it is not already cached locally. Once retrieved, the package is stored in the proxy repository, making subsequent requests for the same package faster and reducing external bandwidth usage.&lt;/p&gt;

&lt;p&gt;Proxy repositories are read-only from the perspective of users, i.e. they cannot be directly published to. Instead, they serve as a transparent cache, improving reliability and performance for external dependencies, and ensuring that previously downloaded packages remain available even if the remote registry is temporarily unreachable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Group repository
&lt;/h3&gt;

&lt;p&gt;Group repository aggregates multiple repositories of any type. Typically, it combines one or more hosted and proxy repositories into a single unified endpoint. The group repository presents a consolidated view to npm clients so that users only need to configure a single registry URL in their npm settings. When a package is requested from a group repository, Nexus searches its member repositories in a defined order and serves the first match it finds. This configuration simplifies development and CI/CD workflows by allowing both internal and external npm packages to be accessed seamlessly through one URL.&lt;/p&gt;

&lt;p&gt;In Nexus Community Edition, group repositories are read-only, i.e. they cannot be directly published to. All the packages must be published to a hosted repository, which can then be included in a group repository.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage principles
&lt;/h2&gt;

&lt;p&gt;Nexus stores all its data in the following storages - relational database and blob store.&lt;/p&gt;

&lt;p&gt;Relational database (H2 for this article) is used to store metadata about so-called assets. In Nexus terminology, asset in a npm repository is either a package root or a tarball. Package root is a single JSON containing information about the package itself, i.e. its name, versions, license, etc. (see for example package root of the &lt;a href="https://registry.npmjs.org/react" rel="noopener noreferrer"&gt;react&lt;/a&gt; package). For each version, the package root also contains a link to its tarball, i.e. a compressed file containing the code of that specific package version. As an example, you can get tarball for &lt;a href="https://registry.npmjs.org/react/-/react-19.0.0.tgz" rel="noopener noreferrer"&gt;react version 19.0.0&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Blob store contains actual asset data, i.e. a single JSON file for a package root and a compressed &lt;code&gt;tgz&lt;/code&gt; file for a tarball. The package root JSON file is not compressed and can be quite large, because it contains metadata for all package versions. For example, current package root of the &lt;a href="https://registry.npmjs.org/react" rel="noopener noreferrer"&gt;react&lt;/a&gt; package has about 5.7 MB. The blob store content can be stored either directly in a filesystem or in a cloud storage (Amazon S3 or Google Cloud Storage). In this article, we will focus on the filesystem storage.&lt;/p&gt;

&lt;p&gt;As the relational database stores only asset metadata, its size is expected to be much smaller than the size of the blob store. As a result, when calculating the Nexus storage requirements, it should be sufficient to consider only the blob store. For example, my proxy repository with about 2.5 thousand assets requires 4 MB of storage in the relational database and more than 400 MB of storage in the blob store, i.e. the blob store is more than 100 times larger than the relational database.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage analysis
&lt;/h2&gt;

&lt;p&gt;The amount of data stored by Nexus differs between repository types.&lt;/p&gt;

&lt;p&gt;For hosted repositories, the following rules apply:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Nexus creates package root, when a first version of a package is published.&lt;/li&gt;
&lt;li&gt;Nexus updates the package root every time a version of the package is published or removed.&lt;/li&gt;
&lt;li&gt;Nexus creates/removes tarball, when a package version is published/removed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Total size of the assets for a single package can then be calculated as a sum of the size of the package root and sum of all tarballs for all its versions. This size can be reduced by removing specific versions or the package itself. Another option is to define a &lt;a href="https://help.sonatype.com/repomanager3/cleanup-policies" rel="noopener noreferrer"&gt;cleanup policy&lt;/a&gt;, which automatically removes package versions (i.e. tarballs) based on defined criteria. However, I've found no way, how to automatically remove packages themselves.&lt;/p&gt;

&lt;p&gt;For proxy repositories, the following rules apply:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Nexus creates package root, when package metadata or a package version is requested.&lt;/li&gt;
&lt;li&gt;Nexus creates tarball, when a package version is requested.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Similarly to hosted repositories, cleanup policies can be defined to automatically remove package versions.&lt;/p&gt;

&lt;p&gt;Group repositories are virtual by nature, they just group packages stored somewhere else. However, even group repositories require storage. Nexus does not directly reuse package roots from other repositories included in the group, but creates a local copy of the package root instead. On the other hand, the tarballs are reused, i.e. they are not stored in the group repository. Similarly to proxy repositories, package root is created, when package metadata or a package version is requested for the first time. Unfortunately, I've found no way, how to remove obsolete package roots other than completely recreating the group repository. I've also found no way, how to check, which package roots are physically stored in the group repository other than querying the metadata in the relational database.&lt;/p&gt;

&lt;p&gt;A summary of the storage analysis is provided in the following table.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Repository type&lt;/th&gt;
&lt;th&gt;Package root&lt;/th&gt;
&lt;th&gt;Tarball&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hosted&lt;/td&gt;
&lt;td&gt;Created when first package version is published&lt;/td&gt;
&lt;td&gt;Created/removed when package version is published/removed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Proxy&lt;/td&gt;
&lt;td&gt;Created when package metadata or first package version is requested&lt;/td&gt;
&lt;td&gt;Created when package version is requested&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Group&lt;/td&gt;
&lt;td&gt;Created when package metadata or first package version is requested&lt;/td&gt;
&lt;td&gt;Reused from other repositories included in the group, i.e. no extra storage required&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;To analyze storage of a specific repository, we can use Nexus GUI, which provides limited information, but we can also query the relational database to get all the details. For example, the following SQL query provides a summary of all assets and their sizes grouped by repository and asset kind (package root or tarball).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ab&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;blob_size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;bytes&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
    &lt;span class="n"&gt;npm_asset&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt;
    &lt;span class="n"&gt;npm_asset_blob&lt;/span&gt; &lt;span class="n"&gt;ab&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;asset_blob_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ab&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;asset_blob_id&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt;
    &lt;span class="n"&gt;npm_content_repository&lt;/span&gt; &lt;span class="n"&gt;cr&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;cr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;repository_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;repository_id&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt;
    &lt;span class="n"&gt;repository&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;config_repository_id&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kind&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kind&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Nexus supports 3 types of npm repositories (hosted, proxy and group), each serving a distinct purpose and behaving differently in terms of storage.&lt;/li&gt;
&lt;li&gt;Group repositories are virtual by nature, but they still require storage for package roots.&lt;/li&gt;
&lt;li&gt;Blob store size is typically the dominant factor in overall storage requirements.&lt;/li&gt;
&lt;li&gt;Cleanup policies can be used to automatically remove package versions, but not packages themselves.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>nexus</category>
      <category>sonatype</category>
      <category>npm</category>
      <category>javascript</category>
    </item>
    <item>
      <title>The Pitfalls of Streamed ZIP Decompression: An In-Depth Analysis</title>
      <dc:creator>Pavel Zeman</dc:creator>
      <pubDate>Mon, 12 May 2025 22:07:06 +0000</pubDate>
      <link>https://dev.to/pavel-zeman/the-pitfalls-of-streamed-zip-decompression-an-in-depth-analysis-3l99</link>
      <guid>https://dev.to/pavel-zeman/the-pitfalls-of-streamed-zip-decompression-an-in-depth-analysis-3l99</guid>
      <description>&lt;p&gt;&lt;em&gt;&lt;a href="https://en.wikipedia.org/wiki/ZIP_(file_format)" rel="noopener noreferrer"&gt;Wikipedia&lt;/a&gt; says it clearly: "Tools that correctly read ZIP archives ... must not scan for entries from the top of the ZIP file". In other words, streamed decompression of ZIP archives is not possible. Still, there are some libraries (e.g. &lt;a href="https://www.npmjs.com/package/unzipper" rel="noopener noreferrer"&gt;unzipper&lt;/a&gt;), which support it. Is it safe to use them? Does it work? Let's try to analyze it.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Motivation
&lt;/h2&gt;

&lt;p&gt;As a motivation and basis for further discussion, let's create a simple ZIP archive using the following simple script (you can clone it from my &lt;a href="https://github.com/pavel-zeman/unzip-stream" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;JSZip&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;jszip&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;fs&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;zipItem&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;JSZip&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;j&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromCharCode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;j&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;zipItem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`dummy-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.txt`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;zipItem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generateAsync&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;nodebuffer&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;streamFiles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;zip&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;JSZip&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`invalid-item-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.zip`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nx"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generateNodeStream&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;nodebuffer&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;streamFiles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createWriteStream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;invalid.zip&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The script creates a ZIP archive &lt;code&gt;invalid.zip&lt;/code&gt;, which contains nested ZIP archives in the following structure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;invalid-item-0.zip&lt;/code&gt; - 664 bytes

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;dummy-0.txt&lt;/code&gt; - 100 bytes&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dummy-1.txt&lt;/code&gt; - 100 bytes&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dummy-2.txt&lt;/code&gt; - 100 bytes&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;
&lt;code&gt;invalid-item-1.zip&lt;/code&gt; - 664 bytes

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;dummy-0.txt&lt;/code&gt; - 100 bytes&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dummy-1.txt&lt;/code&gt; - 100 bytes&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dummy-2.txt&lt;/code&gt; - 100 bytes&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;We can verify, that the ZIP archive is valid using &lt;code&gt;unzip&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;user@localhost:~&lt;span class="nv"&gt;$ &lt;/span&gt;unzip &lt;span class="nt"&gt;-tl&lt;/span&gt; invalid.zip
Archive:  invalid.zip
    testing: invalid-item-0.zip       OK
    testing: invalid-item-1.zip       OK
No errors detected &lt;span class="k"&gt;in &lt;/span&gt;compressed data of invalid.zip.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now let's try to decompress the ZIP archive as a stream using the &lt;a href="https://www.npmjs.com/package/unzipper" rel="noopener noreferrer"&gt;unzipper&lt;/a&gt; library and list all files inside it together with their sizes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;fs&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;unzipper&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;unzipper&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createReadStream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;invalid.zip&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;unzipper&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Parse&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;entry&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;data&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;size&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;end&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`File: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;, size: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;size&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; bytes`&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;finish&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Finished processing all entries&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;error&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Error during processing:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output is as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;File: invalid-item-0.zip, size: 141 bytes
File: dummy-1.txt, size: 100 bytes
File: dummy-2.txt, size: 100 bytes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We expect the output to contain just 2 files - &lt;code&gt;invalid-item-0.zip&lt;/code&gt; and &lt;code&gt;invalid-item-1.zip&lt;/code&gt;. The size of each of them should be 664 bytes. Instead, we get 3 files. The first one has invalid size and the others are read from inside of &lt;code&gt;invalid-item-0.zip&lt;/code&gt;, which is completely nonsense. Additionally, you can see, that the &lt;code&gt;finish&lt;/code&gt; event is never processed and its log record is missing.&lt;/p&gt;

&lt;p&gt;If you try to modify the archive content, you may get other results including various errors. And if you search for &lt;a href="https://github.com/ZJONSSON/node-unzipper/issues" rel="noopener noreferrer"&gt;issues&lt;/a&gt; of the &lt;a href="https://github.com/ZJONSSON/node-unzipper" rel="noopener noreferrer"&gt;unzipper&lt;/a&gt; library, you can find about 10 of them mentioning similar problems.&lt;/p&gt;

&lt;p&gt;All of these have the same root cause - zipped data cannot be reliably decompressed as a stream. The only way to reliably decompress zipped data is to save it to a file and then decompress the file.&lt;/p&gt;

&lt;p&gt;We can use the same library and decompress the same ZIP archive as a file as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;unzipper&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;unzipper&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;directory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;unzipper&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Open&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;invalid.zip&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;file&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;directory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;files&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;data&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;size&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;finish&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`File: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;, size: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;size&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; bytes`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Finished processing all entries&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;})();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we get the expected output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;File: invalid-item-0.zip, size: 664 bytes
File: invalid-item-1.zip, size: 664 bytes
Finished processing all entries
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  ZIP archive structure
&lt;/h2&gt;

&lt;p&gt;In order to understand the problem, let's analyze the ZIP archive structure. The complete description is available in &lt;a href="https://en.wikipedia.org/wiki/ZIP_(file_format)" rel="noopener noreferrer"&gt;Wikipedia&lt;/a&gt; and all the details are available in &lt;a href="https://pkwaredownloads.blob.core.windows.net/pkware-general/Documentation/APPNOTE-6.3.9.TXT" rel="noopener noreferrer"&gt;PKWARE Inc. ZIP File Format Specification&lt;/a&gt;. This text contains only a brief summary needed to understand the presented problem.&lt;/p&gt;

&lt;p&gt;The ZIP archive (usually) starts with a series of entries, each of them representing a single stored file. Each entry consists of the following items:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Local file header&lt;/em&gt; - Contains signature (4-byte constant), file name, compressed and uncompressed size, and other metadata. Compressed and uncompressed sizes are optional and can be set to 0, when they are not known during compression. This is used, when the compression is streamed and the compressed and uncompressed sizes are not known in advance.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Actual compressed data&lt;/em&gt; - Byte stream containing the compressed data.&lt;/li&gt;
&lt;li&gt;Optional &lt;em&gt;Data descriptor&lt;/em&gt; - Contains signature (4-byte constant), compressed and uncompressed size and other metadata. It is present only when the compression is streamed and in this case, the sizes are not optional, because even when streaming, they are already known.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At the end of the archive, there is a central directory. It consists of the following items: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Central directory file header&lt;/em&gt; - An extension of the &lt;em&gt;Local file header&lt;/em&gt; for each stored file. It always contains real compressed and uncompressed size, and it also contains offset of the &lt;em&gt;Local file header&lt;/em&gt; for the file inside the archive.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;End of central directory record&lt;/em&gt; - Data structure, which must be always present at the very end of the archive. Among others, it contains offset of the start of the central directory.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;em&gt;End of central directory record&lt;/em&gt; is the only data structure, that has a guaranteed position in the archive - it must be at the very end. Positions of all other data structures are not guaranteed, although the specification mentions, that they &lt;strong&gt;should&lt;/strong&gt; be in the order mentioned in this description with no gaps between them.&lt;/p&gt;

&lt;p&gt;The whole structure is summarized in the following diagram. Again, please note, that the archives &lt;strong&gt;should&lt;/strong&gt; be created with data structures in the order shown in the diagram, but it is not guaranteed. For example, an archive with gaps between entries is perfectly valid and must be correctly decompressed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdrive.usercontent.google.com%2Fdownload%3Fid%3D1ehtGiMpHHZvo7umy9eJKOFf1_KeWfrJ5" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdrive.usercontent.google.com%2Fdownload%3Fid%3D1ehtGiMpHHZvo7umy9eJKOFf1_KeWfrJ5" alt="ZIP file structure" width="1088" height="1090"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  ZIP archive decompression from a file
&lt;/h2&gt;

&lt;p&gt;To decompress a ZIP archive, we just need to follow the arrows in the previous diagram. You may notice, that all the arrows go bottom-up, which means, that the archive needs to be read from the end to the beginning as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We read the &lt;em&gt;End of central directory record&lt;/em&gt;. It is always located at the very end of the archive. Among others, it contains the offset of the start of the central directory.&lt;/li&gt;
&lt;li&gt;We scan the central directory to get list of all files from &lt;em&gt;Central directory file headers&lt;/em&gt;. We get file names, offsets in the archive, and compressed and uncompressed sizes.&lt;/li&gt;
&lt;li&gt;For each file, we get its compressed data and decompress it using the ZIP decompression algorithm (details about the compression method are stored in the &lt;em&gt;Central directory file header&lt;/em&gt; as well).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To summarize, the only reliable method to read a ZIP archive is to read it from the end to the beginning. This makes it impossible to decompress the zipped content as a stream, because it can be only read from the beginning to the end.&lt;/p&gt;

&lt;h2&gt;
  
  
  ZIP archive decompression from a stream
&lt;/h2&gt;

&lt;p&gt;But wait. The &lt;a href="https://www.npmjs.com/package/unzipper" rel="noopener noreferrer"&gt;unzipper&lt;/a&gt; library is actually able to decompress a ZIP archive from a stream. How is it implemented?&lt;/p&gt;

&lt;p&gt;The library is based on a simple assumption: The ZIP entries start at the beginning of the archive and follow one by one with no gaps between them. This is quite safe assumption, since the ZIP file format specification states, that all tools &lt;strong&gt;should&lt;/strong&gt; create ZIP archives exactly this way.&lt;/p&gt;

&lt;p&gt;With this assumption in mind, we can design an alternative algorithm to decompress a ZIP archive from a stream:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We read the &lt;em&gt;Local file header&lt;/em&gt; of a ZIP entry (the first one is at the beginning of the stream, the other ones follow without gaps). It contains the file name as well as its compressed and uncompressed sizes.&lt;/li&gt;
&lt;li&gt;Based on the compressed size from the previous step, we read the compressed data and decompress it using the ZIP decompression algorithm.&lt;/li&gt;
&lt;li&gt;If there is a &lt;em&gt;Data descriptor&lt;/em&gt; present, we can either skip it or use it to verify the uncompressed data checksum (CRC-32).&lt;/li&gt;
&lt;li&gt;We continue from the first step until we reach the start of central directory. This can be easily recognized using the signature of the &lt;em&gt;Central directory file header&lt;/em&gt; or &lt;em&gt;End of central directory record&lt;/em&gt;, if there is no file.&lt;/li&gt;
&lt;li&gt;We skip all central directory entries as well as the &lt;em&gt;End of central directory record&lt;/em&gt;. These are not needed anymore.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This algorithm works, but there is a catch. The sizes in the &lt;em&gt;Local file header&lt;/em&gt; are optional, and they are not set, when the ZIP archive is created as a stream. As a result, we do not know the size of the compressed data. How to solve this? Let's check the library source code. The relevant part is located in &lt;a href="https://github.com/ZJONSSON/node-unzipper/blob/master/lib/parse.js" rel="noopener noreferrer"&gt;parse.js&lt;/a&gt; at lines 181 through 187:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fileSizeKnown&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;vars&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;uncompressedSize&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;eof&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;vars&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;compressedSize&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;eof&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;alloc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;eof&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writeUInt32LE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mh"&gt;0x08074b50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can see, that if the compressed size is known, it is used. Otherwise, the library searches for the end of the compressed data based on a 4-byte signature (&lt;code&gt;0x08074b50&lt;/code&gt;). This is the signature of the &lt;em&gt;Data descriptor&lt;/em&gt;, which is located immediately after the compressed data. Searching for the compressed data using the 4-byte signature may work, but it fails, when the signature is present in the compressed data itself. As we can hardly assume anything about the compressed data, I consider this approach to be too risky.&lt;/p&gt;

&lt;p&gt;To summarize, the library makes the following assumptions in order to decompress a ZIP archive from a stream:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The ZIP entries start at the beginning of the archive and follow one by one with no gaps between them.&lt;/li&gt;
&lt;li&gt;One of the following is true:

&lt;ol&gt;
&lt;li&gt;The archive was not created as a stream (i.e. all the sizes in the &lt;em&gt;Local file header&lt;/em&gt; are known).&lt;/li&gt;
&lt;li&gt;The compressed data does not contain the signature of the &lt;em&gt;Data descriptor&lt;/em&gt;.&lt;/li&gt;
&lt;/ol&gt;


&lt;/li&gt;

&lt;/ol&gt;

&lt;p&gt;Based on these assumptions, the algorithm to decompress a ZIP archive from a stream can be refined as follows (this is the algorithm used by the library):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We read the &lt;em&gt;Local file header&lt;/em&gt; of a ZIP entry (the first one is at the beginning of the stream, the other ones follow without a gap). It contains the file name as well as its compressed and uncompressed sizes.&lt;/li&gt;
&lt;li&gt;If the compressed data size from the previous step is known, we read the compressed data of that size. If it is unknown, we read the compressed data until we find the &lt;em&gt;Data descriptor&lt;/em&gt; signature.&lt;/li&gt;
&lt;li&gt;We decompress the compressed data using the ZIP decompression algorithm.&lt;/li&gt;
&lt;li&gt;If there is a &lt;em&gt;Data descriptor&lt;/em&gt; present, we just discard it.&lt;/li&gt;
&lt;li&gt;We continue from the first step until we reach the start of central directory. This can be easily recognized using the signature of the &lt;em&gt;Central directory file header&lt;/em&gt; or &lt;em&gt;End of central directory record&lt;/em&gt;, if there is no file.&lt;/li&gt;
&lt;li&gt;We skip all central directory entries as well as the &lt;em&gt;End of central directory record&lt;/em&gt;. These are not needed anymore.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  invalid.zip archive analysis
&lt;/h2&gt;

&lt;p&gt;Now, it should be clear, why our &lt;code&gt;invalid.zip&lt;/code&gt; archive cannot be decompressed as a stream. The first assumption is satisfied, but the second one is not. The &lt;code&gt;invalid.zip&lt;/code&gt; archive is intentionally created as a stream (notice &lt;code&gt;streamFiles: true&lt;/code&gt; in the source code) and its compressed data contains the signature of the &lt;em&gt;Data descriptor&lt;/em&gt;. &lt;/p&gt;

&lt;p&gt;The last point does not have to be clear at first sight, so let's analyze it in more detail. First of all, notice that there is no compression level specified in the source code, when creating the &lt;code&gt;invalid.zip&lt;/code&gt; archive. As a result, compression level 0 (i.e. no compression) is used by default. This means, that the contents of the &lt;code&gt;invalid-item-0.zip&lt;/code&gt; and &lt;code&gt;invalid-item-1.zip&lt;/code&gt; files are simply copied to the &lt;code&gt;invalid.zip&lt;/code&gt; archive, which leads to the structure shown in the following diagram.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdrive.usercontent.google.com%2Fdownload%3Fid%3D1zz2-f1dQrDKrHbpMEFXlrimSJ7Yk2cSM" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdrive.usercontent.google.com%2Fdownload%3Fid%3D1zz2-f1dQrDKrHbpMEFXlrimSJ7Yk2cSM" alt="invalid.zip structure" width="1634" height="882"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This structure is then processed as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We read &lt;em&gt;Local file header&lt;/em&gt; of the first file in the archive, i.e. &lt;code&gt;invalid-item-0.zip&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The &lt;em&gt;Local file header&lt;/em&gt; contains unknown data size, so we search for the signature of the &lt;em&gt;Data descriptor&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;The first &lt;em&gt;Data descriptor&lt;/em&gt; is the first &lt;em&gt;Data descriptor&lt;/em&gt; of the first file inside &lt;code&gt;invalid-item-0.zip&lt;/code&gt;, i.e. file &lt;code&gt;dummy-0.txt&lt;/code&gt;. This is not the &lt;em&gt;Data descriptor&lt;/em&gt; that we want, but we don't know that, so we finish the processing of the first file.&lt;/li&gt;
&lt;li&gt;We read the &lt;em&gt;Data descriptor&lt;/em&gt; and discard it (the library does not use it in any way).&lt;/li&gt;
&lt;li&gt;We continue processing with the next &lt;em&gt;Local file header&lt;/em&gt;, which is the &lt;em&gt;Local file header&lt;/em&gt; of &lt;code&gt;dummy-1.txt&lt;/code&gt;. Please note, that now we process a file, that does not exist in &lt;code&gt;invalid.zip&lt;/code&gt; archive, it exists only inside &lt;code&gt;invalid-item-0.zip&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;We search for the following &lt;em&gt;Data descriptor&lt;/em&gt;, find it and finish processing of &lt;code&gt;dummy-1.txt&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;In the same way, we process &lt;code&gt;dummy-2.txt&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Now, we identify a &lt;em&gt;Central directory file header&lt;/em&gt; based on its signature. This means, that we are at the end of the archive.&lt;/li&gt;
&lt;li&gt;We drain all following &lt;em&gt;Central directory file headers&lt;/em&gt; until &lt;em&gt;End of central directory record&lt;/em&gt; is reached.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;End of central directory record&lt;/em&gt; denotes the very end of the archive, so we stop processing here without any error, but without processing &lt;code&gt;invalid-item-1.zip&lt;/code&gt; at all.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Trying to improve it
&lt;/h2&gt;

&lt;p&gt;Based on the previous text, it should be clear, that streamed decompression of zipped data is a bad idea and can never be reliable. The ZIP file format is simply not designed for it. Still, it is tempting to use it, when it works in many cases. So how to improve the &lt;a href="https://www.npmjs.com/package/unzipper" rel="noopener noreferrer"&gt;unzipper&lt;/a&gt; library so that it provides better results than presented in this text?&lt;/p&gt;

&lt;p&gt;I would suggest the following improvements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Documentation - Add a big red warning to the documentation stating, that streamed decompression of zipped data is not reliable and users of this feature do it at their own risk.&lt;/li&gt;
&lt;li&gt;Verify compressed data size based on the &lt;em&gt;Data descriptor&lt;/em&gt; - Currently, the library does not use the &lt;em&gt;Data descriptor&lt;/em&gt;, which contains useful information - among others there is the real compressed data size. We can leverage it and compare it with the amount of data already processed. And if there is a difference, we can either fail with a reasonable error message stating, that the archive cannot be decompressed as a stream, or we can continue processing the archive until we read all the compressed data.&lt;/li&gt;
&lt;li&gt;Verify CRC-32 of the uncompressed data based on the &lt;em&gt;Data descriptor&lt;/em&gt; - Same as the previous one, but in this case CRC-32 of the uncompressed data is verified.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Alternative libraries
&lt;/h2&gt;

&lt;p&gt;There are other libraries, which can be used to decompress ZIP archives. Their support of streamed decompression is as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.npmjs.com/package/adm-zip" rel="noopener noreferrer"&gt;adm-zip&lt;/a&gt; - Streamed decompression is not supported. The API only accepts a file name as its input.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.npmjs.com/package/decompress" rel="noopener noreferrer"&gt;decompress&lt;/a&gt; - Streamed decompression is not supported. The API only accepts a file name or a buffer as its input.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.npmjs.com/package/decompress-zip" rel="noopener noreferrer"&gt;decompress-zip&lt;/a&gt; - Streamed decompression is not supported. The API only accepts a file name as its input.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.npmjs.com/package/extract-zip" rel="noopener noreferrer"&gt;extract-zip&lt;/a&gt; - Based on &lt;a href="https://www.npmjs.com/package/yauzl" rel="noopener noreferrer"&gt;yauzl&lt;/a&gt;, so streamed decompression is not supported.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.npmjs.com/package/jszip" rel="noopener noreferrer"&gt;jszip&lt;/a&gt; - Streamed decompression is not supported. The API only accepts a file name or in-memory data as its input.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.npmjs.com/package/node-stream-zip" rel="noopener noreferrer"&gt;node-stream-zip&lt;/a&gt; - Streamed decompression is not supported. The API only accepts a file name as its input.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.npmjs.com/package/unzip-stream" rel="noopener noreferrer"&gt;unzip-stream&lt;/a&gt; - Based on &lt;a href="https://www.npmjs.com/package/unzipper" rel="noopener noreferrer"&gt;unzipper&lt;/a&gt;, so streamed decompression is supported. And it works better than &lt;a href="https://www.npmjs.com/package/unzipper" rel="noopener noreferrer"&gt;unzipper&lt;/a&gt;, because it implements my second improvement mentioned above. Thanks to it, it is even able to successfully decompress the &lt;code&gt;invalid.zip&lt;/code&gt; file as a stream. However, the documentation clearly states, that streamed decompression of a ZIP archive is not supported and the library may fail in some cases.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.npmjs.com/package/yauzl" rel="noopener noreferrer"&gt;yauzl&lt;/a&gt; - Streamed decompression is not supported. The documentation explicitly &lt;a href="https://www.npmjs.com/package/yauzl#no-streaming-unzip-api" rel="noopener noreferrer"&gt;states&lt;/a&gt;, that this is intentional, because streamed decompression of a ZIP archive is not possible.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.npmjs.com/package/zip-lib" rel="noopener noreferrer"&gt;zip-lib&lt;/a&gt; - Streamed decompression is not supported. The API only accepts a file name as its input.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The ZIP file format is not designed for streamed decompression, because the archive must be read from the end to the beginning.&lt;/li&gt;
&lt;li&gt;Avoid streamed decompression of zipped data as much as you can. Always prefer to store the stream to a file or a memory buffer and decompress it from there.&lt;/li&gt;
&lt;li&gt;Streamed decompression of zipped data is reliable, when you can make certain assumptions about the archive, that you decompress. But this is rarely the case.&lt;/li&gt;
&lt;li&gt;The &lt;a href="https://www.npmjs.com/package/unzipper" rel="noopener noreferrer"&gt;unzipper&lt;/a&gt; can be improved so that it at least fails with a reasonable error message, when the archive cannot be decompressed as a stream.&lt;/li&gt;
&lt;li&gt;When streamed decompression is required, consider using &lt;a href="https://www.npmjs.com/package/unzip-stream" rel="noopener noreferrer"&gt;unzip-stream&lt;/a&gt;, which provides better results than &lt;a href="https://www.npmjs.com/package/unzipper" rel="noopener noreferrer"&gt;unzipper&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>zip</category>
      <category>streaming</category>
      <category>node</category>
      <category>unzipper</category>
    </item>
    <item>
      <title>Demystifying npm package installation: Insights, analysis and optimization tips</title>
      <dc:creator>Pavel Zeman</dc:creator>
      <pubDate>Tue, 22 Apr 2025 21:24:33 +0000</pubDate>
      <link>https://dev.to/pavel-zeman/demystifying-npm-package-installation-insights-analysis-and-optimization-tips-4nmj</link>
      <guid>https://dev.to/pavel-zeman/demystifying-npm-package-installation-insights-analysis-and-optimization-tips-4nmj</guid>
      <description>&lt;p&gt;&lt;em&gt;When analyzing issues with our internal npm registry, I was quite surprised how the package installation in &lt;code&gt;npm&lt;/code&gt; works. This post summarizes the principles, analyzes the behavior using a sample &lt;code&gt;package.json&lt;/code&gt; and points out some gotchas that you may encounter.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Disclaimer: I'm aware, that there are other (and newer) package managers available, but &lt;code&gt;npm&lt;/code&gt; has still been widely used and at least for my work, I have no other choice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Package registry
&lt;/h2&gt;

&lt;p&gt;Before delving into the details of &lt;code&gt;npm install&lt;/code&gt;, let's look briefly at the package registry, which provides the packages to install. You are well aware of the public registry at &lt;a href="https://registry.npmjs.org/" rel="noopener noreferrer"&gt;https://registry.npmjs.org/&lt;/a&gt;, but there are many other public and private registries as well.&lt;/p&gt;

&lt;h3&gt;
  
  
  API specification
&lt;/h3&gt;

&lt;p&gt;The package registries provide simple &lt;a href="https://github.com/npm/registry/blob/main/docs/REGISTRY-API.md" rel="noopener noreferrer"&gt;REST API&lt;/a&gt; to get package metadata as well as actual package contents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;GET /{package}&lt;/code&gt; - Gets package metadata for all package versions. &lt;/p&gt;

&lt;p&gt;For example, &lt;a href="https://registry.npmjs.org/react" rel="noopener noreferrer"&gt;https://registry.npmjs.org/react&lt;/a&gt; provides metadata about the &lt;a href="https://www.npmjs.com/package/react" rel="noopener noreferrer"&gt;react&lt;/a&gt; package. Notice, that the metadata is quite large. In this case, it is about 5.6 MB (1.2 MB compressed) at the time of writing. This is caused by the fact, that the metadata contains details for all package versions, which means 2,280 versions in this case.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;GET /{package}/{version}&lt;/code&gt; - Similar to the previous one, but returns metadata just for a single version. The version must really be a single version, no floating versions like &lt;code&gt;^19.0.0&lt;/code&gt; are supported.&lt;/p&gt;

&lt;p&gt;For example, we can get metadata for &lt;a href="https://www.npmjs.com/package/react" rel="noopener noreferrer"&gt;react&lt;/a&gt; version 19.0.0 using &lt;a href="https://registry.npmjs.org/react/19.0.0" rel="noopener noreferrer"&gt;https://registry.npmjs.org/react/19.0.0&lt;/a&gt;. As there is just a single version, the metadata is much smaller - about 2 KB in this case.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;GET /{package}/-/{package}-{version}.tgz&lt;/code&gt; - Gets actual package content as a single compressed tarball. This endpoint is not documented (and does not have to be), because it is referenced from the package metadata.&lt;/p&gt;

&lt;p&gt;For example, we can download &lt;a href="https://www.npmjs.com/package/react" rel="noopener noreferrer"&gt;react&lt;/a&gt; version 19.0.0 from &lt;a href="https://registry.npmjs.org/react/-/react-19.0.0.tgz" rel="noopener noreferrer"&gt;https://registry.npmjs.org/react/-/react-19.0.0.tgz&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are also endpoints used e.g. for package publishing, but these are not needed for package installation, so they are not mentioned here.&lt;/p&gt;

&lt;h3&gt;
  
  
  API behavior
&lt;/h3&gt;

&lt;p&gt;It is obvious, that in general &lt;code&gt;npm install&lt;/code&gt; can use only two out of these endpoints - the first one to get package metadata for all versions and the last one to get package content. The second one cannot be used in many cases due to floating package versions being requested. When evaluating floating versions, &lt;code&gt;npm&lt;/code&gt; must use the first endpoint to get metadata for all versions and then evaluate the floating version locally.&lt;/p&gt;

&lt;h4&gt;
  
  
  Metadata endpoint
&lt;/h4&gt;

&lt;p&gt;As we will need it later, let's analyze behavior of the first endpoint using &lt;code&gt;curl&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-I&lt;/span&gt; &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s1"&gt;'accept: application/json'&lt;/span&gt; https://registry.npmjs.org/react

HTTP/2 200
&lt;span class="nb"&gt;date&lt;/span&gt;: Sun, 20 Apr 2025 19:52:23 GMT
content-type: application/json
cf-ray: 93372e8c4bdbf99e-PRG
cf-cache-status: HIT
accept-ranges: bytes
access-control-allow-origin: &lt;span class="k"&gt;*&lt;/span&gt;
age: 141
cache-control: public, max-age&lt;span class="o"&gt;=&lt;/span&gt;300
etag: &lt;span class="s2"&gt;"25b33cc1cc58fb5a156c19f2ec3408ca"&lt;/span&gt;
last-modified: Sun, 20 Apr 2025 01:17:26 GMT
vary: accept-encoding, accept
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The endpoint returns package metadata (not shown here) and a couple of headers. Out of these headers, we will further need the following ones:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;cache-control&lt;/code&gt; - based on this header, the metadata should be cached for 300 seconds, i.e. 5 minutes. I have checked several packages and this value is the same for all of them, so it seems, that it is a common default.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;etag&lt;/code&gt;, &lt;code&gt;last-modified&lt;/code&gt; - values of these headers can be used in the future to check, whether there is a new version available.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's try to use the &lt;code&gt;last-modified&lt;/code&gt; header in another request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-I&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s1"&gt;'accept: application/json'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s1"&gt;'if-modified-since: Sun, 20 Apr 2025 01:17:26 GMT'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  https://registry.npmjs.org/react

HTTP/2 200
&lt;span class="nb"&gt;date&lt;/span&gt;: Sun, 20 Apr 2025 19:59:19 GMT
content-type: application/json
cf-ray: 933738b08aa0b377-PRG
cf-cache-status: HIT
accept-ranges: bytes
access-control-allow-origin: &lt;span class="k"&gt;*&lt;/span&gt;
age: 254
cache-control: public, max-age&lt;span class="o"&gt;=&lt;/span&gt;300
etag: &lt;span class="s2"&gt;"25b33cc1cc58fb5a156c19f2ec3408ca"&lt;/span&gt;
last-modified: Sun, 20 Apr 2025 01:17:26 GMT
vary: accept-encoding, accept
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is quite surprising. With the &lt;code&gt;if-modified-since&lt;/code&gt; header, the server should have returned HTTP code &lt;a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/304" rel="noopener noreferrer"&gt;304 Not Modified&lt;/a&gt;, but we get 200 instead. You can try to use &lt;code&gt;etag&lt;/code&gt; with &lt;code&gt;if-none-match&lt;/code&gt; and the result is the same. Even when specifying both &lt;code&gt;if-modified-since&lt;/code&gt; and &lt;code&gt;if-none-match&lt;/code&gt;, HTTP 200 is still returned. &lt;/p&gt;

&lt;p&gt;This means, that the public npm registry does not support effective checks of new versions. As we will see later, after the cache expires (i.e. after 5 minutes), the complete metadata has to be re-downloaded even if there is no change.&lt;/p&gt;

&lt;p&gt;For the purposes of this post, I also have my own local npm registry running on &lt;a href="https://www.sonatype.com/products/sonatype-nexus-repository" rel="noopener noreferrer"&gt;Sonatype Nexus&lt;/a&gt;. The registry is set up to mirror content from the public &lt;a href="https://registry.npmjs.org" rel="noopener noreferrer"&gt;https://registry.npmjs.org&lt;/a&gt;, so we can easily repeat the previous test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-I&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s1"&gt;'accept: application/json'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s1"&gt;'if-modified-since: Sun, 20 Apr 2025 01:17:26 GMT'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  http://localhost:8081/repository/npmjs/react

HTTP/1.1 304 Not Modified
Date: Sun, 20 Apr 2025 20:24:56 GMT
Server: Nexus/3.79.1-04 &lt;span class="o"&gt;(&lt;/span&gt;COMMUNITY&lt;span class="o"&gt;)&lt;/span&gt;
X-Content-Type-Options: nosniff
Content-Security-Policy: sandbox allow-forms allow-modals allow-popups allow-presentation allow-scripts allow-top-navigation
X-XSS-Protection: 1&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nv"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;block
ETag: W/&lt;span class="s2"&gt;"25b33cc1cc58fb5a156c19f2ec3408ca"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, HTTP code &lt;code&gt;304&lt;/code&gt; is correctly returned. This means, that if the metadata is not modified, only a few HTTP headers (and no content) are transferred resulting in substantial bandwidth savings. &lt;/p&gt;

&lt;p&gt;There is one more difference when comparing the behavior of my local Nexus with the public npm registry - my local Nexus does not return &lt;code&gt;cache-control&lt;/code&gt; header. As a result, the client application (&lt;code&gt;npm&lt;/code&gt; in this case) does not know, how long the returned data can be safely cached, so it has to use a heuristic algorithm, as we will see later.&lt;/p&gt;

&lt;h4&gt;
  
  
  Package content endpoint
&lt;/h4&gt;

&lt;p&gt;The package content endpoint works similarly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-I&lt;/span&gt; &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s1"&gt;'accept: text/html'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  https://registry.npmjs.org/react/-/react-19.0.0.tgz

HTTP/2 200
&lt;span class="nb"&gt;date&lt;/span&gt;: Sun, 20 Apr 2025 20:57:35 GMT
content-type: application/octet-stream
cf-ray: 93378e0f7e5771ef-PRG
cf-cache-status: HIT
accept-ranges: bytes
access-control-allow-origin: &lt;span class="k"&gt;*&lt;/span&gt;
age: 1705545
cache-control: public, immutable, max-age&lt;span class="o"&gt;=&lt;/span&gt;31557600
etag: &lt;span class="s2"&gt;"7860ab2d152873bfbc3e990b2bbc62da"&lt;/span&gt;
last-modified: Thu, 05 Dec 2024 18:10:24 GMT
vary: Accept-Encoding
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can see similar headers as for the package metadata:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;cache-control&lt;/code&gt; - based on this header, the metadata is cached for 31,557,600 seconds, i.e. 1 year. Again, this value seems to be the same for all packages.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;etag&lt;/code&gt;, &lt;code&gt;last-modified&lt;/code&gt; - values of these headers can be used in the future to check, whether there is a new version available&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If we try to use the &lt;code&gt;last-modified&lt;/code&gt; value to check for new version, the behavior is the same as for the package metadata endpoint - we get HTTP code &lt;code&gt;200&lt;/code&gt; with the public npm repository and &lt;code&gt;304&lt;/code&gt; with my local Nexus. And the &lt;code&gt;cache-control&lt;/code&gt; header is also missing for Nexus.&lt;/p&gt;

&lt;h2&gt;
  
  
  Package cache
&lt;/h2&gt;

&lt;p&gt;To optimize package installation, &lt;code&gt;npm&lt;/code&gt; maintains local cache of package metadata and package content. This is described in the &lt;a href="https://docs.npmjs.com/cli/v11/commands/npm-cache" rel="noopener noreferrer"&gt;npm documentation&lt;/a&gt;. However, the documentation does not mention anything about the cache policy. It is not clear when a cache entry is considered stale and needs to be refreshed.&lt;/p&gt;

&lt;p&gt;To analyze the cache policy, we can check the source code. The cache policy is implemented in the &lt;a href="https://github.com/kornelski/http-cache-semantics" rel="noopener noreferrer"&gt;http-cache-semantics&lt;/a&gt; package in its &lt;a href="https://github.com/kornelski/http-cache-semantics/blob/main/index.js" rel="noopener noreferrer"&gt;index.js&lt;/a&gt; file. After analyzing this file, we can easily find out, that the cache behavior depends on the HTTP cache headers. Without any technical details, the principles can be summarized as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If there is &lt;code&gt;cache-control&lt;/code&gt; header with &lt;code&gt;max-age&lt;/code&gt; parameter, then this value is used. This means, that package metadata fetched from &lt;a href="https://registry.npmjs.org" rel="noopener noreferrer"&gt;https://registry.npmjs.org&lt;/a&gt; is cached for 5 minutes, while package content is cached for 1 year. 5 minutes for package metadata seems to be quite aggressive, but up-to-date metadata is probably more important than the extra bandwidth required for frequent checks. One year for package content is fine because the content should never change.&lt;/li&gt;
&lt;li&gt;If there is no &lt;code&gt;cache-control&lt;/code&gt; header (as is the case for my local Nexus), the caching behavior is not defined by the server. In this case, the implementation uses heuristics defined in &lt;a href="https://datatracker.ietf.org/doc/html/rfc7234#section-4.2.2" rel="noopener noreferrer"&gt;RFC 7234&lt;/a&gt;, which calculates &lt;code&gt;max-age&lt;/code&gt; as &lt;code&gt;(current-time - last-modified) * 0.1&lt;/code&gt;. As a result, the data is cached one-tenth of the time elapsed since its last modification. For example, if the data was last modified 10 months ago, it will be cached for one month. If this behavior is not intended, it can always be overridden for example using a reverse proxy, which adds the &lt;code&gt;cache-control&lt;/code&gt; header to standard headers returned by Nexus. Using this approach, the same behavior as for &lt;a href="https://registry.npmjs.org" rel="noopener noreferrer"&gt;https://registry.npmjs.org&lt;/a&gt; can be configured.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The cache behavior can be observed when running &lt;code&gt;npm install&lt;/code&gt; with the &lt;code&gt;--loglevel=http&lt;/code&gt; option. The output then looks as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm http fetch GET 200 https://registry.npmjs.org/@testing-library%2fjest-dom 11ms &lt;span class="o"&gt;(&lt;/span&gt;cache hit&lt;span class="o"&gt;)&lt;/span&gt;
npm http fetch GET 200 https://registry.npmjs.org/@testing-library%2freact 5ms &lt;span class="o"&gt;(&lt;/span&gt;cache hit&lt;span class="o"&gt;)&lt;/span&gt;
npm http fetch GET 200 https://registry.npmjs.org/react 13ms &lt;span class="o"&gt;(&lt;/span&gt;cache hit&lt;span class="o"&gt;)&lt;/span&gt;
npm http fetch GET 200 https://registry.npmjs.org/react-dom 15ms &lt;span class="o"&gt;(&lt;/span&gt;cache hit&lt;span class="o"&gt;)&lt;/span&gt;
npm http fetch GET 200 https://registry.npmjs.org/@testing-library%2fuser-event 4ms &lt;span class="o"&gt;(&lt;/span&gt;cache hit&lt;span class="o"&gt;)&lt;/span&gt;
npm http fetch GET 200 https://registry.npmjs.org/@testing-library%2fdom 5ms &lt;span class="o"&gt;(&lt;/span&gt;cache hit&lt;span class="o"&gt;)&lt;/span&gt;
npm http fetch GET 200 https://registry.npmjs.org/react-scripts 4ms &lt;span class="o"&gt;(&lt;/span&gt;cache hit&lt;span class="o"&gt;)&lt;/span&gt;
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cache behavior is shown at the end of each request as &lt;code&gt;cache &amp;lt;status&amp;gt;&lt;/code&gt;. Specific values are not documented, but based on my analysis of the &lt;code&gt;npm&lt;/code&gt; source code, they are as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;cache hit&lt;/code&gt; - The requested data was found in the cache and was not fetched from the server. The HTTP status 200 in this case is misleading because there was no HTTP request sent.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cache miss&lt;/code&gt; - The requested data was not found in the cache, so it was fetched from the server.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cache revalidated&lt;/code&gt; - The requested data was found in the cache, but it is stale, so a request with &lt;code&gt;if-none-match&lt;/code&gt; and &lt;code&gt;if-modified-since&lt;/code&gt; headers was sent to the server. The server returned HTTP 304 without any data, so the cache item was marked as up-to-date. Similarly to &lt;code&gt;cache hit&lt;/code&gt;, the log contains HTTP status 200, which is misleading.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cache updated&lt;/code&gt; - Similar to &lt;code&gt;cache revalidated&lt;/code&gt;, but in this case server responded with HTTP 200 and provided up-to-date data.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cache stale&lt;/code&gt; - The requested data was found in the cache, it was identified as stale, but it was used anyway because, for example, cached data is preferred using &lt;code&gt;--prefer-offline&lt;/code&gt; option. The HTTP status 200 in this case is misleading again because there was no HTTP request sent. &lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Installing packages using npm install and others
&lt;/h2&gt;

&lt;p&gt;When installing packages using &lt;code&gt;npm install&lt;/code&gt; (just plain &lt;code&gt;npm install&lt;/code&gt; without any options is assumed), &lt;code&gt;npm&lt;/code&gt; performs the following steps (see &lt;a href="https://docs.npmjs.com/cli/v11/commands/npm-install" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;package.json&lt;/code&gt; resolution - &lt;code&gt;npm&lt;/code&gt; reads the &lt;code&gt;package.json&lt;/code&gt; file.&lt;/li&gt;
&lt;li&gt;Package resolution - based on the direct dependencies specified in &lt;code&gt;package.json&lt;/code&gt;, &lt;code&gt;npm&lt;/code&gt; calculates complete dependency tree including transitive dependencies. If there is a &lt;code&gt;package-lock.json&lt;/code&gt; file present, npm first checks, if the locked versions satisfy version requirements from &lt;code&gt;package.json&lt;/code&gt;. If they do, the &lt;code&gt;package-lock.json&lt;/code&gt; is used instead of resolving the dependency tree, otherwise the conflicts are resolved using the dependency tree.&lt;/li&gt;
&lt;li&gt;Package download - &lt;code&gt;npm&lt;/code&gt; downloads all missing packages used in the dependency tree.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;node_modules&lt;/code&gt; creation/update - &lt;code&gt;npm&lt;/code&gt; creates &lt;code&gt;node_modules&lt;/code&gt; directory based on the dependency tree and the downloaded packages. If the &lt;code&gt;node_modules&lt;/code&gt; directory already exists, &lt;code&gt;npm&lt;/code&gt; updates its content to reflect the dependency tree.&lt;/li&gt;
&lt;li&gt;Scripts execution - &lt;code&gt;npm&lt;/code&gt; runs scripts defined in installed &lt;code&gt;package.json&lt;/code&gt; files.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;package-lock.json&lt;/code&gt; create/update - &lt;code&gt;npm&lt;/code&gt; creates or updates the &lt;code&gt;package-lock.json&lt;/code&gt; file to reflect the installed packages.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;But &lt;code&gt;npm install&lt;/code&gt; is not the only command to install packages. Another command is &lt;code&gt;npm update&lt;/code&gt; (see &lt;a href="https://docs.npmjs.com/cli/v11/commands/npm-update" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;), which behaves similarly to &lt;code&gt;npm install&lt;/code&gt;. The key difference is in the way, how existing &lt;code&gt;package-lock.json&lt;/code&gt; file is handled. &lt;code&gt;npm update&lt;/code&gt; ignores the &lt;code&gt;package-lock.json&lt;/code&gt; file, always calculates complete dependency tree and updates &lt;code&gt;package-lock.json&lt;/code&gt; accordingly. If there is no &lt;code&gt;package-lock.json&lt;/code&gt; file, &lt;code&gt;npm update&lt;/code&gt; behaves the same as &lt;code&gt;npm install&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The last package installation command is &lt;code&gt;npm ci&lt;/code&gt; (see &lt;a href="https://docs.npmjs.com/cli/v11/commands/npm-ci" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;). This command requires existing &lt;code&gt;package-lock.json&lt;/code&gt;, but otherwise its behavior is similar to &lt;code&gt;npm install&lt;/code&gt;. The key difference is in the way, how conflicts between &lt;code&gt;package.json&lt;/code&gt; and &lt;code&gt;package-lock.json&lt;/code&gt; are handled. &lt;code&gt;npm ci&lt;/code&gt; does not try to resolve them, but fails instead.&lt;/p&gt;

&lt;p&gt;Behavior of all package installation commands can be summarized using the following table.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;npm install&lt;/th&gt;
&lt;th&gt;npm update&lt;/th&gt;
&lt;th&gt;npm ci&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No &lt;code&gt;package-lock.json&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Resolves complete dependency tree using &lt;code&gt;package.json&lt;/code&gt; and installs resolved dependencies&lt;/td&gt;
&lt;td&gt;Same as &lt;code&gt;npm install&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Fails&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;package-lock.json&lt;/code&gt; consistent with &lt;code&gt;package.json&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Installs dependencies based on &lt;code&gt;package-lock.json&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Same as with no &lt;code&gt;package-lock.json&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Same as &lt;code&gt;npm install&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;package-lock.json&lt;/code&gt; inconsistent with &lt;code&gt;package.json&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Resolves dependency conflicts and installs dependencies based on &lt;code&gt;package-lock.json&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Same as with no &lt;code&gt;package-lock.json&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Fails&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Package installation behavior analysis
&lt;/h2&gt;

&lt;p&gt;Now, we have all the important theoretical information about &lt;code&gt;npm&lt;/code&gt;, so let's try to analyze package installation in more detail. To do this, we are going to install npm packages in multiple scenarios and observe &lt;code&gt;npm&lt;/code&gt; behavior in each of them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Environment
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Hardware

&lt;ul&gt;
&lt;li&gt;CPU: Intel(R) Core(TM) Ultra 7 &lt;a href="mailto:155H@1.40"&gt;155H@1.40&lt;/a&gt; GHz&lt;/li&gt;
&lt;li&gt;RAM: 32 GB&lt;/li&gt;
&lt;li&gt;Network bandwidth: 100 Mbps (this applies to my local Nexus registry as well)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Software

&lt;ul&gt;
&lt;li&gt;OS: Ubuntu 24.04.1 LTS (in a virtual machine)&lt;/li&gt;
&lt;li&gt;Node.js version: 22.14.0 (latest LTS version)&lt;/li&gt;
&lt;li&gt;npm version: 10.9.2 (bundled with the Node.js)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Scenarios
&lt;/h3&gt;

&lt;p&gt;The tested scenarios are as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;npm ci&lt;/code&gt; with no cache&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;npm ci&lt;/code&gt; with cache&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;npm install&lt;/code&gt; with no cache and no &lt;code&gt;package-lock.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;npm install&lt;/code&gt; with no cache, no &lt;code&gt;package-lock.json&lt;/code&gt; and &lt;code&gt;--package-lock-only&lt;/code&gt; option&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;npm install&lt;/code&gt; with no cache and &lt;code&gt;package-lock.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;npm install&lt;/code&gt; with up-to-date cache and no &lt;code&gt;package-lock.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;npm install&lt;/code&gt; with up-to-date cache, no &lt;code&gt;package-lock.json&lt;/code&gt; and &lt;code&gt;--prefer-online&lt;/code&gt; option&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;npm install&lt;/code&gt; with up-to-date cache, no &lt;code&gt;package-lock.json&lt;/code&gt; and &lt;code&gt;--package-lock-only&lt;/code&gt; option&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;npm install&lt;/code&gt; with up-to-date cache and &lt;code&gt;package-lock.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;npm install&lt;/code&gt; with up-to-date cache, &lt;code&gt;package-lock.json&lt;/code&gt; and &lt;code&gt;--prefer-online&lt;/code&gt; option&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;npm update&lt;/code&gt; for the same scenarios as &lt;code&gt;npm install&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each of the scenarios will be run using the public &lt;a href="https://registry.npmjs.org" rel="noopener noreferrer"&gt;https://registry.npmjs.org&lt;/a&gt; registry as well as my local Nexus registry.&lt;/p&gt;

&lt;h3&gt;
  
  
  package.json
&lt;/h3&gt;

&lt;p&gt;In order to run each scenario, there must be a &lt;code&gt;package.json&lt;/code&gt; file present. For this analysis, we will use the following &lt;code&gt;package.json&lt;/code&gt; file. This is a real &lt;code&gt;package.json&lt;/code&gt; file, that I've used when creating a simple demo of &lt;a href="https://recharts.org" rel="noopener noreferrer"&gt;Recharts&lt;/a&gt; library.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"recharts-random-data"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0.1.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"private"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"dependencies"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@testing-library/jest-dom"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"^5.17.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@testing-library/react"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"^13.4.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@testing-library/user-event"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"^13.5.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"react"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"^18.2.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"react-dom"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"^18.2.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"react-scripts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"5.0.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"recharts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"^2.9.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"html2canvas"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"^1.4.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"web-vitals"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"^2.1.4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"typescript"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"^4.0.0"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You may notice dependency on &lt;code&gt;typescript&lt;/code&gt; version &lt;code&gt;^4.0.0&lt;/code&gt;, which is not really needed, it is only a transitive dependency of other packages. However, without this dependency, &lt;code&gt;npm install&lt;/code&gt; installs &lt;code&gt;typescript&lt;/code&gt; version 5.x and the following &lt;code&gt;npm ci&lt;/code&gt; complains, that the generated &lt;code&gt;package-lock.json&lt;/code&gt; is not in sync with the &lt;code&gt;package.json&lt;/code&gt;. This seems to be a bug in &lt;code&gt;npm&lt;/code&gt;, but as a workaround, the &lt;code&gt;typescript&lt;/code&gt; dependency can be explicitly added.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;package.json&lt;/code&gt; generates &lt;code&gt;node_modules&lt;/code&gt; directory with the following contents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total number of packages: 1,507 (as reported by &lt;code&gt;npm install&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Total number of unique packages (excluding version): 1,168&lt;/li&gt;
&lt;li&gt;Total number of unique packages (including version): 1,317&lt;/li&gt;
&lt;li&gt;Total number of files: 41,087&lt;/li&gt;
&lt;li&gt;Total number of directories: 5,240&lt;/li&gt;
&lt;li&gt;Total file size: 258 MB&lt;/li&gt;
&lt;li&gt;Total used disk space: 387 MB&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Observed metrics
&lt;/h3&gt;

&lt;p&gt;For each scenario, we will observe the following metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Number of HTTP requests sent to the registry for each request type (package metadata and package content) and HTTP response code - see below for more details.&lt;/li&gt;
&lt;li&gt;Total size of the downloaded data with the same breakdown as for the number of HTTP requests - see below for more details.&lt;/li&gt;
&lt;li&gt;Number of duplicate HTTP requests - it may seem surprising, but &lt;code&gt;npm&lt;/code&gt; really generates duplicate HTTP requests. This is probably connected with the fact, that the dependency tree is being resolved in parallel, but it still seems like a bug.&lt;/li&gt;
&lt;li&gt;Total duration of the whole scenario&lt;/li&gt;
&lt;li&gt;Time to generate complete dependency tree - this information is provided when running &lt;code&gt;npm&lt;/code&gt; with &lt;code&gt;--timing&lt;/code&gt; parameter as a single log line with the following contents: &lt;code&gt;npm timing idealTree Completed in 5800ms&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As we already know from previous sections, the &lt;code&gt;npm&lt;/code&gt; log at the &lt;code&gt;http&lt;/code&gt; level does not provide real HTTP response codes. Additionally, it does not provide the size of the downloaded data. And it seems, that there are no other command-line options, that would provide us with this information. So we need to add additional logging directly to the &lt;code&gt;npm&lt;/code&gt; source code. The HTTP requests are handled by the &lt;a href="https://www.npmjs.com/package/minipass-fetch" rel="noopener noreferrer"&gt;minipass-fetch&lt;/a&gt; package, which can be easily extended by adding the following code to its &lt;a href="https://github.com/npm/minipass-fetch/blob/main/lib/index.js" rel="noopener noreferrer"&gt;index.js&lt;/a&gt; source:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;totalSize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;data&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;totalSize&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;end&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;minipass-fetch&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;totalSize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;cache-control&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;end&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here we calculate the total size of the download data (without HTTP headers) and log it together with the HTTP status code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tooling
&lt;/h3&gt;

&lt;p&gt;To run the scenarios, I've created a simple Node.js script, which runs the scenarios one-by-one and for each of them performs the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Cleans &lt;code&gt;npm&lt;/code&gt; cache using &lt;code&gt;npm cache clean --force&lt;/code&gt; command.&lt;/li&gt;
&lt;li&gt;Removes &lt;code&gt;node_modules&lt;/code&gt; directory and &lt;code&gt;package-lock.json&lt;/code&gt; file, which may have been created by a previous scenario.&lt;/li&gt;
&lt;li&gt;Runs &lt;code&gt;npm install&lt;/code&gt;, if there are cached data or &lt;code&gt;package-lock.json&lt;/code&gt; needed for the scenario.&lt;/li&gt;
&lt;li&gt;Cleans &lt;code&gt;npm&lt;/code&gt; cache or &lt;code&gt;package-lock.json&lt;/code&gt; file, if not needed for the scenario.&lt;/li&gt;
&lt;li&gt;Synchronously runs command specific for the scenario, measures its execution time and gathers its log.&lt;/li&gt;
&lt;li&gt;Analyzes the log to gather statistics about HTTP requests and time to generate complete dependency tree.&lt;/li&gt;
&lt;li&gt;Saves the log as well as all metrics to files.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Complete source code is available in my &lt;a href="https://github.com/pavel-zeman/npm-analysis" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;. The code can be run using &lt;code&gt;node benchmark.js&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Results
&lt;/h3&gt;

&lt;p&gt;Complete results are available in a &lt;a href="https://docs.google.com/spreadsheets/d/1rzumATOUnYovjRikUNvnkNy6rHvbUWuXlTj89JzKgdQ" rel="noopener noreferrer"&gt;Google sheet&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you prefer the results directly in the article, the following table contains results for each scenario for the public &lt;a href="https://registry.npmjs.org" rel="noopener noreferrer"&gt;https://registry.npmjs.org&lt;/a&gt; registry.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdnl4gtz25mugwimatnbz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdnl4gtz25mugwimatnbz.png" alt="Results for public registry" width="800" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And the same table for my local Nexus registry follows.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F25mt8pxsi39e9tspscxh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F25mt8pxsi39e9tspscxh.png" alt="Results for Nexus" width="800" height="289"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The results contain the following columns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Command - &lt;code&gt;npm&lt;/code&gt; command being run.&lt;/li&gt;
&lt;li&gt;Cache - indicates, whether up-to-date cache data was used.&lt;/li&gt;
&lt;li&gt;Lockfile - indicates, whether the &lt;code&gt;package-lock.json&lt;/code&gt; file was present.&lt;/li&gt;
&lt;li&gt;Arguments - additional command-line arguments passed to &lt;code&gt;npm&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Metadata requests - count and size of HTTP requests to get package metadata split by HTTP status code (200 and 304).&lt;/li&gt;
&lt;li&gt;Content requests - count and size of HTTP requests to get package content split by HTTP status code (200 and 304).&lt;/li&gt;
&lt;li&gt;Duplicate requests - number of duplicate HTTP requests, i.e. requests, that get the same data as a previously sent request.&lt;/li&gt;
&lt;li&gt;Dep. generation time - time to generate complete dependency tree in seconds. Each scenario was run just once, so this time should be taken cautiously. &lt;/li&gt;
&lt;li&gt;Total time - total time of the scenario in seconds. Each scenario was run just once, so this time should be taken cautiously. &lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Observations
&lt;/h3&gt;

&lt;p&gt;The following list summarizes observations based on the results presented above:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Metadata is huge - One may expect package metadata to be smaller than package contents. The results show, that exactly the opposite is true with package content size being about 53 MB, while package metadata size being about 77 MB (or 81 MB for Nexus). This should be no surprise since metadata contains all package versions, which can easily be more than a thousand as evidenced above for the &lt;a href="https://www.npmjs.com/package/react" rel="noopener noreferrer"&gt;react&lt;/a&gt; package.&lt;/li&gt;
&lt;li&gt;Number of HTTP requests needed to install packages is huge - When starting with no &lt;code&gt;package-lock.json&lt;/code&gt; and no cache (or with the &lt;code&gt;--prefer-online&lt;/code&gt; option), there are about 2800 requests generated. And even though &lt;code&gt;npm&lt;/code&gt; runs them in parallel (&lt;a href="https://docs.npmjs.com/cli/v11/using-npm/config#maxsockets" rel="noopener noreferrer"&gt;15 by default&lt;/a&gt;), the requests take some time depending especially on your bandwidth and latency.&lt;/li&gt;
&lt;li&gt;Dependency tree generation is slow - The time to generate complete dependency tree is about 6 seconds even when all the metadata is cached. For 1,507 packages in total, this seems to be quite slow. And this is one of the pain points, that other package managers try to address.&lt;/li&gt;
&lt;li&gt;Total metadata size with Nexus is higher than with the public registry - I suppose, that the metadata of the public registry is gzipped with maximum compression and then stored on a CDN to be available for end users. Nexus itself supports no metadata compression, so I configured dynamic compression at a reverse proxy. Due to its nature, this compression is probably not as efficient as the one-off compression used by the public registry.&lt;/li&gt;
&lt;li&gt;Missing support of HTTP 304 by the public registry - As already explained above, the public registry does not support HTTP 304 status code. As a result, when you ask for up-to-date data using the &lt;code&gt;--prefer-online&lt;/code&gt; option, be prepared to download everything - complete package metadata as well as package content. On the other hand, with Nexus and up-to-date local cache, the traffic is low, because only HTTP headers are transferred. &lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--prefer-online&lt;/code&gt; tries to fetch not only up-to-date package metadata but also package content - This does not make sense in most cases, because the package content should not be modified. If a package content needs to be modified, new package version should be published instead.&lt;/li&gt;
&lt;li&gt;Many duplicate requests - The number of duplicate requests is quite high, especially in some scenarios. For example when using the &lt;code&gt;--prefer-online&lt;/code&gt; option with no &lt;code&gt;package-lock.json&lt;/code&gt; and the public registry, there are 611 duplicate requests out of 3,112 requests in total, which is about 20%. In this case, the duplicate requests also result in the increase of transferred data from 77 MB to 106 MB.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;npm update&lt;/code&gt; behaves the same as &lt;code&gt;npm install&lt;/code&gt; without &lt;code&gt;package-lock.json&lt;/code&gt; - This confirms the principles of &lt;code&gt;npm update&lt;/code&gt; described above.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;npm ci&lt;/code&gt; behaves the same as &lt;code&gt;npm install&lt;/code&gt; with &lt;code&gt;package-lock.json&lt;/code&gt; - This confirms the principles of &lt;code&gt;npm ci&lt;/code&gt; described above.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Package metadata is huge, so be prepared to download a lot of data when installing packages.&lt;/li&gt;
&lt;li&gt;Check the caching headers and HTTP 304 support of your npm registry to understand, how your packages are cached and refreshed.&lt;/li&gt;
&lt;li&gt;When you need to install up-to-date packages, &lt;code&gt;npm install&lt;/code&gt; with &lt;code&gt;--prefer-online&lt;/code&gt; option can be used. However, this approach tries to refresh package metadata as well as package content. A better approach might be to use &lt;code&gt;npm install --package-lock-only --prefer-online&lt;/code&gt; to generate up-to-date &lt;code&gt;package-lock.json&lt;/code&gt; and then use &lt;code&gt;npm install --prefer-offline&lt;/code&gt; to install packages based on the &lt;code&gt;package-lock.json&lt;/code&gt; file. This way, the package content is only downloaded, when it is not present in the cache.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>npm</category>
      <category>packageregistry</category>
      <category>performance</category>
      <category>javascript</category>
    </item>
  </channel>
</rss>
