<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Joni Sar</title>
    <description>The latest articles on DEV Community by Joni Sar (@jonisar).</description>
    <link>https://dev.to/jonisar</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F13629%2F82514506-b658-415e-8114-a7f2e04c4ff3.png</url>
      <title>DEV Community: Joni Sar</title>
      <link>https://dev.to/jonisar</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jonisar"/>
    <language>en</language>
    <item>
      <title>Introducing QueryFlux: Open-Source Universal Multi-Engine Query Router and SQL Proxy</title>
      <dc:creator>Joni Sar</dc:creator>
      <pubDate>Mon, 06 Apr 2026 09:07:13 +0000</pubDate>
      <link>https://dev.to/jonisar/introducing-queryflux-multi-engine-query-router-and-universal-sql-proxy-19e9</link>
      <guid>https://dev.to/jonisar/introducing-queryflux-multi-engine-query-router-and-universal-sql-proxy-19e9</guid>
      <description>&lt;p&gt;Efficiently routing multiple query engines is a critical challenge.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://queryflux.dev/" rel="noopener noreferrer"&gt;QueryFlux&lt;/a&gt; is a universal SQL proxy and multi-engine query router written in Rust. It sits between clients and query engines. Clients connect to QueryFlux using a protocol they already know. QueryFlux routes each query to the right backend, translates SQL dialects when needed, enforces concurrency limits, and gives you a unified observability surface.&lt;/p&gt;

&lt;p&gt;Open table formats unified the data. QueryFlux unifies the access.&lt;/p&gt;

&lt;p&gt;If you already run more than one query engine, you know the problem is not only where data lives. The harder part is how query access works in practice.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which engine should this run on?&lt;/li&gt;
&lt;li&gt;Which client should connect where?&lt;/li&gt;
&lt;li&gt;How do you protect low-latency traffic from batch workloads?&lt;/li&gt;
&lt;li&gt;What happens when one cluster is saturated?&lt;/li&gt;
&lt;li&gt;How much routing logic ends up hardcoded across the stack?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the problem &lt;a href="https://queryflux.dev/" rel="noopener noreferrer"&gt;QueryFlux&lt;/a&gt; is built to solve.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why QueryFlux exists
&lt;/h2&gt;

&lt;p&gt;Modern data platforms are multi-engine by design.&lt;/p&gt;

&lt;p&gt;A team may use Trino for federated queries, DuckDB for embedded analytics, StarRocks for low-latency serving, and Athena for pay-per-scan workloads on cold data. That mix is not a sign of architectural drift. In many cases, it is the right shape of the system.&lt;/p&gt;

&lt;p&gt;Open table formats made this possible. With Apache Iceberg, Delta Lake, or Hudi, multiple engines can read the same data in object storage without duplicating it. That solved storage interoperability.&lt;/p&gt;

&lt;p&gt;What it did not solve is compute access.&lt;/p&gt;

&lt;p&gt;Each engine still comes with its own protocol, its own SQL dialect, its own connection handling, and its own operational behavior. Clients still need to know where to connect. Routing logic still leaks into notebooks, applications, dashboards, and team conventions. Capacity management is still fragmented across backends.&lt;/p&gt;

&lt;p&gt;QueryFlux adds the missing layer above the table format: one access layer in front of the engine fleet.&lt;/p&gt;

&lt;h2&gt;
  
  
  What QueryFlux does
&lt;/h2&gt;

&lt;p&gt;At a high level, QueryFlux handles three things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;protocol ingestion&lt;/li&gt;
&lt;li&gt;routing&lt;/li&gt;
&lt;li&gt;dispatch and dialect translation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Clients connect using protocols they already speak. QueryFlux supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Trino HTTP&lt;/li&gt;
&lt;li&gt;PostgreSQL wire&lt;/li&gt;
&lt;li&gt;MySQL wire&lt;/li&gt;
&lt;li&gt;Arrow Flight SQL&lt;/li&gt;
&lt;li&gt;Admin REST API&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On the backend side, it already supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Trino&lt;/li&gt;
&lt;li&gt;DuckDB&lt;/li&gt;
&lt;li&gt;StarRocks&lt;/li&gt;
&lt;li&gt;Athena&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That gives it a very specific place in the stack. It is not trying to replace engines, and it is not introducing a custom client model. It is making a heterogeneous engine fleet look coherent from the access layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  How a query flows through the system
&lt;/h2&gt;

&lt;p&gt;A client connects to QueryFlux using a native protocol.&lt;/p&gt;

&lt;p&gt;The query is evaluated against an ordered routing chain.&lt;/p&gt;

&lt;p&gt;The first matching rule selects the cluster group that should handle the query.&lt;/p&gt;

&lt;p&gt;From there, QueryFlux selects a healthy cluster in that group, optionally rewrites the SQL into the target dialect using sqlglot, and dispatches the query.&lt;/p&gt;

&lt;p&gt;If the group is already at its concurrency limit, the query can queue at the proxy instead of failing immediately.&lt;/p&gt;

&lt;p&gt;That is the important design move. QueryFlux is not just a forwarder. It is the runtime layer where access, routing, translation, and capacity handling meet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client (psql / Trino CLI / mysql / BI tool)
    │
    │ native protocol
    ▼
┌─────────────────────────────────────────────┐
│                 QueryFlux                   │
│                                             │
│  Frontend ──► Router ──► Dialect translation│
│                    │                        │
│              Cluster group                  │
│         (concurrency limit + queue)         │
└──────────────────┬──────────────────────────┘
                   │
      ┌────────────┼────────────┐
      ▼            ▼            ▼
   Trino       StarRocks      Athena
      └────────────┴────────────┘
          Apache Iceberg / Delta / Hudi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The architecture is simple enough to understand quickly, but deep enough to be useful in real environments.&lt;/p&gt;

&lt;p&gt;The simplicity is at the edge. Clients keep using the protocols they already know.&lt;/p&gt;

&lt;p&gt;The depth is inside the routing and dispatch path, where QueryFlux can apply routing policy, translation, concurrency limits, queueing, health-aware selection, and load balancing without pushing that complexity back into every client.&lt;/p&gt;

&lt;h2&gt;
  
  
  Routing is where the value becomes obvious
&lt;/h2&gt;

&lt;p&gt;QueryFlux evaluates each query against an ordered router chain.&lt;/p&gt;

&lt;p&gt;Routing can be based on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;protocol&lt;/li&gt;
&lt;li&gt;HTTP headers&lt;/li&gt;
&lt;li&gt;SQL text using regex&lt;/li&gt;
&lt;li&gt;client tags&lt;/li&gt;
&lt;li&gt;Python script logic&lt;/li&gt;
&lt;li&gt;compound rules&lt;/li&gt;
&lt;li&gt;fallback routing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That matters because real routing logic is rarely a single condition. In practice, you may want to steer PostgreSQL wire traffic to a low-latency group, send ETL-tagged traffic to a batch-oriented cluster, and use query patterns to catch common fast-path cases.&lt;/p&gt;

&lt;p&gt;A simple example looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;routes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;fast_queries&lt;/span&gt;
    &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;query_regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;.*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;LIMIT&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;d+"&lt;/span&gt;
    &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;duckdb_group&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dashboard_queries&lt;/span&gt;
    &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mysql&lt;/span&gt;
    &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;starrocks_group&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;heavy_analytics&lt;/span&gt;
    &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;query_regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JOIN|GROUP&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;BY|WINDOW"&lt;/span&gt;
    &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;trino_group&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;fallback&lt;/span&gt;
    &lt;span class="na"&gt;fallback&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;athena_group&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The point is not that every deployment should use this exact policy. The point is that the policy becomes explicit, traceable, and shared.&lt;/p&gt;

&lt;p&gt;That alone removes a surprising amount of hidden operational drag.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cluster groups make routing operational
&lt;/h2&gt;

&lt;p&gt;Once a route resolves to a cluster group, QueryFlux handles execution there.&lt;/p&gt;

&lt;p&gt;It supports these load-balancing strategies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;roundRobin&lt;/li&gt;
&lt;li&gt;leastLoaded&lt;/li&gt;
&lt;li&gt;failover&lt;/li&gt;
&lt;li&gt;engineAffinity&lt;/li&gt;
&lt;li&gt;weighted&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It also supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;per-group concurrency limits&lt;/li&gt;
&lt;li&gt;proxy-side queueing when groups are full&lt;/li&gt;
&lt;li&gt;health-aware cluster selection&lt;/li&gt;
&lt;li&gt;background health checks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where QueryFlux starts to feel deeper than a typical proxy.&lt;/p&gt;

&lt;p&gt;It is not only deciding where a query should go. It is also giving operators a place to control how traffic behaves when systems are under load, how overflow is absorbed, and how healthy capacity is chosen.&lt;/p&gt;

&lt;p&gt;That is the part that makes the system practical.&lt;/p&gt;

&lt;h2&gt;
  
  
  SQL translation is built into the path
&lt;/h2&gt;

&lt;p&gt;Multi-engine routing is much more useful when SQL dialect differences do not immediately get in the way.&lt;/p&gt;

&lt;p&gt;QueryFlux integrates dialect-only translation through sqlglot. When needed, it can rewrite SQL into the target engine’s dialect during dispatch.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;clients can keep speaking the SQL they naturally emit&lt;/li&gt;
&lt;li&gt;QueryFlux can normalize for the backend that will actually execute the query&lt;/li&gt;
&lt;li&gt;teams do not need to maintain multiple versions of the same query only because engines differ&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The current design is disciplined here. What is implemented today is dialect-only translation. Schema-aware translation is explicitly on the roadmap.&lt;/p&gt;

&lt;p&gt;That is a good balance: the system is already useful now, and the path to deeper translation is clear.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability is part of the product, not an add-on
&lt;/h2&gt;

&lt;p&gt;A routing layer only works if operators can see what it is doing.&lt;/p&gt;

&lt;p&gt;QueryFlux includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prometheus metrics&lt;/li&gt;
&lt;li&gt;Grafana dashboard&lt;/li&gt;
&lt;li&gt;Admin REST API&lt;/li&gt;
&lt;li&gt;QueryFlux Studio&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The current observability surface covers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;query counts&lt;/li&gt;
&lt;li&gt;query duration&lt;/li&gt;
&lt;li&gt;translation metrics&lt;/li&gt;
&lt;li&gt;running queries&lt;/li&gt;
&lt;li&gt;queued queries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It also supports routing traces, which matters in practice. When you introduce a routing layer, one of the first questions engineers ask is: why did this query land there? QueryFlux has a real answer to that.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this becomes useful quickly
&lt;/h2&gt;

&lt;p&gt;The value of QueryFlux is easier to see in real scenarios than in abstract feature lists.&lt;/p&gt;

&lt;h3&gt;
  
  
  A multi-engine platform
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;BI tools connect through one access layer&lt;/li&gt;
&lt;li&gt;different workloads are routed to the engines they fit best&lt;/li&gt;
&lt;li&gt;backend topology becomes configuration instead of client code&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Dashboard SLA protection
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;low-latency groups can be protected with concurrency limits&lt;/li&gt;
&lt;li&gt;overflow can queue or spill instead of degrading the serving path&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Incremental engine migration
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;weighted routing makes gradual traffic shifts possible&lt;/li&gt;
&lt;li&gt;clients do not need to change while the migration happens&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Mixed workloads on shared data
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;batch, interactive, and exploratory traffic can be separated by policy&lt;/li&gt;
&lt;li&gt;routing intent lives in one place instead of being spread across the stack&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are practical benefits. They show up immediately once a platform becomes multi-engine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting started is intentionally simple
&lt;/h2&gt;

&lt;p&gt;One of the nice things about the project is that the first run experience is straightforward.&lt;/p&gt;

&lt;p&gt;A minimal setup looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/lakeops-org/queryflux.git
&lt;span class="nb"&gt;cd &lt;/span&gt;queryflux/examples/minimal-trino
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--wait&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;QueryFlux on &lt;code&gt;http://localhost:8080&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Trino direct on &lt;code&gt;http://localhost:8081&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Admin API on &lt;code&gt;http://localhost:9000&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Studio on &lt;code&gt;http://localhost:3000&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Postgres on &lt;code&gt;localhost:5433&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can then send a simple query through the Trino HTTP frontend:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8080/v1/statement &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"X-Trino-User: dev"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s2"&gt;"SELECT 42"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are also examples for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a minimal in-memory setup&lt;/li&gt;
&lt;li&gt;a Prometheus + Grafana stack&lt;/li&gt;
&lt;li&gt;a full stack with Trino, StarRocks, and Iceberg-related services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That combination is important. The system is conceptually ambitious, but the on-ramp is short.&lt;/p&gt;

&lt;p&gt;It feels like deep infrastructure without feeling heavy to try.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is already shipped
&lt;/h2&gt;

&lt;p&gt;QueryFlux already includes a substantial set of capabilities on the main branch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Trino HTTP frontend&lt;/li&gt;
&lt;li&gt;PostgreSQL wire frontend&lt;/li&gt;
&lt;li&gt;MySQL wire frontend&lt;/li&gt;
&lt;li&gt;Arrow Flight SQL frontend&lt;/li&gt;
&lt;li&gt;Admin REST API&lt;/li&gt;
&lt;li&gt;Trino backend&lt;/li&gt;
&lt;li&gt;DuckDB backend&lt;/li&gt;
&lt;li&gt;StarRocks backend&lt;/li&gt;
&lt;li&gt;Athena backend&lt;/li&gt;
&lt;li&gt;ordered router chains and routing fallback&lt;/li&gt;
&lt;li&gt;route tracing support&lt;/li&gt;
&lt;li&gt;per-group concurrency limits&lt;/li&gt;
&lt;li&gt;proxy-side queueing&lt;/li&gt;
&lt;li&gt;multiple load-balancing strategies&lt;/li&gt;
&lt;li&gt;health-aware cluster selection&lt;/li&gt;
&lt;li&gt;dialect-only translation through sqlglot&lt;/li&gt;
&lt;li&gt;in-memory persistence&lt;/li&gt;
&lt;li&gt;PostgreSQL persistence&lt;/li&gt;
&lt;li&gt;authentication providers including none, static, OIDC, and LDAP&lt;/li&gt;
&lt;li&gt;authorization modes including allow-all, simple policy, and OpenFGA&lt;/li&gt;
&lt;li&gt;Prometheus metrics&lt;/li&gt;
&lt;li&gt;Grafana dashboard&lt;/li&gt;
&lt;li&gt;QueryFlux Studio&lt;/li&gt;
&lt;li&gt;dynamic config reload from Postgres&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This matters because the project already feels like infrastructure, not just an idea.&lt;/p&gt;

&lt;h2&gt;
  
  
  What comes next
&lt;/h2&gt;

&lt;p&gt;The roadmap extends the same core design.&lt;/p&gt;

&lt;p&gt;Near-term work includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;schema-aware SQL translation&lt;/li&gt;
&lt;li&gt;ClickHouse backend and HTTP frontend&lt;/li&gt;
&lt;li&gt;richer routing telemetry in Studio&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Medium-term work includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cost- and performance-aware routing&lt;/li&gt;
&lt;li&gt;Snowflake backend&lt;/li&gt;
&lt;li&gt;BigQuery backend&lt;/li&gt;
&lt;li&gt;Redis persistence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That roadmap makes sense. It deepens the same access layer instead of changing the project’s center of gravity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solving the data ccess side
&lt;/h2&gt;

&lt;p&gt;The most interesting thing about QueryFlux is not that it is a proxy.&lt;/p&gt;

&lt;p&gt;It is that it is a carefully placed layer in a part of the modern data stack that is still surprisingly underbuilt.&lt;/p&gt;

&lt;p&gt;Open table formats solved the data side.&lt;/p&gt;

&lt;p&gt;QueryFlux is solving the access side.&lt;/p&gt;

&lt;p&gt;That creates an appealing combination:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;conceptually clean architecture&lt;/li&gt;
&lt;li&gt;obvious operational benefits&lt;/li&gt;
&lt;li&gt;room for sophisticated policy and routing logic&lt;/li&gt;
&lt;li&gt;low-friction adoption path&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It feels like the kind of infrastructure that becomes more valuable as the rest of the stack becomes more heterogeneous.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting started
&lt;/h2&gt;

&lt;p&gt;Once a data platform becomes multi-engine, the missing piece is usually not another engine.&lt;br&gt;&lt;br&gt;
It is the access layer.&lt;br&gt;&lt;br&gt;
Clients still need to know where to connect. Routing still leaks into tools and applications. SQL dialect differences still show up at the edges. Capacity handling is still fragmented.&lt;br&gt;&lt;br&gt;
QueryFlux gives that layer a shape.&lt;br&gt;&lt;br&gt;
It makes multi-engine access easier to reason about, easier to operate, and easier to evolve.&lt;br&gt;&lt;br&gt;
That is why it is a compelling project: the idea is deep, the benefits are immediate, and the first experience is simple.&lt;br&gt;&lt;br&gt;
To try it out visit: &lt;a href="https://queryflux.dev/" rel="noopener noreferrer"&gt;https://queryflux.dev/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>database</category>
      <category>dataengineering</category>
      <category>devops</category>
      <category>opensource</category>
    </item>
    <item>
      <title>11 Compaction Optimizations for Iceberg Data Lakes</title>
      <dc:creator>Joni Sar</dc:creator>
      <pubDate>Mon, 16 Feb 2026 12:55:54 +0000</pubDate>
      <link>https://dev.to/jonisar/11-compaction-optimizations-for-iceberg-data-lakes-52h2</link>
      <guid>https://dev.to/jonisar/11-compaction-optimizations-for-iceberg-data-lakes-52h2</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F742rq0qi27n0m42m1rka.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F742rq0qi27n0m42m1rka.png" alt="Iceberg Coontrol Plan provides automated optimzied compaction" width="800" height="333"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Compaction should provide an easy solution to a very difficult problem: controlling file count, minimizing the cost of delete operations over read costs, and keeping metadata growth from turning every query plan into a deep, time-consuming walk through snapshots and manifests.&lt;/p&gt;

&lt;p&gt;The data layer has compaction as its mechanism for solving these issues, but only if compaction is run using some defined set of rules governing the scope of compactions, the thresholds above which compactions are run, and the synchronization of compaction runs with the timing of snapshot expirations and the maintenance of manifests.&lt;/p&gt;

&lt;p&gt;Compaction can be run manually via scripts and schedules, or automatically by a control plane.&lt;/p&gt;

&lt;p&gt;When manual scripts are used to run compaction, they can effectively manage compaction for a small number of tables and one engine. However, as soon as there are many tables, and/or multiple engines, the manual process becomes guesswork; scripts may rewrite too much, may run too infrequently, may interfere with ongoing ingestions, and will cause churn in both the snapshots and manifests.&lt;/p&gt;

&lt;p&gt;A control plane flips this model completely around.&lt;/p&gt;

&lt;p&gt;Instead of rewriting everything all the time, a control plane continuously monitors the health and workload characteristics of tables. Then, only when necessary, a control plane spends rewrite budget on the parts of the tables that actually change performance or cost, while also managing the entire lifecycle of maintaining the table.&lt;/p&gt;

&lt;p&gt;This article will teach you how to run compaction like the production lakes do it: how to choose your base line strategy (bin-packing vs sorting) for compaction, how to prevent rewrites of healthy partitioning, how to limit the scope of each compaction so maintenance remains invisible, how to focus on hot and delete-heavy areas first, how to prevent the continuous commit cadence of streaming data from creating a factory of snapshot partitions, and how to synchronize compaction with the metadata cleanup that is needed to maintain stable query planning.&lt;/p&gt;

&lt;p&gt;For further reading on other aspects of optimizing and maintaining Iceberg, please also refer to:&lt;/p&gt;

&lt;p&gt;For further reading on other aspects of optimizing and maintaining Iceberg, please also refer to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://overcast.blog/7-best-compaction-engines-for-apache-iceberg/" rel="noopener noreferrer"&gt;7 Best Compaction Engines for Apache Iceberg&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://overcast.blog/11-iceberg-performance-optimizations-you-should-know/" rel="noopener noreferrer"&gt;11 Iceberg Performance Optimizations You Should Know&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://overcast.blog/9-apache-iceberg-table-maintenance-tools-you-should-know/" rel="noopener noreferrer"&gt;9 Apache Iceberg Table Maintenance Tools You Should Know&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's move on to the compaction strategies that actually work in real production lakes.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Add a Control Plane for 20x Faster Compaction and Optimized Table Maintenance
&lt;/h2&gt;

&lt;p&gt;If you add a control plane to your lake, LakeOps comes with the most powerful and intelligent compaction engine that exists today, and will also manage and optimize table maintenance and lake operations for you.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnwo5b6j2npuehmuta4r3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnwo5b6j2npuehmuta4r3.png" alt=" " width="800" height="452"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Snowflake-like experience for Iceberg with 10x performance (source: lakeops.dev)&lt;/p&gt;

&lt;p&gt;Instead of being confined to fixed schedules, LakeOps operates as a control plane for Iceberg tables and can know when and how to compact what. It treats compaction as a continuous operational problem rather than a periodic batch job, and optimizes it in real time.&lt;/p&gt;

&lt;p&gt;It analyzes telemetry data from query engines and Iceberg catalogs and uses that data to decide when compaction is actually needed, what to compact, and how. It takes actual usage patterns into account as well.&lt;/p&gt;

&lt;p&gt;Under the hood, LakeOps uses a dedicated Rust-based compaction engine that is designed specifically for Iceberg layouts and metadata behavior. Compaction is coordinated with snapshot expiration, manifest rewrites, orphan cleanup, and statistics maintenance so these operations reinforce each other instead of fighting.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcmtgryzwbhhygpmnmn9s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcmtgryzwbhhygpmnmn9s.png" alt=" " width="800" height="570"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The results are ~20x faster compaction, ~15x faster queries, and ~80% CPU/Storage cost saving.&lt;/p&gt;

&lt;p&gt;🚢 Apache Iceberg compaction is not “background maintenance.” It’s a time-critical optimization problem that directly impacts query latency, metadata growth, and infrastructure cost.&lt;/p&gt;

&lt;p&gt;Learn more about it here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/posts/amit-gilad_apache-iceberg-compaction-time-critical-optimization-activity/" rel="noopener noreferrer"&gt;Apache Iceberg Compaction: Time-Critical Optimization | Amit Gilad&lt;/a&gt; &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In addition to compaction LakeOps gives you control with manual and autopilot modes for all maintenance operations in youe tables and coordinates them with compaction. That includes expiring snapshots, manifest rewrites, orphan file cleanups and more.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq0rrirvr0hsr1al47v0r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq0rrirvr0hsr1al47v0r.png" alt=" " width="800" height="543"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Real-Time comoaction and maintenance optimization with a control plane (source: lakeops.dev/)&lt;/p&gt;

&lt;p&gt;You can choose between manual mode and auto-pilot per table or for groups of tables to control compaction and maintenance proccesses.&lt;/p&gt;

&lt;p&gt;LakeOps also lets you define policies across the lake to enforce your standards, and provides you with dashboards to see and manage all compaction and maintenance processes per table and for the entire lake.&lt;/p&gt;

&lt;p&gt;Learn more: &lt;a href="https://lakeops.dev" rel="noopener noreferrer"&gt;https://lakeops.dev&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Use bin-pack as the baseline correction
&lt;/h2&gt;

&lt;p&gt;Most iceberg tables do not require complex layout schemes. However, they all suffer from file fragmentation.&lt;/p&gt;

&lt;p&gt;Before attempting to fix the problem using sort-based layouts, clustering, or partitioning, take a closer look at the most obvious source of file fragmentation: the writing process itself. In almost all cases, the initial performance decline caused by file fragmentation is due to the streaming ingestion of very small batches; micro-batch commits occur very frequently; and backfill data will always come in very unevenly-sized chunks.&lt;/p&gt;

&lt;p&gt;As a result of the write process, many small Parquet files exist within each partition. None of these files are "broken," and queries will still continue to provide accurate answers. However, as more and more small Parquet files exist within each partition, planning time will increase, the overhead associated with task scheduling will grow, and the number of object store calls will grow.&lt;/p&gt;

&lt;p&gt;This is not a layout issue; it is a file count issue.&lt;/p&gt;

&lt;p&gt;The easiest and most reliable method to solve the file count issue is to use bin-pack compaction. Bin-pack compaction combines small data files into a smaller number of larger, properly sized files. Bin-pack compaction does not alter the existing file sort order, nor does it re-cluster data; it merely normalizes the file size and decreases the metadata overhead associated with having a high number of files.&lt;/p&gt;

&lt;p&gt;In practice, this is usually sufficient.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Approach to Compaction Really Does Improve Performance
&lt;/h3&gt;

&lt;p&gt;Iceberg engines operate at the file level. The more files you add to an engine, the more the engine must plan:&lt;/p&gt;

&lt;p&gt;More manifest entries must be read&lt;br&gt;&lt;br&gt;
More file footers must be inspected&lt;br&gt;&lt;br&gt;
More scan tasks must be scheduled&lt;br&gt;&lt;br&gt;
More file references to delete must be tracked&lt;/p&gt;

&lt;p&gt;As the number of files grows exponentially, so too does the planning time. Bin-pack compaction eliminates the number of files physically on disk, while maintaining the existing logical layout. Therefore, there are fewer planning reads, and fewer tasks to schedule without requiring additional shuffling.&lt;/p&gt;

&lt;p&gt;A good rule of thumb for most production tables is to target file sizes ranging from 128 MB to 512 MB. The specific size range will depend on the engine, and the workload. What is important is consistency.&lt;/p&gt;
&lt;h3&gt;
  
  
  Start with the Default Rewriting Method
&lt;/h3&gt;

&lt;p&gt;Unless you specify otherwise, Iceberg uses the bin-pack rewriting method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rewrite_data_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; 
  &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'db.events'&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify the effectiveness of the rewrite:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;file_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_size_in_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_size_gb&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You want to see fewer files and the same total size. If the total size is substantially changed, something other than file fragmentation is occurring.&lt;/p&gt;

&lt;h3&gt;
  
  
  Specify Your Target File Size
&lt;/h3&gt;

&lt;p&gt;If file fragmentation continues, specify a target file size at the table level:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;TBLPROPERTIES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt; 
  &lt;span class="s1"&gt;'write.target-file-size-bytes'&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'536870912'&lt;/span&gt; &lt;span class="c1"&gt;-- 512MB&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And then perform the rewrite with the same target file size:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rewrite_data_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; 
  &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'db.events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="k"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; 
    &lt;span class="s1"&gt;'target-file-size-bytes'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'536870912'&lt;/span&gt; 
  &lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without specifying a target file size, the engine and writer will create files of varying sizes, which will require subsequent compactions to restore the target file size.&lt;/p&gt;

&lt;h3&gt;
  
  
  Define Thresholds to Prevent Unnecessary Rewrites
&lt;/h3&gt;

&lt;p&gt;At scale, performing rewrites on healthy data wastes compute resources. Define the following thresholds to prevent unnecessary rewrites:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rewrite_data_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; 
  &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'db.events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="k"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; 
    &lt;span class="s1"&gt;'min-input-files'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'5'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="s1"&gt;'min-file-size-bytes'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'134217728'&lt;/span&gt; &lt;span class="c1"&gt;-- 128MB &lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will ensure that only those partitions with a minimum of five input files are rewritten. Partitions with one or two files of reasonable sizes are ignored.&lt;/p&gt;

&lt;p&gt;These thresholds can be made tighter by defining multiple conditions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rewrite_data_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; 
  &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'db.events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="k"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; 
    &lt;span class="s1"&gt;'min-input-files'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'5'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="s1"&gt;'min-file-size-bytes'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'134217728'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;-- 128MB &lt;/span&gt;
    &lt;span class="s1"&gt;'target-file-size-bytes'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'536870912'&lt;/span&gt; &lt;span class="c1"&gt;-- 512MB &lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Therefore, compaction will now only run under the following conditions:&lt;/p&gt;

&lt;p&gt;There are enough small files to warrant consolidation&lt;br&gt;&lt;br&gt;
Files are currently below a reasonable size threshold&lt;br&gt;&lt;br&gt;
There is a clear target to normalize to&lt;/p&gt;

&lt;p&gt;This transforms compaction from an automated rewrite operation to a targeted repair operation.&lt;/p&gt;

&lt;p&gt;Prior to executing a compaction operation, review the distribution of files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="k"&gt;partition&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;file_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_size_in_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_mb&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;files&lt;/span&gt; 
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;partition&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_date&lt;/span&gt; 
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;file_count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt; 
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a partition contains four files averaging 480 MB and your target file size is 512 MB, rewriting the partition will not significantly affect either planning time or scan time.&lt;/p&gt;

&lt;p&gt;However, if another partition contains 180 files averaging 25 MB, that partition clearly requires compaction.&lt;/p&gt;

&lt;p&gt;Decisions regarding compaction operations should be based on this type of signal. Not a schedule.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Conditional Compaction
&lt;/h2&gt;

&lt;p&gt;Another way to waste compute cycles in an Iceberg lake is to blindly compact data on a regular basis.&lt;/p&gt;

&lt;p&gt;It usually begins innocently. A periodic rewrite job is created to "keep things tidy." For a period of time, it appears to help. Eventually, however, it begins to rewrite partitions that were previously healthy. Each rewrite generates new files, new snapshots, and updates the manifest list. Although none of the files are "broken," the system is spending compute cycles on processing data that does not improve performance.&lt;/p&gt;

&lt;p&gt;Compaction should be used as a corrective measure, not as a routine activity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Unconditional Rewrites Cause Churn
&lt;/h3&gt;

&lt;p&gt;Each time you rewrite data files, Iceberg:&lt;/p&gt;

&lt;p&gt;Creates new data files&lt;br&gt;&lt;br&gt;
Generates a new snapshot&lt;br&gt;&lt;br&gt;
Updates manifest lists&lt;br&gt;&lt;br&gt;
Increases metadata history&lt;/p&gt;

&lt;p&gt;If the files being rewritten are already close to their target size, you are essentially cycling the data through the system. The added churn causes increased metadata depth and longer planning times over time.&lt;/p&gt;

&lt;p&gt;At scale, this overhead becomes noticeable.&lt;/p&gt;

&lt;p&gt;Your objective is not to compact frequently. It is to compact when the layout is measurably unhealthy.&lt;/p&gt;
&lt;h3&gt;
  
  
  Implement Gateways to Control Rewrites
&lt;/h3&gt;

&lt;p&gt;Iceberg provides a rewrite_data_files procedure that allows you to implement gateway conditions. The most effective condition is min_input_files.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rewrite_data_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; 
  &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'db.events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="k"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; 
    &lt;span class="s1"&gt;'min-input-files'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'3'&lt;/span&gt; 
  &lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using this condition, Iceberg will only rewrite file groups that contain at least three files. Partitions that already have one or two files that are of reasonable size are excluded.&lt;/p&gt;

&lt;p&gt;This is a relatively small change to make, but in large lakes, it will significantly reduce unnecessary compaction.&lt;/p&gt;

&lt;p&gt;You can implement additional conditions to make this gateway even tighter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rewrite_data_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; 
  &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'db.events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="k"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; 
    &lt;span class="s1"&gt;'min-input-files'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'5'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="s1"&gt;'min-file-size-bytes'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'134217728'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;-- 128MB &lt;/span&gt;
    &lt;span class="s1"&gt;'target-file-size-bytes'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'536870912'&lt;/span&gt; &lt;span class="c1"&gt;-- 512MB &lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Compaction will now only run when the following conditions are met:&lt;/p&gt;

&lt;p&gt;There are sufficient small files to warrant consolidation&lt;br&gt;&lt;br&gt;
Files are currently below a reasonable size threshold&lt;br&gt;&lt;br&gt;
There is a valid target to normalize to&lt;/p&gt;

&lt;p&gt;This converts compaction from an automatic rewrite into a targeted repair operation.&lt;/p&gt;
&lt;h3&gt;
  
  
  Let the Table State Drive the Decision
&lt;/h3&gt;

&lt;p&gt;Prior to initiating a compaction operation, review the distribution of files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="k"&gt;partition&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;file_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_size_in_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_mb&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;files&lt;/span&gt; 
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;partition&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_date&lt;/span&gt; 
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;file_count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt; 
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a partition has four files averaging 480 MB and your target file size is 512 MB, rewriting the partition will not materialy affect either planning or scanning time.&lt;/p&gt;

&lt;p&gt;However, if another partition has 180 files averaging 25 MB, that partition is a prime candidate for compaction.&lt;/p&gt;

&lt;p&gt;Compaction decisions should be based on signals such as this. Not a schedule.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Limit Rewrite Scope Per Run
&lt;/h2&gt;

&lt;p&gt;Big compaction jobs appear great on paper. In reality, they are among the easiest ways to create instability in a production lake.&lt;/p&gt;

&lt;p&gt;Backfills, partition evolutions, or long stretches of time without maintenance can quickly turn terabytes of data into rewrite candidates. If you don't specify any boundaries, Iceberg will rewrite everything that meets its criteria. The outcome is well understood: long-running jobs, significant shuffles, large amounts of object store I/O, significant increases in snapshot sizes, and sometimes even cluster contention with users' queries.&lt;/p&gt;

&lt;p&gt;Compaction operates at its best when it is incremental.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why "Rewrite Everything" Is A Risk
&lt;/h3&gt;

&lt;p&gt;When you rewrite a large section of a table in a single pass, you are executing a number of costly operations simultaneously:&lt;/p&gt;

&lt;p&gt;Reading numerous data files&lt;br&gt;&lt;br&gt;
Shuffle and rewrite them&lt;br&gt;&lt;br&gt;
Create a lot of new files&lt;br&gt;&lt;br&gt;
Create a new large snapshot&lt;br&gt;&lt;br&gt;
Possibly rewrite manifests&lt;/p&gt;

&lt;p&gt;Regardless of whether it is successful, you have produced a major maintenance event. Even if it fails in the middle of its execution, you will lose some resources and extend the length of your maintenance period.&lt;/p&gt;

&lt;p&gt;Operationally, smaller, and more frequent corrections are safer than infrequent large-scale rewrites.&lt;/p&gt;
&lt;h3&gt;
  
  
  Restrict rewrite size explicitly
&lt;/h3&gt;

&lt;p&gt;Rewrite_data_files is a method that allows Iceberg to provide parameters that can help limit the amount of effort that is put into a single run of rewriting data.&lt;/p&gt;

&lt;p&gt;To illustrate, you may desire to restrict the maximum number of file group rewrites:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rewrite_data_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'db.events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'max-file-group-rewrites'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'20'&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This restricts the number of rewrite groups executed in a single pass of rewrite_data_files. Instead of rewriting hundreds of partitions in one pass, you will execute a controlled sequence of batch passes.&lt;/p&gt;

&lt;p&gt;You can also utilize these with eligibility thresholds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rewrite_data_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'db.events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'min-input-files'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'5'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'max-file-group-rewrites'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'20'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'target-file-size-bytes'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'536870912'&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Compaction now becomes predictable. Each pass of compaction corrects a finite amount of drift.&lt;/p&gt;

&lt;h3&gt;
  
  
  Disperse correction over cycles
&lt;/h3&gt;

&lt;p&gt;There is rarely a good reason to try to "Fix everything tonight" if a table contains months of small files due to a large number of small writes.&lt;/p&gt;

&lt;p&gt;Instead, a more sustainable and steady state approach would be:&lt;/p&gt;

&lt;p&gt;Compact data with limited rewrite scope.&lt;br&gt;&lt;br&gt;
Permit normal operation of user workloads.&lt;br&gt;&lt;br&gt;
Repeat on the subsequent maintenance cycle.&lt;/p&gt;

&lt;p&gt;Within a couple of cycles, fragmentation will decrease significantly, without generating a maintenance peak.&lt;/p&gt;

&lt;p&gt;This also produces less "snapshot shock." Instead of a large, single-pass rewrite snapshot replacing nearly half of the table, you generate a series of smaller, incremental snapshots.&lt;/p&gt;
&lt;h3&gt;
  
  
  Prioritize rather than rewriting randomly
&lt;/h3&gt;

&lt;p&gt;Limiting rewrite scope, prioritizing becomes essential.&lt;/p&gt;

&lt;p&gt;Practically speaking, you want to rewrite:&lt;/p&gt;

&lt;p&gt;Partition groups with the greatest number of files.&lt;br&gt;&lt;br&gt;
Partition groups with the least average file size.&lt;br&gt;&lt;br&gt;
Partition groups with the largest numbers of deletions.&lt;br&gt;&lt;br&gt;
Partition groups with the most user queries.&lt;/p&gt;

&lt;p&gt;You can find the worst offending partition groups using the following query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;partition&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;file_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_size_in_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_mb&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;files&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;partition&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_date&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;file_count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then either target the offending partition groups directly using a WHERE statement or permit your orchestration layer to determine the highest impact groups to rewrite first.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rewrite_data_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'db.events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'event_date &amp;gt;= DATE &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;2026-01-01&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'max-file-group-rewrites'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'10'&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This limits both the logical scope (only recent partitions) and the physical rewrite volume.&lt;/p&gt;

&lt;h3&gt;
  
  
  Maintain Maintenance Invisible To Users
&lt;/h3&gt;

&lt;p&gt;The ultimate objective of limiting rewrite scope is not merely cluster stability. It is predictability.&lt;/p&gt;

&lt;p&gt;When compaction is done in a manner such that each run is relatively small and bounded:&lt;/p&gt;

&lt;p&gt;Maintenance windows are brief.&lt;br&gt;&lt;br&gt;
Resource spikes are under control.&lt;br&gt;&lt;br&gt;
Snapshots grow gradually.&lt;br&gt;&lt;br&gt;
Query performance improves incrementally, rather than suddenly.&lt;/p&gt;

&lt;p&gt;In production lakes, stability is usually more important than the rate of correction. Incremental correction is generally preferred to dramatic restructuring.&lt;/p&gt;
&lt;h2&gt;
  
  
  5. Focus On Hot Partition Groups
&lt;/h2&gt;

&lt;p&gt;In the majority of Iceberg tables, compaction impact is not uniformly distributed.&lt;/p&gt;

&lt;p&gt;A small segment of partitions is responsible for most of the pain: They receive the most writes (thus they tend to fragment the quickest) and they receive the most reads (thus every additional file appears as both planning + scan overhead). If you rewrite the partitions that are "hot", you will typically gain 80% of the benefits with only a fraction of the rewrite volume.&lt;/p&gt;

&lt;p&gt;The simplest approach to achieve this is to treat compaction as a rolling window problem.&lt;/p&gt;
&lt;h3&gt;
  
  
  "Hot" Generally Means Two Things
&lt;/h3&gt;

&lt;p&gt;Hot partitions generally represent partitions that are still active:&lt;/p&gt;

&lt;p&gt;They are still receiving new files from streaming or micro-batch systems&lt;br&gt;&lt;br&gt;
They are the partitions that your analysts / dashboards / downstream jobs are accessing constantly.&lt;/p&gt;

&lt;p&gt;This results in two operational principles:&lt;/p&gt;

&lt;p&gt;Compact relatively recent partitions regularly, since they will accumulate the most small files.&lt;br&gt;&lt;br&gt;
Do not compact the actively written partitions unless you know you can tolerate collisions with writers.&lt;/p&gt;

&lt;p&gt;AWS describes this for Iceberg compaction: Use a where predicate to exclude actively written partitions, so that you do not encounter data conflicts with writers and leave only metadata conflicts that Iceberg can normally resolve.&lt;/p&gt;
&lt;h3&gt;
  
  
  Identify Your Rolling Window With Where
&lt;/h3&gt;

&lt;p&gt;Iceberg's Spark procedure provides a where predicate for filtering which files (and hence which partitions) qualify for rewriting.&lt;/p&gt;

&lt;p&gt;An extremely common use case is "Compact everything older than the current ingest window":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rewrite_data_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'db.events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'event_date &amp;lt; DATE &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;2026-02-10&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'target-file-size-bytes'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'536870912'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'min-input-files'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'5'&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This maintains compaction away from the partition(s) that are currently being mutated, while continually cleaning up yesterday's and previous data.&lt;/p&gt;

&lt;p&gt;If your table is partitioned hourly, perform the same concept at the hour granularity:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rewrite_data_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'db.events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'event_hour &amp;lt; TIMESTAMP &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;2026-02-11 12:00:00&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'target-file-size-bytes'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'536870912'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'min-input-files'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'5'&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The main principle here is not the specific cut-off. It is maintaining a buffer so that compaction does not conflict with ingestion.&lt;/p&gt;

&lt;h3&gt;
  
  
  Locate the Worst Partition Groups First
&lt;/h3&gt;

&lt;p&gt;Even within the "hot-ish" window, not all partition groups are equally bad. You can typically find the worst offender simply by examining the file count and average size:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;partition&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;file_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_size_in_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_mb&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;files&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;partition&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_date&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;file_count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you plan to compact only a limited portion of the data per pass (as you probably should), this query will indicate where you will obtain the largest initial benefit.&lt;/p&gt;

&lt;h3&gt;
  
  
  If You Must Compact Partition Groups That Are Still Receiving Late Data
&lt;/h3&gt;

&lt;p&gt;There are certain types of workloads that receive late-arriving events, updates, or merges that keep older partition groups "active." If you compact them regardless, you may periodically collide with writers.&lt;/p&gt;

&lt;p&gt;Iceberg includes a partial progress mode that commits compaction in smaller portions, rather than committing a large block, which reduces the collision risk associated with retries when conflicts occur.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rewrite_data_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'db.events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'event_date &amp;gt;= DATE &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;2026-02-01&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt; AND event_date &amp;lt; DATE &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;2026-02-10&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'partial-progress.enabled'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'true'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'partial-progress.max-commits'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'10'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'max-concurrent-file-group-rewrites'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'10'&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You are making a tradeoff of "one clean commit" versus "multiple smaller commits that fail at lower expense." In actual production lakes with continuous write activity, that trade is typically worthwhile.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where a Control Plane Helps
&lt;/h3&gt;

&lt;p&gt;Once you have numerous tables, "hot partition groups" ceases to be something you determine through experience. You require a loop that continuously determines the "hot partition groups" based on the actual read/write usage of your system and then applies the rolling window concept to those partition groups.&lt;/p&gt;

&lt;p&gt;That is the point at which a control plane such as LakeOps becomes useful: It is not adding a new compaction algorithm as much as it is determining where to expend your rewrite budget based on real workload telemetry and applying that determination in a consistent manner across hundreds of tables.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Sort or Z-Order When Scan Efficiency Is The Bottleneck
&lt;/h2&gt;

&lt;p&gt;Bin-packing compaction decreases the number of files. However, it does not affect how the data is organized internally within those files.&lt;/p&gt;

&lt;p&gt;If partitions are appropriately sized and file counts are reasonable, but selectivity-based queries are scanning a disproportionately larger number of data files than anticipated, the cause is likely clustering. You will commonly observe the following pattern: Planning times are consistent, partition pruning is functioning, but filtered queries are reading a large percent of files within a given partition. This is the point where sorting-based compaction becomes relevant.&lt;/p&gt;

&lt;p&gt;Data engines depend on file-based statistical information (such as min and max values) to determine whether a file can be excluded from processing. When data is written in a random fashion, the value ranges between files overlap greatly. Therefore, even selective predicates typically cannot exclude a large number of files. Sorting changes this. When data is ordered by a frequently used filter column, each file will generally contain a narrower value range than previously existed, thus allowing more files to be skipped based upon the predicate and reducing the number of bytes to be scanned.&lt;/p&gt;

&lt;p&gt;If there is one column that dominates your predicates (e.g., event_time within a date-partitioned table), a simple sort-based rewrite is typically sufficient:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rewrite_data_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'db.events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;strategy&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'sort'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;sort_order&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'event_time ASC NULLS LAST'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'target-file-size-bytes'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'536870912'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'min-input-files'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'5'&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This rearranges the rows of data within files so that time-based predicates can remove more files early in the process. The effect is evident not only in terms of runtime, but also in the reduction in scanned bytes and the number of splits generated.&lt;/p&gt;

&lt;p&gt;If your workload filters on multiple columns (e.g., user_id, event_type, and occasionally device_type), Z-order is typically a superior choice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rewrite_data_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'db.events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;strategy&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'sort'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;sort_order&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'zorder(user_id, event_type)'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'target-file-size-bytes'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'536870912'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'min-input-files'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'5'&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Z-ordering enhances locality across multiple dimensions. While Z-ordering will never perfectly optimize any individual column, it typically minimizes overall scan expansion when filter patterns vary.&lt;/p&gt;

&lt;p&gt;Important Note: Defining a sort order at the table level does not rewrite historical data. It only affects newly-written data. All existing files will remain unmodified until a rewrite occurs.&lt;/p&gt;

&lt;p&gt;It is typical to define a sort order and then expect improvements, only to discover that no changes occurred to the physical layout of the data.&lt;/p&gt;

&lt;p&gt;After performing a sort-based compaction, verify it correctly. Examine the number of files that are scanned for common predicates. Compare the total bytes scanned before and after. Runtime can be difficult to quantify in shared environments; however, file count and total bytes scanned are more reliable metrics.&lt;/p&gt;

&lt;p&gt;Sorting is more resource-intensive than bin-packing. Sorting involves additional shuffle and CPU overhead during compaction. If you were to blindly apply sorting to all partitions, the costs of maintenance could potentially exceed the query performance improvements. In general, sorting works best when applied selectively: Target high-traffic partitions; Align the sort with real filter patterns; Apply rewrites incrementally.&lt;/p&gt;

&lt;p&gt;When scan efficiency is the primary bottleneck rather than file count, sorting or Z-order is one of the few techniques that will reliably enhance pruning. The key is to apply sorting or Z-order in a manner that aligns with the characteristics of your workload.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Compact delete-heavy partitions deliberately
&lt;/h2&gt;

&lt;p&gt;You can view a table, note that file sizes are healthy, and believe that compaction is being handled - yet the queries are slowing down.&lt;/p&gt;

&lt;p&gt;A common cause is delete files.&lt;/p&gt;

&lt;p&gt;Iceberg does not immediately write out data files when rows are modified or deleted. Rather, it will store the row position deletes or equality deletes with the data. At read time, the engine will combine the data files with their associated delete files. Although this provides an efficient mechanism for writing, as delete files become numerous, every query will incur additional cost.&lt;/p&gt;

&lt;p&gt;The affect is subtle. File sizes appear fine. Bin-pack has already normalized fragmentation. However, scan CPU increases, and partitions that are update-heavy begin to perform poorer than append-only partitions.&lt;/p&gt;

&lt;p&gt;You may usually verify this by examining the delete file distribution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;partition&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;delete_file_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;delete_files&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;partition&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;event_date&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;delete_file_count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If certain partitions contain a high concentration of delete files, that is an indication that reads are performing more work than necessary.&lt;/p&gt;

&lt;p&gt;Iceberg supports delete aware compaction. Instead of rewriting files solely based upon their size, you may specify a threshold for the ratio of deleted rows and have Iceberg rewrite data files that meet this criteria. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rewrite_data_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'db.events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'delete-ratio-threshold'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'0.3'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'remove-dangling-deletes'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'true'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'target-file-size-bytes'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'536870912'&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this case, Iceberg will rewrite data files that are severely impacted by deletes and will remove the physical representation of deleted rows as well as the associated delete files.&lt;/p&gt;

&lt;p&gt;The practical result is that queries will no longer require the merging of as many delete files at runtime; CPU decreases; scan cost stabilizes; and planning becomes easier since there are fewer auxiliary files to track.&lt;/p&gt;

&lt;p&gt;This has a significant impact primarily on tables that have been subjected to upserts, CDC pipelines, and/or frequent merges. Event tables that are used exclusively for appending data do not typically exhibit this behavior. Similarly, dimension tables and slowly changing datasets do.&lt;/p&gt;

&lt;p&gt;Just like with any other compaction strategy, maintain focus. Use the combination of delete thresholds, partition filters, and rewrite limits to optimize the compaction strategy. It is unnecessary to rewrite the entire table simply because a handful of partitions contain a high amount of delete activity.&lt;/p&gt;

&lt;p&gt;Healthy file size does not ensure healthy performance. If delete files comprise the majority of a partition, the compaction strategy must specifically target these delete files - otherwise, read cost will continue to increase regardless of whether the layout appears to be "correct".&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Rewrite position delete files individually when needed
&lt;/h2&gt;

&lt;p&gt;Rewriting data files does not necessarily resolve all delete related issues.&lt;/p&gt;

&lt;p&gt;In many update-intensive workloads, position delete files accumulate more quickly than data files are rewritten. Even if you execute a delete aware compaction strategy, you can still find yourself with a large number of position delete files attached to otherwise healthy data files.&lt;/p&gt;

&lt;p&gt;Even after executing a compaction strategy that reduces the number of data files, the engine still has to open and apply the delete files during a read operation. Therefore, as delete files accumulate, the scan overhead will remain greater than it should be.&lt;/p&gt;

&lt;p&gt;This is particularly prevalent in tables that receive regular upserts or merges. Tables that are subject to append-only inserts do not typically exhibit this type of behavior. However, CDC pipelines and dimension tables do.&lt;/p&gt;

&lt;p&gt;Iceberg permits the explicit rewriting of position delete files as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rewrite_position_delete_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'db.events'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Rewrite smaller delete files into fewer, larger ones and attempt to eliminate obsolete entries wherever possible. The objective of this approach is not merely to reduce the total number of files, but rather to minimize the number of files that the engine has to open during a read operation and thus improve performance.&lt;/p&gt;

&lt;p&gt;You may also limit the scope of this rewrite operation, similar to how you limit the scope of data file rewrites, by specifying the partition(s) to be rewrote:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rewrite_position_delete_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'db.events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'event_date &amp;gt;= DATE &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;2026-02-01&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you have already rewritten the data files and the performance of your application has not improved, inspect the distribution of delete files. In some cases, rewriting data files will reduce fragmentation, but will leave a heavy delete layer behind. In such cases, the separate rewriting of delete files will be required.&lt;/p&gt;

&lt;p&gt;Similar to all of the strategies described throughout this guide, keep the rewriting of delete files focused. There is little to be gained by rewriting delete files across the entire table if only a limited number of partitions are subject to upserts.&lt;/p&gt;

&lt;p&gt;In addition, treating the maintenance of delete files as a separate entity maintains the predictability of read cost. Otherwise, even though the data files themselves are sized appropriately, the accumulated overhead that is generated by the engine processing the delete files can slow over time.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. Lower the commit frequency for streaming workloads
&lt;/h2&gt;

&lt;p&gt;When writing to Iceberg from streaming or micro-batch applications, the commit frequency is one of the largest factors contributing to the overall cost multiplier in the system.&lt;/p&gt;

&lt;p&gt;Each commit generates a new snapshot and produces new metadata work. This includes updating the manifest and creating new, small data files. As you commit every few seconds, you don't simply create small files; you create a long chain of snapshots and a continuous flow of metadata churn. While nothing "breaks," the planning is slowed and maintenance must continually struggle to keep pace with the increasing overhead.&lt;/p&gt;

&lt;p&gt;The frustrating aspect is that teams typically attempt to resolve this issue by applying more compaction, while the true solution lies upstream: stop committing as often.&lt;/p&gt;

&lt;h3&gt;
  
  
  The benefits of modifying the commit frequency
&lt;/h3&gt;

&lt;p&gt;When you increase the interval between commits, you generally gain three tangible benefits at once.&lt;/p&gt;

&lt;p&gt;First, you generate fewer snapshots, resulting in fewer pieces of metadata that the engine has to evaluate during planning.&lt;/p&gt;

&lt;p&gt;Second, you generate fewer manifests / manifest updates overall.&lt;/p&gt;

&lt;p&gt;Third, each commit contains more data, resulting in larger files (or fewer small files) and therefore reduced compaction pressure.&lt;/p&gt;

&lt;p&gt;You're essentially lowering entropy at the source.&lt;/p&gt;

&lt;h3&gt;
  
  
  Structured Streaming using Spark: Set a valid trigger interval
&lt;/h3&gt;

&lt;p&gt;A common antipattern is to configure structured streaming to run "as fast as possible" or with a very short trigger. If you are writing to Iceberg tables, avoid this practice unless you have a true requirement for sub-minute freshness.&lt;/p&gt;

&lt;p&gt;The following shows the configuration for setting a reasonable commit interval in PySpark:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;writeStream&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iceberg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;outputMode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;append&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;checkpointLocation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3://prod-checkpoints/events/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trigger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;processingTime&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1 minute&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toTable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prod.db.events&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your service level agreement (SLA) allows it, increase the commit interval to 2–5 minutes. In most analytics lakes, this tradeoff is worthwhile: slightly increased data freshness lag in exchange for significantly decreased metadata churn and less maintenance overhead.&lt;/p&gt;

&lt;h3&gt;
  
  
  Flink: Commit frequency follows checkpointing
&lt;/h3&gt;

&lt;p&gt;For Flink, Iceberg commits typically follow the checkpoint intervals. If you checkpoint every 30 seconds, you are essentially committing every 30 seconds. That's a lot.&lt;/p&gt;

&lt;p&gt;A more reasonable interval would be minutes, not seconds, unless you are operating a low-latency serving pipeline.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// 5 minutes&lt;/span&gt;
&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;enableCheckpointing&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;300_000&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ultimately, the best value will depend on recovery requirements and end-to-end latency needs. However, the underlying premise is the same: do not checkpoint so frequently that you turn your Iceberg table into a snapshot factory.&lt;/p&gt;

&lt;h3&gt;
  
  
  A simple method to select the interval
&lt;/h3&gt;

&lt;p&gt;Do not overcomplicate things. Ask yourself: what is the longest delay that your downstream consumers can tolerate for data freshness?&lt;/p&gt;

&lt;p&gt;If the response is "near real-time", you may still be fine at 1 minute. If the response is "a few minutes", take advantage of the situation and commit every few minutes.&lt;/p&gt;

&lt;p&gt;If the response is "we run dashboards hourly", then committing every 10 seconds is just self-imposed suffering.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sanity check
&lt;/h3&gt;

&lt;p&gt;If you observe thousands of snapshots being created daily for a single table, this is typically an indication that your commit cadence is too aggressive for an analytics lake. You can certainly use Iceberg as a means of generating data in this manner - it is designed to be correct - but you will pay for it in terms of planning overhead and ongoing maintenance.&lt;/p&gt;

&lt;p&gt;Lowering the commit frequency is one of the few optimization techniques that will decrease costs and improve stability, independent of whether you adjust the compaction strategy. Fixing this earlier is beneficial because once you have multiple dozen or hundred of streaming-written tables, this behavior will dictate your operational overhead.&lt;/p&gt;

&lt;h2&gt;
  
  
  10. Stop the repair loop; fix the write path
&lt;/h2&gt;

&lt;p&gt;Write paths that produce too many small files or heavily skewed partitions will make compaction a never-ending battle. As long as you continue to re-write the same issues, the lake will always drift back into an unhealthy condition.&lt;/p&gt;

&lt;p&gt;The majority of "we need more compaction" situations are actually "our write path is poorly configured."&lt;/p&gt;

&lt;h3&gt;
  
  
  Begin with Distribution Mode
&lt;/h3&gt;

&lt;p&gt;Small file generation is a common result of poor data distribution during the write process. A common scenario is when one writer (task) has the majority of the data for a given partition, it emits a couple of large files, while the remaining writers (tasks) emit a large number of smaller files. Worse, if the data distribution is unstable between batches, you will experience fragmentation regardless of how often you compact.&lt;/p&gt;

&lt;p&gt;Iceberg allows you to configure the way data is written across multiple writers. A good baseline configuration for many workloads is hash, as it generally spreads rows out more evenly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;TBLPROPERTIES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="s1"&gt;'write.distribution-mode'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'hash'&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This does not completely remove the necessity for compaction, but it helps slow down how quickly fragmentation occurs again.&lt;/p&gt;

&lt;h3&gt;
  
  
  Establish a Target File Size for Writers at the Table Level
&lt;/h3&gt;

&lt;p&gt;When writers do not have a target file size, you will see variability in the file sizes produced by writers across the engine and job. Some will produce 16MB files, some will produce 1GB files, etc., and compaction will continually attempt to normalize the mess.&lt;/p&gt;

&lt;p&gt;Create a target file size for writers at the table level and maintain it consistent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;TBLPROPERTIES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="s1"&gt;'write.target-file-size-bytes'&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'536870912'&lt;/span&gt; &lt;span class="c1"&gt;-- 512MB&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you have established a target file size for writers at the table level, compaction will transition from "fix everything" to "fix the outliers."&lt;/p&gt;

&lt;h3&gt;
  
  
  Optimize the Writer and Not Just the Table
&lt;/h3&gt;

&lt;p&gt;Another common reason writers produce small files is due to the number of tasks that are utilized when writing versus the amount of data in each micro-batch or partition. The easiest method to optimize this is to adjust the degree of parallelism at the point of write.&lt;/p&gt;

&lt;p&gt;If you are experiencing hundreds of files per partition per batch, consider reducing the number of output partitions prior to writing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;df&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;repartition&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="n"&gt;pick&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;number&lt;/span&gt; &lt;span class="n"&gt;that&lt;/span&gt; &lt;span class="n"&gt;corresponds&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;your&lt;/span&gt; &lt;span class="n"&gt;cluster&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writeTo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prod.db.events&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You don't have to find the optimal number. All you need to do is stop creating 1000 small files because your job happened to run with 1000 tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do Not Create "Hot Partitions" by Design
&lt;/h3&gt;

&lt;p&gt;Some datasets inherently skew towards certain items, such as one customer producing 70% of the events, or a specific date receiving a massive backfill. When a partitioning scheme directs a large amount of data into a single partition, you will continually be fighting it with compaction.&lt;/p&gt;

&lt;p&gt;This is one of the few instances where adjusting the partitioning scheme to provide less skewness can greatly reduce compaction load. A common strategy is to add another dimension to the partitioning scheme (or create a derived shard key) to ensure that a single logical partition does not become a physical hotspot.&lt;/p&gt;

&lt;p&gt;You do not need to re-design the entire table. One additional dimension may be sufficient to prevent the worst skewness.&lt;/p&gt;

&lt;h2&gt;
  
  
  11. Maintain Metadata With Compaction
&lt;/h2&gt;

&lt;p&gt;You can obtain the desired file sizes, reduce the number of files, and yet still end up with a table whose performance and cost characteristics degrade over time. This is typically a metadata issue and not a physical layout issue.&lt;/p&gt;

&lt;p&gt;Every time you run compaction, you create a new snapshot. Every snapshot adds to the table's history. Each manifestation accumulates. The old metadata remains until something removes the history. If nothing removes the history, the table becomes deeper and more expensive to reason through, even if the physical data files appear to be clean.&lt;/p&gt;

&lt;p&gt;This is the most common trap: Teams focus on rewrite_data_files and neglect what happens to the snapshots and manifestations after the fact.&lt;/p&gt;

&lt;p&gt;In general, compaction should be run immediately followed by snapshot expiration. If you keep thousands of historical snapshots around "just in case," the engine still has to traverse the lineage when performing planning. Over time, this shows up as slower metadata reads and longer planning times.&lt;/p&gt;

&lt;p&gt;Typically, a snapshot expiration would look something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;expire_snapshots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'db.events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;retain_last&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The actual number of retained snapshots will vary based on your rollback and time-travel policies, however, it is critical that you establish a clear retention policy. Retention policies greater than infinite are rarely what you truly need.&lt;/p&gt;

&lt;p&gt;After expiring snapshots, it is also beneficial to perform manifest consolidation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rewrite_manifests&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'db.events'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Even if your file sizes are acceptable, fragmented manifests will require the planner to open and evaluate numerous small metadata files. Manifest consolidation will reduce the fan-out and stabilize the planner costs.&lt;/p&gt;

&lt;p&gt;Then there is the removal of orphans. Failed jobs, speculative tasks, and partial re-writes will leave files in object storage that are no longer referenced by the table. Over months, this is a lot of money. Removing these orphans will help to predictably manage the lake.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;remove_orphan_files&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'db.events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;older_than&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="s1"&gt;'2026-02-10 00:00:00'&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The older_than guardrail is essential. You cannot afford to be racing with active writers in a production environment. Safety is more important than aggression.&lt;/p&gt;

&lt;p&gt;What complicates this is that the above actions are not independent. How you retain snapshots impacts what you can remove. How frequently the table is updated determines how frequently you need to rewrite manifests. Removing orphans is related to when commits occur and streaming jobs.&lt;/p&gt;

&lt;p&gt;Therefore, compaction is not simply a single maintenance action. It is part of a life cycle. Physical data files, snapshots, manifests, and physical storage move in tandem.&lt;/p&gt;

&lt;p&gt;At small scales, you can run these actions manually and get away with it. At larger scales, you need to be able to enforce consistency. Tables drift in various ways at varying rates. Without coordinated metadata maintenance, you will continue to repair file layout, while the metadata layer quietly continues to grow.&lt;/p&gt;

&lt;p&gt;Your goal is not simply to minimize the number of small files. Your goal is to maintain a table whose performance and cost characteristics remain stable over time. Compaction addresses the data layer. The three above actions address the metadata layer to prevent it from becoming the next bottleneck.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recap and Conclusion
&lt;/h2&gt;

&lt;p&gt;Compaction in Iceberg is not about scheduling rewrite_data_files. It is about maintaining the alignment between the layout, deletes, and metadata of a table with its actual usage.&lt;/p&gt;

&lt;p&gt;Compaction plus table maintenance is now a coordination problem. Therefore, having a control plane to continuously assess the health of tables, prioritize the top partitions requiring compaction, and coordinate compaction with metadata maintenance rather than treat these as separate jobs is a home run on the first step:&lt;/p&gt;

&lt;p&gt;Manual work and scripting are typically the alternatives.&lt;/p&gt;

&lt;p&gt;We previously reviewed the practical aspects of this:&lt;/p&gt;

&lt;p&gt;Use bin-pack to control the file count&lt;br&gt;&lt;br&gt;
Escalate to sorting only when the scan efficiency is the limiting factor&lt;br&gt;&lt;br&gt;
Use gates to restrict the re-writing of healthy data&lt;br&gt;&lt;br&gt;
Limit the scope to make the maintenance predictable&lt;br&gt;&lt;br&gt;
Proactively resolve delete-heavy partitions&lt;br&gt;&lt;br&gt;
Reduce the commit entropy in streaming jobs&lt;br&gt;&lt;br&gt;
Correct the write path to prevent constant repair of the same issues&lt;br&gt;&lt;br&gt;
Connect compaction to snapshot expiration, manifest rewrites, and orphan removal&lt;/p&gt;

&lt;p&gt;Most importantly, we connected compaction to snapshot expiration, manifest rewrites, and orphan removal - because the physical data layout and the metadata health are interdependent.&lt;/p&gt;

&lt;p&gt;Your goal is not to simply minimize the number of small files. Your goal is to maintain a lake that remains predictable - in terms of performance, cost, and operational overhead - as it grows.&lt;/p&gt;

&lt;p&gt;If you are operating Iceberg in production, I would appreciate your feedback regarding what has worked (and failed) for you. Real world patterns are always more interesting than theoretical ones.&lt;/p&gt;

&lt;p&gt;Thank you for reading 🍺&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>iceberg</category>
      <category>snowflake</category>
    </item>
    <item>
      <title>Iceberg Rewrite Manifest Files: A Guide</title>
      <dc:creator>Joni Sar</dc:creator>
      <pubDate>Sun, 08 Feb 2026 15:31:17 +0000</pubDate>
      <link>https://dev.to/jonisar/iceberg-rewrite-manifest-files-a-guide-m5f</link>
      <guid>https://dev.to/jonisar/iceberg-rewrite-manifest-files-a-guide-m5f</guid>
      <description>&lt;h3&gt;
  
  
  Iceberg Rewrite Manifest Files: A Guide
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi8m6hfp8trhca4vpg19m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi8m6hfp8trhca4vpg19m.png" width="800" height="568"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A data engineer happily runing into the quicksand&lt;/p&gt;

&lt;p&gt;Manifest rewrites are a critical ongoing operation in Iceberg table maintenance.&lt;/p&gt;

&lt;p&gt;Data keeps landing, queries stay correct, and nothing looks obviously wrong. But over time, query planning takes longer, metadata reads increase, and latency creeps up even though the amount of data scanned hasn’t really changed. In most production systems, the root cause is not data layout — it’s &lt;strong&gt;metadata&lt;/strong&gt;, and specifically how manifest files accumulate and degrade over time.&lt;/p&gt;

&lt;p&gt;Manifest files are central to how Iceberg works. They’re what allow engines to plan efficiently without listing object storage. But with frequent commits, streaming writes, deletes, and long snapshot histories, manifests naturally fragment. Iceberg doesn’t reorganize them automatically, so planning cost quietly grows until it starts to matter.&lt;/p&gt;

&lt;p&gt;This guide focuses on &lt;strong&gt;rewrite manifests&lt;/strong&gt;: what they actually do, when they help, and how to run them correctly in production. You’ll learn how to detect when manifest rewrites are needed, how they interact with snapshot expiration and compaction, and why running them in isolation often delivers disappointing results.&lt;/p&gt;

&lt;p&gt;We’ll also contrast two operational models: managing all of this manually with scripts and schedules, and handling it continuously through a &lt;a href="https://lakeops.dev" rel="noopener noreferrer"&gt;&lt;strong&gt;Control&lt;/strong&gt; &lt;strong&gt;Plane&lt;/strong&gt; like LakeOps&lt;/a&gt;, which optimizes table maintenance based on real workload behavior instead of fixed timers.&lt;/p&gt;

&lt;p&gt;The rest of the article guides you through mannual optimization with practical examples. No spec theory, no generic advice — just what actually works when Iceberg tables grow, change, and age in real systems.&lt;/p&gt;

&lt;p&gt;Let’s begin then 🙂&lt;/p&gt;

&lt;h3&gt;
  
  
  Smart Automation vs Manual Scripts
&lt;/h3&gt;

&lt;p&gt;Before getting into mechanics, it’s important to understand the two main ways teams approach manifest management: Automated maintenance optimization with a Control Plane like &lt;a href="https://lakeops.dev" rel="noopener noreferrer"&gt;&lt;strong&gt;LakeOps&lt;/strong&gt;&lt;/a&gt;, and doing this operation manually and ongoingly by hand, and using generic scripts.&lt;/p&gt;

&lt;h4&gt;
  
  
  Continuous Optimization with a Control Plane
&lt;/h4&gt;

&lt;p&gt;A &lt;strong&gt;control plane&lt;/strong&gt; is a layer that sits above your data lake, catalogs, and query engines and takes responsibility for &lt;em&gt;operating and optimizing&lt;/em&gt; tables over time. Iceberg defines table structure and guarantees correctness, but it intentionally does not decide &lt;strong&gt;when&lt;/strong&gt;, &lt;strong&gt;where&lt;/strong&gt;, or &lt;strong&gt;how aggressively&lt;/strong&gt; maintenance should run. That operational and optimization gap is exactly what a control plane fills.&lt;/p&gt;

&lt;p&gt;Instead of running maintenance because a schedule says it’s time, a control plane continuously &lt;strong&gt;optimizes&lt;/strong&gt; tables based on what is actually happening in the system. Operations run only when and where they are needed, or according to explicit policies you define, rather than blindly across all tables.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://lakeops.dev/" rel="noopener noreferrer"&gt;&lt;strong&gt;LakeOps&lt;/strong&gt;&lt;/a&gt; acts as a control plane for Iceberg by continuously analyzing telemetry from Iceberg catalogs and query engines. Using this data, LakeOps builds a live understanding of how each table behaves in practice.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://lakeops.dev" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffqi5oes035kbuki5fgus.png" width="800" height="366"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cotinious Optimization and Maintenance with an Iceberg Control Plane (source: lakeops.dev)&lt;/p&gt;

&lt;p&gt;From that telemetry, LakeOps continuously optimizes table maintenance. Manifest rewrites are triggered only when metadata fragmentation begins to impact planning or cost. Snapshot expiration runs only when retained history no longer provides real value. Compaction is optimized continuously to reduce small files before they create downstream metadata pressure. Orphan cleanup runs when metadata and data files are no longer referenced and can safely be removed.&lt;/p&gt;

&lt;p&gt;Coordination is central to optimization. Rewrite manifests, snapshot expiration, compaction, and cleanup are not independent jobs. They are executed as part of a single, continuous optimization loop that ensures only the required operations run, only on the tables that need them, and only at the point where they actually improve performance or cost.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://lakeops.dev/" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F79itrha3wsuk3c3n8zhw.png" width="800" height="517"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Automating Smart Rewrite manfiest operations (source: lakeops.dev)&lt;/p&gt;

&lt;p&gt;Engineers don’t tune per-table schedules or chase drifting thresholds. They decide &lt;em&gt;what&lt;/em&gt; should be optimized and &lt;em&gt;within what constraints&lt;/em&gt;, and the control plane decides &lt;em&gt;when and where&lt;/em&gt; to run each operation. The result is stable metadata, predictable performance, and far less work.&lt;/p&gt;

&lt;p&gt;Learn more:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://overcast.blog/9-apache-iceberg-table-maintenance-tools-you-should-know-df864ed7a6d5" rel="noopener noreferrer"&gt;&lt;strong&gt;9 Apache Iceberg Table Maintenance Tools You Should Know&lt;/strong&gt;&lt;/a&gt;&lt;a href="https://overcast.blog/9-apache-iceberg-table-maintenance-tools-you-should-know-df864ed7a6d5" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Start at the beginning: What Manifest Files Are
&lt;/h3&gt;

&lt;p&gt;Manifest files are the core metadata units Iceberg uses to describe &lt;em&gt;which data files exist&lt;/em&gt; and &lt;em&gt;what is inside them&lt;/em&gt;. They sit between snapshots and actual data files and are the reason Iceberg can plan queries efficiently without scanning directories or listing objects in storage.&lt;/p&gt;

&lt;p&gt;Each manifest file is essentially a list of data file entries. For every data file, the manifest records:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  the partition values for that file&lt;/li&gt;
&lt;li&gt;  record count&lt;/li&gt;
&lt;li&gt;  per-column statistics such as min and max values&lt;/li&gt;
&lt;li&gt;  file size and other low-level attributes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When a query runs, the engine does &lt;strong&gt;not&lt;/strong&gt; discover data files by walking object storage. Instead, it reads manifests referenced by the current snapshot and uses the stored statistics to decide which data files can be skipped entirely. This is how Iceberg enables predicate and partition pruning at planning time.&lt;/p&gt;

&lt;p&gt;Manifests are immutable. Every commit creates new metadata. When a write happens, Iceberg typically creates one or more new manifest files describing the files added or removed by that commit. Over time, a snapshot references many manifests, some created recently and some carried forward from older snapshots.&lt;/p&gt;

&lt;p&gt;This design is powerful, but it has predictable operational consequences.&lt;/p&gt;

&lt;p&gt;Frequent small commits, especially from streaming or micro-batch ingestion, tend to produce many small manifest files. For example, a streaming job that commits every minute may generate hundreds or thousands of manifests per day, each describing only a handful of data files. From Iceberg’s point of view this is correct, but for the query engine it means more metadata to read and evaluate during planning.&lt;/p&gt;

&lt;p&gt;Another issue is &lt;strong&gt;manifest clustering&lt;/strong&gt;. Manifests are not automatically reorganized around how tables are queried. If files are appended over time with mixed partitions or evolving data distributions, manifests may contain entries that are poorly aligned with common filters. The engine still prunes correctly, but it has to examine more metadata to do so.&lt;/p&gt;

&lt;p&gt;Snapshots make this worse if they are not expired. Each snapshot retains references to the manifests that describe its table state. Even if newer snapshots supersede old ones, the metadata remains live as long as those snapshots are kept. This means manifests that are no longer useful for active queries still participate in metadata reads and storage costs.&lt;/p&gt;

&lt;p&gt;The net effect is subtle but significant. Query planning time increases even though data size stays flat. Metadata I/O grows quietly. Storage costs creep up due to retained metadata. None of this breaks correctness, which is why it often goes unnoticed until performance degrades.&lt;/p&gt;

&lt;p&gt;Manifest rewrites exist specifically to address these issues. They allow Iceberg to reorganize and consolidate manifests so that the metadata layer reflects the &lt;em&gt;current&lt;/em&gt; table state and access patterns, rather than the historical accident of how data arrived over time.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Rewrite Manifests Does
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;rewrite manifests&lt;/strong&gt; operation restructures the metadata layer of an Iceberg table without touching the data itself.&lt;/p&gt;

&lt;p&gt;At a high level, Iceberg takes the manifest files referenced by the &lt;em&gt;current snapshot&lt;/em&gt;, reads the data-file entries inside them, and writes a new set of manifest files that describe the &lt;strong&gt;exact same live data files&lt;/strong&gt;, just in a layout that’s cheaper for engines to plan against. The commit updates table metadata to point the current snapshot at the new manifests. The old manifests become obsolete once nothing references them anymore (usually after snapshot expiration and cleanup).&lt;/p&gt;

&lt;p&gt;This is a metadata rewrite, not a data rewrite. No Parquet/ORC/Avro files are rewritten.&lt;/p&gt;

&lt;h4&gt;
  
  
  What actually improves
&lt;/h4&gt;

&lt;p&gt;Rewrite manifests helps in three very concrete ways.&lt;/p&gt;

&lt;p&gt;It reduces manifest fan-out. When you have many small commits (streaming, micro-batch), you often end up with lots of tiny manifest files. Each query has to open and evaluate those manifests during planning. Rewriting consolidates many small manifests into fewer, larger ones, which reduces metadata I/O and planning latency.&lt;/p&gt;

&lt;p&gt;It aligns manifest layout with partitioning. Iceberg sorts data-file entries in manifests by fields in the partition spec. In practice, this tends to make partition pruning cheaper because related entries are adjacent and engines do less work to decide what to skip.&lt;/p&gt;

&lt;p&gt;It removes “historical write shape” from the current snapshot. Without rewrites, manifests reflect how data arrived over time, not how it’s queried. Rewriting reorganizes metadata around the current state, which is usually what you actually care about for planning.&lt;/p&gt;

&lt;h4&gt;
  
  
  What rewrite manifests does not do
&lt;/h4&gt;

&lt;p&gt;It does not compact data files. Tiny data files stay tiny. It does not change partitioning or rewrite records.&lt;/p&gt;

&lt;p&gt;It does not delete old manifests by itself. If old snapshots still reference them, they’ll remain. Cleanup is a separate step.&lt;/p&gt;

&lt;h4&gt;
  
  
  Practical code examples
&lt;/h4&gt;

&lt;p&gt;Below are a few examples that are actually useful in day-to-day operations, not just “hello world”.&lt;/p&gt;

&lt;p&gt;1) Measure the problem before you touch anything&lt;/p&gt;

&lt;p&gt;Start by inspecting the metadata table that lists manifests. Don’t assume column names — Iceberg versions and engines can differ — so first look at the schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;\-- Spark: inspect the manifests metadata table schema  
DESCRIBE TABLE EXTENDED prod.db.my\_table.manifests;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then get a baseline count:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;\-- How many manifests does the current snapshot reference?  
SELECT COUNT(\*) AS manifest\_count  
FROM prod.db.my\_table.manifests;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If this number grows steadily week over week while the table isn’t exploding in size, planning overhead is usually creeping up.&lt;/p&gt;

&lt;p&gt;2) Rewrite manifests via Spark SQL procedure (the most common operational path)&lt;/p&gt;

&lt;p&gt;This runs the rewrite in parallel using Spark:&lt;/p&gt;

&lt;p&gt;CALL prod.system.rewrite_manifests('db.my_table');&lt;/p&gt;

&lt;p&gt;In Spark, this returns a small result set with counters (how many manifests were rewritten, how many were added). In practice, you run the call, note the counters, and then re-check &lt;code&gt;my_table.manifests&lt;/code&gt; to see the manifest count drop.&lt;/p&gt;

&lt;p&gt;3) Rewrite manifests for a specific partition spec (when you’ve done partition evolution)&lt;/p&gt;

&lt;p&gt;If your table has evolved partition specs over time, you may want to rewrite manifests for a particular spec id:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CALL prod.system.rewrite\_manifests(  
  table   =&amp;gt; 'db.my\_table',  
  spec\_id =&amp;gt; 1  
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is useful when an older spec still contributes a lot of manifest fragmentation and you want to target it instead of doing everything blindly.&lt;/p&gt;

&lt;p&gt;4) Disable Spark caching if executors get memory pressure during rewrites&lt;/p&gt;

&lt;p&gt;Some environments prefer to avoid caching during maintenance to reduce executor memory footprint:&lt;/p&gt;

&lt;p&gt;CALL prod.system.rewrite_manifests('db.my_table', false);&lt;/p&gt;

&lt;p&gt;If you’ve ever seen maintenance jobs destabilize executor memory, this is one of the first knobs to reach for.&lt;/p&gt;

&lt;p&gt;5) Validate the effect (simple but important)&lt;/p&gt;

&lt;p&gt;After the rewrite, validate that you actually improved the metadata shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT COUNT(\*) AS manifest\_count\_after  
FROM prod.db.my\_table.manifests;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the count doesn’t drop (or drops only slightly), the usual causes are that snapshots weren’t expired (old manifests still referenced), or the table’s write pattern keeps producing fragmentation faster than your maintenance cadence.&lt;/p&gt;

&lt;p&gt;That’s the point where you either tighten the full maintenance loop (expire snapshots, rewrite manifests, remove orphans, and revisit compaction) or stop doing this manually and let a control plane keep it stable continuously.&lt;/p&gt;

&lt;h3&gt;
  
  
  When You Should Rewrite Manifests
&lt;/h3&gt;

&lt;p&gt;Manifest rewrites are not something you run on a fixed schedule “just in case”. They are most effective when there is a clear signal that metadata, not data, is becoming the bottleneck.&lt;/p&gt;

&lt;p&gt;The most common trigger is &lt;strong&gt;planning getting slower while data size stays flat&lt;/strong&gt;. If query runtimes increase but the amount of data scanned is roughly the same, the extra time is often spent in planning and metadata evaluation. This is especially visible in engines that log planning or analysis time separately.&lt;/p&gt;

&lt;p&gt;Another strong signal is &lt;strong&gt;manifest growth that outpaces data growth&lt;/strong&gt;. If storage size grows slowly but the number of manifests keeps climbing, you are accumulating metadata fragmentation. This usually happens in tables with frequent commits, even if each commit is small.&lt;/p&gt;

&lt;p&gt;Tables that receive &lt;strong&gt;streaming or micro-batch writes&lt;/strong&gt; are prime candidates. Frequent commits tend to generate many small manifests. Even if data files are reasonably sized, the metadata layer becomes increasingly expensive to process.&lt;/p&gt;

&lt;p&gt;A very common real-world pattern is a table that “looks healthy” in storage metrics but becomes steadily slower to query over weeks. Nothing is broken, nothing obvious changed, but planning time creeps up. That is almost always a manifest problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Managing Manifest Rewrites Manually
&lt;/h3&gt;

&lt;p&gt;If you don’t use a &lt;a href="https://lakeops.dev" rel="noopener noreferrer"&gt;&lt;strong&gt;control plane&lt;/strong&gt;&lt;/a&gt;, the following sequence reflects what works well in production if done right and in context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Inspect Metadata Health
&lt;/h3&gt;

&lt;p&gt;Before you decide to rewrite manifests, you need &lt;strong&gt;visibility into the live metadata&lt;/strong&gt; — not guesswork, not periodic dashboards, but concrete numbers that reflect how fragmented the metadata has become.&lt;/p&gt;

&lt;p&gt;Iceberg exposes &lt;strong&gt;metadata tables&lt;/strong&gt; that you can query just like regular tables. These include tables like &lt;code&gt;…$manifests&lt;/code&gt;, &lt;code&gt;…$files&lt;/code&gt;, &lt;code&gt;…$snapshots&lt;/code&gt;, etc. You can use these directly in SQL to inspect current state and spot trouble early.&lt;/p&gt;

&lt;h4&gt;
  
  
  Iceberg stores metadata in layers:
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-   A **manifest list** per snapshot points to all manifests for that snapshot.
-   Each **manifest file** lists a subset of data files, partition values, and column statistics (min/max/null counts).
-   Manifests may be reused across snapshots.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As a result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Lots of small commits → many small manifests.&lt;/li&gt;
&lt;li&gt;  Old snapshots hold onto old manifests.&lt;/li&gt;
&lt;li&gt;  Query engines read manifests during plan time to prune partitions/files.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If manifests are fragmented or numerous, query planning becomes slow because engines read and evaluate many metadata files before they touch actual data.&lt;/p&gt;

&lt;p&gt;This is why &lt;strong&gt;metadata health matters early, not late&lt;/strong&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  What to Look At
&lt;/h4&gt;

&lt;p&gt;Here are the core checks you should be doing regularly — ideally automated — to monitor manifest health.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🔍 1) Count the Current Manifests&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Run a live count of manifests referenced by the current snapshot:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT COUNT(\*) AS active\_manifest\_count  
FROM prod.db.my\_table$manifests;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A sudden jump in this number relative to data size usually correlates with planning overhead.&lt;br&gt;&lt;br&gt;
 A steady climb over time, without data volume growth, is a strong indicator your metadata is fragmenting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2) Look at Files per Manifest&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Iceberg metadata stores statistics such as file counts per manifest. Pull a distribution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT  
  CASE  
    WHEN record\_count &amp;lt; 10 THEN '&amp;lt;10 rows'  
    WHEN record\_count BETWEEN 10 AND 100 THEN '10–100 rows'  
    ELSE '100+ rows'  
  END AS manifest\_size\_bucket,  
  COUNT(\*) AS manifests  
FROM prod.db.my\_table$manifests  
GROUP BY 1  
ORDER BY 1;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you see &lt;strong&gt;lots of manifests with very few rows/files&lt;/strong&gt;, that means fragmentation. It means many small manifests (from tiny commits) that blow up planning work.&lt;/p&gt;

&lt;p&gt;You can also look at larger manifests: if lots of manifests hold small amounts of data, it’s a sign that maintenance will be valuable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3) Compare Manifests to Data Growth&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you track how data size and manifest count change together, you can spot divergence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;\-- number of data files  
SELECT COUNT(\*) AS data\_file\_count  
FROM prod.db.my\_table$files;

\-- number of manifests  
SELECT COUNT(\*) AS manifest\_count  
FROM prod.db.my\_table$manifests;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;manifest_count&lt;/code&gt; grows faster than &lt;code&gt;data_file_count&lt;/code&gt;, that’s another sign of metadata inefficiency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4) Look at Snapshots (Optional but Useful)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Snapshots tell you how many historical versions you’re retaining, which impacts how many manifests persist:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT  
  committed\_at,  
  snapshot\_id  
FROM prod.db.my\_table$snapshots  
ORDER BY committed\_at DESC  
LIMIT 10;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Long snapshot histories mean old manifests may still be referenced and not cleaned up until expiration happens.&lt;/p&gt;

&lt;h4&gt;
  
  
  Interpreting the Results
&lt;/h4&gt;

&lt;p&gt;Here are practical heuristics data engineers use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;High manifest count with small average manifest size&lt;/strong&gt; → metadata fragmentation (good candidate for rewrite).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Stable manifest count but slow query planning&lt;/strong&gt; → the problem might be clustering, not count; manifest rewrites can help.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Lots of snapshots older than retention needs&lt;/strong&gt; → metadata is being kept too long; expire them first so rewrites can be effective.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Manifest growth outpacing data file growth&lt;/strong&gt; → metadata is drifting away from the current shape of data.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Example Scenario
&lt;/h4&gt;

&lt;p&gt;Imagine a streaming table ingesting updates every minute. You might see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  5,000 data files&lt;/li&gt;
&lt;li&gt;  2,000 manifests&lt;/li&gt;
&lt;li&gt;  70% of manifests contain &amp;lt;10 files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s a classic candidate for consolidated manifests: smaller number of larger manifests will cut planning time dramatically, especially if queries filter on partitions that aren’t well clustered yet.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Expire Snapshots First
&lt;/h3&gt;

&lt;p&gt;Always expire snapshots &lt;strong&gt;before&lt;/strong&gt; rewriting manifests. This is not a best-practice nicety — it directly determines whether a manifest rewrite will actually do anything useful.&lt;/p&gt;

&lt;p&gt;The easiest way to achieve this is using a &lt;a href="https://lakeops.dev" rel="noopener noreferrer"&gt;&lt;strong&gt;Control Plane&lt;/strong&gt;&lt;/a&gt; for automated and optimized maintenance operations that include snapshot expirations in addition to manifest rewrites.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://lakeops.dev" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fai8r2rsuyf1p4w5ju96k.png" width="800" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Automated and optimized snapshot expiration with a Control Plane (source: lakeops.dev)&lt;/p&gt;

&lt;p&gt;Her’es a deep dive into the topic and practical solutions:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://overcast.blog/11-apache-iceberg-expired-snapshots-strategiesyou-should-know-ca7b81e87fb5" rel="noopener noreferrer"&gt;&lt;strong&gt;11 Expire Snapshots Optimizations for Apache Iceberg&lt;/strong&gt;&lt;/a&gt;&lt;a href="https://overcast.blog/11-apache-iceberg-expired-snapshots-strategiesyou-should-know-ca7b81e87fb5" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Snapshots are what keep manifests alive. Every snapshot references a specific set of manifest files that describe the table state at that point in time. As long as a snapshot exists, all of its manifests must remain reachable, even if they describe data that is no longer relevant for current queries.&lt;/p&gt;

&lt;p&gt;If you run a manifest rewrite while old snapshots are still retained, Iceberg can only optimize the manifests referenced by the &lt;em&gt;current&lt;/em&gt; snapshot. Older snapshots will continue to reference older manifests, which means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  old manifests stay in storage,&lt;/li&gt;
&lt;li&gt;  metadata fan-out remains higher than expected,&lt;/li&gt;
&lt;li&gt;  storage costs don’t drop,&lt;/li&gt;
&lt;li&gt;  and in some engines, planning still touches more metadata than necessary.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the most common reason teams say “we ran rewrite manifests and it didn’t really help”.&lt;/p&gt;

&lt;h4&gt;
  
  
  Why snapshot expiration comes first
&lt;/h4&gt;

&lt;p&gt;Think of snapshot expiration as &lt;strong&gt;pruning the metadata graph&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Until you expire snapshots, Iceberg is obligated to preserve historical metadata for correctness and time travel. A rewrite cannot remove or consolidate manifests that are still referenced by retained snapshots. Expiring snapshots reduces the metadata surface area first, so the rewrite can actually consolidate what remains.&lt;/p&gt;

&lt;p&gt;In practice, snapshot expiration is what turns a rewrite from “cosmetic” into “effective”.&lt;/p&gt;

&lt;h4&gt;
  
  
  Inspect snapshot history before expiring
&lt;/h4&gt;

&lt;p&gt;Before expiring anything, look at what you’re retaining:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT  
  snapshot\_id,  
  committed\_at,  
  operation  
FROM prod.db.my\_table$snapshots  
ORDER BY committed\_at DESC;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In many production systems, you’ll find snapshots going back weeks or months, even though nobody ever queries historical versions beyond a few days.&lt;/p&gt;

&lt;p&gt;That’s usually accidental, not intentional.&lt;/p&gt;

&lt;h4&gt;
  
  
  Expire snapshots based on real needs
&lt;/h4&gt;

&lt;p&gt;Snapshot retention should reflect &lt;strong&gt;actual recovery and audit requirements&lt;/strong&gt;, not defaults or copy-pasted examples.&lt;/p&gt;

&lt;p&gt;If you only need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  a few days of rollback for operational safety, or&lt;/li&gt;
&lt;li&gt;  short-term auditability,&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;then retaining dozens or hundreds of snapshots actively hurts metadata efficiency with no upside.&lt;/p&gt;

&lt;p&gt;A common pattern is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  retain snapshots newer than a time threshold, and&lt;/li&gt;
&lt;li&gt;  always keep the last N snapshots as a safety net.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Example: expire old snapshots in Spark
&lt;/h4&gt;

&lt;p&gt;Here’s a practical Spark SQL example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CALL prod.system.expire\_snapshots(  
  table =&amp;gt; 'db.my\_table',  
  older\_than =&amp;gt; TIMESTAMP '2024-01-01',  
  retain\_last =&amp;gt; 2  
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This removes snapshots older than the specified timestamp while keeping the most recent snapshots for safety.&lt;/p&gt;

&lt;p&gt;After this runs, many old manifests will become unreferenced — which is exactly what you want &lt;em&gt;before&lt;/em&gt; rewriting manifests.&lt;/p&gt;

&lt;h4&gt;
  
  
  Validate the effect
&lt;/h4&gt;

&lt;p&gt;After expiring snapshots, re-check your metadata:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT COUNT(\*) AS remaining\_snapshots  
FROM prod.db.my\_table$snapshots;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see a much smaller snapshot set. At this point:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  old manifests are no longer protected,&lt;/li&gt;
&lt;li&gt;  rewrite manifests can actually consolidate metadata,&lt;/li&gt;
&lt;li&gt;  and orphan cleanup will be able to reclaim storage.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 3: Run Rewrite Manifests
&lt;/h3&gt;

&lt;p&gt;At this point, the current snapshot references only the metadata that still matters. That gives Iceberg room to consolidate and reorganize manifests instead of carrying forward historical baggage.&lt;/p&gt;

&lt;h4&gt;
  
  
  What this step actually does now
&lt;/h4&gt;

&lt;p&gt;After snapshot expiration, rewrite manifests can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  merge many small manifests into fewer, larger ones,&lt;/li&gt;
&lt;li&gt;  reorganize data-file entries so they’re better clustered by partition and statistics,&lt;/li&gt;
&lt;li&gt;  reduce the amount of metadata the engine has to read during planning.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you skip snapshot expiration, most of these benefits are muted. After expiration, they show up immediately in planning time and metadata size.&lt;/p&gt;

&lt;h4&gt;
  
  
  Running rewrite manifests (Spark example)
&lt;/h4&gt;

&lt;p&gt;In Spark-based environments, this is usually done via a system procedure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CALL prod.system.rewrite\_manifests('db.my\_table');
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This executes the rewrite in parallel across the cluster. Spark will read the existing manifests, generate a new optimized set, and commit a new snapshot that references them.&lt;/p&gt;

&lt;p&gt;The command itself is simple. The impact depends entirely on whether you prepared the table correctly in the earlier steps.&lt;/p&gt;

&lt;h4&gt;
  
  
  Validate that it actually worked
&lt;/h4&gt;

&lt;p&gt;After the rewrite, always check the result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT COUNT(\*) AS manifest\_count\_after  
FROM prod.db.my\_table$manifests;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see a noticeable drop in manifest count or, at the very least, fewer very small manifests. If nothing changes, the usual reasons are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  snapshots were not expired, so old manifests are still referenced,&lt;/li&gt;
&lt;li&gt;  the table’s write pattern is fragmenting metadata faster than maintenance runs,&lt;/li&gt;
&lt;li&gt;  or the table is already in a reasonably healthy state.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Why this step is cheap — but not free
&lt;/h4&gt;

&lt;p&gt;Rewrite manifests does not rewrite data files, so it’s much cheaper than compaction. However, it still:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  reads all manifests referenced by the current snapshot,&lt;/li&gt;
&lt;li&gt;  writes new manifest files,&lt;/li&gt;
&lt;li&gt;  and commits new metadata.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On large tables with many manifests, this can still consume noticeable CPU, memory, and I/O. That’s why you should not run it blindly across hundreds of tables at once.&lt;/p&gt;

&lt;h4&gt;
  
  
  Practical scheduling guidance
&lt;/h4&gt;

&lt;p&gt;If you’re doing this manually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  avoid peak query hours,&lt;/li&gt;
&lt;li&gt;  stagger rewrites across tables,&lt;/li&gt;
&lt;li&gt;  and gate execution on actual metadata health signals rather than time alone.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Running rewrite manifests selectively, when metadata drift is real, is what keeps it a high-ROI operation instead of background noise.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Remove Orphan Files
&lt;/h3&gt;

&lt;p&gt;Once snapshots are expired and manifests are rewritten, you need to clean up what is no longer referenced.&lt;/p&gt;

&lt;p&gt;Orphan files are data or metadata files that exist in storage but are no longer referenced by any snapshot. They typically appear after snapshot expiration, manifest rewrites, failed jobs, or aborted commits. Iceberg does not delete these files automatically, because doing so without coordination would risk correctness.&lt;/p&gt;

&lt;p&gt;If you stop after rewriting manifests, those unreferenced files will remain in object storage indefinitely.&lt;/p&gt;

&lt;p&gt;From Iceberg’s point of view, everything is correct after a rewrite. From your cloud bill’s point of view, nothing changed.&lt;/p&gt;

&lt;p&gt;Skipping orphan cleanup is one of the most common reasons teams see storage costs grow even though they “ran all the maintenance jobs.” The metadata graph is clean, but the physical files are still sitting in S3, GCS, or ADLS.&lt;/p&gt;

&lt;p&gt;This step is what turns logical cleanup into actual cost reduction.&lt;/p&gt;

&lt;h4&gt;
  
  
  What orphan cleanup actually removes
&lt;/h4&gt;

&lt;p&gt;Orphan cleanup removes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  old manifest files no longer referenced by any snapshot,&lt;/li&gt;
&lt;li&gt;  metadata files left behind by rewrites and expired snapshots,&lt;/li&gt;
&lt;li&gt;  data files created by failed or rolled-back writes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It does &lt;strong&gt;not&lt;/strong&gt; remove any file that is reachable from a live snapshot. If a file is still referenced, it stays.&lt;/p&gt;

&lt;h4&gt;
  
  
  Running orphan cleanup (Spark example)
&lt;/h4&gt;

&lt;p&gt;In Spark environments, orphan cleanup is typically done with a system procedure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CALL prod.system.remove\_orphan\_files(  
  table \=\&amp;gt; 'db.my\_table',  
  older\_than \=\&amp;gt; TIMESTAMP '2024-01-01'  
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;older_than&lt;/code&gt; guard is critical. It ensures Iceberg only deletes files older than a safe cutoff, protecting against races with in-flight or recently committed jobs.&lt;/p&gt;

&lt;p&gt;Never run orphan cleanup without a time threshold.&lt;/p&gt;

&lt;h4&gt;
  
  
  Validate the effect
&lt;/h4&gt;

&lt;p&gt;After cleanup, you should see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  a reduction in storage usage over time,&lt;/li&gt;
&lt;li&gt;  fewer unreferenced metadata files,&lt;/li&gt;
&lt;li&gt;  and no change in query results or correctness.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Storage metrics won’t always drop instantly due to object store reporting delays, but the trend should flatten instead of creeping upward.&lt;/p&gt;

&lt;h4&gt;
  
  
  The key takeaway
&lt;/h4&gt;

&lt;p&gt;Snapshot expiration and manifest rewrites clean up &lt;strong&gt;logical metadata&lt;/strong&gt;. Orphan cleanup is what turns that into &lt;strong&gt;physical cleanup&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you skip this step, maintenance looks successful on paper but storage costs keep rising. If you include it consistently, metadata maintenance finally translates into real, measurable savings.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Coordinate with Compaction
&lt;/h3&gt;

&lt;p&gt;Manifest rewrites optimize &lt;em&gt;how files are described&lt;/em&gt;. Compaction optimizes &lt;em&gt;how many files exist&lt;/em&gt;. If you ignore compaction, manifest rewrites will help briefly — then fragmentation will return.&lt;/p&gt;

&lt;p&gt;Small data files are the main upstream cause of manifest churn. Every time a write job produces many small files, Iceberg must record them in metadata. Even if you rewrite manifests perfectly, frequent small-file writes will recreate fragmentation within days.&lt;/p&gt;

&lt;p&gt;The optimal solution is to use an &lt;a href="https://lakeops.dev" rel="noopener noreferrer"&gt;Iceberg Control Plane&lt;/a&gt;. LakeOps for example, compacts data 95% faster and cheaper than alternatives thanks to a rust based engine and analyzed cross-system data. Compaction is also smarter, so query times go up ans costs go down dramatically.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;![](https://cdn-images-1.medium.com/max/1600/1*7w1IT-CzuDQRQsVuCionDA.png)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Compaction optimizagtion with a dcontrol plane (source:lakeops/dev)&lt;/p&gt;

&lt;p&gt;In LakeOps, compaction processes are also synchronized with maintenance processes like manifest rewrites, so everything runs smoothly and you don’t have to connect and coordinate it yourself. Results are optimized and are usually far better than a home-made solution.&lt;/p&gt;

&lt;h4&gt;
  
  
  Example: why rewrites alone don’t hold
&lt;/h4&gt;

&lt;p&gt;Let’s go back to the core problem for a second, and then see how to manually address it if you don’t use a control plane.&lt;/p&gt;

&lt;p&gt;Consider a table with streaming ingestion committing every few minutes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Each commit writes 20–50 small Parquet files.&lt;/li&gt;
&lt;li&gt;  Each commit creates one or more new manifests.&lt;/li&gt;
&lt;li&gt;  After a week, the table has thousands of data files and hundreds of manifests.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You run snapshot expiration and rewrite manifests. Planning time improves.&lt;/p&gt;

&lt;p&gt;Two days later, the table is slow again.&lt;/p&gt;

&lt;p&gt;Nothing is broken. The write pattern simply recreated the same metadata pressure. This is what happens when compaction is missing or misaligned.&lt;/p&gt;

&lt;h4&gt;
  
  
  Use metadata to confirm compaction pressure
&lt;/h4&gt;

&lt;p&gt;Before scheduling more manifest rewrites, check whether small files are the real problem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT  
  COUNT(\*) AS file\_count,  
  AVG(file\_size\_in\_bytes) / 1024 / 1024 AS avg\_file\_mb  
FROM prod.db.my\_table$files;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the average file size is far below your engine’s sweet spot, manifest rewrites are treating symptoms, not the cause.&lt;/p&gt;

&lt;p&gt;Another useful signal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT  
  COUNT(\*) AS manifests,  
  SUM(added\_data\_files\_count) AS total\_files\_tracked  
FROM prod.db.my\_table$manifests;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If file counts are high and keep growing quickly, metadata pressure will return unless compaction slows it down.&lt;/p&gt;

&lt;h4&gt;
  
  
  How compaction stabilizes manifest rewrites
&lt;/h4&gt;

&lt;p&gt;When compaction is running correctly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  fewer data files are created per write cycle,&lt;/li&gt;
&lt;li&gt;  manifests grow more slowly and stay denser,&lt;/li&gt;
&lt;li&gt;  rewrite manifests becomes an occasional cleanup, not a recurring firefight.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In stable tables, teams often find that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  compaction runs frequently (or continuously),&lt;/li&gt;
&lt;li&gt;  manifest rewrites run infrequently,&lt;/li&gt;
&lt;li&gt;  snapshot expiration runs regularly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That balance is what keeps planning predictable.&lt;/p&gt;

&lt;h4&gt;
  
  
  Practical coordination rule
&lt;/h4&gt;

&lt;p&gt;In production, a simple rule holds up well: If manifest rewrites are needed often, compaction is not doing enough.&lt;/p&gt;

&lt;p&gt;If you find yourself rewriting manifests weekly or daily on the same tables, it’s usually a sign that upstream file layout is unstable.&lt;/p&gt;

&lt;h4&gt;
  
  
  Should you add compaction code here?
&lt;/h4&gt;

&lt;p&gt;At this point in the guide, &lt;strong&gt;full compaction code examples are usually not helpful&lt;/strong&gt;. Compaction is engine-specific, workload-specific, and already well-covered elsewhere. What matters here is understanding the dependency:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  compaction reduces metadata churn,&lt;/li&gt;
&lt;li&gt;  reduced churn makes manifest rewrites effective,&lt;/li&gt;
&lt;li&gt;  without compaction, rewrites are temporary relief.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That mental model is more valuable than a generic compaction snippet.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Add policies and guardrails
&lt;/h3&gt;

&lt;p&gt;If you manage maintenance with scripts or schedulers, automation needs guardrails or it will drift out of alignment with reality.&lt;/p&gt;

&lt;p&gt;The simpest way is to add a &lt;a href="https://lakeops.dev" rel="noopener noreferrer"&gt;&lt;strong&gt;Control Plane&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Then you can define policies per table or for your entire lake.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgy7vm5m1c8trzkylvrtb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgy7vm5m1c8trzkylvrtb.png" width="800" height="314"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Define maintenance policies with a control plane (source:lakeops.dev/)&lt;/p&gt;

&lt;p&gt;Start by skipping inactive tables. Tables that are rarely queried or written to don’t need aggressive maintenance. Running rewrites on them just burns cluster resources without benefit.&lt;/p&gt;

&lt;p&gt;Avoid peak query hours. Even though manifest rewrites are cheaper than data compaction, they still consume CPU, memory, and I/O. Running them during high query load increases contention and hurts user-facing performance.&lt;/p&gt;

&lt;p&gt;Trigger maintenance based on &lt;strong&gt;observed metadata health&lt;/strong&gt;, not time alone. Manifest count, average files per manifest, snapshot growth, and planning time trends are far better signals than “once a day” or “once a week”.&lt;/p&gt;

&lt;p&gt;Finally, expect thresholds to change. Write patterns evolve, query behavior shifts, and what worked six months ago may be wrong today. Scripts that never get revisited slowly turn into background noise or, worse, a source of instability.&lt;/p&gt;

&lt;p&gt;This is the point where many teams decide that maintaining guardrails manually is more work than it’s worth and move metadata maintenance into a control plane.&lt;/p&gt;

&lt;h3&gt;
  
  
  Practical steps to take
&lt;/h3&gt;

&lt;p&gt;Rewrite manifests is one of those Iceberg operations that looks optional until it isn’t. When metadata is healthy, planning is fast and predictable. When it drifts, everything still works — just slower and more expensively. That’s why manifest issues often go unnoticed for a long time.&lt;/p&gt;

&lt;p&gt;In this guide, we walked through what manifest files actually are, why they fragment in real systems, and what rewrite manifests really does under the hood. We covered when rewrites are worth running, why snapshot expiration has to come first, how orphan cleanup turns logical cleanup into real cost savings, and why compaction is the long-term stabilizer that keeps metadata from degrading again.&lt;/p&gt;

&lt;p&gt;If you’re managing this manually, the step-by-step approach will get you there. If you want this handled continuously and optimzied, a &lt;strong&gt;C&lt;/strong&gt;&lt;a href="https://lakeops.dev" rel="noopener noreferrer"&gt;&lt;strong&gt;ontrol&lt;/strong&gt; &lt;strong&gt;Plane&lt;/strong&gt;&lt;/a&gt; exists to do exactly that — operating and optimizing Iceberg tables based on real workload behavior instead of fixed schedules.&lt;/p&gt;

&lt;p&gt;The big takeaway is that rewrite manifests only works well as part of a &lt;strong&gt;coordinated maintenance loop&lt;/strong&gt;. Run it in isolation and the benefits are usually temporary. Pair it with snapshot expiration, compaction, and cleanup, and it becomes one of the highest-ROI metadata optimizations Iceberg offers.&lt;/p&gt;

&lt;p&gt;If you’ve run into edge cases, different patterns, or lessons learned the hard way, feel free to share them in the comments.&lt;/p&gt;

&lt;p&gt;Thanks for reading, and hope this helps keep your Iceberg tables healthy, fast, predictable, and boring in all the right ways.&lt;/p&gt;

&lt;p&gt;Cheers 🍺&lt;/p&gt;

&lt;h3&gt;
  
  
  Learn more
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://overcast.blog/11-iceberg-performance-optimizations-you-should-know-d9aef7aab235" rel="noopener noreferrer"&gt;&lt;strong&gt;11 Iceberg Performance Optimizations You Should Know&lt;/strong&gt;&lt;/a&gt;&lt;a href="https://overcast.blog/11-iceberg-performance-optimizations-you-should-know-d9aef7aab235" rel="noopener noreferrer"&gt;&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://overcast.blog/13-apache-iceberg-optimizations-you-should-know-85bc25690f00" rel="noopener noreferrer"&gt;&lt;strong&gt;13 Apache Iceberg Optimizations You Should Know&lt;/strong&gt;&lt;/a&gt;&lt;a href="https://overcast.blog/13-apache-iceberg-optimizations-you-should-know-85bc25690f00" rel="noopener noreferrer"&gt;&lt;/a&gt;.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://overcast.blog/11-apache-iceberg-cost-reduction-strategies-you-should-know-8de7acb14151" rel="noopener noreferrer"&gt;&lt;strong&gt;11 Apache Iceberg Cost Reduction Strategies You Should Know&lt;/strong&gt;&lt;/a&gt;&lt;a href="https://overcast.blog/11-apache-iceberg-cost-reduction-strategies-you-should-know-8de7acb14151" rel="noopener noreferrer"&gt;&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://overcast.blog/9-data-lake-cost-optimization-tools-you-should-know-2a5995be8f4b" rel="noopener noreferrer"&gt;&lt;strong&gt;9 Data Lake Cost Optimization Tools You Should Know&lt;/strong&gt;&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://overcast.blog/9-data-lake-cost-optimization-tools-you-should-know-2a5995be8f4b" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>11 Must-Know FrontEnd Trends for 2020</title>
      <dc:creator>Joni Sar</dc:creator>
      <pubDate>Sun, 29 Dec 2019 12:30:59 +0000</pubDate>
      <link>https://dev.to/jonisar/11-must-know-frontend-trends-for-2020-13e1</link>
      <guid>https://dev.to/jonisar/11-must-know-frontend-trends-for-2020-13e1</guid>
      <description>&lt;h3&gt;
  
  
  Or- how to sound smart in frontEnd lunch conversations!
&lt;/h3&gt;

&lt;p&gt;Sounding smart at your team's lunch talks is obviously a great reason to stay updated with the latest frontend trends. It might even help you become a better developer, build better technology and better products. Maybe.&lt;/p&gt;

&lt;p&gt;So, please allow me to make this honorable quest easier by pointing you in a few interesting directions. I will not explain every concept A-Z, but will introduce the concept, how it’s useful and direct to further resources.&lt;/p&gt;

&lt;p&gt;For example, we’ll shortly cover an introduction to Micro Fontends, Atmoic Design, Web components TS take-over, ESM CDN and even Design tokens. Feel free to scroll through and mark the topics you’d like to learn more about. For any questions or more suggestions, just drop a comment below. &lt;/p&gt;

&lt;p&gt;Short disclaimer: I'm on the team building Bit. This doesn't make any of the following less true though. Enjoy!&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Micro frontends
&lt;/h2&gt;

&lt;p&gt;Micro Frontends are the buzziest frontend topic for lunch conversations.&lt;br&gt;
Ironically, while frontend development enjoys the modular advantages of components, it is still largely more monolithic than backend microservices.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5RM_mJgL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2ASdrrxeKfuAyDEAKATFNUNg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5RM_mJgL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2ASdrrxeKfuAyDEAKATFNUNg.png" alt=""&gt;&lt;/a&gt;&lt;br&gt;
Micro frontends bring the promise of splitting your frontend architecture into different frontends for different teams working on different parts of your app. Each team can gain autonomy over the end-to-end lifecycle of their micro frontend, which can be developed, versioned, tested, built, rendered, updated and deployed independently (using &lt;a href="https://bit.dev"&gt;tools like Bit&lt;/a&gt; for example).&lt;br&gt;
Instead of explaining the whole concept here, &lt;a href="https://martinfowler.com/articles/micro-frontends.html#InANutshell"&gt;**read this great post&lt;/a&gt;** by &lt;a class="comment-mentioned-user" href="https://dev.to/thecamjackson"&gt;@thecamjackson&lt;/a&gt;
 published at the @martinfowler blog. It’s really good and should cover everything you need to start digging into this concept.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--kAQ3L-9K--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2AfxACkCp1y_fDwnF-N7bVMQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--kAQ3L-9K--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2AfxACkCp1y_fDwnF-N7bVMQ.png" alt=""&gt;&lt;/a&gt;&lt;br&gt;
However, there are still certain shortages in today’s ecosystem. Mostly, people are worried by issues like the deployments of separate frontends, bundling, environment differences etc. &lt;a href="https://bit.dev"&gt;Bit&lt;/a&gt; already lets you isolate, version, build, test and update individual frontends/components. For now, this is mainly useful when working with multiple applications (though It’s already commonly used for gradually refactoring parts of existing apps via components).&lt;br&gt;
When Bit will introduce deployments in 2020, independent teams will get the power to develop, compose, version, deploy and update standalone frontends. It will let you compose UI apps together and let teams create simple decoupled codebases with independent continuous deployments and incremental upgrades. The composition of these frontends will end up creating your application. Here's how a UI app composed with Bit looks like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--j0oWJyZI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/v2nf316tdaw9nkxw7mng.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--j0oWJyZI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/v2nf316tdaw9nkxw7mng.png" alt="Composed UI app"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Learn more:&lt;br&gt;
&lt;a href="https://martinfowler.com/articles/micro-frontends.html"&gt;Micro Frontends - Martin Fowler&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  2. Atomic Design
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://blog.bitsrc.io/atomic-design-and-ui-components-theory-to-practice-f200db337c24"&gt;&lt;br&gt;
&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--usZNK7ni--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/6528/1%2Aq5IW7xZF8AYFj8NZEVi17Q.jpeg"&gt;&lt;br&gt;
&lt;/a&gt; &lt;br&gt;
&lt;a href="https://bradfrost.com/blog/post/atomic-web-design/"&gt;Atomic Design&lt;/a&gt; is yet another super interesting topic for lunch talks, which I like to think about more of as a philosophy than a pure methodology.&lt;br&gt;
Simply put, the theory introduced by &lt;a href="https://dev.toundefined"&gt;Brad Frost&lt;/a&gt; compares the composition of web applications to the natural composition of Atoms, Molecules, Organisms and so on- ending with concrete web pages. Atoms compose molecules (e.g. text-input + button + label atoms = search molecule). Molecules compose an organism. Organisms live in a layout template, which can be concretized into a page delivered to your users.&lt;br&gt;
Here’s a &lt;a href="https://blog.bitsrc.io/atomic-design-and-ui-components-theory-to-practice-f200db337c24?"&gt;*detailed 30-seconds explanation with visual examples&lt;/a&gt;. *It includes very impressive drawings I made with great artistic talent, which you can copy-paste to your office board 😆&lt;br&gt;
The advantages of Atomic components go beyond building modular UI applications through modular and reusable components. This paradigm forces you to think in composition so you better understand the role and API of every component, their hierarchy, and how to abstract the building process of your application in an effective and efficient way. &lt;a href="https://bradfrost.com/blog/post/atomic-web-design/"&gt;Take a look.&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  3. Encapsulated Styling and Shadow Dom
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Ud3q7udi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2276/1%2ATSOpITlAqbyYC_UYYW7zMg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Ud3q7udi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2276/1%2ATSOpITlAqbyYC_UYYW7zMg.png" alt="Source: developer.mozzila.org"&gt;&lt;/a&gt;&lt;em&gt;Source: developer.mozzila.org&lt;/em&gt;&lt;br&gt;
An important aspect of components is encapsulation — being able to keep the markup structure, style, and behavior hidden and separate from other code on the page so that different parts do not clash, and the code can be kept nice and clean. &lt;a href="https://developer.mozilla.org/en-US/docs/Web/Web_Components/Using_shadow_DOM"&gt;The Shadow DOM API&lt;/a&gt; is a key part of this, providing a way to attach a hidden separated DOM to an element.&lt;br&gt;
&lt;em&gt;Shadow&lt;/em&gt; DOM is actually used by browsers for a long time now. You &lt;a href="https://bitsofco.de/what-is-the-shadow-dom/"&gt;can think of the shadow DOM &lt;/a&gt;as a “DOM within a DOM”. It is its own isolated DOM tree with its own elements and styles, completely isolated from the original DOM.&lt;br&gt;
It allows hidden DOM trees to be attached to elements in the regular DOM tree — this shadow DOM tree starts with a shadow root, underneath which can be attached to any elements you want, in the same way as the normal DOM. The &lt;a href="https://dev.to/maxart2501/css-for-an-encapsulated-web-7fo"&gt;main implication&lt;/a&gt; of this is that we have &lt;em&gt;no need for a namespace&lt;/em&gt; for our classes, as there’s no risk of name clashing or style spilling. There also additional advantages. It is often referred to as the long-promised solution to a true encapsulation of styles for web components. Learn more:&lt;br&gt;
&lt;a href="https://developer.mozilla.org/en-US/docs/Web/Web_Components/Using_shadow_DOM"&gt;Using shadow DOM&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  4. The TypeScript take over
&lt;/h2&gt;

&lt;p&gt;So lately every conversation &lt;a href="https://medium.com/@jtomaszewski/why-typescript-is-the-best-way-to-write-front-end-in-2019-feb855f9b164"&gt;makes it sound like TS is taking over&lt;/a&gt; frontend development. It is reported that &lt;a href="https://2018.stateofjs.com/javascript-flavors/typescript/"&gt;**80% of developers admit they would like to use or learn TypeScript in their next project&lt;/a&gt;**.&lt;br&gt;
Although it has it’s shortcomings, TS code is easier to understand, faster to implement, it produces less bugs and requires less boilerplate. Want to refactor your React app to work with TS? Go for it. Want to start gradually? Use tools like &lt;a href="https://github.com/teambit/bit"&gt;Bit&lt;/a&gt; to gradually refactor components in your app to TS and use the &lt;a href="https://bit.dev/bit/envs/compilers/react-typescript"&gt;React-Typescript compiler&lt;/a&gt; to build them independently from your app. This way to can gradually upgrade your code one component at a time.&lt;br&gt;
learn more:&lt;br&gt;
&lt;a href="https://medium.com/@jtomaszewski/why-typescript-is-the-best-way-to-write-front-end-in-2019-feb855f9b164"&gt;Why TypeScript is the best way to write Front-end in 2019And why you should convince everybody to use it.&lt;/a&gt;&lt;br&gt;
&lt;a href="https://eng.lyft.com/typescript-at-lyft-64f0702346ea"&gt;TypeScript at Lyft&lt;/a&gt;&lt;br&gt;
&lt;a href="https://slack.engineering/typescript-at-slack-a81307fa288d"&gt;TypeScript at Slack&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  5. Web components- Stencil, Svelte, Lit &amp;amp; friends!
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--tnwswldm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3200/1%2A-zkpV1IfOv-1dux6ZqWBCQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--tnwswldm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/3200/1%2A-zkpV1IfOv-1dux6ZqWBCQ.png" alt=""&gt;&lt;/a&gt;&lt;br&gt;
So basically, this is the future. Why? because these pure web components are framework agnostic and can work without a framework or with any framework- spelling &lt;strong&gt;standardization&lt;/strong&gt;. Because they are free from JS fatigue and are supported by modern browsers. Because their bundle size and consumption will be optimal, and VDOM rendering is mind-blowing.&lt;br&gt;
These components provide Custom Element, a Javascript API that allows you to define a new kind of html tag, HTML templates to specify layouts, and of course the Shadow DOM which is component-specific by nature.&lt;br&gt;
Prominent tools to know in this space are &lt;a href="https://github.com/Polymer/lit-html"&gt;**Lit-html&lt;/a&gt; &lt;strong&gt;(and &lt;a href="https://lit-element.polymer-project.org/"&gt;Lit-element&lt;/a&gt;), &lt;a href="https://github.com/ionic-team/stencil"&gt;**StencilJS&lt;/a&gt;&lt;/strong&gt;, &lt;a href="https://github.com/sveltejs/svelte"&gt;**SvelteJS&lt;/a&gt; &lt;strong&gt;and of course&lt;/strong&gt; &lt;a href="https://bit.dev/"&gt;Bit&lt;/a&gt;**, for reusable modular components which can be directly shared, consumed and developed anywhere.&lt;br&gt;
When thinking of the future of our UI development, and of how principles of modularity, reusability, encapsulation, and standardization should look like in the era of components, web components are the answer. Learn more:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blog.bitsrc.io/7-tools-for-developing-web-components-in-2019-1d5b7360654d"&gt;7 Tools for Developing Web Components in 2019&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.bitsrc.io/9-web-component-ui-libraries-you-should-know-in-2019-9d4476c3f103"&gt;9 Web Components UI Libraries You Should Know in 2019&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.bitsrc.io/prototyping-with-web-components-build-an-rss-reader-5bb753508d48"&gt;Prototyping with Web Components: Build an RSS Reader&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  6. From component libraries to dynamic collections
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--MxwQnLBi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2AVmerRS_ufSltgSGYiNHinQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--MxwQnLBi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2AVmerRS_ufSltgSGYiNHinQ.png" alt="Organize components in dynamic collections; reuse, compose, stay independent"&gt;&lt;/a&gt;&lt;em&gt;Organize components in dynamic collections; reuse, compose, stay independent&lt;/em&gt;&lt;br&gt;
The emergence of &lt;a href="https://blog.bitsrc.io/a-guide-to-component-driven-development-cdd-69dbd3d07bf0?source=collection_home---4------13-----------------------"&gt;component-driven development&lt;/a&gt; gave birth to a verity of tools. One prominent tool is &lt;a href="https://github.com/teambit/bit"&gt;Bit&lt;/a&gt;, alongside it’s hosting platform &lt;a href="https://bit.dev"&gt;Bit.dev&lt;/a&gt;.&lt;br&gt;
Instead of working hard to build a cumbersome and highly-coupled component-library, use Bit to continuously isolate and export existing components into a dynamically reusable shared-collection.&lt;br&gt;
Using &lt;a href="https://github.com/teambit/bit"&gt;Bit (GitHub)&lt;/a&gt; you can independently isolate, version, build, test and update UI components. It streamlines the process of isolating a component in an existing app, harvesting it to a remote collection, and using it anywhere. Every component can build, test, and render outside of any project. You can update a single component (and it’s dependants) and not the whole app.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ost0MZ7C--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2Ac6475ieLqqEzb4htt3T94Q.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ost0MZ7C--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2Ac6475ieLqqEzb4htt3T94Q.gif" alt=""&gt;&lt;/a&gt;&lt;br&gt;
In the bit.dev platform (or on your own server) your components can be remotely hosted and organized for different teams, so that every team can control the development of their own components. Every team can share and reuse components but keep their independence and control.&lt;br&gt;
The platform also provides the all-in-one ecosystem for a shared components out-of-the-box: It auto-documents UI components, renders components in an interactive playground, and even provides a built-in registry to install components using npm/yarn. In addition, you can bit import components for modifications in any repository.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--qVhfkPYZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2ARZP_jNEEilVtmjGH4O4UHQ.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--qVhfkPYZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2ARZP_jNEEilVtmjGH4O4UHQ.gif" alt=""&gt;&lt;/a&gt;&lt;br&gt;
In the short run, this revolutionizes the process of sharing and composing components in a similar way to how Spotify/iTunes changed the process of previously sharing Music through static CD Music Albums. It’s a dynamic and modular solution that lets everyone share and use components together.&lt;br&gt;
In the long run, Bit helps pave the way to micro-frontends. Why? Because it already lets you independently version, test, build and update parts of your UI application. In 2020 it will introduce independent deployments, which will finally allow different teams to own parts of your apps end-to-end: keep decoupled and simple codebases, let teams cautiously and continuously build and deploy incremental UI upgrades, and compose frontends together.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://bit.dev"&gt;Share reusable code components as a team&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://bit.dev/collections"&gt;UI Component design systems&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  7. State management: Bye Bye Redux? (Not….)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--00YNB73y--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2290/1%2A6oeKSYnPG2pbg8vdaiteYg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--00YNB73y--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2290/1%2A6oeKSYnPG2pbg8vdaiteYg.png" alt=""&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://blog.bitsrc.io/state-of-react-state-management-in-2019-779647206bbc"&gt;Redux is a hard beast to kill&lt;/a&gt;. While the pains of globally managing states in your app are becoming more clear as frontend becomes more modular, the sheer usefulness of Redux makes it a go-to solution for many teams.&lt;br&gt;
So will we say bye-bye to Redux in 2020? Probably not entirely 😄&lt;br&gt;
However, the uprising of new features within frameworks that handle states (React hooks, Context-API etc) are painting the way to a future without a global store. Tools like &lt;a href="https://github.com/mobxjs/mobx"&gt;Mobx&lt;/a&gt;, which only a year ago were rather scarcely adopted, are becoming more popular every day thanks to their component-oriented and scalable nature. You can explore &lt;a href="https://blog.bitsrc.io/state-of-react-state-management-in-2019-779647206bbc"&gt;more alternatives here&lt;/a&gt;.&lt;br&gt;
&lt;em&gt;Read&lt;/em&gt;: &lt;a href="https://medium.com/@dan_abramov/making-sense-of-react-hooks-fdbde8803889"&gt;*Making Sense of React Hooks&lt;/a&gt;* — by &lt;a href="https://dev.toundefined"&gt;Dan Abramov&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  8. ESM CDN
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s---ahkLvgh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/4000/1%2AdSWVWelaiGClQXD6nGhBuA.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---ahkLvgh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/4000/1%2AdSWVWelaiGClQXD6nGhBuA.jpeg" alt=""&gt;&lt;/a&gt;&lt;br&gt;
ES Modules is the standard for working with modules in the browser, standardized by ECMAScript. Using ES modules you can easily encapsulate functionalities into modules which can be consumed via CDN etc. With the release of Firefox 60, all &lt;a href="https://hacks.mozilla.org/2018/03/es-modules-a-cartoon-deep-dive/"&gt;major browsers will support&lt;/a&gt; ES modules, and the Node mteam is working on adding ES module support to &lt;a href="https://nodejs.org/en/"&gt;Node.js&lt;/a&gt;. Also, &lt;a href="https://www.youtube.com/watch?v=qR_b5gajwug"&gt;ES module integration for WebAssembly&lt;/a&gt; is coming in the next few years. Just imagine modular &lt;a href="https://github.com/teambit/bit"&gt;Bit&lt;/a&gt; UI components composed in your app via CDN…&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://hacks.mozilla.org/2018/03/es-modules-a-cartoon-deep-dive/"&gt;ES modules: A cartoon deep-dive — Mozilla Hacks — the Web developer blog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/denoland/deno"&gt;denoland/deno&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  9. Progressive web apps. Still growing.
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://developers.google.com/web/progressive-web-apps"&gt;Progressive web applications&lt;/a&gt; take advantage of the latest technologies to &lt;a href="https://www.smashingmagazine.com/2016/08/a-beginners-guide-to-progressive-web-apps/"&gt;combine the best of web and mobile apps&lt;/a&gt;. Think of it as a website built using web technologies but that acts and feels like an app. Recent advancements in the browser and in the availability of service workers and in the Cache and Push APIs have enabled web developers to allow users to install web apps to their home screen, receive push notifications and even work offline.&lt;br&gt;
Since PWAs provide an intimate user experience and because all network requests can be intercepted through service workers, it is imperative that the app be hosted over HTTPS to prevent man-in-the-middle attacks, which also spells better security. Here’s a great talk by Facebook developer &lt;a href="https://dev.toundefined"&gt;Omer Goldberg&lt;/a&gt; outlining best practices for PWAs.&lt;br&gt;
&lt;/p&gt;
&lt;center&gt;&lt;/center&gt;
&lt;h2&gt;
  
  
  10. Designer-developer integrations
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--MZ8AwG4x--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2A55RGwH_5D3mIZoVhSCXWOA.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--MZ8AwG4x--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://cdn-images-1.medium.com/max/2000/1%2A55RGwH_5D3mIZoVhSCXWOA.gif" alt=""&gt;&lt;/a&gt;&lt;br&gt;
With the uprise of &lt;a href="https://dev.to/jonisar/ui-component-design-system-a-developer-s-guide-19fg"&gt;component-driven design systems&lt;/a&gt; to enable a &lt;a href="https://blog.bitsrc.io/building-a-consistent-ui-design-system-4481fb37470f"&gt;consistent UI across products and teams&lt;/a&gt;, &lt;a href="https://blog.bitsrc.io/7-tools-for-building-your-design-system-in-2020-452d9c9b3b8e"&gt;new tools have emerged&lt;/a&gt; to bridge the gap between designers and developers. &lt;a href="https://codeburst.io/ui-design-system-and-component-library-where-things-break-d9c55dc6e386"&gt;This is no simple task however&lt;/a&gt;; While code itself is really the only source of truth (this is what your user really gets), most tools try to bridge the gap from the designer’s end. In this category you can find Framer, Figma, Invision DSM and more.&lt;br&gt;
From the developer’s end you can see how platforms like &lt;a href="https://bit.dev"&gt;Bit.dev&lt;/a&gt;, which host your next-gen component library and helps create adoption for shared components. The platform provides rendered visualization for your actual source-code so that designers can collaborate wit developers and create discussions over the source-code itself, in a visual way.&lt;br&gt;
Another promising idea to take note of is &lt;a href="https://css-tricks.com/what-are-design-tokens/"&gt;design-tokens&lt;/a&gt;. Placing tokens in your code through which designers can really control simple styling aspects (e.g. colors) directly through external collaboration tools. Integrated with platforms like Bit.dev, this can create a tighter workflow than ever before.&lt;br&gt;
&lt;/p&gt;
&lt;div class="ltag__link"&gt;
  &lt;a href="/jonisar" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KIXYeytP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://res.cloudinary.com/practicaldev/image/fetch/s--_HzUKqXm--/c_fill%2Cf_auto%2Cfl_progressive%2Ch_150%2Cq_auto%2Cw_150/https://dev-to-uploads.s3.amazonaws.com/uploads/user/profile_image/13629/9979db72-9117-41df-83ca-0404028463e3.jpg" alt="jonisar image"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="/jonisar/ui-component-design-system-a-developer-s-guide-19fg" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;UI Component Design System: A Developer’s Guide&lt;/h2&gt;
      &lt;h3&gt;JoniSar ・ Oct 23 '19 ・ 10 min read&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#design&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#ui&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#javascript&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#frontend&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;div class="ltag__link"&gt;
  &lt;a href="https://medium.com/codeburstio/ui-design-system-and-component-library-where-things-break-d9c55dc6e386" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--tWTMxjIU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/fit/c/96/96/1%2ApLN3R5sML3dcjAvUZDWtOA.png" alt="Jonathan Saring"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://medium.com/codeburstio/ui-design-system-and-component-library-where-things-break-d9c55dc6e386" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;UI Design System and Component Library: Where Things Break | by Jonathan Saring | codeburst&lt;/h2&gt;
      &lt;h3&gt;Jonathan Saring ・ &lt;time&gt;Aug 22, 2019&lt;/time&gt; ・ 8 min read
      &lt;div class="ltag__link__servicename"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KBvj_QRD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://practicaldev-herokuapp-com.freetls.fastly.net/assets/medium_icon-90d5232a5da2369849f285fa499c8005e750a788fdbf34f5844d5f2201aae736.svg" alt="Medium Logo"&gt;
        Medium
      &lt;/div&gt;
    &lt;/h3&gt;
&lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;div class="ltag__link"&gt;
  &lt;a href="https://medium.com/bitsrcio/7-tools-for-building-your-design-system-in-2020-452d9c9b3b8e" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--tWTMxjIU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/fit/c/96/96/1%2ApLN3R5sML3dcjAvUZDWtOA.png" alt="Jonathan Saring"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://medium.com/bitsrcio/7-tools-for-building-your-design-system-in-2020-452d9c9b3b8e" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;7 Tools for Building Your Design System in 2020 | by Jonathan Saring | Bits and Pieces&lt;/h2&gt;
      &lt;h3&gt;Jonathan Saring ・ &lt;time&gt;Dec 4, 2019&lt;/time&gt; ・ 11 min read
      &lt;div class="ltag__link__servicename"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KBvj_QRD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://practicaldev-herokuapp-com.freetls.fastly.net/assets/medium_icon-90d5232a5da2369849f285fa499c8005e750a788fdbf34f5844d5f2201aae736.svg" alt="Medium Logo"&gt;
        Medium
      &lt;/div&gt;
    &lt;/h3&gt;
&lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


&lt;h2&gt;
  
  
  11. Web assembly — into the future?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://webassembly.org/"&gt;Web assembly&lt;/a&gt; brings language diversity into web development to cover gaps created by JavaScript. It is defined as a “a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications”.&lt;br&gt;
In his post, &lt;a href="https://dev.toundefined"&gt;Eric Elliott&lt;/a&gt; &lt;a href="https://medium.com/javascript-scene/what-is-webassembly-the-dawn-of-a-new-era-61256ec5a8f6"&gt;elegantly outlines the concept’s benefits&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;An improvement to JavaScript:&lt;/strong&gt; Implement your performance critical stuff in wasm and import it like a standard JavaScript module.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A new language:&lt;/strong&gt; WebAssembly code defines an AST (Abstract Syntax Tree) represented in a &lt;strong&gt;binary format&lt;/strong&gt;. You can &lt;strong&gt;author and debug in a text format&lt;/strong&gt; so it’s readable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A browser improvement:&lt;/strong&gt; &lt;strong&gt;Browsers will understand the binary format&lt;/strong&gt;, which means we’ll be able to compile binary bundles that compress smaller than the text JavaScript we use today. Smaller payloads mean faster delivery. Depending on &lt;strong&gt;compile-time optimization opportunities&lt;/strong&gt;, WebAssembly bundles may run faster than JavaScript, too!&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A Compile Target:&lt;/strong&gt; A way for other languages to get first-class binary support across the entire web platform stack
To learn more about this concept, why it’s useful, where it will be used and why it’s not here yet, I suggest &lt;a href="https://medium.com/javascript-scene/why-we-need-webassembly-an-interview-with-brendan-eich-7fb2a60b0723"&gt;this great post&lt;/a&gt; and &lt;a href="https://www.youtube.com/watch?v=aZqhRICne_M&amp;amp;feature=emb_title"&gt;this great video&lt;/a&gt;.
&lt;a href="https://medium.com/javascript-scene/why-we-need-webassembly-an-interview-with-brendan-eich-7fb2a60b0723"&gt;&lt;strong&gt;Why We Need WebAssembly: An Interview with Brendan Eich&lt;/strong&gt;
*Brendan Eich &amp;amp; Eric Elliott Discuss WebAssembly Details*medium.com&lt;/a&gt;
&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/aZqhRICne_M"&gt;
&lt;/iframe&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Learn more
&lt;/h2&gt;


&lt;div class="ltag__link"&gt;
  &lt;a href="https://medium.com/bitsrcio/13-top-react-component-libraries-for-2020-488cc810ca49" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jpCwpTfl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/fit/c/96/96/1%2Aw12_5tQWwj3V1nS8wOc3Hg.jpeg" alt="Fernando Doglio"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://medium.com/bitsrcio/13-top-react-component-libraries-for-2020-488cc810ca49" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;13 Top React Component Libraries for 2020 | by Fernando Doglio | Bits and Pieces&lt;/h2&gt;
      &lt;h3&gt;Fernando Doglio ・ &lt;time&gt;Jun 8, 2020&lt;/time&gt; ・ 13 min read
      &lt;div class="ltag__link__servicename"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KBvj_QRD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://practicaldev-herokuapp-com.freetls.fastly.net/assets/medium_icon-90d5232a5da2369849f285fa499c8005e750a788fdbf34f5844d5f2201aae736.svg" alt="Medium Logo"&gt;
        Medium
      &lt;/div&gt;
    &lt;/h3&gt;
&lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;div class="ltag__link"&gt;
  &lt;a href="https://medium.com/bitsrcio/11-top-angular-developer-tools-for-2020-3d2621f1e157" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--F8kFRNS---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/fit/c/96/96/2%2AePGCw-LWOt-vRWj4REfBlA.jpeg" alt="Giancarlo Buomprisco"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://medium.com/bitsrcio/11-top-angular-developer-tools-for-2020-3d2621f1e157" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;11 Top Angular Developer Tools for 2020 | by Giancarlo Buomprisco | Bits and Pieces&lt;/h2&gt;
      &lt;h3&gt;Giancarlo Buomprisco ・ &lt;time&gt;Dec 24, 2019&lt;/time&gt; ・ 8 min read
      &lt;div class="ltag__link__servicename"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KBvj_QRD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://practicaldev-herokuapp-com.freetls.fastly.net/assets/medium_icon-90d5232a5da2369849f285fa499c8005e750a788fdbf34f5844d5f2201aae736.svg" alt="Medium Logo"&gt;
        Medium
      &lt;/div&gt;
    &lt;/h3&gt;
&lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;div class="ltag__link"&gt;
  &lt;a href="https://blog.bitsrc.io/top-10-vuejs-developer-tools-becd61375447" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--gPPTQzS9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://miro.medium.com/fit/c/96/96/1%2A0yN1ln4bBjuXhg10DyW-6Q.jpeg" alt="Shanika Wickramasinghe"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://blog.bitsrc.io/top-10-vuejs-developer-tools-becd61375447" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;11 Top VueJS Developer Tools for 2020 | by Shanika Wickramasinghe | Bits and Pieces&lt;/h2&gt;
      &lt;h3&gt;Shanika Wickramasinghe ・ &lt;time&gt;Dec 24, 2019&lt;/time&gt; ・ 8 min read
      &lt;div class="ltag__link__servicename"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KBvj_QRD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://practicaldev-herokuapp-com.freetls.fastly.net/assets/medium_icon-90d5232a5da2369849f285fa499c8005e750a788fdbf34f5844d5f2201aae736.svg" alt="Medium Logo"&gt;
        blog.bitsrc.io
      &lt;/div&gt;
    &lt;/h3&gt;
&lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>react</category>
      <category>javascript</category>
      <category>frontend</category>
      <category>ui</category>
    </item>
    <item>
      <title>Reuse React Components Between Apps Like a Pro</title>
      <dc:creator>Joni Sar</dc:creator>
      <pubDate>Wed, 20 Nov 2019 13:58:52 +0000</pubDate>
      <link>https://dev.to/jonisar/reuse-react-components-between-apps-like-a-pro-2a39</link>
      <guid>https://dev.to/jonisar/reuse-react-components-between-apps-like-a-pro-2a39</guid>
      <description>&lt;p&gt;One of the reasons we love React is the truly reusable nature of its components, even compared to other frameworks. Reusing components means you can save time writing the same code, prevent bugs and mistakes, and keep your UI consistent for users across your different applications.&lt;/p&gt;

&lt;p&gt;But, reusing React between apps components can be harder than it sounds. In the past, this process involved splitting repositories, boiler-plating packages, configuring builds, refactoring our apps and more.&lt;/p&gt;

&lt;p&gt;In this post, I'll show how to &lt;a href="https://bit.dev/" rel="noopener noreferrer"&gt;use Bit&lt;/a&gt; (&lt;a href="https://github.com/teambit/bit" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;) in order make this process much easier, saving around 90% of the work. Also, it will allow you to gradually collect existing components from your apps into a reusable collection for your team to share - &lt;a href="https://bit.dev/collections" rel="noopener noreferrer"&gt;like these ones&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkb3sn7l6g011e1vs6l9d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkb3sn7l6g011e1vs6l9d.png" alt="reuse react components" width="800" height="379"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this short tutorial, we'll learn how to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Quickly setup a Bit workspace&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Track and isolate components in your app&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Define a zero-config React compiler&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Version and export components from your app&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use the components in a new app&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Bonus: Leveraging Bit to modify the component from the consuming app (yes), and syncing the changes between the two apps.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Quick Setup
&lt;/h1&gt;

&lt;p&gt;So for this tutorial, we've prepared &lt;a href="https://bit.dev/collections" rel="noopener noreferrer"&gt;an example React App on GitHub&lt;/a&gt; you can clone.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;git clone https://github.com/teambit/bit-react-tutorial
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;bit-react-tutorial
&lt;span class="nv"&gt;$ &lt;/span&gt;yarn 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, go ahead and install Bit.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;bit-bin &lt;span class="nt"&gt;-g&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we'll need a remote collection to host the shared components. You can set up on &lt;a href="https://docs.bit.dev/docs/bit-server" rel="noopener noreferrer"&gt;your own server&lt;/a&gt;, but let's use Bit's free component hub instead. This way our collection can be visualized and shared with our team, which is very useful.&lt;/p&gt;

&lt;p&gt;quickly head over to &lt;a href="https://bit.dev" rel="noopener noreferrer"&gt;bit.dev and create a free collection&lt;/a&gt;. It should take less than a minute.&lt;/p&gt;

&lt;p&gt;Now return to your terminal and run &lt;code&gt;bit login&lt;/code&gt; to connect your local workspace with the remote collection, where we'll export our components.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;bit login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cool. Now return to the project you've cloned and init a Bit workspace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;bit init &lt;span class="nt"&gt;--package-manager&lt;/span&gt; yarn
successfully initialized a bit workspace.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Next, let's track and isolate a reusable component from the app.&lt;/p&gt;

&lt;h1&gt;
  
  
  Track and isolate reusable components
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fad6zwhrb5zhde3ug03tc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fad6zwhrb5zhde3ug03tc.png" alt="reusable-react-component-example" width="800" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Bit lets you track components in your app, and isolates them for reuse,  including automatically defining all dependencies. You can track multiple components using a glob pattern (&lt;code&gt;src/components/*&lt;/code&gt;) or specify a path for a specific component. In this example, we'll use the later.&lt;/p&gt;

&lt;p&gt;Let's use the &lt;code&gt;bit add&lt;/code&gt; command to track the "product list" component in the app. We'll track it with the ID 'product-list'. Here's &lt;a href="https://bit.dev/bit/react-tutorial/product-list" rel="noopener noreferrer"&gt;an example of how it will look like&lt;/a&gt; as a shared component in bit.dev.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;bit add src/components/product-list
tracking component product-list:
added src/components/product-list/index.js
added src/components/product-list/product-list.css
added src/components/product-list/products.js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's run a quick &lt;code&gt;bit status&lt;/code&gt; to learn that Bit successfully tracked all the files of the component. You can use this command at any stage to learn more, it's quite useful!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;bit status
new components
&lt;span class="o"&gt;(&lt;/span&gt;use &lt;span class="s2"&gt;"bit tag --all [version]"&lt;/span&gt; to lock a version with all your changes&lt;span class="o"&gt;)&lt;/span&gt;

     &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; product-list ... ok
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Define a zero-config reusable React compiler
&lt;/h1&gt;

&lt;p&gt;To make sure the component can run outside of the project, we'll tell Bit to define a reusable React compiler for it. This is part of how Bit isolates components for reuse, while saving you the work of having to define a build step for every component.&lt;/p&gt;

&lt;p&gt;Let's import the &lt;a href="https://bit.dev/bit/envs/compilers/react" rel="noopener noreferrer"&gt;React compiler&lt;/a&gt; into your project's workspace. You can find more compiler &lt;a href="https://bit.dev/bit/envs" rel="noopener noreferrer"&gt;here in this collection&lt;/a&gt;, including &lt;a href="https://bit.dev/bit/envs/compilers/react-typescript" rel="noopener noreferrer"&gt;react-typescript&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;bit import bit.envs/compilers/react &lt;span class="nt"&gt;--compiler&lt;/span&gt;
the following component environments were installed
- bit.envs/react@0.1.3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Right now the component might consume dependencies from your project. Bit's build is taking place in an &lt;em&gt;isolated environment&lt;/em&gt; to make sure the process will also succeed on the cloud or in any other project. To build your component, run this command inside your react project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;bit build
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Version and export reusable components
&lt;/h1&gt;

&lt;p&gt;Now let's export the component to your collection. As you see, you don't need to split your repos or refactor your app. &lt;/p&gt;

&lt;p&gt;First, let's tag a version for the component. Bit lets you version and export individual components, and as it nows about each component's dependants, you can later bump versions for single component and all its dependants at once.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;bit tag &lt;span class="nt"&gt;--all&lt;/span&gt; 0.0.1
1 component&lt;span class="o"&gt;(&lt;/span&gt;s&lt;span class="o"&gt;)&lt;/span&gt; tagged
&lt;span class="o"&gt;(&lt;/span&gt;use &lt;span class="s2"&gt;"bit export [collection]"&lt;/span&gt; to push these components to a remote&lt;span class="s2"&gt;")
(use "&lt;/span&gt;bit untag&lt;span class="s2"&gt;" to unstage versions)

new components
(first version for components)
     &amp;gt; product-list@0.0.1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can run a quick 'bit status' to verify if you like, and then export it to your collection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;bit &lt;span class="nb"&gt;export&lt;/span&gt; &amp;lt;username&amp;gt;.&amp;lt;collection-name&amp;gt;
exported 1 components to &amp;lt;username&amp;gt;.&amp;lt;collection-name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now head over to your bit.dev collection and see how it looks!&lt;br&gt;
You can &lt;a href="https://docs.bit.dev/docs/tutorials/bit-react-tutorial#preview-the-react-component" rel="noopener noreferrer"&gt;save a visual example for your component&lt;/a&gt;, so you and your team can easily discover, try and use this component later on.&lt;/p&gt;
&lt;h1&gt;
  
  
  Install components in a new app
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffq99v2ey4ti5vx9cbyo6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffq99v2ey4ti5vx9cbyo6.png" alt="reuse-react-component-in-new-app" width="597" height="314"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Create a new React app using create-create-app (or your own).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;npx create-react-app my-new-app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Move over to the new app you created.&lt;br&gt;
Install the component from bit.dev:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;yarn add @bit/&amp;lt;username&amp;gt;.&amp;lt;collection-name&amp;gt;.product-list &lt;span class="nt"&gt;--save&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it! you can now &lt;a href="https://docs.bit.dev/docs/tutorials/bit-react-tutorial#use-in-your-application" rel="noopener noreferrer"&gt;use the component in your new app&lt;/a&gt;!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you want to use npm, run &lt;code&gt;npm install&lt;/code&gt; once after the project is created so a package-lock.json will be created and npm will organize dependencies correctly.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Modify components from the consuming app
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5c2kp0wggc385upepu6y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5c2kp0wggc385upepu6y.png" alt="develop-reusable-react-component" width="590" height="331"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now let's use Bit to &lt;a href="https://docs.bit.dev/docs/tutorials/bit-react-tutorial#modify-the-component" rel="noopener noreferrer"&gt;import the component's source-code&lt;/a&gt; from bit.dev and make some changes, right from the new app.&lt;/p&gt;

&lt;p&gt;First, init a Bit workspace for the new project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;bit init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And import the component&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;bit import &amp;lt;username&amp;gt;.&amp;lt;collection-name&amp;gt;/product-list
successfully imported one component
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is what happened:&lt;/p&gt;

&lt;p&gt;A new top-level components folder is created that includes the code of the component, with its compiled code and node_modules (in this case the node_modules are empty, as all of your node_modules are peer dependencies and are taken from the root project.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;.bitmap&lt;/code&gt; file was modified to include the reference to the component&lt;br&gt;
The package.json file is modified to point to the files rather than the remote package. Your package.json now displays:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@bit/&amp;lt;username&amp;gt;.&amp;lt;collection-name&amp;gt;.product-list&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;file:./components/product-list&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start your application to make sure it still works. As you'll see, no changes are required: Bit takes care of everything.&lt;/p&gt;

&lt;p&gt;Then, just go ahead and make changes to the code anyway you like!&lt;br&gt;
&lt;a href="https://docs.bit.dev/docs/tutorials/bit-react-tutorial#update-the-code" rel="noopener noreferrer"&gt;Here's an example&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Now run a quick &lt;code&gt;bit status&lt;/code&gt; to see that the code is changed. Since Bit tracks the source-code itself (via a Git extension), it "knows" that the component is modified.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;bit status
modified components
&lt;span class="o"&gt;(&lt;/span&gt;use &lt;span class="s2"&gt;"bit tag --all [version]"&lt;/span&gt; to lock a version with all your changes&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;(&lt;/span&gt;use &lt;span class="s2"&gt;"bit diff"&lt;/span&gt; to compare changes&lt;span class="o"&gt;)&lt;/span&gt;

     &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; product-list ... ok
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now tag a version and export the component back to bit.dev:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ bit tag product-list
1 component(s) tagged
(use "bit export [collection]" to push these components to a remote")
(use "bit untag" to unstage versions)

changed components
(components that got a version bump)
     &amp;gt; &amp;lt;username&amp;gt;.&amp;lt;collection-name&amp;gt;/product-list@0.0.2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and...&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;bit &lt;span class="nb"&gt;export&lt;/span&gt; &amp;lt;username&amp;gt;.&amp;lt;collection-name&amp;gt;
exported 1 components to &amp;lt;username&amp;gt;.&amp;lt;collection-name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can now see the updated version with the changes in bit.dev!&lt;/p&gt;

&lt;h1&gt;
  
  
  Update changes in the first app (checkout)
&lt;/h1&gt;

&lt;p&gt;Switch back to the &lt;code&gt;react-tutorial&lt;/code&gt; app you cloned and exported the component from, and check for updates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;bit import
successfully imported one component
- updated &amp;lt;username&amp;gt;.&amp;lt;collection-name&amp;gt;/product-list new versions: 0.0.2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run &lt;code&gt;bit status&lt;/code&gt; to see that an update is availabe for &lt;code&gt;product-list&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;bit status
pending updates
&lt;span class="o"&gt;(&lt;/span&gt;use &lt;span class="s2"&gt;"bit checkout [version] [component_id]"&lt;/span&gt; to merge changes&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;(&lt;/span&gt;use &lt;span class="s2"&gt;"bit diff [component_id] [new_version]"&lt;/span&gt; to compare changes&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;(&lt;/span&gt;use &lt;span class="s2"&gt;"bit log [component_id]"&lt;/span&gt; to list all available versions&lt;span class="o"&gt;)&lt;/span&gt;

    &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &amp;lt;username&amp;gt;.react-tutorial/product-list current: 0.0.1 latest: 0.0.2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Merge the changes done to the component to your project. The structure of the command is &lt;code&gt;bit checkout &amp;lt;version&amp;gt; &amp;lt;component&amp;gt;&lt;/code&gt;. So you run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;bit checkout 0.0.2 product-list
successfully switched &amp;lt;username&amp;gt;.react-tutorial/product-list to version 0.0.2
updated src/app/product-list/product-list.component.css
updated src/app/product-list/product-list.component.html
updated src/app/product-list/product-list.component.ts
updated src/app/product-list/product-list.module.ts
updated src/app/product-list/products.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Bit performs a git merge. The code from the updated component is now merged into your code.&lt;/p&gt;

&lt;p&gt;Run the application again to see it is working properly with the updated component:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;yarn start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. A change was moved between the two projects. Your application is running with an updated component.&lt;/p&gt;

&lt;p&gt;Happy coding!&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;By being able to more easily reuse React components between applications you can speed your development velocity with React, keep a consistent UI, prevent bugs and mistakes and better collaborate as a team over a collection of shared components. It's also a useful way to create a reusable UI component library for your team in a gradual way without having to stop everything or lose focus. &lt;/p&gt;

&lt;p&gt;Feel free to try it out yourself, &lt;a href="https://github.com/teambit/bit" rel="noopener noreferrer"&gt;explore the project in GitHub&lt;/a&gt;. Happy coding! &lt;/p&gt;

</description>
      <category>javascript</category>
      <category>frontend</category>
      <category>react</category>
      <category>ui</category>
    </item>
    <item>
      <title>UI Component Design System: A Developer’s Guide</title>
      <dc:creator>Joni Sar</dc:creator>
      <pubDate>Wed, 23 Oct 2019 12:08:08 +0000</pubDate>
      <link>https://dev.to/jonisar/ui-component-design-system-a-developer-s-guide-19fg</link>
      <guid>https://dev.to/jonisar/ui-component-design-system-a-developer-s-guide-19fg</guid>
      <description>&lt;p&gt;Component design systems let teams collaborate to introduce a &lt;a href="https://blog.bitsrc.io/building-a-consistent-ui-design-system-4481fb37470f" rel="noopener noreferrer"&gt;consistent user visual and functional experience&lt;/a&gt; across different products and applications.&lt;/p&gt;

&lt;p&gt;On the designer's side, a predefined style guide and set of reusable master components enable consistent design and brand presented to users across all different instances (products etc) built by the organization. This is why great teams like &lt;a href="https://eng.uber.com/introducing-base-web/" rel="noopener noreferrer"&gt;Uber&lt;/a&gt;, &lt;a href="https://airbnb.design/building-a-visual-language/" rel="noopener noreferrer"&gt;Airbnb&lt;/a&gt;, &lt;a href="https://polaris.shopify.com/" rel="noopener noreferrer"&gt;Shopify&lt;/a&gt; and many others work so hard to build it.&lt;/p&gt;

&lt;p&gt;On the developer's side, a &lt;a href="https://stg.bit.dev/design-system" rel="noopener noreferrer"&gt;reusable set of components&lt;/a&gt; helps to standardize front-end development across different projects, save time building new apps, reduce maintenance overhead and provide easier onboarding for new team members.&lt;/p&gt;

&lt;p&gt;Most importantly, on the user's side, a successful component design system means less confusion, better navigation of your products, warm and fuzzy brand-familiarity feeling and better overall satisfaction and happiness. For your business, this means better results.&lt;/p&gt;

&lt;p&gt;But, building a successful design system can be trickier than you might think. Bridging the gap between designers and developers is no simple task, both in the process of building your system as well as over time. In this post, we’ll walk-through the fundamentals of successfully building a component design system, using it across projects and products, and growing a thriving and &lt;a href="https://blog.bitsrc.io/getting-adoption-for-design-systems-a-practical-guide-cde86ee9bf40" rel="noopener noreferrer"&gt;collaborative component ecosystem within the organization&lt;/a&gt;, that brings everyone together. We’ll also introduce some shiny modern tools that can help you build it. Please feel free to comment below, ask anything, or share from your own experience! &lt;/p&gt;

&lt;h1&gt;
  
  
  Bridging the gap between design and development through components
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmfaap481lgylikf7tb82.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmfaap481lgylikf7tb82.png" alt="Component design systemst" width="800" height="422"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When building your system you will face several challenges. The first, is achieving true collaboration between &lt;a href="https://blog.bitsrc.io/let-everyone-in-your-company-see-your-reusable-components-270cd3213fe9" rel="noopener noreferrer"&gt;designers, developers and everyone else&lt;/a&gt; (product, marketing etc). This is hard. Designers use tools like Photoshop, Sketch etc which are built for generating “flat” visual assets that don’t translate into real code developers will use. Tools like &lt;a href="https://www.framer.com/" rel="noopener noreferrer"&gt;Framer&lt;/a&gt; aim to bridge this gap on the designer’s side.&lt;/p&gt;

&lt;p&gt;Developers work with Git (and GitHub) and use different languages and technologies (such as component-based frameworks: React, Vue etc) and have to translate the design into code as the source of truth of the design’s implementation. Tools like &lt;a href="https://bit.dev" rel="noopener noreferrer"&gt;Bit&lt;/a&gt; turn real components written in your codebase into a visual and collaborative design system (&lt;a href="https://bit.dev/collections" rel="noopener noreferrer"&gt;examples&lt;/a&gt;), making it easy to reuse and update components across apps, and visualizing them for designers.&lt;/p&gt;

&lt;p&gt;Modern components are the key to bridging this gap. They function as both visual UI design elements as well as encapsulated and reusable functional units that implement UX functionality that can be used and standardized across different projects in your organization’s codebase. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://bit.dev/components" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo75adqf0n5tkkcrr8e8n.gif" alt="Alt Text" width="720" height="366"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To bridge the gap, you’d have to let designers and other non-coding stakeholders collaborate over the source of truth, which is code. You can use &lt;a href="https://bit.dev" rel="noopener noreferrer"&gt;Bit&lt;/a&gt; or similar tools to bridge this gap and build a collaborative component economy where developers can easily &lt;a href="https://blog.bitsrc.io/getting-adoption-for-design-systems-a-practical-guide-cde86ee9bf40" rel="noopener noreferrer"&gt;build, distribute and adopt components&lt;/a&gt; while designers and everyone else can collaborate to build and align the design implementation of components across applications.&lt;/p&gt;

&lt;h1&gt;
  
  
  Choosing your stack and tools
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://medium.com/memory-leak/introducing-redpoints-design-and-front-end-engineering-landscape-ab377302a164" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnvb0ovbgkfao2jldbfhj.jpeg" alt="design-system-landscape" width="800" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The choice of technologies and tools is a major key in the success of your design system. We’ll try to narrow it down to a few key choices you’d have to make along the way:&lt;/p&gt;

&lt;h4&gt;
  
  
  Framework or no framework?
&lt;/h4&gt;

&lt;p&gt;Modern frameworks like React, Vue and Angular provide an environment where you can build components and build applications with components. Whether you choose a view library or a full-blown MVC, you can start building your components with a mature and extensive toolchain and community behind you. However, such frameworks might not be future proof, and can limit the reuse and standardization of components on different platforms, stacks and use-cases.&lt;/p&gt;

&lt;p&gt;Another way to go is &lt;a href="https://blog.bitsrc.io/9-web-component-ui-libraries-you-should-know-in-2019-9d4476c3f103" rel="noopener noreferrer"&gt;framework-agnostic web components&lt;/a&gt;. Custom components and widgets that build on the Web Component standards, will work across modern browsers, and can be used with any JavaScript library or framework that works with HTML.&lt;/p&gt;

&lt;p&gt;This means more reuse, better stability, abstraction and standardization, less work and pretty much everything else that comes with better modularity. While many people are sitting around waiting on projects like web-assembly, in the past year &lt;a href="https://blog.bitsrc.io/7-tools-for-developing-web-components-in-2019-1d5b7360654d" rel="noopener noreferrer"&gt;we see new tools and techs&lt;/a&gt; rise to bring the future today.&lt;/p&gt;

&lt;p&gt;The core concept of a standardized component system that work everywhere &lt;a href="https://hackernoon.com/7-frontend-javascript-trends-and-tools-you-should-know-for-2020-fb1476e41083" rel="noopener noreferrer"&gt;goes naturally well with the core concept of web components&lt;/a&gt;, so don’t be quick to overlook it despite the less mature ecosystem existing around it today.&lt;/p&gt;

&lt;h4&gt;
  
  
  Component library or no library?
&lt;/h4&gt;

&lt;p&gt;Building a component library is basically a way to reduce the overhead that comes with maintaining multiple repositories for multiple components. Instead, you group multiple components into one repository and distribute it like a multi-song CD music album. &lt;/p&gt;

&lt;p&gt;The tradeoff? App developers (component consumers) can’t use, update or modify individual components they need. They are struggling with the idea of coupling the development of their products to that of the library. Component collaboration platforms like &lt;a href="https://bit.dev" rel="noopener noreferrer"&gt;Bit&lt;/a&gt; can greatly mitigate this pain, by sharing your library as a “playlist” like system of components that people can easily discover, use, update and collaborate-over across projects and teams. Every developer can share, find, use and update components right from their projects.&lt;/p&gt;

&lt;p&gt;Most larger organization implement a library (&lt;a href="https://blog.bitsrc.io/11-react-component-libraries-you-should-know-178eb1dd6aa4" rel="noopener noreferrer"&gt;examples&lt;/a&gt;) to consolidate the development of their components, consolidate all development workflows around the project and control changes. In today's ecosystem, it’s hard to scale component-based design systems without libraries mostly due to development workflows (PRs, issues, deployment etc). In the future, we might see more democratized component economies where everyone can freely share and collaborate.&lt;/p&gt;

&lt;p&gt;When building your library you effectively build a multi-component monorepo. &lt;a href="https://github.com/teambit/bit" rel="noopener noreferrer"&gt;Open-source tools like bit-cli can help&lt;/a&gt; you isolate each component, automatically define all its dependencies and environments, test and build it in isolation, and share it as a standalone reusable unit. It also lets app-developers import and suggest updates to components right from their own projects, to increase the adoption of shared components.&lt;/p&gt;

&lt;h4&gt;
  
  
  Component discoverability and visualization
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa92ty8goncjotq1karfx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa92ty8goncjotq1karfx.png" alt="Component design systems examples" width="800" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When building and distributing components you must create a way for other developers, and for non-developers collaborating with you, to discover and learn exactly which components you have, what they look like, how they behave in different states and how to use them.&lt;/p&gt;

&lt;p&gt;If working with tools like Bit you get this out of the box, as all your components &lt;a href="https://bit.dev/collections" rel="noopener noreferrer"&gt;are visualized in a design system made from your actual components&lt;/a&gt;. Developers can use and develop components from the same place designers, marketers and product managers can view and monitor the components.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://bit.dev/collections" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feaf1eaposw72qg37g2e3.gif" alt="Component design systems" width="600" height="329"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If not, you can create your own documentation portal or leverage tools like &lt;a href="https://storybook.js.org/" rel="noopener noreferrer"&gt;Storybook&lt;/a&gt; to organize the visual documentation of the components you develop in a visual way. Either way, without making components visually discoverable it will be hard to achieve true reusability and collaboration over components.&lt;/p&gt;

&lt;h1&gt;
  
  
  Building your design system: top-down vs. bottom-up
&lt;/h1&gt;

&lt;p&gt;There are two ways to build a component design system. Choosing the right one is mostly based on who your are and what you need to achieve.&lt;/p&gt;

&lt;h3&gt;
  
  
  Design first, then implement reusable components
&lt;/h3&gt;

&lt;p&gt;The first, mostly used by larger organizations that need to standardize UX/UI and development across multiple teams and products, is to &lt;strong&gt;design components first&lt;/strong&gt; and then make sure this design is implemented as components (often building a library) and used everywhere. &lt;/p&gt;

&lt;p&gt;A super over-simplified structure of this workflow looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Build a visual language and design components&lt;/li&gt;
&lt;li&gt;Implement components in a git-based project in GitHub/Gitlab etc&lt;/li&gt;
&lt;li&gt;Distribute using component-platforms like Bit and/or to package managers&lt;/li&gt;
&lt;li&gt;Standardize instances of components across projects and apps&lt;/li&gt;
&lt;li&gt;Collaboratively monitor, update and evolve components (using Bit or other tools)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Code first, then collect components into a design system
&lt;/h3&gt;

&lt;p&gt;The second, often used by smaller and younger teams or startups, is to &lt;strong&gt;build-first&lt;/strong&gt; and then collect existing components from your apps into one system, align the design, and keep going from there. This approach saves the time consumed by the design-system project, time which startups often can’t afford to spend. &lt;a href="https://github.com/teambit/bit" rel="noopener noreferrer"&gt;bit-cli&lt;/a&gt; introduces the ability to virtually isolate components from existing repositories, building and exporting each of them individually as a standalone reusable unit, and collect them into one visual system made of your real code. So, you can probably use it to collect your components into a system in a few hours without having to refactor, split of configure anything, which is a quick way to do it today.&lt;/p&gt;

&lt;p&gt;This workflow looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Isolate and collect components already existing in your apps into one collection (Bit is useful)&lt;/li&gt;
&lt;li&gt;Bring in designers and other stakeholders to learn what you have and introduce your visual language into this collection&lt;/li&gt;
&lt;li&gt;Update components across projects to align to your new collection&lt;/li&gt;
&lt;li&gt;Use these components to build more products and apps&lt;/li&gt;
&lt;li&gt;Collaboratively monitor, update and evolve components (using Bit or other tools)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Design systems and atomic design
&lt;/h4&gt;

&lt;p&gt;Through the comparison of components and their composition to atoms, molecules, and organisms, we can think of the design of our UI as a composition of self-containing modules put together.&lt;/p&gt;

&lt;p&gt;Atomic Design helps you &lt;a href="https://blog.bitsrc.io/atomic-design-and-ui-components-theory-to-practice-f200db337c24" rel="noopener noreferrer"&gt;create and maintain robust design systems&lt;/a&gt;, allowing you to roll out higher quality, more consistent UIs faster than ever before. &lt;/p&gt;

&lt;p&gt;Learn more in this post: &lt;a href="https://blog.bitsrc.io/atomic-design-and-ui-components-theory-to-practice-f200db337c24" rel="noopener noreferrer"&gt;Atomic Design and UI Components: Theory to Practice&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Collaboratively manage and update components
&lt;/h1&gt;

&lt;p&gt;Over time your design system is a living creature that changes as the environment does. Design might changes, and so should the components. Components might change to fit new products, and so should the design. So, you must think of this process as a 2-way collaborative workflow.&lt;/p&gt;

&lt;h4&gt;
  
  
  Controlling components changes across projects
&lt;/h4&gt;

&lt;p&gt;When a component is used in 2 or more projects, sooner or later you will have to change it. So, you should be able to update a component from one project to another, consolidate code-changes and update all dependent components impacted by the change.&lt;/p&gt;

&lt;p&gt;If you are using &lt;a href="https://bit.dev" rel="noopener noreferrer"&gt;Bit&lt;/a&gt; this is fairly easy. You can import a component into any project, make changes, and update them as a new version. Since Bit “knows” exactly which other components depend on this component in different projects, you can update all of them at once and learn that nothing breaks before updating. Since Bit extends Git, you can merge the changes across projects just like you do in a single repository. All the changes will be visually availbe to view and monitor in your shared &lt;a href="https://bit.dev" rel="noopener noreferrer"&gt;bit.dev&lt;/a&gt; component collection.&lt;/p&gt;

&lt;p&gt;If not, things become trickier, and your component infrastructure team will have to enforce updates to their libraries for all projects using these libraries, which impairs flexibility, creates friction and makes it hard to achieve true standardization through adoption. Yet, this is harder but not impossible, here is &lt;a href="https://medium.com/walmartlabs/how-to-achieve-reusability-with-react-components-81edeb7fb0e0" rel="noopener noreferrer"&gt;how Walmart Labs do it&lt;/a&gt;.  You will also have to make sure that both changes to code and design are aligned in both your design tools and library docs wikis, to avoid misunderstandings and mistakes.&lt;/p&gt;

&lt;h1&gt;
  
  
  Grow a component ecosystem in your organization
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4iy52k0jeekpifei8qqh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4iy52k0jeekpifei8qqh.png" alt="component-economyt" width="700" height="511"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Building a design system is really about building a growing component ecosystem in your organization. This means that managing components isn’t a one-way street; you have to include the app-builders (component consumers) in this new economy, so that the components you build will actually use them in their applications and products. &lt;/p&gt;

&lt;p&gt;Share components that people can easily find and use. Let them collaborate and make it easy and fun to do so. Don’t force developers to install heavy libraries or dive-in too deep into your library just to make a small pull-request. Don’t make it hard for designers to learn exactly which components changes over time and make it easy for them to collaborate in the process.&lt;/p&gt;

&lt;p&gt;Your component design system is a &lt;strong&gt;living and breathing organism&lt;/strong&gt; that grows and evolves over time. If you try to enforce it on your organization, it might die. Instead, prefer legalization and democratization of components, their development and their design. Regulate this process to achieve standardization, but don’t block or impair adoption- at all costs. &lt;a href="https://bit.dev" rel="noopener noreferrer"&gt;Bit&lt;/a&gt; is probably the most prominent power-tool here too, but please do share more if you know them. &lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;Design systems help to create consistency in the visual and functional experience you give you users, while forming your brand across different products and applications. Modern components, with or without a framework, let you implement this system as a living set of building blocks that can and should be shared across projects to standardize and speed development.&lt;/p&gt;

&lt;p&gt;As designers and developers use different tools, it’s critical to bring them together over a single source of truth, which is really your code since this is what your users really experience. A democratized and collaborative process between developers, designers, products, marketers and everyone else is the only way to grow a thriving and sustainable component ecosystem that breathes life into your design system.&lt;/p&gt;

&lt;p&gt;Modern tools built for this purpose, such as &lt;a href="https://bit.dev" rel="noopener noreferrer"&gt;Bit&lt;/a&gt; and others (&lt;a href="https://www.framer.com" rel="noopener noreferrer"&gt;FramerX&lt;/a&gt; and &lt;a href="https://builderx.io/" rel="noopener noreferrer"&gt;BuilderX&lt;/a&gt; are also interesting on the designer’s end) can be used to build, distribute and collaborate over components to turn your design system into a consistent and positive user experience everywhere, and to manage and collaborate over components across teams within the organization.&lt;/p&gt;

&lt;p&gt;Thanks for reading!&lt;/p&gt;

</description>
      <category>design</category>
      <category>ui</category>
      <category>javascript</category>
      <category>frontend</category>
    </item>
    <item>
      <title>Do You Still Need a Component Library?</title>
      <dc:creator>Joni Sar</dc:creator>
      <pubDate>Thu, 08 Aug 2019 12:46:48 +0000</pubDate>
      <link>https://dev.to/jonisar/do-you-still-need-a-component-library-28kn</link>
      <guid>https://dev.to/jonisar/do-you-still-need-a-component-library-28kn</guid>
      <description>&lt;p&gt;&lt;strong&gt;&lt;em&gt;Let's rethink the way we share components to build our applications&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Today, frontEnd components in React, Vue and Angular let us compose applications through modular UI building blocks. A couple of years from now, &lt;a href="https://hackernoon.com/7-frontend-javascript-trends-and-tools-you-should-know-for-2020-fb1476e41083" rel="noopener noreferrer"&gt;framework-agnostic web components&lt;/a&gt; will take this to the next level.&lt;/p&gt;

&lt;p&gt;Yet, up until 2018 the way we shared and reused modular components wasn't very different than the way we shared entire projects. If we wanted to share a component from one reposiotry to another, we would have to create a new reposiotry to host it, move the code there, boilerplate it as a package, publish it, and install it as a dependency in the new project.&lt;/p&gt;

&lt;p&gt;That process is very hard to scale when it comes to smaller atomic components. It wasn't meant for components, it was meant for projects.&lt;/p&gt;

&lt;p&gt;So, teams began to struggle with sharing components, trying to reduce the overhead around the process. This often led to the creation of projects called "shared component libraries" (&lt;a href="https://hackernoon.com/23-best-react-ui-component-libraries-and-frameworks-250a81b2ac42" rel="noopener noreferrer"&gt;example&lt;/a&gt;) which are basically a single project with many components.&lt;/p&gt;

&lt;p&gt;But, in 2018 a new kind of sharing became possible: sharing components directly between projects, synced through a remote cloud-based collection. This was made possible thanks to a new open-source project called &lt;a href="https://github.com/teambit/bit" rel="noopener noreferrer"&gt;Bit&lt;/a&gt;, built for sharing smaller modules between larger projects.&lt;/p&gt;

&lt;p&gt;In this post, we'll try to explore the question "Do I still need a component library?" and present the pros of cons of different component-sharing workflows. Let's dive in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pros and cons of a component library
&lt;/h2&gt;

&lt;p&gt;To better understand &lt;a href="https://blog.bitsrc.io/do-we-really-use-reusable-components-959a252a0a98" rel="noopener noreferrer"&gt;if a component library is the right choice&lt;/a&gt;, let's shortly review the pros and cons of building a component library. In short, the answer is: it depends :)&lt;/p&gt;

&lt;h3&gt;
  
  
  Pros of a component library
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Instead of setting up 30 more repositories for 30 more components, you can just have 1 external repository to host all 30 components.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Consolidate the development of shared components into one project: PRs, Issues etc.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Assign a clear owner to the components.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enforcement of stacks and standards (double-edged sword).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Basically, the main advantage of a component library depends on the perspective. Compared to a repo-per-component approach, it saves overhead and consolidates the development and consumption of components into one reposiotry and package. However, this can also be a downside. Let's review.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pains of a component library
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;If the components are internal to your apps, it will require heavy refactoring to move them to the library.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Consumers just need a single component, yet they are forced to install a whole library. &lt;a href="https://github.com/lerna/lerna" rel="noopener noreferrer"&gt;Lerna&lt;/a&gt; can help publish each component, but the overhead is heavy for many components.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How will you version and update individual components? &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Discoverability for components is poor so you have to invest in docs-portals and maybe add tools like &lt;a href="https://github.com/storybookjs/storybook" rel="noopener noreferrer"&gt;StoryBook&lt;/a&gt; or &lt;a href="https://codesandbox.io/" rel="noopener noreferrer"&gt;Codesandbox&lt;/a&gt;. Still, how can you search for a button component with X dependencies and only Y kb in bundle size? (see &lt;a href="https://bit.dev" rel="noopener noreferrer"&gt;bit.dev&lt;/a&gt; below).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Component consumers can't make changes to the components without diving into the library and making a PR, then waiting for it to maybe get accepted. This often blocks the adoption of such libraries inside organizations. For many teams, this alone becomes a breaking point between the infra team building the library, and the app developers consuming it. Collaboration isn't good over the components.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You enforce styles and other things that don't fit any usecase for every consuming app, blocking the adoption of the library.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You make it hard to handle dependencies between components, as when you make a change to a component it's hard to tell which other components (in the library and otherwise) are affected and how.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You need to commit to additional tooling around the library to relife some of the pains (basic discoverability, individual publishing etc).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A component library can be compared to a music album CD-Rom (those of you over 25 will remember :). It's a static place you carry around with you, putting ~30 items on it. You have to read the cover to learn what's inside, and you can't search for songs. You also can't change the content without hard-burning the CD again. Over time, it takes some damage from ad-hock adjustments and starts to wear off. Collaboration across teams is very difficult with libraries, which often fail to get adopted at scale.&lt;/p&gt;

&lt;p&gt;But, what if instead of a component CD album we can have a "component iTunes" - where we can easily share, discover, consume and update individual components from different projects? Keep reading.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sharing components in the cloud
&lt;/h2&gt;

&lt;p&gt;In 2018 an &lt;a href="https://github.com/teambit/bit" rel="noopener noreferrer"&gt;open-source project called Bit&lt;/a&gt; was first introduced on GitHub. &lt;/p&gt;

&lt;p&gt;Unlike the project-oriented tools we use for our projects (Git repos, package managers etc), Bit was built for atomic components.&lt;/p&gt;

&lt;p&gt;It lets us share JavaScript code between projects, without having to setup more external repositories to do so (however, we can if we want to, we can use it share code from a library to other projects too). It manages changes for both source-code and dependencies across projects.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://bit.dev" rel="noopener noreferrer"&gt;bit.dev&lt;/a&gt; is Bit's component hub. Like GitHub, it's free for open-source too (and for some private code). Through bit.dev, components become available to discvoer, use and sync across projects and teams.&lt;/p&gt;

&lt;p&gt;Let's quickly review.&lt;/p&gt;

&lt;h3&gt;
  
  
  Isolation and publishing
&lt;/h3&gt;

&lt;p&gt;When it comes to frontEnd components, Bit lets us automatically isolate components from a project (app or library) and wrap them in a contained environment that lets them run in other projects, out of the box. This environment contains all the files of the component, all its dependencies and the configuration it needs to build and run outside of the projects.&lt;/p&gt;

&lt;p&gt;This means we can individually share multiple components from a given project in little time, with zero to very little refactoring.&lt;/p&gt;

&lt;p&gt;Bit handles each component's versions and dependencies while extending Git to track changes to its source code, across projects.&lt;/p&gt;

&lt;h3&gt;
  
  
  Discoverabilty for components
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1hrrfm75zunmyktns7li.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1hrrfm75zunmyktns7li.png" width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Through &lt;a href="https://bit.dev" rel="noopener noreferrer"&gt;bit.dev&lt;/a&gt; the components you share become discoverable to yourself and others to find, learn about and choose from.&lt;/p&gt;

&lt;p&gt;You can semantically &lt;strong&gt;search for components&lt;/strong&gt; by name, and filter results based on context-relevant labels, dependencies, bundle size and more useful parameters.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqs30jz6x74xikix6cwyi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqs30jz6x74xikix6cwyi.png" width="787" height="599"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq074zc5kaq0gzcc3gqa5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq074zc5kaq0gzcc3gqa5.png" width="794" height="583"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can quickly browse through components with visual snapshots, and when you go into a component's page you can try it-hands on in a live playground before using it in your project. You can also view the API docs, automatically parsed from the code, to learn how it works.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxjocdoll1i95es7q9zw5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxjocdoll1i95es7q9zw5.png" width="800" height="337"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Through &lt;a href="https://bit.dev" rel="noopener noreferrer"&gt;bit.dev&lt;/a&gt; components are visualized so that developers, product, designers and other stakeholders can collaborate and have universal access to all the components within the organization.&lt;/p&gt;

&lt;h3&gt;
  
  
  Component consmution and collaboration
&lt;/h3&gt;

&lt;p&gt;Once you find a component you like, for example, shared by your team or the community, you can install it using package managers like npm and yarn.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2r9sz3pb985nl38z3nuq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2r9sz3pb985nl38z3nuq.png" width="478" height="214"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Updating components right from the consuming project...
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fawcucbw8dcipmxutas85.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fawcucbw8dcipmxutas85.png" width="511" height="226"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Bit also lets you &lt;code&gt;bit import&lt;/code&gt; a component (or a whole collection) to a new project. This means Bit will bring the component's actual source-code into the repository, while tracking the changes you make.&lt;/p&gt;

&lt;p&gt;You can then change something in the code, maybe a style for example, and tag the component with a new version. You can then share the new version back to the collection, and even pull the changes into any other reposiotry this component is written in, while leveraging Git to merge the changes between the versions. &lt;/p&gt;

&lt;p&gt;Simply put, this means you can very quickly update a component right from your consuming app, so you don't have to dive into the library and wait on long PRs. While it requires some rules for collaboration (for example, choosing who can push new version into the collection in bit.dev), it also means people can adopt the components and fit them to their needs. Otherwise, the component might just not be used (or jusy copy-pasted and changes without anyone ever knowing about it :).&lt;/p&gt;

&lt;h2&gt;
  
  
  Component library + bit.dev together?
&lt;/h2&gt;

&lt;p&gt;Given the advantages of both approaches, many choose to combine their component library with the advantages of &lt;a href="https://github.com/teambit/bit" rel="noopener noreferrer"&gt;Bit&lt;/a&gt; and &lt;a href="http://bit.dev/" rel="noopener noreferrer"&gt;bit.dev&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;In this structure, the library functions as a development and staging area of the shared components. Bit and bit.dev are used to share the components, make them discoverable, and enable collaboration on top of the components to breed their adoption in the real world.&lt;/p&gt;

&lt;p&gt;The best choice depends on your needs. For larger organizations with infra teams publishing components while other teams are consuming them, it's recommended to combine both- to develop all components owned by the infra team in their repo, and make all of them individually discoverable to find, use and -given simple regulation- update as needed.&lt;/p&gt;

&lt;p&gt;For smaller teams of single developers trying to share a component between a couple of applications, a library might be an overkill and you can gust share components through your bit.dev collection- from one application to another, and keep them synced. You won't even need to refactor anything or add additional repositories to maintain.&lt;/p&gt;

&lt;p&gt;Bottom line, it's really up to you :)&lt;/p&gt;

&lt;p&gt;Cheers&lt;/p&gt;

</description>
      <category>react</category>
      <category>angular</category>
      <category>vue</category>
      <category>javascript</category>
    </item>
    <item>
      <title>When Writing Code Meets The Marshmallow Test</title>
      <dc:creator>Joni Sar</dc:creator>
      <pubDate>Mon, 29 May 2017 14:40:50 +0000</pubDate>
      <link>https://dev.to/jonisar/when-writing-code-meets-the-marshmallow-test</link>
      <guid>https://dev.to/jonisar/when-writing-code-meets-the-marshmallow-test</guid>
      <description>&lt;p&gt;One of my favorite videos around the web is this video of kids trying to resist eating a marshmallow after being promised a greater reward if they can hold on for a short while:&lt;/p&gt;

&lt;p&gt;This isn't only a funny video. It's also a fascinating experiment in human psychology that demonstrates one of our most important cognitive biasses: &lt;strong&gt;&lt;a href="https://en.wikipedia.org/wiki/Time_preference" rel="noopener noreferrer"&gt;time preference&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  What is time preference
&lt;/h1&gt;

&lt;p&gt;Time preference basically means we value short-term rewards more than we value long-term ones. Practically speaking, it means we would rather have a single marshmallow right now than a whole bunch of them later. The longer the time difference, the worse it gets.&lt;/p&gt;

&lt;p&gt;As time preference is affected by time (shockingly), studies suggest that people who lived longer may suffer less from its manipulations. To help battle this bias without waiting to become wise elders, here are a few tips as to how to identify and overcome this bias today.&lt;/p&gt;

&lt;h1&gt;
  
  
  Time preference and writing code
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fbm8ly6x10prdzvjssqr6.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fbm8ly6x10prdzvjssqr6.jpg" alt="alt text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Like many other tasks, writing code often means choosing between short-term and long-term values. Time preference might tempt us to choose the immediate satisfaction over long-term values. Enforcing better practices upon ourselves often require the activation of precious and limited mental resources such as willpower.&lt;/p&gt;

&lt;p&gt;Here are a few suggestions for how and when we can be aware of this bias when writing code, and how we can try to overcome it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Test-driven development
&lt;/h2&gt;

&lt;p&gt;We all know about TDD. We all know how important unit test are. There is little doubt about how they help make sure nothing breaks when we change stuff, and how they affect the overall quality and maintainability of our applications.&lt;/p&gt;

&lt;p&gt;But, "Interrupting" our development flow to write tests in short cycles isn't always psychologically simple. Sometimes we just want to get the job done and worry about long-term stuff later. We can adopt different methodologies, but the curve for adoption often comes with a bit of a struggle.&lt;/p&gt;

&lt;p&gt;There are a few ways we can look at TDD to increase short-term value and satisfaction, balancing the equation a little more in our favor.&lt;/p&gt;

&lt;p&gt;A good example is the fact that tests help to &lt;strong&gt;decide when something is good enough&lt;/strong&gt;. They define the behavioral scope of different components, helping us to better understand what each of them is supposed to do and when it's good enough to be considered "working". This can actually help us &lt;strong&gt;save time&lt;/strong&gt; and stop optimizing things which already hit home. &lt;/p&gt;

&lt;p&gt;Also, don't be ashamed to take pride in work ethic and quality of practice. Seeing green indicators flash over our code and knowing we put in the effort to create something we're proud of isn't something to be taken lightly. Practice makes perfect, and good practice should make us proud.&lt;/p&gt;

&lt;h2&gt;
  
  
  Design for modularity and reusability
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Feqp8e7cxprb0ovdl12qj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Feqp8e7cxprb0ovdl12qj.png" alt="alt text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Building modular software out of smaller atomic functionalities &lt;a href="https://dev.to/jonisar/coding-in-the-age-of-code-components"&gt;offers many advantages&lt;/a&gt;. Designing our applications with these principles in mind &lt;a href="https://addyosmani.com/first/" rel="noopener noreferrer"&gt;makes for better and more maintainable software&lt;/a&gt;. Still, much like TDD, it might also require some additional thinking and effort right now.&lt;/p&gt;

&lt;p&gt;To help make life easier, we can try and generate short-term values and satisfaction from this practice. Designing with modularity in mind helps to better understand how our application is built and how every component fits in the bigger picture. Such clear structure can help avoid writing stuff we don't need and helps get things ready for production quicker. Thick twice, write once.&lt;/p&gt;

&lt;p&gt;Also, we can and should aim to make our modular components truly reusable as we work, not only by design.&lt;/p&gt;

&lt;p&gt;Some add components to a general "util" library they drag across projects. Others keep a "waiting for export" directory ready for exporting a reusable component every time we create it. Publishing to package managers such as NPM every single component might consume much of our day (boiler-plating etc.), thus creating more immediate negative value in the equation. To lower the barrier we can also use projects such as &lt;a href="https://github.com/teambit/bit" rel="noopener noreferrer"&gt;Bit&lt;/a&gt; and take advantage of the simplicity of export to create a growing arsenal of &lt;a href="https://bitsrc.io" rel="noopener noreferrer"&gt;reusbale components&lt;/a&gt; to be shared with the open source community. &lt;/p&gt;

&lt;p&gt;Building an arsenal of open source work is fun and generates a clear visual feedback for our effort. It's also a great way to collaborate with the others while getting feedbacks and improvement suggestions for our work. &lt;br&gt;
Giving to others is something to be proud of, and social metrics or downloads makes us (biologically) feel the rush of what we did.&lt;/p&gt;

&lt;p&gt;In the long run, your code will also be easier to maintain and understand.&lt;/p&gt;

&lt;h2&gt;
  
  
  Short documentation cycle
&lt;/h2&gt;

&lt;p&gt;We often think of documentation if the context of explaining our code to the next person who will have to look at it. We know how important that is, and we've all tried to dive into someone else's code wishing they'd taken the extra time to write useful documentation.&lt;/p&gt;

&lt;p&gt;Taking the time to document every component isn't always simple. Future us or future others aren't always our first priority, and we get tempted to overlook it and put our mental resources into getting our code to work.&lt;/p&gt;

&lt;p&gt;There are a few ways to make this process more practical and satisfying in the short term. First, we can create a clear format for documenting different components and modules. Deciding what will be the exact format for the documentation can help grasp what we're going to do and how long it will take us, lowering the mental barriers of starting to write it. For example, when documenting a Javascript core functionality we can work with a checklist of (a) short description (b) signature (arguments, returns) (c) 1-3 examples and so on. This makes it easier to repeat the process.&lt;/p&gt;

&lt;p&gt;We can also make sure our documentation builds a logical story. If we know that functionality A leads to B which together work with C, we can "read" the story of our code and make sure everything makes sense and that we didn't add redundant chapters. Of course, modularity means every component will be independently documented. Still, chapters in a story should build upon one another in a logical way as much as possible. If they don't, the docs is a good way to find out.&lt;/p&gt;

&lt;p&gt;Good docs also help when publishing our work to our team or the open source, playing well with modularity and reusability. &lt;/p&gt;

&lt;p&gt;Unit tests can also work as part of the component's documentation. For example, if we have a simple &lt;a href="https://bitsrc.io/bit/utils/array/first" rel="noopener noreferrer"&gt;array-first&lt;/a&gt; Javascript function, the tests can tell us that (a) when the array is empty it will return &lt;code&gt;null&lt;/code&gt; (B) the first value of the array is returned when given X. This way the tests can also function as usage examples that help better understand the different use cases our code handles. This way we can hit two marshmallows with one bite.&lt;/p&gt;

&lt;p&gt;At the end of the day, these are only suggestions. It's really up to us to know when and how ever we can gain improve immediate satisfaction from practices that also create long-term value. Over time, we develop our own understanding of "what works for us" and there isn't one rule that applies to everyone. In many ways, that's part of the beauty of it all. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;"I can resist everything except temptation"&lt;/em&gt;&lt;br&gt;
    - Oscar Wilde&lt;/p&gt;

</description>
      <category>programming</category>
      <category>psychology</category>
      <category>coding</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Coding In The Age Of Code Components</title>
      <dc:creator>Joni Sar</dc:creator>
      <pubDate>Mon, 01 May 2017 09:18:00 +0000</pubDate>
      <link>https://dev.to/jonisar/coding-in-the-age-of-code-components</link>
      <guid>https://dev.to/jonisar/coding-in-the-age-of-code-components</guid>
      <description>&lt;p&gt;This is the age of code components. Web, React, Angular, Vue, and even Node components are the building blocks of pretty much everything these days. &lt;/p&gt;

&lt;p&gt;This makes sense. Software should be built by composing smaller, isolated functionalities together. &lt;a href="https://blog.bitsrc.io/introducing-bit-writing-code-in-the-age-of-code-components-fd8512a9aa90" rel="noopener noreferrer"&gt;Modularity and reusability are key for composability&lt;/a&gt;. When designing software, we should be designing a composition of smaller functionalities.  When I say small, I don't really mean X or Y lines of code. What I do mean is small in the sense that it handles a single focus or responsibility.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fxkgrb64zxr8lx4ayty72.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2Fxkgrb64zxr8lx4ayty72.png" alt="alt text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  On Reusability And Modularity
&lt;/h1&gt;

&lt;p&gt;More and more, particularly when it comes to web components, it seems like designing isolated and reusable components is not only simpler than it used to be, it's sometimes the only &lt;em&gt;right&lt;/em&gt; way to design them.&lt;/p&gt;

&lt;p&gt;So, how come achieving &lt;strong&gt;true reusability&lt;/strong&gt; for code components remains such a challenge? the answer lays not in design, but rather in the question &lt;strong&gt;how do we create, find and use these components&lt;/strong&gt;. &lt;/p&gt;

&lt;h1&gt;
  
  
  Micro Packages Are Not The Answer
&lt;/h1&gt;

&lt;p&gt;Obviously, we don't want to be copy-pasting components everywhere. Duplications are very bad, and there is no need to elaborate. The problem is, up until not the only alternative to duplicating code was publishing these components as packages, or "micro-packages".&lt;/p&gt;

&lt;p&gt;I don't think small components should become packages. Packages are not fit for making them practically reusable, and they add too much complexity. Here is why:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Initial overhead&lt;/strong&gt;: to create a new repository and package for every small component you would have to create a repository, the package boilerplate (build, testing, etc.) and somehow make this process practical for a large set of components.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Maintenance&lt;/strong&gt;: modifying a repository and a package takes time and forces you to go through multiple steps such as cloning, linking, debugging, committing, republishing and so on. Build and install times quickly increase and dependency hell always feels near.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Discoverability&lt;/strong&gt;: it’s hard if not impossible to organize and search multiple repositories and packages to quickly find the components you need. People often used different terms to describe the same functionality, and there is no single source of truth to search and trust.  &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Making reusability practical
&lt;/h1&gt;

&lt;p&gt;I'll start at the end- we built an open source project to solve this problem. It's &lt;a href="https://github.com/teambit/bit" rel="noopener noreferrer"&gt;called Bit&lt;/a&gt; and it enables us to quickly create reusable components during your workflow, export them to a distributed repository designed for code components called a &lt;a href="https://teambit.github.io/bit/bit-scope.html" rel="noopener noreferrer"&gt;Scope&lt;/a&gt; (which stores, organizes, manages, tests and builds your components) and then use them anywhere across repositories and applications. Components can be used as a virtual API pulling nothing but the code actually used in your application.&lt;/p&gt;

&lt;p&gt;
  &lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fbit-assets%2Fgifs%2Fleftpad2.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fbit-assets%2Fgifs%2Fleftpad2.gif"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;Bit was created to solve the three problems described above. To do so, it had to give components everything packages couldn't. It had to make them quick to create, simple to maintain and easy to find. Designed for code components, Bit introduces some new capabilities to make this not only possible but also practical:&lt;/p&gt;

&lt;p&gt;Bit comes with a reusable and isolated component environment. This environment is also configurable, saving the overhead for creating new boilerplates for new components (which can be done directly withing any project you're working on). This environment also takes care of testing and building your components, using any framework you choose. &lt;/p&gt;

&lt;p&gt;To organize, store and manage your components Bit uses "Scopes". A Bit Scope is a distributed and virtual layer of components on top of a source-code repository (and outside of it). You can export components to a Scope, where they will be stored, organized and made reusable. Scopes are distributed and can be used for organizing components by context, collaborators or other abstract methods. Scopes makes it simpler to maintain a large collection of reusable components, keeping them all in one place while modifying and using them individually.&lt;/p&gt;

&lt;p&gt;Bit also solves the discoverability problem using the Scoping organizational system, and features such as a built-in semantic search engine. &lt;/p&gt;

&lt;h1&gt;
  
  
  Distribution and centralization
&lt;/h1&gt;

&lt;p&gt;Distribution has many advantages, both practical as well as keeping things separated from commercial interests. However, centralization has its own advantages particularly in the areas of collaboration and the comfort of work. To help make work easier and collaborate as a team, we also built a free &lt;a href="https://www.bitsrc.io" rel="noopener noreferrer"&gt;community Hub called bitsrc&lt;/a&gt;. It's free for open source and always will be. &lt;a href="https://www.bitsrc.io/bit/utils" rel="noopener noreferrer"&gt;Here is an example Scope of utility functions&lt;/a&gt; I made with my team. &lt;/p&gt;

&lt;h1&gt;
  
  
  What now?
&lt;/h1&gt;

&lt;p&gt;Bit is working, but it's also a work in progress.&lt;br&gt;
For example, Bit is designed to be language agnostic and uses external drivers to work with different languages. Javascript is the first one we added, and more should soon follow. &lt;/p&gt;

&lt;p&gt;Other features should also be added, such as automatic dependency definition, source code indexing, component quality measurement, semi-automatic semantic versioning and more.&lt;/p&gt;

&lt;p&gt;For now, working with &lt;a href="https://github.com/teambit/bit" rel="noopener noreferrer"&gt;Bit&lt;/a&gt; and/or &lt;a href="https://www.bitsrc.io" rel="noopener noreferrer"&gt;bitsrc&lt;/a&gt; allows us to create, maintain and reuse a growing set of "building blocks" including Web and React components, utility functions, small modules and more. This not only speeds up work and prevents duplications, it aligns with the basic principles of how software should be composed. One step at a time.&lt;/p&gt;

&lt;p&gt;Feel free to try it out for yourselves, contributions on GitHub are always welcome.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>opensource</category>
      <category>softwareengineering</category>
      <category>webcomponents</category>
    </item>
  </channel>
</rss>
